Sandeep Chowdhary — Adjacent Possible Ideas

thoughts

character of physical law by feynman

feynman is talking about meta-patterns that physics follows, the very nature of discovery itself.

metaanalysis of textbooks

during my studies i found some textbooks to be easier to understand than others. and with some textbooks, i would just be lost and not understand what was going on. and i always wanted to know why this was. i was curious why some subject matter was easier for me to understand, while some just didn't register. and i wondered whether it was (i) a personal thing, an individual aptitude for the subject, (ii) was it because of the subject matter and how difficult/complex it was inherently? or (iii) maybe it was something about how the subject matter was presented in the book.

with my training in network and data science, and a growing interest in meta-science, i want to answer that childhood curiosity by using textbooks at college level from physics, chemistry, mathematics, and biology and parsing them as growing networks of knowledge as one reads the textbook. using the recent breakthrough in AI vision and language technology, which is LLMs, i will convert books into structured data, such as a JSON object, and then analyze the networks of concepts and the order in which they occur in the book. how this knowledge network grows in each individual book, to understand the complexity of the structure and compare them across subject matter, to compare disciplines. next, to understand different structures of different books for the same topic, i will compare the same topic taught in different books.

if successful, this research would help us improve the theoretical representation of knowledge and learning, and capture the kind of structures we are designing in textbooks. perhaps it will also open up potentially better ways to structure material in the future for consumption by humans. now that we have LLMs, restructuring books is no major struggle, it should be possible to do this on the fly. things will become more fluid, and we might even have books that adjust themselves agentically to each student. so i imagine such an investigation of the inherent structure of knowledge, and our options therein to rearrange, might be quite useful.

short note on lean

lean-based automatic theorem proving (ATP) is one of the fastest-moving, most rigorously quantified "scientific" arenas we have: every step is verifiable and the full developmental history of the library and tactic ecosystem is recorded. that makes it a near-ideal testbed where meta-science can stop being purely observational and become interventional: we can define discovery as a computational search process over a formally specified space (proof states, tactics, and premises), measure progress precisely, and design systems that actually accelerate proof and lemma discovery.

modern ATP work explicitly frames proving as sequential decision-making (often an MDP) and improves performance via structured search and learning—e.g., retrieval-augmented premise selection and tactic prediction in LeanDojo [Yang 2023]; reinforcement learning from proof-assistant feedback combined with Monte-Carlo tree search in DeepSeek-Prover-V1.5 [Xin 2025a]; scalable best-first search (BFS-Prover) [Xin 2025b]; critic-guided expert iteration for stepwise proving (InternLM2.5-StepProver) [Wu 2024]; and "growing libraries" approaches that explicitly model how new lemmas expand the adjacent possible (LEGO-Prover) [Wang 2024a]. in parallel, autoformalization is rapidly scaling: translating natural-language math problems and proofs into formal statements (Lean Workbook [Ying 2024], TheoremLlama [Wang 2024b]), earlier autoformalization with LLMs in proof assistants [Wu 2022], and curriculum-style training over formal statements [Polu 2022]. recent work also targets the semantic mismatch between informal proofs and formal tactic steps by introducing intermediate representations such as a "Chain of States" to align informal logical transitions with formal proof states [Wang n.d.].

that is why i would like to work on this: LEAN/Mathlib provides a uniquely clean marriage of (i) a fully verified, richly logged discovery process and (ii) a rapidly innovating AI toolchain, meaning meta-science can be used not just to observe scientific development but to accelerate it with measurable, reproducible gains.

towards compositionality in concept learning

https://huggingface.co/datasets/internlm/Lean-Workbook

dataset contains 57231 problems in the split of Lean Workbook and 82893 problems in the split of Lean Workbook Plus. we provide the natural language statement, answer, formal statement, and formal proof (if available) for each problem.

a last set of goals

a last set of goals of this project is: (i) to discover the tactics used in physics, and patterns of their usage, similar to tactics used by LEAN. (ii) grow LEAN tactics by identifying the mechanism of creating new tactics, tactics are a subset of reasoning traces (assumption), reasonings which are useful in many scenarios. (iii) discover new predictive physics laws. (iv) new LEAN tactics by formalizing new tactics generated by LLMs or perhaps a more fundamental process of generation than LLMs and testing those candidates to prove theorems in LEAN. (v) identify meta-traces within saturations of EProver which lead to solutions using RL perhaps and name them as tactics.

full references

Yang, Kaiyu, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J. Prenger, and Animashree Anandkumar. 2023. "LeanDojo: Theorem Proving with Retrieval-Augmented Language Models." Advances in Neural Information Processing Systems (NeurIPS 2023), 21573–21612.

Xin, Huajian, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, et al. 2025a. "DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte Carlo Tree Search." International Conference on Learning Representations (ICLR 2025).

Xin, Ran, Chengguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, and Kai Shen. 2025b. "BFS-Prover: Scalable Best-First Tree Search for LLM-Based Automatic Theorem Proving." arXiv preprint arXiv:2502.03438.

Wu, Zijian, Suozhi Huang, Zhejian Zhou, Huaiyuan Ying, Jiayu Wang, Dahua Lin, and Kai Chen. 2024. "InternLM 2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale Lean Problems." arXiv preprint arXiv:2410.15700.

Wang, Haiming, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, et al. 2024a. "LEGO-Prover: Neural Theorem Proving with Growing Libraries." International Conference on Learning Representations (ICLR 2024).

Ying, Huaiyuan, Zijian Wu, Yihan Geng, Jiayu Wang, Dahua Lin, and Kai Chen. 2024. "Lean Workbook: A Large-Scale Lean Problem Set Formalized from Natural Language Math Problems." arXiv preprint arXiv:2406.03847.

Wang, Ruida, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, and Tong Zhang. 2024b. "TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts." Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Association for Computational Linguistics.

Wu, Yuhuai, Albert Q. Jiang, Wenda Li, Markus Rabe, Charles Staats, Mateja Jamnik, and Christian Szegedy. 2022. "Autoformalization with Large Language Models." Advances in Neural Information Processing Systems (NeurIPS 2022) 35: 32353–32368.

Polu, Stanislas, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. 2022. "Formal Mathematics Statement Curriculum Learning." arXiv preprint arXiv:2202.01344.

Wang, Ziyu, Bowen Yang, Chenyi Li, Yuan Zhang, Shihao Zhou, Bin Dong, and Zaiwen Wen. n.d. "Translating Informal Proofs into Formal Proofs Using a Chain of States." (Source shown as a formatted paper PDF page; year not visible in the provided image.)

derive laws of physics in a toy env I

knowing all physics previously derived, memory, and access to the environment for experiments to gain info, can an agent find the minimal description length physical law. another aspect that would be hard to model is the hardware to run experiments, which is also an exploratory, adjacent possible space. for now, let's avoid that. so we build a toy world, like the game of life let's say and add a neural network in it. it might be a weird situation because the game of life is sufficiently complex for universal computation but what a neural network would look like in it is unclear to me at this stage. my problem is that i want to run the neural network on the hardware that the game of life provides, a spatial map with rules of evolution. like physics in our world is the bedrock on which we must build neural networks using silicon. i am sure it can be done similarly for the game of life but just not immediately obvious. so let's instead choose a world which is more suitable for an agent to exist in. perhaps a larger neural network simulates the world around it, but with fixed laws. then we are back to the Worlds models idea by Demis Hassabis and DeepMind, with Genie inside a video generation model.

rethinking this, it is perfectly fine to have a neural network inside the game of life. just assume it's possible to have an agent like that as the game of life is a universal computer, so we might not know what the spatial representation of a NN in the game of life's 2D space would be but we do not need to know. it's like brains in our universe, let's suppose there is an evolutionary process which can evolve NNs in the game of life. can it discover the exact symbolic equations in this game of life? rather than approximations and complex representations inside its network. think: minimum description length, Occam's razor and such.

the next important decision is: what initial capabilities do we assign to this agent? RL to evolve its own architecture and reward patterns? memory? perhaps an LLM that it builds and trains, we give it the autoencoder architecture. the LLM we will substitute with an LLM from our world to save time and be a bit hand-wavy about the self-contained nature of this experiment, purely due to compute constraints i have. unless i get hired by DeepMind or someplace where this constraint goes away.

all in all it's cheating to instantiate this agent with these massive capabilities at the start and we skip a lot of the building phase before this, but it is fine. this will still be educational. even the process of creating such a system is a good first step to such systems which can quantify the process of scientific discovery itself. that's the end goal, to learn how science works and can be made faster.

derive laws of physics in a toy env II

scientific laws are compact descriptions that compress large volumes of observations. from an information-theoretic perspective, understanding corresponds to minimizing description length rather than merely fitting data. this view is implicit in approaches such as minimum description length (MDL), Kolmogorov complexity, and symbolic regression, but is rarely enforced explicitly in modern machine learning benchmarks.

most existing benchmarks reduce discovery to passive regression or supervised prediction. even when symbolic expressions are recovered, the agent is typically given a fixed dataset and asked to interpolate. examples like SWE, proof bench, ARC AGI I, II, III are all performance-based—they measure how well systems perform on tasks.

the computational complexity of such a task is very hard naively. how hard relatively for different envs? say the background env is something known to humans (already "named") like "game of life". i tested this with an API server env (which only exposes the state evolution, not underlying rules) and the scientist is a cursor agent using claude sonnet and it guessed it after a couple of experiments. so named rules are easy to find. then i tried game of life + a perturbation (every 10th step, all 0 flip to 1), it could not find the minimal set of rules, which was pretty easy to come up with for me. so it's rare as a combination for game of life to come up with this perturbation and that made it a harder guess. can we quantify the hardness of guessing a physics set of rules using complexity theory and their distance in embedding space of LLMs finetuned on physics textbooks?

hardness of new proofs under removal of existing proofs

ok so imagine we've got a theorem v sitting in the graph like a named shortcut. it has parents(v) upstream and children(v) downstream. right now, every child in children(v) gets to say "cool, i'll just cite v" and move on. the experiment is: remove v as a named node. just delete the label that everyone is using as a handle. then we ask a very plain question: can the stuff right after it still be proved if we only allow parents(v) and whatever else they already had?

and this immediately splits into two separate things, and i'm trying to keep them from blurring. one is logical reachability: if v is derivable from parents(v), then in principle nothing is lost. any proof that used v can be rewritten by inlining a derivation of v from parents(v). so the children are still "possible" in the math sense. the other is operational reachability: when you actually run proof search (tactics, premise selection, timeouts), does the system still find those inlined paths, or does removing v blow up the search so the child proofs become practically unreachable under budget?

experiments

adjacent possible of known mathematics

adjacent possible of lean mathlib →

exploring how mathematical discoveries are made, hoping to generalize the search for new mathematical tactics and using those to conjecture/ discover (and prove) new theoroms. parallely, trying to discover faster search algorithms for existing mathlib theorom- theorom network. this is postdiction rather than prediction, yet might reveal useful general stratergies for search for new mathematics at the systemic level (how theoroms are distributed in the space of complex network of theoroms).

more...

claude opus describes this project as following:

s is conducting research to quantify the difficulty of mathematical discoveries in formal proof libraries, specifically using Mathlib (a large mathematical library in the Lean theorem prover containing over 99,000 theorems). The work builds on previous analysis of theorem dependency graphs and MDL gain calculations. The primary objective is developing frameworks that move beyond simple citation metrics to understand what makes mathematical theorems "interesting" or valuable, using information-theoretic approaches as proxies for mathematical understanding. This research explores whether computational analysis could discover better factorizations of mathematical knowledge than human mathematical ontology, and aims to model how agents explore existing theorem networks to discover new results. The work represents a shift from merely evaluating existing theorems to generating new ones based on information-theoretic principles.

s is actively developing two parallel modeling approaches for mathematical discovery. The first is a graph-based model treating the theorem dependency DAG as a constraint-based possibility space, focusing on the "adjacent possible" of theorems that become accessible given current knowledge. The second is a more complex model addressing actual discovery processes through conjecture formation and proof search. s has comprehensive research plans for both approaches, with the simpler plan containing 7 experiments and the complex plan containing 10 experiments, covering topics like accessibility dynamics, bottleneck analysis, premise selection difficulty, and ATP search complexity. s is also working on computing Minimum Description Length (MDL) for theorems as a compression-based measure of mathematical value.

s plans to implement focused computation of description length for current Mathlib and create physics-inspired baseline models for interpreting MDL gains. s wants to test these frameworks against historical Mathlib development patterns and validate whether algorithmic approaches can outperform human-chosen mathematical abstractions. The research will explore boundary conditions using theoretical extremes like "random library" (no compression) to understand where actual mathematical systems sit relative to these limits.

s has identified three distinct notions of compression in mathematical contexts: uniform encoding (baseline structural measure), Shannon encoding (frequency-weighted), and pattern abstraction (detecting repeated proof strategies). These measures can diverge significantly — frequently-cited trivial lemmas versus complex theorems that abstract reusable proof patterns represent different types of mathematical value. A key insight is that real mathematical reasoning involves conjecturing new statements and proving them, not just discovering existing nodes in a graph, though the graph-based model still captures something profound about mathematical dependencies and the adjacent possible of accessible theorems.

s approaches the research through information-theoretic frameworks, treating mathematical libraries as physical systems bounded by theoretical limits. s structures experiments around motivation/question, method, measurements, expected outcomes, and learning implications, preferring detailed analysis over simple lists. The methodology involves systematic analysis of optimization opportunities and comparative performance between human versus algorithmic approaches to mathematical abstraction. s emphasizes moving beyond toy models to capture real mathematical reasoning processes while maintaining computational tractability.

s works with Mathlib data containing theorem dependency networks and proof structures, and uses the Lean theorem prover environment. s has access to comprehensive datasets showing mathematical theorem relationships and is developing algorithms for pattern detection, compression estimation, and comparative analysis of mathematical abstractions.

physics as compression: electrodynamics

physics is compression. the laws of physics are equations which are true across any number of contexts. we name concepts, write equations that connect them, equations which explain real observations. how much compression does a concept offer? using Griffiths textbook as a first experiment, i measure the compression of description length of the textbook brought about by concepts of electrodynamics. (on the other hand, compression brought by laws of electrodynamics is astronomical if not infinite, as they are predictive laws, true regardless of context.)

Adjacent Possible In Electrodynamics →

more...

claude opus describes this project as following:

s is conducting research on the computational foundations of scientific discovery, treating science as an optimization process that can be analyzed and improved through information-theoretic methods. The core hypothesis is that current conceptual structures in physics and mathematics may represent local rather than global optima in terms of compression efficiency and descriptive power. This research aims to understand whether alternative conceptual dictionaries could achieve the same predictive power while using shorter descriptions, essentially asking if there are objectively better ways to organize scientific knowledge.

The work spans three major experimental directions: analyzing the compression architecture of existing physics knowledge through textbook analysis, developing benchmark environments for testing automated discovery capabilities, and creating self-supervised discovery agents. s emphasizes this as "a new kind of science" focused on formalizing and optimizing the discovery process itself, rather than just finding new empirical relationships in data.

s has completed Experiment 1, which involved extracting and analyzing the conceptual architecture of Griffiths' Introduction to Electrodynamics. This analysis identified 725 concepts with their dependency relationships and calculated each concept's contribution to overall compression (ΔL values). The experiment revealed interesting patterns, including that derivation-based analysis uncovered 287 named results that definition-focused concept extraction had missed.

Currently active work includes extending the textbook analysis to additional physics texts, with Griffiths' Quantum Mechanics identified as the next priority for controlled comparison. s is also developing comprehensive benchmark environments featuring 27 grid-based discovery problems across 7 substrates, designed to test scientific reasoning rather than pattern matching from training data.

The research pipeline includes building a comprehensive corpus of approximately 105 physics textbooks totaling around 64,000 pages across 18 major subfields. Future work involves developing self-supervised discovery agents that construct their own reward functions through environmental interaction, using Einstein as a model of simultaneous searcher, verifier, and reward designer. s is also planning to formalize physics similar to how Lean and MathLib formalized mathematics, creating a "MathLib for physics" where derivation steps and tactic usage can be analyzed and predicted.

A fundamental insight driving the work is that science is about "bringing the unknown to the known" through constrained search, where the constraints distinguish scientific laws from mere descriptions. s has identified five operationalizable principles that constitute science: parsimony (Occam's razor), predictability (generalization beyond training data), symmetry (invariance under transformations), compositionality (modular decomposition), and unification (analogical compression across domains).

The research has revealed that concept formation in physics has been historically path-dependent on measurement hardware capabilities — concepts became foundational not necessarily because they're optimal, but because they corresponded to what could be measured first. A critical distinction emerged between prediction and understanding: AI systems can achieve high prediction accuracy while failing to compress patterns into underlying rule structures. s has recognized that human senses create misleading observables and that physics may be over-tuned to these projections, while true generating principles (like action minimization through variational principles) remain unobservable but more fundamental.

The research methodology emphasizes rigorous computational methods over LLM proxies where possible. s consistently redirects toward exact graph algorithms, OCR extraction using specialized models, and multi-stage concept extraction pipelines using tiered local models. The work involves factored representations where complex processes decompose into (tactic, lemma, instance_parameters), similar to Lean's MathLib structure. The research relies heavily on local AI models for different specialized tasks, with key technical infrastructure including OCR systems for textbook processing, graph-based algorithms for description length computation, evolutionary programming frameworks, and MCP server architectures for isolated testing environments.

adjacent possible of aesthetics

exploring the adjacent possible of equation aesthetics through user-guided genetic evolution. systematically testing individual parameter changes to map which mutations create preferred visual patterns, learning from user selections to evolve variants that favor approved changes while keeping unapproved ones accessible at reduced probability

base equations by @yuruyurau

adjacent possible of aesthetics →