thoughts
growing up, during my education i found some textbooks to be easier to understand than others. and with some textbooks, i would just be lost and not understand what was going on. and i always wanted to know why this was. i was curious why some subject matter was easier for me to understand, while some just didn't register. and i wondered whether it was (i) a personal thing, an individual aptitude for the subject, (ii) was it because of the subject matter and how difficult/complex it was inherently? or (iii) maybe it was something about how the subject matter was presented in the book.
with my training in network and data science, and a growing interest in meta-science, i want to answer that childhood curiosity by using textbooks at college level from physics, chemistry, mathematics, and biology and parsing them as growing networks of knowledge as one reads the textbook. using the recent breakthrough in AI vision and language technology, which is LLMs, i will convert books into structured data, such as a JSON object, and then analyze the networks of concepts and the order in which they occur in the book. how this knowledge network grows in each individual book, to understand the complexity of the structure and compare them across subject matter, to compare disciplines. next, to understand different structures of different books for the same topic, i will compare the same topic taught in different books.
if successful, this research would help us improve the theoretical representation of knowledge and learning, and capture the kind of structures we are designing in textbooks. perhaps it will also open up potentially better ways to structure material in the future for consumption by humans. now that we have LLMs, restructuring books is no major struggle, it should be possible to do this on the fly. things will become more fluid, and we might even have books that adjust themselves agentically to each student. so i imagine such an investigation of the inherent structure of knowledge, and our options therein to rearrange, might be quite useful.
lean-based automatic theorem proving (ATP) is one of the fastest-moving, most rigorously quantified "scientific" arenas we have: every step is verifiable and the full developmental history of the library and tactic ecosystem is recorded. that makes it a near-ideal testbed where meta-science can stop being purely observational and become interventional: we can define discovery as a computational search process over a formally specified space (proof states, tactics, and premises), measure progress precisely, and design systems that actually accelerate proof and lemma discovery.
modern ATP work explicitly frames proving as sequential decision-making (often an MDP) and improves performance via structured search and learning—e.g., retrieval-augmented premise selection and tactic prediction in LeanDojo [Yang 2023]; reinforcement learning from proof-assistant feedback combined with Monte-Carlo tree search in DeepSeek-Prover-V1.5 [Xin 2025a]; scalable best-first search (BFS-Prover) [Xin 2025b]; critic-guided expert iteration for stepwise proving (InternLM2.5-StepProver) [Wu 2024]; and "growing libraries" approaches that explicitly model how new lemmas expand the adjacent possible (LEGO-Prover) [Wang 2024a]. in parallel, autoformalization is rapidly scaling: translating natural-language math problems and proofs into formal statements (Lean Workbook [Ying 2024], TheoremLlama [Wang 2024b]), earlier autoformalization with LLMs in proof assistants [Wu 2022], and curriculum-style training over formal statements [Polu 2022]. recent work also targets the semantic mismatch between informal proofs and formal tactic steps by introducing intermediate representations such as a "Chain of States" to align informal logical transitions with formal proof states [Wang n.d.].
that is why i would like to work on this: LEAN/Mathlib provides a uniquely clean marriage of (i) a fully verified, richly logged discovery process and (ii) a rapidly innovating AI toolchain, meaning meta-science can be used not just to observe scientific development but to accelerate it with measurable, reproducible gains.
towards compositionality in concept learning
https://huggingface.co/datasets/internlm/Lean-Workbook
dataset contains 57231 problems in the split of Lean Workbook and 82893 problems in the split of Lean Workbook Plus. we provide the natural language statement, answer, formal statement, and formal proof (if available) for each problem.
a last set of goals
a last set of goals of this project is: (i) to discover the tactics used in physics, and patterns of their usage, similar to tactics used by LEAN. (ii) grow LEAN tactics by identifying the mechanism of creating new tactics, tactics are a subset of reasoning traces (assumption), reasonings which are useful in many scenarios. (iii) discover new predictive physics laws. (iv) new LEAN tactics by formalizing new tactics generated by LLMs or perhaps a more fundamental process of generation than LLMs and testing those candidates to prove theorems in LEAN. (v) identify meta-traces within saturations of EProver which lead to solutions using RL perhaps and name them as tactics.
full references
Yang, Kaiyu, Aidan Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan J. Prenger, and Animashree Anandkumar. 2023. "LeanDojo: Theorem Proving with Retrieval-Augmented Language Models." Advances in Neural Information Processing Systems (NeurIPS 2023), 21573–21612.
Xin, Huajian, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, et al. 2025a. "DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte Carlo Tree Search." International Conference on Learning Representations (ICLR 2025).
Xin, Ran, Chengguang Xi, Jie Yang, Feng Chen, Hang Wu, Xia Xiao, Yifan Sun, Shen Zheng, and Kai Shen. 2025b. "BFS-Prover: Scalable Best-First Tree Search for LLM-Based Automatic Theorem Proving." arXiv preprint arXiv:2502.03438.
Wu, Zijian, Suozhi Huang, Zhejian Zhou, Huaiyuan Ying, Jiayu Wang, Dahua Lin, and Kai Chen. 2024. "InternLM 2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale Lean Problems." arXiv preprint arXiv:2410.15700.
Wang, Haiming, Huajian Xin, Chuanyang Zheng, Lin Li, Zhengying Liu, Qingxing Cao, Yinya Huang, Jing Xiong, Han Shi, Enze Xie, et al. 2024a. "LEGO-Prover: Neural Theorem Proving with Growing Libraries." International Conference on Learning Representations (ICLR 2024).
Ying, Huaiyuan, Zijian Wu, Yihan Geng, Jiayu Wang, Dahua Lin, and Kai Chen. 2024. "Lean Workbook: A Large-Scale Lean Problem Set Formalized from Natural Language Math Problems." arXiv preprint arXiv:2406.03847.
Wang, Ruida, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, and Tong Zhang. 2024b. "TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts." Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024). Association for Computational Linguistics.
Wu, Yuhuai, Albert Q. Jiang, Wenda Li, Markus Rabe, Charles Staats, Mateja Jamnik, and Christian Szegedy. 2022. "Autoformalization with Large Language Models." Advances in Neural Information Processing Systems (NeurIPS 2022) 35: 32353–32368.
Polu, Stanislas, Jesse Michael Han, Kunhao Zheng, Mantas Baksys, Igor Babuschkin, and Ilya Sutskever. 2022. "Formal Mathematics Statement Curriculum Learning." arXiv preprint arXiv:2202.01344.
Wang, Ziyu, Bowen Yang, Chenyi Li, Yuan Zhang, Shihao Zhou, Bin Dong, and Zaiwen Wen. n.d. "Translating Informal Proofs into Formal Proofs Using a Chain of States." (Source shown as a formatted paper PDF page; year not visible in the provided image.)
knowing all physics previously derived, memory, and access to the environment for experiments to gain info, can an agent find the minimal description length physical law. another aspect that would be hard to model is the hardware to run experiments, which is also an exploratory, adjacent possible space. for now, let's avoid that. so we build a toy world, like the game of life let's say and add a neural network in it. it might be a weird situation because the game of life is sufficiently complex for universal computation but what a neural network would look like in it is unclear to me at this stage. my problem is that i want to run the neural network on the hardware that the game of life provides, a spatial map with rules of evolution. like physics in our world is the bedrock on which we must build neural networks using silicon. i am sure it can be done similarly for the game of life but just not immediately obvious. so let's instead choose a world which is more suitable for an agent to exist in. perhaps a larger neural network simulates the world around it, but with fixed laws. then we are back to the Worlds models idea by Demis Hassabis and DeepMind, with Genie inside a video generation model.
rethinking this, it is perfectly fine to have a neural network inside the game of life. just assume it's possible to have an agent like that as the game of life is a universal computer, so we might not know what the spatial representation of a NN in the game of life's 2D space would be but we do not need to know. it's like brains in our universe, let's suppose there is an evolutionary process which can evolve NNs in the game of life. can it discover the exact symbolic equations in this game of life? rather than approximations and complex representations inside its network. think: minimum description length, Occam's razor and such.
the next important decision is: what initial capabilities do we assign to this agent? RL to evolve its own architecture and reward patterns? memory? perhaps an LLM that it builds and trains, we give it the autoencoder architecture. the LLM we will substitute with an LLM from our world to save time and be a bit hand-wavy about the self-contained nature of this experiment, purely due to compute constraints i have. unless i get hired by DeepMind or someplace where this constraint goes away.
all in all it's cheating to instantiate this agent with these massive capabilities at the start and we skip a lot of the building phase before this, but it is fine. this will still be educational. even the process of creating such a system is a good first step to such systems which can quantify the process of scientific discovery itself. that's the end goal, to learn how science works and can be made faster.
experiments
open AI systematic lit review →
science is expanding faster than we can review it, creating a bottleneck for policy makers trying to act on the latest consensus. existing ai tools help but they're opaque and miss stuff. this project builds an open, transparent tool for automated literature synthesis, starting with climate assessments like ipcc. we expand from seed papers via citation networks and semantic search, filter with llms, validate findings with independent models, and integrate everything into live html reports with timestamps and source links. it's a glass box approach—fully open source—that automates discovery and validation while keeping human authors in the loop. works for any evidence-intensive domain, not just climate.
exploring the adjacent possible of equation aesthetics through user-guided genetic evolution. systematically testing individual parameter changes to map which mutations create preferred visual patterns, learning from user selections to evolve variants that favor approved changes while keeping unapproved ones accessible at reduced probability
base equations by @yuruyurau