Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add description of some progress using lean-egg #210

Merged
merged 5 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Some automatically generated progress:
- Oct 1, 2024: Another [~250k transitive implications](equational_theories/Generated/TrivialBruteforce) were proven by simple proof generation. Discussed in the blueprint [here](https://teorth.github.io/equational_theories/blueprint/sect0007.html).
- Oct 1, 2024: [~500k transitive implications](equational_theories/Generated/EquationSearch) were proven by a custom tool that chooses hypotheses and leveraged previously found implications to search by using the implied equations as substitutions. Discussed in the blueprint [here](https://teorth.github.io/equational_theories/blueprint/sect0009.html).
- Oct 2, 2024: All but 7999 implications (99.958%) were [conjecturally resolved using the vampire ATP](equational_theories/Generated/Vampire).
- Oct 2, 2024: 86 (potentially) new implications in `Subgraph.lean` conclusively (i.e. in Lean) resolved using the [lean-egg tactic](https://github.com/marcusrossel/lean-egg). Some of these were simply missed by the transitive closure computation and technically already implied, but some were genuinely new.

Some statistics and data files from a given point in time:
- Oct 2, 2024: [A list of unknown implications whose converse is proven](https://github.com/amirlb/equational_theories/blob/extract_implications_equivalence_creators_data/scripts/equivalence_creators.json), i.e. implications that would reduce the number of equivalence classes of equations. At the time of creation we had 2969 equivalence classes. This file contains 4526 unknown implications, if all hold then it will reduce the number of equivalence classes to 1300.
Expand Down
29 changes: 29 additions & 0 deletions blueprint/src/chapter/egraphs.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
\chapter{E-Graphs}

For proving implications, we used another technique called equality saturation~\cite{DBLP:journals/pacmpl/WillseyNWFTP21} with a tactic, to automatically construct proofs.

\section{Methodology}

The basic methodology of equality saturation is based on E-Graphs, a data structure that can store equivalence classes of terms efficiently.
We used the lean-egg tactic (\url{https://github.com/marcusrossel/lean-egg}), based on equality saturation as a tactic, which (re)constructs a proof from the E-graph~\cite{DBLP:journals/pacmpl/KoehlerGBGTS24} in Lean.
This means that we do not have to trust either the egg tool nor the tactic: if something goes wrong, Lean will not accept the constructed proof. In fact, we found issues with the proof reconstruction from the examples in this project.

The lean-egg tactic works for equational reasoning, i.e. proving equalities as consequences of other equalities (potentially universally quantified), which is exactly what we need to prove implications of laws in Magmas.
In many cases, we have laws of the form $x = y$, where neither set of variables in the left- and right-hand-side of the law is a subset of each other.
In this case the laws cannot be used as rewrite rules: it's not clear what it would be rewritten to, since there are unknowns on both sides of the equation.
For these cases we used a simple heuristic, where we instantiate the variables with terms found in the (proof) context, as those are likely to be important for proving the equality.

\section{Results}

Out of the possible implications between the $34$ equations considered in Chapter~\ref{subgraph-eq}, this method found an additional 86 implications that were not found before.
Some of these seem to be missing in the computation of the transitive closure of implications of the equalities (an investigation is in progress), but some of these are genuinely new theorems, and the \texttt{lean-egg} tactic finds good proofs of these (these can be rewritten using \texttt{calc} style with a different tactic, \texttt{calcify}: \url{https://github.com/nomeata/lean-calcify}). An example of this is the following proof, found by lean-egg:

\begin{theorem}[14 implies 23]\label{14_implies_23} \uses{eq23,eq14}\lean{Subgraph.Equation14_implies_Equation23}\leanok Definition \ref{eq14} is equivalent to Definition \ref{eq23}.
\end{theorem}

\begin{proof}

$$ x = (x \circ x) \circ (x \circ (x \circ x)) = (x \circ x) \circ x $$
\end{proof}

It was also able to (re)prove Theorem~\ref{1689_equiv_2}, albeit with a manually-provided hint (guide, in the sense of~\cite{DBLP:journals/pacmpl/KoehlerGBGTS24}).
2 changes: 1 addition & 1 deletion blueprint/src/chapter/equation_search.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
\chapter{Equation Search}

\href{https://github.com/teorth/equational_theories/tree/main/equational_theories/Generated/EquationSearch}{Approximately ~500k transitive implications were proven by a custom tool leveraging the implication graph}. After previous brute force had derived many implications expressible as a small number of rewrites, this equation search leveraged the implication graph to search further. It began at a hypothesis, and then attempted to perform substitutions by equations it transitively implied. Due to it's naive implementation, it can not perform all forms of substutions, it is limited to substutions that match the exact text of another equation. This search method benefits from being able to start from a hypothesis and can search for any number of goals, rather than having to search the combined space of all possible hypotheses and goals, and can 'reach' farther positions in the search graph than simple rewriting alone.
\href{https://github.com/teorth/equational_theories/tree/main/equational_theories/Generated/EquationSearch}{Approximately ~500k transitive implications were proven by a custom tool leveraging the implication graph}. After previous brute force had derived many implications expressible as a small number of rewrites, this equation search leveraged the implication graph to search further. It began at a hypothesis, and then attempted to perform substitutions by equations it transitively implied. Due to it's naive implementation, it can not perform all forms of substitutions, it is limited to substitutions that match the exact text of another equation. This search method benefits from being able to start from a hypothesis and can search for any number of goals, rather than having to search the combined space of all possible hypotheses and goals, and can 'reach' farther positions in the search graph than simple rewriting alone.

An example proof illustrates the logic it uses:

Expand Down
1 change: 1 addition & 0 deletions blueprint/src/content.tex
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
\input{chapter/trivial_auto.tex}
\input{chapter/all_small_magmas.tex}
\input{chapter/equation_search.tex}
\input{chapter/egraphs.tex}

\bibliographystyle{plain} % We choose the "plain" reference style
\bibliography{references}
Binary file modified blueprint/src/print.pdf
Binary file not shown.
40 changes: 40 additions & 0 deletions blueprint/src/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,43 @@ @article {Kisielewicz2
DOI = {10.1007/s000120050057},
URL = {https://doi.org/10.1007/s000120050057},
}

@article{DBLP:journals/pacmpl/KoehlerGBGTS24,
author = {Thomas Koehler and
Andr{\'{e}}s Goens and
Siddharth Bhat and
Tobias Grosser and
Phil Trinder and
Michel Steuwer},
title = {Guided Equality Saturation},
journal = {Proc. {ACM} Program. Lang.},
volume = {8},
number = {{POPL}},
pages = {1727--1758},
year = {2024},
url = {https://doi.org/10.1145/3632900},
doi = {10.1145/3632900},
timestamp = {Thu, 29 Feb 2024 20:54:53 +0100},
biburl = {https://dblp.org/rec/journals/pacmpl/KoehlerGBGTS24.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

@article{DBLP:journals/pacmpl/WillseyNWFTP21,
author = {Max Willsey and
Chandrakana Nandi and
Yisu Remy Wang and
Oliver Flatt and
Zachary Tatlock and
Pavel Panchekha},
title = {egg: Fast and extensible equality saturation},
journal = {Proc. {ACM} Program. Lang.},
volume = {5},
number = {{POPL}},
pages = {1--29},
year = {2021},
url = {https://doi.org/10.1145/3434304},
doi = {10.1145/3434304},
timestamp = {Sun, 12 Feb 2023 18:49:14 +0100},
biburl = {https://dblp.org/rec/journals/pacmpl/WillseyNWFTP21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}