-
-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3Packages.tree-sitter-grammars: init at 0.22.5 #320783
base: master
Are you sure you want to change the base?
python3Packages.tree-sitter-grammars: init at 0.22.5 #320783
Conversation
}; | ||
}; | ||
in | ||
# TODO pkgset or flattened? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't see any pkgsets inside python3Packages so we will go ahead and flatten this unless we're told otherwise. But we didn't any flattening in top-level either so we guess this will be the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed our minds. We saw at least two package sets nested in python3Packages
. We saw bootstrap
and qt6
. And this one is literally mapped from an existing package set (tree-sitter.builtGrammars
) so it seems appropriate.
6937418
to
253c53f
Compare
253c53f
to
1ff067a
Compare
1ff067a
to
fe9e96c
Compare
4eeaa9e
to
8c63440
Compare
8c63440
to
0550b06
Compare
0550b06
to
5947cdb
Compare
5947cdb
to
ea460ca
Compare
pkgs/top-level/python-packages.nix
Outdated
# ImportError: /nix/store/7w7piy6hpdj1swdg5r0bz47gk2g3q855-tree-sitter-perl-grammar-0.22.5/parser: undefined symbol: isnumber | ||
"tree-sitter-perl" | ||
# ImportError: /nix/store/6rkhjf2mhwnfqwq2962fvv9bnd4xmpk7-python3.11-python-tree-sitter-ql-dbscheme-0.22.5/lib/python3.11/site-packages/tree_sitter_ql_dbscheme/_binding.abi3.so: undefined symbol: tree_sitter_ql_dbscheme | ||
"tree-sitter-ql-dbscheme" | ||
# ImportError: /nix/store/ybmnykhrgj6sghw4r7ypf897c4wrncr5-python3.11-python-tree-sitter-org-nvim-0.22.5/lib/python3.11/site-packages/tree_sitter_org_nvim/_binding.abi3.so: undefined symbol: tree_sitter_org_nvim | ||
"tree-sitter-org-nvim" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each of these might be resolved by simply bumping its source.
Some could possibly be resolved with some tinkering, which we decided should not delay the submitting of this PR for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved. All grammars pass!
For the record, this is already ready for review, even though it's marked as draft. |
ea460ca
to
d3890e6
Compare
be0a31b
to
93a9562
Compare
pkgs/top-level/python-packages.nix
Outdated
# ImportError: /nix/store/7w7piy6hpdj1swdg5r0bz47gk2g3q855-tree-sitter-perl-grammar-0.22.5/parser: undefined symbol: isnumber | ||
"tree-sitter-perl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be blocked by ganezdragon/tree-sitter-perl#45.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resolved.
93a9562
to
615c75b
Compare
485f437
to
cb96a09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the usage example from #320783 (comment) (with a properly written out shell.nix) to the Nixpkgs manual.
# `name`: grammar derivation pname in the format of `tree-sitter-<lang>` | ||
name, | ||
grammarDrv, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rather cosmetic and maybe @jtojnar has a sharper opinion on this, but it would surprise me less if that was the argument of a nested function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like this ?
{
lib,
buildPythonPackage,
pytestCheckHook,
tree-sitter,
symlinkJoin,
writeTextDir,
pythonOlder,
} : {
# `name`: grammar derivation pname in the format of `tree-sitter-<lang>`
name,
grammarDrv,
} :
# it could cause a symbol mismatch at load time. This override ensures the binding can | ||
# find the correct symbol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# it could cause a symbol mismatch at load time. This override ensures the binding can | |
# find the correct symbol | |
# it could cause a symbol mismatch at load time. This manually curated collection of overrides | |
# ensures the binding can find the correct symbol |
I was confused about this singleton attrset, but after reading the code multiple times understood that it's supposed to be a collection which may grow.
'') | ||
(writeTextDir "pyproject.toml" '' | ||
[build-system] | ||
requires = ["setuptools>=42", "wheel"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not a big deal, but that hard-coded value has a bit of a smell to me. We probably can get that from our buildPythonPackage
, although it doesn't make much sense to change this file on every dependency bump. But at the very least we should document why it's hard-coded and why that value, so future readers will know when to touch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I doubt that we need any version constraint here since we are tied with the one provided by the current version of python we are building for.
I removed it, I will push it shortly.
define_macros=[ | ||
# Define python limited API for compatibility. | ||
# https://docs.python.org/3/c-api/stable.html#c.Py_LIMITED_API | ||
("Py_LIMITED_API", "0x03080000"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to derive that from the Python we use for building? Or if it's fixed to 3.8 for some reason, at least use the same source for it everywhere it's referenced. See all the other hard-coded values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to remove Py_LIMITED_API
, since we control both the build and runtime environments (using shells), so there's no need to enforce forward compatibility through the limited API.
|
||
preCheck = '' | ||
# https://github.com/NixOS/nixpkgs/issues/255262 | ||
rm -r ${snakeCaseName} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confuses me, why do we do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pytestCheckHook
uses the command python -m pytest
, and as explained in this comment, this command adds the current directory to the search path. As a result, we end up executing tests for source files that do not contain the required shared object (and thus, failing the tests).
Looking at all the links in issue #255262, there are several ways to address this problem. A few examples include:
- Removing the source files (this is the approach we use).
- Using an alternative test command (e.g., pytest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so "looking at all the links" doesn't scale for readers, but regardless I don't understand the approach: this looks like we're removing the code under test entirely, so what's the point of running the test? Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't mean for you to go through all the links. I was just pointing out that the solutions I found seem to involve some hacky workarounds for the pytestCheckHook
behavior.
While I don't think this is directly related to the PR at hand, I want to better understand the issue so I can explain it more clearly. I am investigating right now.
At a high level, my current understanding, it that it is related to how pytest
loads the tests. We want pytest
to use the compiled version that contains the shared object, rather than the source version that is not yet ready to be used. And without removing this directory, pytest
loads the sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of the issue
The pytest -m
command used by the pytestCheckHook
(pkgs/development/interpreters/python/hooks/pytest-check-hook.sh
) adds the current directory to sys.path
(documented here).
With this setup, the current directory is prioritized before the Python site-packages folder in sys.path
. Both locations contain a directory named ${snakeCaseName}
. When Python tries to load the module, it prioritizes the current directory, which lacks the compiled binding, causing the test to fail. The compiled binding resides in $out/${python.sitePackages}
after the build.
This issue primarily affects bindings since the source does not include the compiled object.
Example: tree-sitter-python
grammar (snakeCaseName
= tree_sitter_python
)
During testing, the sys.path
includes the following critical paths (among others). The second and third paths both contain a tree_sitter_python
directory, but the first path is where the tests reside:
'/build/python-tree-sitter-python-source/tests'
'/build/python-tree-sitter-python-source'
'/nix/store/qs13k1z45jxh1q3hsda470f9r3i0pc86-python3.12-python-tree-sitter-python-0.24.3/lib/python3.12/site-packages'
...
Directory contents:
-
First directory:
/build/python-tree-sitter-python-source
. ├── build │ ├── lib.linux-x86_64-cpython-312 │ │ └── tree_sitter_python │ │ ├── __init__.py │ │ ├── _binding.abi3.so │ │ └── binding.c ... ├── tests │ └── test_language.py ├── tree_sitter_python │ ├── __init__.py │ └── binding.c └── tree_sitter_python.egg-info
-
Second directory:
$out/${python.sitePackages}
(/nix/store/qs13k1z45jxh1q3hsda470f9r3i0pc86-python3.12-python-tree-sitter-python-0.24.3/lib/python3.12/site-packages
)├── tree_sitter_python │ ├── __init__.py │ ├── _binding.abi3.so │ ├── binding.c │ └── __pycache__ └── tree_sitter_python-0.24.3.dist-info
Failure mechanism:
When the test imports tree_sitter_python
, Python looks in the first matching directory in sys.path
. In this case, it prioritizes /build/python-tree-sitter-python-source
, which does not contain the compiled _binding.abi3.so
object. This results in an import error:
ImportError: No module named 'tree_sitter_python._binding'
I hope this raises the confusions.
@@ -60,6 +60,7 @@ let | |||
src = grammar.src or (fetchGrammar grammar); | |||
location = grammar.location or null; | |||
generate = grammar.generate or false; | |||
isBroken = builtins.hasAttr "isBroken" grammar && grammar.isBroken || false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not
isBroken = builtins.hasAttr "isBroken" grammar && grammar.isBroken || false; | |
isBroken = grammar ? isBroken && grammar.isBroken; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed. The changes will be pushed shortly.
def copy_grammar_to_tmpdir(path: str, tmpdirname: str): | ||
yield "cp" | ||
yield "--no-preserve=mode,ownership" | ||
yield "-r" | ||
yield f"{path}/." | ||
yield tmpdirname | ||
|
||
|
||
def check_grammar(data) -> bool: | ||
path = data["path"] | ||
|
||
with tempfile.TemporaryDirectory() as tmpdirname: | ||
# Copy grammar to a temporary directory | ||
run_cmd(copy_grammar_to_tmpdir(path, tmpdirname)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creative abuse of yield
is a peculiar pattern, is that used a lot around Nixpkgs? I guess it's to save on diff noise when updated, but it cost me quite some time to grasp why we even to do that. In this particular case it's not at all evident why we're removing the implementation details from the call site and don't do the direct thing:
def copy_grammar_to_tmpdir(path: str, tmpdirname: str): | |
yield "cp" | |
yield "--no-preserve=mode,ownership" | |
yield "-r" | |
yield f"{path}/." | |
yield tmpdirname | |
def check_grammar(data) -> bool: | |
path = data["path"] | |
with tempfile.TemporaryDirectory() as tmpdirname: | |
# Copy grammar to a temporary directory | |
run_cmd(copy_grammar_to_tmpdir(path, tmpdirname)) | |
def check_grammar(data) -> bool: | |
path = data["path"] | |
with tempfile.TemporaryDirectory() as tmpdirname: | |
run_cmd([ "cp", "--no-preserve=mode,ownership", "-r", f"{path}/.", tmdirname]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, happy to make this change.
# Run the tree-sitter test command | ||
res = sub.run( | ||
"cd {tmpdirname} ; {tree_sitter_bin} test".format( | ||
tmpdirname=tmpdirname, tree_sitter_bin=bins["tree-sitter"] | ||
), | ||
stdout=sub.PIPE, | ||
shell=True, | ||
) | ||
|
||
# Return True if the command succeeded, False otherwise | ||
return res.returncode == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Run the tree-sitter test command | |
res = sub.run( | |
"cd {tmpdirname} ; {tree_sitter_bin} test".format( | |
tmpdirname=tmpdirname, tree_sitter_bin=bins["tree-sitter"] | |
), | |
stdout=sub.PIPE, | |
shell=True, | |
) | |
# Return True if the command succeeded, False otherwise | |
return res.returncode == 0 | |
test_result = subprocess.run( | |
[bins["tree-sitter"], "test"], | |
stdout=subprocess.PIPE, | |
cwd=tmpdirname | |
) | |
return test_result.returncode == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearer, thanks
(writeTextDir "tests/test_language.py" '' | ||
from ${snakeCaseName} import language | ||
from tree_sitter import Language, Parser | ||
|
||
|
||
def test_language(): | ||
lang = Language(language()) | ||
assert lang is not None | ||
parser = Parser() | ||
parser.language = lang | ||
tree = parser.parse(bytes("", "utf-8")) | ||
assert tree is not None | ||
'') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we're already running upstream tests on update, do we want to keep that as a last-resort smoke test? If yes, we should not it as such in order to manage expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It caches binding errors, such as when there is symbols mismatch between the compiled grammar and the binding.
I am adding a comment to explain it.
Something like:
# This test only checks that the binding can load the grammar from the compiled shared object.
# It does not verify the grammar itself; that is tested in
# `pkgs/development/tools/parsing/tree-sitter/grammar.nix`.
def test_language():
lang = Language(language())
Thank you @fricklerhandwerk for the review, I addressed most of your comments. I still need to address the following:
I'll do that shortly, I just need to locate it first.
Regarding Py_LIMITED_API, I don’t have the right answer yet. I need to think about it a bit more. |
ee55fa3
to
a15eec3
Compare
Co-authored-by: yakampe <yanis.kampe.cv@gmail.com> Co-authored-by: GetPsyched <priyanshu@getpsyched.dev> Co-authored-by: Shahar "Dawn" Or <mightyiampresence@gmail.com> Co-authored-by: Robert James Hernandez <rob@sarcasticadmin.com>
Co-authored-by: Robert James Hernandez <rob@sarcasticadmin.com> Co-authored-by: Yifei Sun <ysun@hey.com> Co-authored-by: Ali Jamadi <jamadi1377@gmail.com> Co-authored-by: yakampe <yanis.kampe.cv@gmail.com> Co-authored-by: GetPsyched <priyanshu@getpsyched.dev> Co-authored-by: Adrien Faure <adrien.faure@protonmail.com> Co-authored-by: Shahar "Dawn" Or <mightyiampresence@gmail.com>
77d6268
to
786c636
Compare
Description of changes
As request in ngi-nix/ngipkgs#139
This generates a python binding package for each tree-sitter-grammar that exists in
tree-sitter.builtGrammars
.Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.