Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CEP for MatchSpec minilanguage #82

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Jun 4, 2024

Closes #80

📝 👓 Markdown preview

@jaimergp
Copy link
Contributor Author

jaimergp commented Jun 4, 2024

I'm seeing myself referring to the "MatchSpec" interface in other CEPs yet this is not standardized, so there we go. Let's open that can of worms.

@jaimergp jaimergp mentioned this pull request Jun 4, 2024
2 tasks
@jaimergp
Copy link
Contributor Author

jaimergp commented Jun 5, 2024

This will probably need another CEP on PackageRecord, which will probably ask for Repodata counterparts and... channel structure. Yay. I like how packaging.python.org does this btw. I'll probably copy some of that structure.

Comment on lines +46 to +47
- `license`
- `license_family`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license and license_family could be used for search packages with a specific license I guess, say with conda search '*[license="Apache-2.0"]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true, I hadn't considered search here, only install-oriented operations. I should rephrase this part a bit to cover this aspect.


The `MatchSpec` mini language has gone through several iterations.

The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any version specifier. `build` can be any string matcher. See "Match conventions" below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any version specifier. `build` can be any string matcher. See "Match conventions" below.
The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any [version specifier](#version-specifier). `build` can be any [string matcher](#string-matching). See [Match conventions](#match-conventions) below.

Also, should we define what characters are accepted in a package name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is going to be part of a different CEP, PackageRecord.


### Exact matches

To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When matching by checksum, should you also add the subdir? If I'm not mistaken, it's possible for two subdirs to contain a package with the same checksum right? Or is this a corner case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checksums are coming from the compressed artifacts, so in principle they should be unique (even with unique contents, the index.json file should have "subdir": <subdir>, I think?).

The hash that conda-build uses for the build_string doesn't consider the subdir, indeed (and maybe it should).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.

Copy link
Contributor

@baszalmstra baszalmstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great write up @jaimergp !


The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any version specifier. `build` can be any string matcher. See "Match conventions" below.

The positional syntax also allows the `=` character as a separator, instead of a space. When this is the case, versions are interpreted differently. `pkg=1.8` will be taken as `1.8.*` (fuzzy), but `pkg 1.8` will give `1.8` (exact). To have fuzzy matches with the space syntax, you need to use `pkg =1.8`. This nuance does not apply if a `build` string is present; both `foo==1.0=*` and `foo=1.0=*` are equivalent (they both understand the version as `1.0`, exact).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just reporting the current state of affairs but, jucky.

In rattler, this form is no longer allowed when parsing in strict mode. (still accepted in lenient parsing mode).

following conventions:

- If the string begins with `^` and ends with `$`, it is converted to a regex.
- If the string contains an asterisk (`*`), it is transformed from a glob to a regex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstood what it means to transform a glob to a regex but *cuda is a valid build string glob right?

> < 0.960923
> < 1.0
> < 1.1dev1 # special case 'dev'
> < 1.1_ # appended underscore is special case for openssl-like versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is not part of this CEP but the suffix _ notion is not present in the description above.. It is also another can of worms. 1.0- is also valid. So is 1.0__ and 1.0--...

- Exact equality and negated equality: `==`, `!=`.
- Fuzzy equality: `=`, `*`. `=1.0` and `1.0.*` are equivalent, and both would match `1.0.0` and `1.0.1`, but not `1.1` or `0.9`.
- Logical operators: `|` means OR, `,` means AND. `1.0|1.2` would match both `1.0` and `1.2`. `>=1.0,<2.0a0` would match everything between `1.0` and the last version before `2.0a0`. `,` (AND) has higher precedence than `|` (OR). `>=1,<2|>3` means `(>=1,<2)|(>3)`; i.e. greater than or equal to `1` AND less than `2` or greater than `3`, which matches `1`, `1.3` and `3.0`, but not `2.2`.
- Semver-like operator: `~=`. `~=0.5.3` is equivalent to `>=0.5.3, <0.6.0a` and this syntax is preferred for backwards compatibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not entirely correct, it should be ~= is equivalent to >=0.5.3, 0.5.*. This is an important distinction because both 0.6.0_ and 0.6.0dev are considered smaller than 0.6.0a so they both would still match >=0.5.3, <0.6.0a!


### Exact matches

To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.


## Reference

- [`conda.models.match_spec.MatchSpec`](https://github.com/conda/conda/blob/24.5.0/conda/models/match_spec.py)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +76 to +77
6. If `channel` is an exact value and `subdir` is an exact value, `subdir` is appended to
`channel` with a `/` separator. Otherwise, `subdir` is included in the key-value brackets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this related to the label channels? e.g. pytorch/label/nightly::libfaiss?
With the seperator logic this will be assumed to be a subdir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in conda is to take the last component and compare it against known subdirs. As a result, channels cannot be named like subdirs. e. g. I can't register a channel named linux-64.

@jjerphan
Copy link

jjerphan commented Oct 10, 2024

Hi Jaime, thank you for this proposal!

Do you think we could come up with a ANTRL4 grammar for MatchSpec so that we could generate parser for the various package managers (and potentially other utilities such as pre- and post-build checkers) of the ecosystem using it?

If we have an exhaustive set of valid instances of MatchSpec and a comprehensive description of what we what to handle, we could even use LLMs to generate such a grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CEP request: Document MatchSpec
5 participants