Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mechanism for handling meta.sourceProvenance attributes #161098

Merged
merged 7 commits into from
May 30, 2022

Conversation

risicle
Copy link
Contributor

@risicle risicle commented Feb 20, 2022

Motivation for this change

An attempt at an initial implementation of nixos RFC 89 (https://github.com/NixOS/rfcs/blob/master/rfcs/0089-collect-non-source-package-meta.md)

This is heavily based on patterns used by the meta.licenses infrastructure, so may appear overengineered for its initial level of use. I felt it was more important to follow existing patterns than go for simplicity and risk the two features evolving differently.

Initially I've only added the four source types mentioned in the RFC.

@Artturin
Copy link
Member

Artturin commented Mar 12, 2022

what type will appimages,snaps,flatpaks be, for example chrysalis which we have in nixpkgs.

@risicle
Copy link
Contributor Author

risicle commented Mar 13, 2022

binaryNativeCode

@Artturin
Copy link
Member

https://github.com/NixOS/nixpkgs/blob/master/doc/stdenv/meta.chapter.md
Needs docs

@ghost

This comment was marked as outdated.

@ghost
Copy link

ghost commented Mar 27, 2022

(this relates to `pkgs/os-specific/linux/microcode`, not this PR)

Also: x86_64 processors have microcode, and nearly all of them need microcode updates due to the whole Foreshadow-Meltdown-Spectre fiasco. These microcode updates are not even remotely open source. Nobody not under NDA has much of an idea what the source code would even look like!

https://www.kernel.org/doc/html/latest/x86/microcode.html

These microcode blobs are definitely binaryNativeCode under the current definitions -- as or more dangerous from a security perspective than any other CPU binary.

Since such a huge portion of the current installed base of CPUs needs these blobs, it will probably drive most nixpkgs/nixos users to just set NIXPKGS_ALLOW_NONSOURCE=1, promptly nullifying the benefits of this feature.

You might want to consider adding a category binaryCpuMicrocode:

CPU microcode, provided and cryptographically signed by the CPU manufacturer.

...and a corresponding NIXPKGS_ALLOW_NONSOURCE=microcode to allow only this one particular kind of blob and no other (without having to write a custom allowNonSourcePredicate, which new users are unlikely to do). Intel has been signing their microcode since 1995 and AMD has been signing since 2005 (the K8 and K10 were the last unsigned-microcode chips).

I further suggest that when rejecting a package due to the presence of binaryCpuMicrocode, the rejection message should suggest that the user set NIXPKGS_ALLOW_NONSOURCE=microcode rather than suggesting NIXPKGS_ALLOW_NONSOURCE=1.

Fortunately ARM, MIPS, and PowerPC chips don't need any of this stuff; all they have is chickenbits (and POWER9 actually documents what the chickenbits do!)

@risicle
Copy link
Contributor Author

risicle commented Mar 27, 2022

Would it be possible to mention that while there is no guarantee, nixpkgs maintainers should add additional sourceTypes promptly when it becomes known that these sourceTypes are contained in the package? Just to make it clear that inaction in the face of this knowledge is not a matter that is up to the maintainer's individual discretion. I know this sounds obvious, but I got some weird pushback on #148890 arguing that a BSD-licensed blob with no source code counts as "open source".

Perhaps - I think the key thing here is that we're starting from a position of having zero packages with their sourceProvenance set, and so, at least to start with, people can't really use the information as a guarantee of anything. And suddenly dropping a new mandatory requirement on maintainers heads overnight 🤷 dunno if I like being that guy. I'd rather a "softly softly" introduction.

Note from the RFC that at some point I'd like to add a flag that implies that the sourceProvenance for a package is comprehensive, and you would be able to guarantee such a package has no other source types.

Is there a reason why this field isn't binaryProvenance or just provenance?

I think I don't want to bind it to the word "binary" either - in fact I'm on the fence calling bytecode binaryBytecode as it arguably isn't a binary... if we expand this further to things like fonts and icons, that becomes harder to call "binary". Whereas people use the term "open source" with quite a flexible meaning of the word "source", which is what dragged me in the direction of that word.

... microcode ...

Firstly perhaps this is more an indication that I drew the line between code and firmware in the wrong place using the definition of main CPU or peripheral processor. Suggestions for better wording that would classify microcode as firmware welcome.

More to the point though, these microcodes are not open source and I'd actually fully expect users who care about transparency to already be barring anything non-open-source from their systems and for these packages' sourceProvenance to be considered irrelevant. I wasn't personally going to go round tagging everything proprietary with a binary sourceProvenance. Perhaps should set them to "Well duh".

But I'm sure someone's now going to point me to a package that is unfree but compiled from source 😭

@ghost

This comment was marked as outdated.

@ghost

This comment was marked as outdated.

@ghost

This comment was marked as outdated.

@risicle
Copy link
Contributor Author

risicle commented Mar 28, 2022

It really isn't flexible at all. Source code means "the preferred form in which a programmer would modify the program."

Sure that's what the OSI folks like to think, but as far as how the term actually gets used in the real world, people call all sorts of things "Open Source". Documentation and even encyclopedias that are "Open Source" (guess that means the markdown is available and freely licensed?), icon sets that are "Open Source" (perhaps the SVGs or illustrator files are available and freely licensed?), fonts that are "Open Source". And while some might disapprove of such labelling, it seems far more fitting calling those "Open Source" than calling the products in their final form "binaries". Which we'd effectively be doing if we renamed the attribute to something based on the word "binary".

Anyway, I'd really just like to get this to the point where I no longer have to be the implicit umpire of these debates, because there's a limit to how much I care. All I've ever wanted to do is mark packages that just pull in a bunch of .jars from maven or packages that just pull the .x86_linux.bin from the web. At every step of the way I've tried to avoid getting sucked into the obscure philosophical corners, but at every step it seemed the only way things were going to move forward is if I:

  • wrote the RFC (initially very much as a deliberate MVP)
  • adopted a more complex scheme for it
  • wrote the implementation
  • wrote the documentation

each of which sucked me deeper into the obscure philosophical corners I don't honestly care that much about, and put me in this weird position of being the tsar over decisions.

It's absolutely fine if people want to have these discussions, but I really don't think I should be the umpire, because really I just want to be able to mark the obvious cases. It's been over a year.

@ghost

This comment was marked as outdated.

@risicle
Copy link
Contributor Author

risicle commented Mar 29, 2022

None of your counterexamples are software

My point is that nixpkgs packages contain things other than software. In fact all of my examples are things which are contained in some nixpkgs packages. And while your special interest may be in firmware, people with other special interests will appear. In other discussions the topic of fonts has come up already.

This PR has been open for barely a month.

The first draft of the RFC was a year ago.

the undocumented interaction where meta.license changes what sourceProvenance means

This was not meant to be official or part of the spec in any way, I simply thought nobody was going to bother tagging nonfree software's sourceProvenance. I regret mentioning it as everything's a lit simpler if we assume there's no relationship between sourceProvenance and license.

I have really zero interest in credit - I'm trying to explain that I've ended up having to make decisions about all kinds of things where I don't feel I'm the best person to do that.

I just fear that the PR in its current state will create a lot of confusion down the line.

I think the problem is, though I hate the phrase, "perfect being the enemy of good". I am under no illusion that we're going to create a scheme in its perfect final form before we unleash it to the world. Things can be fixed and iterated upon. We see more than enough tree-wide changes to prove this. I'm mostly interested in getting something that's going to solve the bulk of the problem well enough.

I also have enough experience in creating schemas from my OpenStreetMap days to know that you can spend ages coming up with a watertight perfect schema, but what matters in the end is how people actually tag things. It ends up being as much about ergonomics as precise data representation. Codifying the relationship between licenses and provenance feels quite like when people were trying to argue that a scheme must be able to represent a highway with different styles of streetlight on each side.

@ghost

This comment was marked as outdated.

@ghost

This comment was marked as outdated.

@risicle
Copy link
Contributor Author

risicle commented Mar 29, 2022

I would really appreciate it if you would please not mischaracterize me and my viewpoint like this.

That's not what I'm trying to do - it would have been better put:

And while your special interest may be, for example, in firmware

The changes look good to me, I'll pull them in.

FWIW I really do hope things will get refined later on, because I don't think we're likely to get this right first time.

risicle pushed a commit to risicle/nixpkgs that referenced this pull request Mar 29, 2022
…omSource

This is a suggested implementation of the change recommended in the
first paragraph of this comment:

  NixOS#161098 (comment)
risicle pushed a commit to risicle/nixpkgs that referenced this pull request Mar 29, 2022
This commit is a suggested implementation of the change recommended here:

  NixOS#161098 (comment)
risicle pushed a commit to risicle/nixpkgs that referenced this pull request Mar 29, 2022
This commit clarifies that the meaning of the `meta.sourceProvenance`
field is independent of and unaffected by the value of the
`meta.license` field.  This is based on the intent of the RFC author
as expressed here:

  NixOS#161098 (comment)

This clarification is added for two reasons:

1. If in the future there should be some disagreement about what
   `sourceProvenance` to assign to a package, this may help resolve
   the disagreement.  Any interpretation of `sourceProvenance` which
   is influenced by the `meta.license` is clearly an incorrect
   interpretation.

2. If it should turn out that it is impossible to disentangle
   `sourceProvenance` from `meta.license`, this would indicate the
   need for changes to the `sourceProvenance` scheme.  That change
   might be as simple as replacing the sentence added by this commit
   with some other sentence explaining how the two fields influence
   each other.

This commit implements the recommendation made in the paragraph of
this comments which begins with "Please say this explicitly...":

  NixOS#161098 (comment)
@ghost

This comment was marked as outdated.

@ghost
Copy link

ghost commented May 15, 2022

Hello folks, is there any timeline on merging this?

I would like to apologize again for derailing this PR with my comments a few months back. I was not aware of the discussion attached to the (separate) PR for the RFC; this PR links to directly to the rendered RFC so that is what I read. I have hidden all of my comments above in an attempt to fix some of the damage I caused.

Over the past months I have become increasingly horrified by the number of nixpkgs expressions which simply download unauditable binaries onto my computer instead. I don't need nix in order to do that. The scope of the problem is pretty shocking, and there isn't even any machine-checkable way to know when it's happening. It is really really bad. I'm starting to feel like I cannot in good conscience recommend nixpkgs to people as a result of this.

@risicle, I apologize again. This situation is a total disaster and you seem to be the only one doing anything about it. Thank you for your efforts.

The situation with haskell packages is especially bad. Because so much of the haskell infrastructure in nixpkgs is chronically broken it seems that packages like purescript are doing the silent binary download thing simply to keep from being broken when their dependencies break.

If nixpkgs committers keep ignoring this problem perhaps we will need to look into some sort of automation hack. Nix already scans the binaries it installs into /nix/store. Perhaps we could hack the linker to mark binaries which it emits, and hack nix to look for the mark. It's a crude and incomplete solution (doesn't deal with things like Java binaries, and technically a package could download .o files, run them through the linker, and not get noticed, but I doubt anybody does this) but it might be better than nothing.

@risicle
Copy link
Contributor Author

risicle commented May 15, 2022

Sorry, I've just been busy with other things and now everyone's worrying about the 22.05 release..

Copy link
Contributor

@aforemny aforemny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @risicle for driving this. I am very excited to see this in Nixpkgs!

I have added my review, which mostly contains remarks to further my own understanding. Please, feel free to resolve as you see fit (ie. to resolve without changes is OK, too)!

I could test this, and it behaves to my expectation. So, it LGTM!

doc/stdenv/meta.chapter.md Outdated Show resolved Hide resolved
lib/source-types.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Show resolved Hide resolved
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/packages-marked-as-broken-should-come-with-an-explanation/19187/14

pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
pkgs/stdenv/generic/check-meta.nix Outdated Show resolved Hide resolved
doc/stdenv/meta.chapter.md Outdated Show resolved Hide resolved
risicle and others added 7 commits May 20, 2022 23:29
heavily based on patterns used by licenses infrastructure, so may
appear overengineered for its initial level of use
…ges to meta.license

This commit clarifies that the meaning of the `meta.sourceProvenance`
field is independent of and unaffected by the value of the
`meta.license` field.  This is based on the intent of the RFC author
as expressed here:

  NixOS#161098 (comment)

This clarification is added for two reasons:

1. If in the future there should be some disagreement about what
   `sourceProvenance` to assign to a package, this may help resolve
   the disagreement.  Any interpretation of `sourceProvenance` which
   is influenced by the `meta.license` is clearly an incorrect
   interpretation.

2. If it should turn out that it is impossible to disentangle
   `sourceProvenance` from `meta.license`, this would indicate the
   need for changes to the `sourceProvenance` scheme.  That change
   might be as simple as replacing the sentence added by this commit
   with some other sentence explaining how the two fields influence
   each other.

This commit implements the recommendation made in the paragraph of
this comments which begins with "Please say this explicitly...":

  NixOS#161098 (comment)
strings complicate reasoning about values and may not be needed with `sourceProvenance`

Co-authored-by: Alexander Foremny <aforemny@posteo.de>
Co-authored-by: Adam Joseph <54836058+a-m-joseph@users.noreply.github.com>
it may be what the license handling code does, but it's confusing and not very useful

Co-authored-by: Adam Joseph <54836058+a-m-joseph@users.noreply.github.com>
Co-authored-by: Alexander Foremny <aforemny@posteo.de>
Copy link
Contributor

@aforemny aforemny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @risicle for incorporating all of our comments, and thank you @a-m-joseph in your help reviewing this!

I could re-test the changes here, and the implementation behaves to my expectation.

I am really glad to see this in Nixpkgs!

@aforemny aforemny merged commit ae0df5d into NixOS:master May 30, 2022
aforemny pushed a commit that referenced this pull request May 30, 2022
…ges to meta.license

This commit clarifies that the meaning of the `meta.sourceProvenance`
field is independent of and unaffected by the value of the
`meta.license` field.  This is based on the intent of the RFC author
as expressed here:

  #161098 (comment)

This clarification is added for two reasons:

1. If in the future there should be some disagreement about what
   `sourceProvenance` to assign to a package, this may help resolve
   the disagreement.  Any interpretation of `sourceProvenance` which
   is influenced by the `meta.license` is clearly an incorrect
   interpretation.

2. If it should turn out that it is impossible to disentangle
   `sourceProvenance` from `meta.license`, this would indicate the
   need for changes to the `sourceProvenance` scheme.  That change
   might be as simple as replacing the sentence added by this commit
   with some other sentence explaining how the two fields influence
   each other.

This commit implements the recommendation made in the paragraph of
this comments which begins with "Please say this explicitly...":

  #161098 (comment)
@ghost ghost mentioned this pull request May 30, 2022
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants