Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Source and Source256 fields to dependencies in buildpack.toml. #196

Merged
merged 3 commits into from
Sep 20, 2023
Merged

Conversation

robdimsdale
Copy link
Member

Summary

This PR adds source and source-sha256 fields to the dependencies in buildpack.toml. These fields are not used by buildpacks directly, but instead are used third-party provenance tools which leverage these fields to determine what dependencies a buildpack ships.

Paketo buildpacks that use packit instead of libpack already support these fields - this PR brings the libpack-based buildpacks inline with the existing behavior for the other buildpacks.

I believe that this should be a backwards-compatible change when bumped in the buildpacks and pipeline-builder. We don't rely on these fields for any buildpack logic, so whether they exist or not has no impact on the buildpack behavior. Whenever libpack is bumped in pipeline-builder (and the pipelines are bumped in the buildpacks), it is my understanding that the next time dependencies are updated they will have these new fields.

Implementation notes

I couldn't find an integration test for cmd/update-buildpack-dependency/main.go, but I updated relevant existing unit tests in carton.

I am open to discussion about the best way to handle missing --source and --source-256 fields when invoking cmd/update-buildpack-dependency/main.go. I think using the uri and sha256 is a sensible default when both fields are missing - this is consistent with other paketo buildpacks that already have values for source and source-sha256. I think at the carton level we should treat them as any other plain string.

I'd be open to moving more logic into carton, but we already have a precedent for some initial flag processing in cmd so I wanted to follow that.

As an aside, I removed an instance of it.Focus which was committed previously; likely inadvertently.

Checklist

  • I have viewed, signed, and submitted the Contributor License Agreement.
  • I have linked issue(s) that this PR should close using keywords or the Github UI (See docs)
  • I have added an integration test, if necessary.
  • I have reviewed the styleguide for guidance on my code quality.
  • I'm happy with the commit history on this PR (I have rebased/squashed as needed).

@robdimsdale robdimsdale requested a review from a team December 23, 2022 18:36
- This is primarily used by third-party provenance tools which leverage these fields to determine what dependencies a buildpack ships.
@dmikusa dmikusa added type:enhancement A general enhancement semver:minor A change requiring a minor version bump labels Jan 6, 2023
@dmikusa
Copy link
Contributor

dmikusa commented Jan 6, 2023

I couldn't find an integration test for cmd/update-buildpack-dependency/main.go, but I updated relevant existing unit tests in carton.

I doubt there is one. There have never been many integration tests. It's fine if you just push what you can into the unit tests for carton.

I am open to discussion about the best way to handle missing --source and --source-256 fields when invoking cmd/update-buildpack-dependency/main.go. I think using the uri and sha256 is a sensible default when both fields are missing - this is consistent with other paketo buildpacks that already have values for source and source-sha256. I think at the carton level we should treat them as any other plain string.

I think my preference would be that if you don't set the flags then update-buildpack-dependency does nothing to that field. It feels like a side-effect to have those default to something other than the input value. The tool is very basic and very deliberate, I think there's value in keeping it that way.

My preference would also be that if the fields are empty they do not get included. I think you can add omitempty as a struct tag and it'll not set these. To me, it's confusing to have Source and Source256 if they are empty or if they are pointing to anything but legit source/source256 values. Not having anything is better than having something wrong, IMHO.

I'd be open to moving more logic into carton, but we already have a precedent for some initial flag processing in cmd so I wanted to follow that.

Flag processing is fine in the cmd.

As an aside, I removed an instance of it.Focus which was committed previously; likely inadvertently.

Thanks 🤦

I was talking to @pivotal-david-osullivan about this and I think there are going to be more things to do here. This is probably one of the most painful changes to make in the pipelines because it has to filter through three different levels.

  1. The update-buildpack-dependency tool needs to have the flags and support for updating the new fields (this PR). I can't remember if you need to cut a libpak release at this point. The CI pipelines install it like this which uses @latest I'm not sure if that'll pick up the latest commit or latest release. You might need to cut a release, just keep that in mind.

  2. This is an example of what runs update-buildpack-dependency. That chunk of workflow is generated by octo, specifically here and here. The former is the script, which would need to have the new arguments introduced and the latter is what passes in the env variables that would be used as the new arguments in the script. This should be fairly easy, just linking the new action outputs for source & source256 to the env variables used in the script.

  3. This brings us to the 3rd part. The actions. Updates are picked up by actions. The action checks upstream repos, finds the resources, download them, calculates the sha256 hash, and then puts all that into the step outputs so it can be referenced by the step that actually updates the buildpack.toml (# 2).

    None of the actions are outputting these values, so you'd need to update the actions to do this. I think it will probably vary from action to action where it can fetch that info. In some cases, we might be able to fetch it from an upstream API, we might be able to scrape it from a download page, we might be able to construct the URL, others it might have to be manual. Those individual actions will have to be updated too.

    Going back to my point above about not having update-buildpack-dependency add any empty fields or default values, this is why. We can add steps 1 and 2 above at any pace and they won't change anything, not even adding new fields. Then as we update the actions to pull in the source/source256 info, it'll just start appearing for the buildpacks that use those actions. I know we're not using those fields, but there are others outside of Paketo that are using these pipelines and the most curious thing to do would be not add them until they are fully functional.

Hope that helps!

@robdimsdale
Copy link
Member Author

Thanks for the detailed feedback @dmikusa

The main reason I wanted to introduce the logic of "default to the values of uri/sha256" in this tool is precisely to simply the process of introducing these values: to avoid having to propagate that logic all the way upstream across the multiple steps as you describe.

I'm not convinced that the value of keeping the buildpack dependency tool simple is worth the extra effort, but I'm happy to defer to your requests as a maintainer :)

However, I do think there's something I'm missing:

The way you describe the actions requiring custom logic per dependency to determine the source URI surprises me, and makes me think that I'm missing something in the way the URI field is being used. My mental model is that the actions find a URI for a dependency, calculate its SHA256, and write that into the buildpack.toml. I would expect those values to be sufficient for the source-uri/source-sha256 values too. I don't see under what circumstances they would be different, so I don't see why we would need extra logic.

As an example the bellsoft-liberica buildpack has this value for one of its dependencies: uri = "https://github.com/bell-sw/Liberica/releases/download/8u352+8/bellsoft-jdk8u352+8-linux-amd64.tar.gz".

One possibility is that the pipelines do different things when creating offline buildpacks, but last time I looked a buildpack built using this infrastructure, it seemed like the URI in buildpack.toml didn't change - it always pointed to the upstream, source, URI.

So, can you expand a bit more on what extra logic would have to go into the actions in order to identify values for these new source/source-sha fields?

@dmikusa
Copy link
Contributor

dmikusa commented Jan 6, 2023

So, can you expand a bit more on what extra logic would have to go into the actions in order to identify values for these new source/source-sha fields?

That's a good question, cause I'm starting to think that I'm not understanding the requirements here.

My understanding, what you can base my previous comments on, is that source is the source code and source-sha is the sha256 of the source code bundle. Compared to url which is the actual binary and sha256 which is the hash of the binary. I'm starting to think that's incorrect. Can you clarify what's supposed to go in these fields?

@robdimsdale
Copy link
Member Author

Sure, so the high-level goal is to make it easier for some (proprietary) tooling to be able to perform provenance of open source components. We've already built tooling for the other Paketo buildpacks which uses the source key (and associated checksum of whatever that points to).

For buildpacks that provide a dependency directly - without compilation or pre-processing (and which I believe is all of the buildpacks created with this pipeline tooling) - the source URI would point to the precompiled/bytecode artifact, and therefore would be identical to the URI field.

For buildpacks that do pre-processing/compilation, we host the resulting artifact on https://artifacts.paketo.io, and as such the source field would point to whatever was pulled from upstream. For example, in Python that would be the Python source code, but for something like bundler, that's actually a gem.

Does that help clarify?

Next, I'd like to clarify that all the buildpacks that are built with this tooling use upstream dependencies as-is - that they do not perform any pre-processing/compilation and therefore that they don't re-host artifacts on https://artifacts.paketo.io or elsewhere.

If that is true then the source field will always be identical to the URI field. If that's not the case then we would need to explicitly add in the source field at some point in the pipeline - wherever the compilation/pre-processing takes place.

@dmikusa
Copy link
Contributor

dmikusa commented Jan 6, 2023

Does that help clarify?

Yes, thanks. That helps.

All of the Java buildpacks, utility buildpacks (watchexec, syft, etc..), and APM buildpacks all take upstream dependencies. We don't build any dependencies. I don't have a crystal ball or anything, but I don't see that changing in the future. There'd have to be a really strong reason since we don't have the infrastructure to do that, and it would be a considerable effort.

Given that, can we just short-circuit all this and have you use the url and sha256 fields directly? Why add more fields that will just duplicate information?

I also am concerned about the default values being non-blank because I feel like we'll get into a situation where the tooling is there to add the default value fields, but the tooling never gets updated to actually allow those to be properly controlled. In short, I'd like to make sure we get a full solution, not just a partial solution.

@robdimsdale
Copy link
Member Author

@dmikusa the challenge with updating the tooling to just use uri when source isn't available is that we can't differentiate between the case where uri is a valid alternative to source (and hence source will never be populated) vs the situation where source hasn't yet been added, or was erroneously removed.

Another argument for adding the source field to the libpak-based buildpacks is to bring them more in line with the packit buildpacks. This is a good goal to strive towards, but it isn't the primary reason I'm looking to add these fields.

Would you prefer me to open an RFC to codify these fields for all Paketo buildpacks, and have this discussion there?

@dmikusa
Copy link
Contributor

dmikusa commented Jan 6, 2023

@dmikusa the challenge with updating the tooling to just use uri when source isn't available is that we can't differentiate between the case where uri is a valid alternative to source (and hence source will never be populated) vs the situation where source hasn't yet been added, or was erroneously removed.

That can be done per buildpack. You have a list that never will, if it's in the list you're all set.

Another argument for adding the source field to the libpak-based buildpacks is to bring them more in line with the packit buildpacks. This is a good goal to strive towards, but it isn't the primary reason I'm looking to add these fields.

Would you prefer me to open an RFC to codify these fields for all Paketo buildpacks, and have this discussion there?

I don't have a problem adding the fields. I just had to ask, since it seemed like duplication of data for little benefit.

If we add the fields, I would want to see it done the way I outlined above though, I don't like the idea of having the defaults to url/sha256. I can update that notes based on my new understanding of those fields. I just want to see something that allows the fields to be fully utilized through the pipelines and follows the existing conventions.

fixes test format issues
@pivotal-david-osullivan pivotal-david-osullivan requested a review from a team as a code owner September 19, 2023 09:11
@pivotal-david-osullivan
Copy link
Contributor

@dmikusa @robdimsdale I updated this PR based on the discussion here - the source fields will not be picked up if they are not set in the dependency and will not be updated if they aren't set.

When running unit tests locally I found they were failing since the removal of the 'Focus' - this was causing failing tests to be skipped altogether. I updated the tests' format for their buildpack.toml data which I think was out-of-date, tests passed with this change.

@pivotal-david-osullivan pivotal-david-osullivan merged commit 5dd2a92 into paketo-buildpacks:main Sep 20, 2023
dmikusa added a commit to paketo-buildpacks/libpak-tools that referenced this pull request Nov 9, 2024
Signed-off-by: Daniel Mikusa <dan@mikusa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver:minor A change requiring a minor version bump type:enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants