-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
publish_sdks workflow needs to be retry-able by language #1043
Comments
@t0yv0 can you link a bit more context to the CI issue here? presumably this was not a Maven issue? |
Maven Central credentials required rotation. |
Oh hey there .. I think this overlaps with this issue: The intention within pulumi-package-publisher is that creating earch release is idempotent and can safely be retried because it'll just skip if already created. However this then interacts badly with the fact that we don't fail when Java fails because of historical flakeyness. This means that Java just gets skipped and the job marked as complete meaning we can't then retry the failure. I think the solution here is to:
I think this work could be included in the epic to cut a GA of the pulumi-package-publisher action. |
To me as a user it seems like a separate issue from silently ignoring failures. I need to be able to retry Java publishing manually without retrying other SDKs that successfully published. I don't think publishing is idempotent in general, it 100% is not for Maven and I'd love us not to count on it being idempotent. |
I believe most publishing processes do allow to be idempotently retried at this point: PyPI and npm do so out of the box, and we run nuget push with I think the two issues are related, and maybe it boils down to a design decision on whether we're able/willing to have separate publishing runs for each relevant language. |
What's the reason these are coupled currently? Even if other languages are idempotent, rerunning them just to get Java to publish is not ideal. |
I believe if we publish them with the same runner, we a) save runners and b) cut down on artifact download time overall, but I may be overestimating how much of an issue that would be. |
I think its reasonable to assume we can implement idempotent behaviour here even if the service doesn't support it directly. Checking if a package version exists should be possible in all package managers, and failing that it should just be a first write wins and the re-pushed package should be ignored. Publishing in a single job is almost certainly going to be faster overall for us than using separate parallel jobs due to runner contention and the overheads. What we've got is pretty good and working well so we should just focus on making the Java release reliable, auto retryable, not ignoring errors and allow retying of the whole job when one or more fails. |
It's not reasonable for Maven Central. There's hours of delay in the OSSRH<->Central publishing pipe. The only chance to make an idempotent solution is trying to publish and then interpreting error codes as "already published" to count them as success, or else use a side channel such as an S3 sentinel to make the step artificially idempotent. I concede that reliability pulumi/pulumi-package-publisher#16 is more important to work on in the first place, but I'm really wondering why are we prioritizing runner contention over usability. I am guessing in an ideal world GHA would allow steps to be scheduled on a single runner but independently retryable so this could be decided to a win-win. However as we stand, does adding 4 more steps to a 30-step workflow really have any observable effect on runner contention? I think having separate GHA steps could be so much easier for the operator to locate errors and logs in as well. At the very least maybe break the languages into separate steps, e.g. see how they all go in a single step https://github.com/pulumi/pulumi-aws/actions/runs/10064400756/job/27825467506#step:4:82 mixing up the logs. |
AWS v6.45.0 failed to publish Java artifacts to Maven central due to a CI issue.
https://github.com/pulumi/pulumi-aws/actions/runs/9962834625
AS maintainers we would like to retry Java SDK publishing and only that, now that the credentials are fixed. However, currently the SDK publishing is a monolithic step involving every language.
The text was updated successfully, but these errors were encountered: