Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src/cosalib/aliyun.py: save replication progress in meta.json #3839

Closed
wants to merge 1 commit into from

Conversation

mike-nguyen
Copy link
Member

Previously meta.json would only be written if replication was successful in all regions. If there were any errors during replication meta.json would not be written. Subsequent replication runs would start over from scratch which would not be a problem except Aliyun will error out if there is already an image with the same name. Re-running the operation on the same build was not possible without deleting the images before hand.

Lets write meta.json after every region replication so we can keep track of the replicated regions and allow re-runs.

Previously meta.json would only be written if replication was
successful in all regions. If there were any errors during
replication meta.json would not be written. Subsequent replication
runs would start over from scratch which would not be a problem
except Aliyun will error out if there is already an image with the
same name. Re-running the operation on the same build was not
possible without deleting the images before hand.

Lets write meta.json after every region replication so we can keep
track of the replicated regions and allow re-runs.
@mike-nguyen
Copy link
Member Author

/test rhcos

Copy link
Contributor

@jbtrystram jbtrystram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the patch, it just appears to be 3 comments lines added. Is it on purpose ?

Python indentation is hard to see sometimes :)

Copy link
Contributor

@jbtrystram jbtrystram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mike-nguyen
Copy link
Member Author

/test rhcos

Copy link

openshift-ci bot commented Jul 27, 2024

@mike-nguyen: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/rhcos ceb5404 link true /test rhcos

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mike-nguyen
Copy link
Member Author

Looks like openshift/os#1551 for CI issues

@jlebon
Copy link
Member

jlebon commented Aug 6, 2024

Hmm, it looks like this and coreos/fedora-coreos-pipeline#1017 are trying to address #3634. But I think the proper fix is to make ore aliyun copy-image itself idempotent. E.g. if it finds an existing image of the same name, then it should just no-op and return the ID of the existing image. This is similar to what's done for AWS.

Right now for example, if the Publish step fails in the pipeline, we're still in a situation where we have missing data in meta.json.

The bit in aliyun.py and aws.py where we filter out regions that were already uploaded should be more of an optimization rather than load-bearing.

@mike-nguyen
Copy link
Member Author

mike-nguyen commented Aug 9, 2024

The bit in aliyun.py and aws.py where we filter out regions that were already uploaded should be more of an optimization rather than load-bearing.

Thanks! I missed that there was more going on behind the scenes on the AWS side. I'll get the Aliyun side to function similarly.

@mike-nguyen
Copy link
Member Author

Closed in favor of #3872

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants