Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[destination-s3] Deferred deletion incorrectly deleting newly created files #45086

Closed
1 task
RocketerJames opened this issue Sep 2, 2024 · 2 comments
Closed
1 task
Labels
area/connectors Connector related issues community connectors/destination/s3 team/destinations Destinations team's backlog type/bug Something isn't working

Comments

@RocketerJames
Copy link

RocketerJames commented Sep 2, 2024

Connector Name

destination-s3

Connector Version

1.0.4

What step the error happened?

During the sync

Relevant information

A recent change to the S3 destination regarding 'Deferred deletes': cba3ccd

According to the logs, the destination checks if files exist tagged with the CURRENT generation (x-amz-meta-ab-generation-id) otherwise marks the existing files for deletion AT THE END. Files exist in the bucket from the previous sync generation (not current).

Sync runs, overwrites existing files as expected with a new generation ID. Then performs the aforementioned deletion (same file names) at the end. Every second sync I run results in no files because they were deleted. Deletes should not be deferred or it should be looking for previous generation files. Perhaps a solution could be to check files marked for deletion don't match those just created.

Please let me know if I've got something wrong here. However, we've been charged for a lot of syncs over the last week that had all had the data files immediately deleted. Let me know if you need anything else.

Kind regards.

Relevant log output

2024-09-02 14:30:10 destination > INFO main i.a.c.i.d.s.S3ConsumerFactory(keysForOverwriteDeletion):300 No data exists from previous sync for stream XXX_campaigns from current generation 90, proceeding to clean up existing data
2024-09-02 14:30:11 destination > INFO main i.a.c.i.d.s.S3ConsumerFactory(onStartFunction$lambda$1):83 Marked 1 keys for deletion at end of sync for namespace XXX/XXX stream XXX_campaigns bucketObject /

Contribute

  • Yes, I want to contribute
@RocketerJames
Copy link
Author

RocketerJames commented Sep 2, 2024

I should note here, I think by default, the namespace contains date/epoch. My guess is that it appears that tests may have not caught this issue as filenames would have been different for every sync. My stream names and namespaces are fixed so the filenames are same each time for post-processing purposes. However this was valid before the latest adjustment and should be a valid use-case? At least not specifically advised against now in the documentation and I'd assume the purpose of the generation ID? Hope this helps!

@marcosmarxm marcosmarxm changed the title [S3 Destination] Deferred deletion incorrectly deleting newly created files [destination-s3] Deferred deletion incorrectly deleting newly created files Sep 4, 2024
@RocketerJames
Copy link
Author

No longer occurring on Cloud, perhaps the result of this fix: #45143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues community connectors/destination/s3 team/destinations Destinations team's backlog type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants