Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix roll_time and auto_commit settings for FileSink #859

Merged
merged 3 commits into from
Aug 30, 2024

Conversation

michaeldjeffrey
Copy link
Contributor

@michaeldjeffrey michaeldjeffrey commented Aug 29, 2024

In a previous update to file_store I had a misunderstanding of the auto_commit and roll_time settings. I thought auto_commit=true would cause a file sink to commit itself on every write, which would mean the settings were in opposition of each other.

Then arose such a clatter (Oh the folly of my ways)

File Sinks where the settings had previously set auto_commit=true and roll_time=any_duration were converted to only the roll_time being set and auto_commit=false.

These file sinks were expecting max_size or roll_time to be the driver of when a file was uploaded to s3. With auto_commit=false, the file sink was rolling the file into temporary storage, waiting for a .commit() that would never come.


Add FileSinkCommitStrategy and FileSInkRollTime to clear up relationship between roll_time and auto_commit.

Having to make both of these decisions when creating a file sink decreases the likelihood of building up a stash of tmp files that are not being uploaded because a Manual sink was created when an Automatic sink was desired.

I went through https://github.com/helium/oracles/pull/849/files to get grab the settings before they transitioned to use FileSinkWriteExt::file_sink().

For anyone spot checking against that PR, the defaults are.

  • roll_time: Duration::from_secs(DEFAULT_SINK_ROLL_SECS) (3 minutes)
  • auto_commit: true

This hopefully clears up, at least for the use of the FileSinkWriteExt
trait the relationship between `auto_commit` and `roll_time`. There is
no default implementaion for this option, as all options have
implications that should be considered.
The PR that updated FileSink to use the Ext trait had an incorrect
understanding of the relationship between `roll_time` and `auto_commit`.

That understanding has since been corrected and codified in the
`FileSinkCommitStrategy` enum.

For this commit, https://github.com/helium/oracles/pull/849/files was
combed through to get all correct settings for FileSinks before they
were transitioned to use FileSinkWriteExt::file_sink().
@michaeldjeffrey michaeldjeffrey force-pushed the mj/roll-time-auto-commit branch from f96a2cd to 07ee3d4 Compare August 29, 2024 22:36
Copy link
Contributor

@andymck andymck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add explicit details of the problem encountered and solved by this to the description

* Change so auto_commit and roll_time are separated

* Create FileSinkRollTime enum
@michaeldjeffrey
Copy link
Contributor Author

Can we add explicit details of the problem encountered and solved by this to the description

Yes! 🎸

@michaeldjeffrey michaeldjeffrey merged commit f10606d into main Aug 30, 2024
17 checks passed
@michaeldjeffrey michaeldjeffrey deleted the mj/roll-time-auto-commit branch August 30, 2024 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants