-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New module to stream FileInfoStream from s3 #339
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but i wonder if we need to allow the file receiver ( ie the entity which receives the file stream contents ) to specify a max concurrent file limit in order to avoid the receiver being overwhelmed should there be a lot of files or a lot of msgs
in files being streamed....Allows the receiver to control the flow rate
So allow the receiver to specify the size of the channel? I like that idea |
c7ea52a
to
9e3a077
Compare
This will poll s3 for new files on a short duration , but keep track of each individual file so it can handle files coming out of order. The service will receive an already decoded stream over a channel when a new file is seen in s3.
The benefit of this is we will not have to make sure to have certain offsets configured based on the previous service's roll time on its output files.
Table definition
Configuration:
start_after
--> the start point when querying into s3. Will only be used when no records are found in the db.max_lookback
--> the max it will look back from now when querying s3.Having both set doesn't really make sense, do we need both? should be an enum?
Another design choice that comes off little weird, but couldn't come up with anything better is that calling
FileInfoStream::into_stream
consumes the struct and inserts the record into the db within the provided transaction. It feels odd to insert the db record before reading the stream but the thought was it shouldn't matter as long as the transaction is committed after processing stream. Welcome to other ideas.