Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard chunking format #914

Merged
merged 5 commits into from
Apr 26, 2016
Merged

Conversation

tagomoris
Copy link
Member

This change is to add formatting for buffer chunks. Output without explicit #format definition will use this standard format to write event streams to buffer chunks, and actually, it's same with EventStream.to_msgpack_stream.
At the same time, this change moves a method to iterate events from chunks to event.rb as mixin. It's about event streams, not chunk itself.

* to enable limits by # of records for buffer chunks
* to be able to visualize data flow in fluentd
* it improves performance
* it's needed to implement output plugins like ObjectBufferedOutput of v0.12
* This type of output provides the standard format of binary representation of buffer chunk payloads
  * It is actually equal to results of EventStream#to_msgpack_stream
  * We can provide some tools to read-and-rescue buffer chunks in failure situations for such chunks
* This output style is actually same with ObjectBufferedOutput of v0.12
@tagomoris
Copy link
Member Author

This change is basis for many features in future.

  • Very simple buffered plugin only with #write method
    • It can share many implementations with non-buffered procedure #process because chunk can be iterated
  • Event streams now know # of records in it
    • It leads detailed monitoring how many records are processed in Fluentd process or by each plugins

@tagomoris
Copy link
Member Author

@repeatedly could you review this?

@@ -21,6 +21,10 @@ class EventStream
include Enumerable
include MessagePackFactory::Mixin

def records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name records seems to represent returns the event entries.
num_records or size, rubish, is better for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size looks bad, because it sometimes means bytes of a data, and a type of EventStream (MessagePackEventStream) is actually a String object.
How about events?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events is better for me.

@sonots How about this? This method will be used in flowcounter_simple or similar gems.

Copy link
Member

@sonots sonots Apr 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks this method returns an integer, not an array. So, both records and events look bad for me.
What about num_events, num_records, or just length?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll use length.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once I would use length, but I re-found that Chunk#records already exists, and I made EventStream#records as same name with it.
Chunk also has Chunk#size (bytesize), and #size != #length is extraordinary confusing in ruby way. Hmm.

Copy link
Member Author

@tagomoris tagomoris Apr 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to:

  • change Chunk#size to return # of events, and make #length as alias of #size
  • add Chunk#bytesize to return # of bytes of chunk content

But this changes the meaning of chunk.size, and it will break compatibility of plugins.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create another pull-request to change Chunk#records and EventStream#records to #num_events later.
Let me merge this right now.

@tagomoris tagomoris merged commit 433a82b into master Apr 26, 2016
@tagomoris tagomoris deleted the add-standard-chunking-format branch May 17, 2016 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants