-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support compression for IPC with revamped feature flags #2369
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your effort and push this feature forward, and help me to refactor this model.
LGTM
It was my pleasure -- thank you for figuring out the tests and the format that was required |
Also we need to enable the IT test in the arrow. I have not thought about it carefully, but how to enable the |
This issue #2342 also can be closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just some minor nits
} else if decompressed_length == LENGTH_NO_COMPRESSED_DATA { | ||
// no compression | ||
let data = &input[(LENGTH_OF_PREFIX_DATA as usize)..]; | ||
Buffer::from(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation, but this copy is kind of unfortunate (although existed before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking the alternate is to create a Buffer initially and write this code in terms of Buffer
rather than &[u8]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... Not important for this PR, but it seems unfortunate the amount of memory copying we are doing, especially when the major design goal of the IPC spec is to avoid this 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #2437
} | ||
|
||
impl CompressionCodec { | ||
#[allow(clippy::ptr_arg)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be misunderstanding this lint, but I don't think it should be firing for this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I remove it and run clippy like
cargo clippy -p arrow --features=prettyprint,csv,ipc,test_utils --all-targets -- -D warnings
Clippy tells me:
error: writing `&mut Vec` instead of `&mut [_]` involves a new object where a slice will do
--> arrow/src/ipc/compression/stub.rs:50:18
|
50 | _output: &mut Vec<u8>,
| ^^^^^^^^^^^^ help: change this to: `&mut [u8]`
|
= note: `-D clippy::ptr-arg` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
Which while true in this case, is not true for the actual codec implementation, and they both need to have the same signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very daft lint, &mut Vec<u8>
is not the same as &mut [u8]
, in particular the latter must initialize memory... I can understand &[u8]
vs &Vec<u8>
but the mutability makes a difference... I somewhat assumed clippy wasn't being silly, guess I was wrong 😅
} | ||
// pad the tail of body data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this previously wasn't needed because the buffers were already 8 byte aligned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably -- I am not sure to be honest. This is code that was in @liukun4515 's original PR but it seemed reasonable to me
arrow/src/ipc/writer.rs
Outdated
@@ -520,17 +593,20 @@ impl<W: Write> FileWriter<W> { | |||
let data_gen = IpcDataGenerator::default(); | |||
let mut writer = BufWriter::new(writer); | |||
// write magic to header | |||
let mut header_size: usize = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut header_size: usize = 0; | |
let header_size: usize = ARROW_MAGIC.len() + 2; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call -- I have updated in 2ed7ce3 (and added an assert
to verify the currently implicit assumption that ARROW_MAGIC
is 6 bytes long.
arrow/src/ipc/compression/stub.rs
Outdated
// under the License. | ||
|
||
//! Stubs that implement the same interface as ipc_compression | ||
//! but always error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Co-authored-by: Kun Liu <liukun@apache.org> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
…into alamb/help_feature_flags
Benchmark runs are scheduled for baseline = 1c879ae and contender = 5e27d93. 5e27d93 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #1855
Closes liukun4515#1
Closes #1709
Closes #70
Closes #2342
Rationale for this change
Not this is a new PR is based on @liukun4515 's work in #1855. Github is showing a massive diff in liukun4515#1 so I wanted to get this PR up so we could see what the actual change is proposed
The logic is unchanged, but I changed how the feature flagging works so that the code differences are isolated into the
ipc_compression
moduleWhat changes are included in this PR?
ipc_compression
feature flagAre there any user-facing changes?
New feature flag, and support for compression