-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(filecoin-proofs): speed up Fr32Reader #1341
Conversation
a4cee66
to
8b9920a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this earlier and it looks good, thanks for rebasing.
f8340e2
to
80198e6
Compare
Switch to processing bit padding in blocks of 127bytes using u128s
80198e6
to
d4a99cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good, and I can't see any definite problems. However, the naming throughout is confusing enough that I find it hard to follow the logic. In some cases, the naming actively confuses me because words seem to be used ambiguously. If you can clarify such cases by using less ambiguous naming and especially avoid cases where the same word is being used to mean entirely different things (if any such actually exist), that would be valuable. Although it seems like a small thing, once such naming gets locked in, it can be very expensive for future readers of the code to swap in the context required to understand it.
/// Are we done reading? | ||
done: bool, | ||
} | ||
|
||
const NUM_U128S_PER_BLOCK: usize = 8; | ||
const MASK_SKIP_HIGH_2: u128 = 0b0011_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather see this written in a way that can be seen correct by inspection, like:
u128::MAX - (0b11 << 126);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that much harder to inspect to be honest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? It's easy for you to see at a glance that this is wrong?
0b0011_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111_1111
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neither of them is "easy" for me, but in thase case I simply count the 1s and know if it is good, but your notation requires me to think through bitshfiting rules, and how the subtraction operator is used on a bit level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main point was to avoid counting the 1
s — not the specific construction used to avoid it. Especially with an appropriate comment, I'm not averse to asking readers to think through bitshifting rules.
A simpler version which requires less thinking and still no counting would be:
const MASK_SKIP_HIGH_2: u128 = u128::MAX >> 2;
Maybe that is the best of both worlds?
filecoin-proofs/src/fr32_reader.rs
Outdated
impl<R: io::Read> Fr32Reader<R> { | ||
pub fn new(source: R) -> Self { | ||
Fr32Reader { | ||
source, | ||
target_offset: 0, | ||
buffer: Default::default(), | ||
in_buffer: [0; NUM_U128S_PER_BLOCK * 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the 16 into a named constant?
filecoin-proofs/src/fr32_reader.rs
Outdated
// 0..254 | ||
{ | ||
out_buffer[0] = in_buffer[0]; | ||
out_buffer[1] = in_buffer[1] & MASK_SKIP_HIGH_2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere, there should be a comment noting that u128
is 16 bytes, so Fr32
output is arranged in groups of 2 u128
s — and here we we handle groups of 4 Fr32
s because that is the period at which alignment repeats.
filecoin-proofs/src/fr32_reader.rs
Outdated
out_buffer[6] |= in_buffer[6] << 6; // low 122 bits | ||
out_buffer[7] = in_buffer[6] >> 122; // top 6 bits | ||
out_buffer[7] |= in_buffer[7] << 6; // low 120 bits | ||
out_buffer[7] &= MASK_SKIP_HIGH_2; // zero high 2 bits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could implement this with a macro which would make the logic entirely clear while still being efficient. The above is fine, just requires more effort to verify that what is written our here really does correspond to that notional 'macroexpansion'.
filecoin-proofs/src/fr32_reader.rs
Outdated
self.target_offset += 8; | ||
return Ok(1); | ||
Ok(n) => { | ||
let tmp = buf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is tmp
needed? Might be clearer to just assign buf = &mut buf[n..]
if that works.
filecoin-proofs/src/fr32_reader.rs
Outdated
pub fn read_u32(&mut self) -> u32 { | ||
debug_assert!(self.available() >= 32); | ||
self.process_block(); | ||
self.to_process += div_ceil(bytes_read * 8, 254); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This to_process
quantity is very confusing. Maybe it's just the name. We just finished processing (process_block()
), so I would not expect the amount 'to process' to increase. Can you rework the naming so what is actually being tracked is clearer?
Is the word 'process' actually referring to the same thing in these two names? If not, one should really be changed.
filecoin-proofs/src/fr32_reader.rs
Outdated
target[2] = value[2]; | ||
target[3] = value[3]; | ||
} | ||
let start = read; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amplifying my comment when read
was introduced, I think this is a confusing name. I think I would be less confused if it was something like target_read
. As a reader, it's very hard to keep track of the context and semantics of these variables. If there's not a way to make it clearer by the structure, at least maybe more explicit names will help.
I think part of the confusion, too, is that in this context, you're writing to target
but referring to the completed result as having read
. I know that's what this method purports to do, just explaining how it creates confusion, giving that you also read from self.source
into a buffer, then read from that buffer before 'reading' into target
.
Maybe target_offset
would be clearest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it is very unfortunate that the past tense and the verb happen to be the same word here, as this is explicitly the past tense, maybe "bytes_read" would be better?
filecoin-proofs/src/fr32_reader.rs
Outdated
while read < num_bytes { | ||
// load new block | ||
if self.to_process == 0 { | ||
let bytes_read = self.fill_buffer()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fills in_buffer
.
filecoin-proofs/src/fr32_reader.rs
Outdated
/// Currently read byte. | ||
buffer: Buffer, | ||
/// Currently read block. | ||
in_buffer: [u8; NUM_U128S_PER_BLOCK * 16], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of 16, you could explicitly calculate with size_of
to show that the in and out buffers hold the same number of bytes.
filecoin-proofs/src/fr32_reader.rs
Outdated
.copy_from_slice(&self.out_buffer.as_byte_slice()[out_start..out_end]); | ||
read += len; | ||
self.out_offset += len; | ||
self.to_process -= div_ceil(len * 8, 256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
div_ceil(len, 32)
might be clearer. But this raises a question, if to_process
is the number of blocks, and one block is 4 Fr
, shouldn't this be div_ceil(len, 128)
?
I can believe the actual algorithm is doing the right thing here, but I'm confused trying to line up all the names and descriptions throughout this whole changeset.
I think maybe 'block' in this context is being used to mean 'one Fr
s worth of bytes' — but it seems like in the definition of NUM_U128S_PER_BLOCK
(where the value is 8), 'block' is being used to refer to 128 bytes = 4 Fr32
s.
Am I missing something, or can some rectification of names help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to process is in number of Frs sorry, which is 254 bits in, 256bits out
Switch to processing bit padding in blocks of 127bytes using u128s.
This also improves correctness, as now the reader will emit a full
Fr32
at the end, not just a single byte with the additional bits, which is why some of the test code was updated.Speed differences can be seen here, with the diffs being the same benchmarks run against master. https://gist.github.com/dignifiedquire/1c21680f237a8f3aa5ad43f0008ffa41