-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(decoder): avoid lots of calls to ensure data and field length/offset lookups #144
base: master
Are you sure you want to change the base?
perf(decoder): avoid lots of calls to ensure data and field length/offset lookups #144
Conversation
There is no reason to compute the exact same field lengths and offsets at every iteration of these loops. At high scale, this results in tons of redundant lookups into the `LengthOffsetCache`.
Implements `readBits64(int)` with new unit tests, and uses all variants of the bit reading methods to implement reading into a bit set with larger chunks. Also adds a `resultShift` parameter to make the implementation of reading vendor consent much easier, i.e. just need to set this to 1 instead of the default 0.
Implements `readBits32(int)` with new unit tests, and eliminate half the reads for publisher restrictions by reading two fields at a time when possible. Also uses the new bit set reading implementation for vendor consent. All tests pass.
Tagging some common contributors to help review: @laktech @imayankmishra @iabmayank @srinivas81 |
thanks! i'll be able to review in a few days. |
I also have another change on top of this where we make sure to use primitive |
@laktech As promised, I've made a couple more PRs on top of this to test even further optimizations:
Let me know what you think, when you're ready. |
hey, are you able to share the benchmark that you're running? |
@laktech It wasn't a proper offline benchmark like with JMH or anything. I basically just tried it with a production-like workload as part of a much larger code path, and viewed some flame graph profiles. If you have any better JMH or other benchmarks set up, please feel free to test this out. |
@laktech Any progress in reviewing? |
Nice |
Hey, any progress on reviewing this? |
@laktech Another bump 🙏 |
sorry i stepped away from this for a few months. i'll be reviewing this once again. |
if (isRangeEntry) { | ||
int endVendorId = bbv.readBits16(offset); | ||
offset += FieldDefs.START_OR_ONLY_VENDOR_ID.getLength(bbv); | ||
final int content = bbv.readBits32(offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is significant code here to eliminate a call to readBits16. I don't think it's worth it unless you can show a benchmark that it adds value.
Overall, this looks great. I just have some concerns around the two areas I commented above. |
I've made three changes (three commits) to reduce the number of calls to
ensureReadable()
and to theLengthOffsetCache
internal maps. We've seen 2x speed improvement inTCString.decode()
and 2% overall CPU usage improvement in our production workload, which at high scale results in tangible cost savings.Just to note, our use case doesn't benefit much from lazy decoding, since we need to decode the vendor consent, vendor legitimate interest, and publisher restrictions for almost all requests, and those three seem to dominate the decoding time in our profiling measurements when
TCStringV2.hashCode()
is invoked upon eager decoding.