-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use aws_array_list<aws_byte_buffer*> for encoded_checksum_list #318
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #318 +/- ##
==========================================
- Coverage 88.91% 88.83% -0.08%
==========================================
Files 17 17
Lines 4934 4936 +2
==========================================
- Hits 4387 4385 -2
- Misses 547 551 +4
|
source/s3_auto_ranged_put.c
Outdated
struct aws_byte_buf **checksum_buf_pointer = NULL; | ||
aws_array_list_get_at_ptr( | ||
&auto_ranged_put->synced_data.encoded_checksum_list, (void **)&checksum_buf, request->part_number - 1); | ||
&auto_ranged_put->synced_data.encoded_checksum_list, | ||
(void **)&checksum_buf_pointer, | ||
request->part_number - 1); | ||
/* Clean up the buffer in case of it's initialized before and retry happens. */ | ||
aws_byte_buf_clean_up(checksum_buf); | ||
checksum_buf = *checksum_buf_pointer; | ||
if (checksum_buf) { | ||
aws_byte_buf_clean_up(checksum_buf); | ||
} else { | ||
checksum_buf = aws_mem_calloc(meta_request->allocator, 1, sizeof(struct aws_byte_buf)); | ||
*checksum_buf_pointer = checksum_buf; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is accessing synced_data.encoded_checksum_list
without holding the lock
also, it's so damned confusing, it took me 20+ minutes to wrap my head around why we need to sometimes clean_up(), and other times allocate (but not aws_byte_buf_init())
I'm torn between suggesting comments that explaining what's going on here
or suggesting ways to make this code less confusing (stop allocating the byte-buf that the weird stream will initialize later)
maybe we can move this 1 block of code around to make more sense:
- delete everything here.
s_s3_prepare_upload_part_finish()
should not touchsynced_data.encoded_checksum_list
- move the
aws_mem_calloc(..., sizeof(struct aws_byte_buf))
up into the function above this, where we're already holding a lock to resizesynced_data.encoded_checksum_list
. But instead of setting NULL, set the allocated byte-buf - move the
aws_byte_buf_clean_up(checksum_buf);
up intos_s3_prepare_upload_part()
in the same else block with the comment/* Not the first time preparing request (e.g. retry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, but i think we should address it in 2 steps. let fix the issue first with the way it currently works and lets discuss how to refactor it better and we can do that in follow up pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, me and Dengke just had a chat and arrived at the same conclusion.
I can revisit this when I let the user pass in checksum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. Also, we discussed offline, instead of moving things around, I'll just add a lock here. But, still, we are updating the checksum buffer from the chunk stream without lock, which we should address later
Co-authored-by: Michael Graeb <graebm@amazon.com>
…-s3 into fix-unknown-content-length
Issue #, if available:
struct aws_byte_buf
to store the checksum. And point to the checksum buffer in the list while we read from the stream and calculate the checksum. However, while we are using the buffer, the other thread from complete part callback can allocate a new part when we finish reading from the other part and may result in expand the array_list, which frees the old memory we are using, and lead to a crash.Description of changes:
struct aws_byte_buf*
instead. However, the downside is we need to allocate the buffer every time we create a new one, which will add N allocation time, and may impact the performance (?)TODO:
aws_chunk_stream
while we read from the input. So, if we use dynamic array with pre-allocated buffer, there will be a race condition between the expansion of the array to copy the buffer around and the buffer gets updated from the other thread. To prevent that:aws_chunk_stream
initialize the buffer, it just fill the given one without reallocate, which sounds like a workable solution if we have performance issue from extra allocation by this PR.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.