use aws_array_list<aws_byte_buffer*> for encoded_checksum_list #318

TingDaoK · 2023-06-19T17:15:51Z

Issue #, if available:

S3-Transfer-Manager Streaming #317
We used the array list of struct aws_byte_buf to store the checksum. And point to the checksum buffer in the list while we read from the stream and calculate the checksum. However, while we are using the buffer, the other thread from complete part callback can allocate a new part when we finish reading from the other part and may result in expand the array_list, which frees the old memory we are using, and lead to a crash.
- here is where the list may grow
- here is where we read from the list and pass it down to the input stream read

Description of changes:

use array list of struct aws_byte_buf* instead. However, the downside is we need to allocate the buffer every time we create a new one, which will add N allocation time, and may impact the performance (?)

TODO:

the checksum buffer will be initialized and updated by the aws_chunk_stream while we read from the input. So, if we use dynamic array with pre-allocated buffer, there will be a race condition between the expansion of the array to copy the buffer around and the buffer gets updated from the other thread. To prevent that:
- a lock when we initialize the checksum buffer here, which I don't like by adding lock around.
- Instead of aws_chunk_stream initialize the buffer, it just fill the given one without reallocate, which sounds like a workable solution if we have performance issue from extra allocation by this PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov-commenter · 2023-06-19T21:43:01Z

Codecov Report

Merging #318 (8c48d23) into main (cabcef7) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #318      +/-   ##
==========================================
- Coverage   88.91%   88.83%   -0.08%     
==========================================
  Files          17       17              
  Lines        4934     4936       +2     
==========================================
- Hits         4387     4385       -2     
- Misses        547      551       +4

Impacted Files	Coverage Δ
source/s3_auto_ranged_put.c	`92.64% <100.00%> (+0.11%)`	⬆️
source/s3_request_messages.c	`74.88% <100.00%> (ø)`

... and 1 file with indirect coverage changes

source/s3_auto_ranged_put.c

graebm · 2023-06-19T23:06:27Z

source/s3_auto_ranged_put.c

+        struct aws_byte_buf **checksum_buf_pointer = NULL;
        aws_array_list_get_at_ptr(
-            &auto_ranged_put->synced_data.encoded_checksum_list, (void **)&checksum_buf, request->part_number - 1);
+            &auto_ranged_put->synced_data.encoded_checksum_list,
+            (void **)&checksum_buf_pointer,
+            request->part_number - 1);
        /* Clean up the buffer in case of it's initialized before and retry happens. */
-        aws_byte_buf_clean_up(checksum_buf);
+        checksum_buf = *checksum_buf_pointer;
+        if (checksum_buf) {
+            aws_byte_buf_clean_up(checksum_buf);
+        } else {
+            checksum_buf = aws_mem_calloc(meta_request->allocator, 1, sizeof(struct aws_byte_buf));
+            *checksum_buf_pointer = checksum_buf;
+        }


this code is accessing synced_data.encoded_checksum_list without holding the lock

also, it's so damned confusing, it took me 20+ minutes to wrap my head around why we need to sometimes clean_up(), and other times allocate (but not aws_byte_buf_init())

I'm torn between suggesting comments that explaining what's going on here

or suggesting ways to make this code less confusing (stop allocating the byte-buf that the weird stream will initialize later)

maybe we can move this 1 block of code around to make more sense:

delete everything here. s_s3_prepare_upload_part_finish() should not touch synced_data.encoded_checksum_list

move the aws_mem_calloc(..., sizeof(struct aws_byte_buf)) up into the function above this, where we're already holding a lock to resize synced_data.encoded_checksum_list. But instead of setting NULL, set the allocated byte-buf

move the aws_byte_buf_clean_up(checksum_buf); up into s_s3_prepare_upload_part() in the same else block with the comment /* Not the first time preparing request (e.g. retry).

+1, but i think we should address it in 2 steps. let fix the issue first with the way it currently works and lets discuss how to refactor it better and we can do that in follow up pr

yeah, me and Dengke just had a chat and arrived at the same conclusion.
I can revisit this when I let the user pass in checksum

Yeah, agreed. Also, we discussed offline, instead of moving things around, I'll just add a lock here. But, still, we are updating the checksum buffer from the chunk stream without lock, which we should address later

source/s3_auto_ranged_put.c

Co-authored-by: Michael Graeb <graebm@amazon.com>

…-s3 into fix-unknown-content-length

TingDaoK added 7 commits June 19, 2023 10:15

fix the checksum array-list

c106b0b

compiler warning

8798679

well, forget to remove debug stuff

ca4085c

init the empty buffer

ba1b1bc

my local mac is just way too low to run testsss, sorry

62391f9

bug

8909b78

The test missed checksum

6ca7dbc

TingDaoK marked this pull request as ready for review June 19, 2023 20:51

TingDaoK changed the title ~~fix the checksum array-list~~ use aws_array_list<aws_byte_buffer*> for encoded_checksum_list Jun 19, 2023

TingDaoK added 2 commits June 19, 2023 14:12

fine, get rid of the trick

e9564f6

deleted an extra line = =

c77e306

DmitriyMusatkin approved these changes Jun 19, 2023

View reviewed changes

graebm reviewed Jun 19, 2023

View reviewed changes

TingDaoK and others added 5 commits June 19, 2023 16:40

Apply suggestions from code review

9877b6c

Co-authored-by: Michael Graeb <graebm@amazon.com>

address comments

b2134ff

Merge branch 'fix-unknown-content-length' of github.com:awslabs/aws-c…

0d675ec

…-s3 into fix-unknown-content-length

fix build

8c48d23

Merge branch 'main' into fix-unknown-content-length

a1794d6

TingDaoK enabled auto-merge (squash) June 20, 2023 19:53

TingDaoK merged commit 28f7b87 into main Jun 20, 2023

TingDaoK deleted the fix-unknown-content-length branch June 20, 2023 20:09

TingDaoK mentioned this pull request Jun 20, 2023

update submodules awslabs/aws-crt-java#642

Merged

graebm mentioned this pull request Jun 30, 2023

Use 1 array for part info, instead of separate etag array and checksum array #322

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use aws_array_list<aws_byte_buffer*> for encoded_checksum_list #318

use aws_array_list<aws_byte_buffer*> for encoded_checksum_list #318

TingDaoK commented Jun 19, 2023 •

edited

Loading

codecov-commenter commented Jun 19, 2023 •

edited

Loading

graebm Jun 19, 2023

DmitriyMusatkin Jun 19, 2023

graebm Jun 19, 2023

TingDaoK Jun 19, 2023

use aws_array_list<aws_byte_buffer*> for encoded_checksum_list #318

use aws_array_list<aws_byte_buffer*> for encoded_checksum_list #318

Conversation

TingDaoK commented Jun 19, 2023 • edited Loading

codecov-commenter commented Jun 19, 2023 • edited Loading

Codecov Report

graebm Jun 19, 2023

Choose a reason for hiding this comment

DmitriyMusatkin Jun 19, 2023

Choose a reason for hiding this comment

graebm Jun 19, 2023

Choose a reason for hiding this comment

TingDaoK Jun 19, 2023

Choose a reason for hiding this comment

TingDaoK commented Jun 19, 2023 •

edited

Loading

codecov-commenter commented Jun 19, 2023 •

edited

Loading