fix: Fix long parallel writings #222

dmga44 · 2024-10-17T23:00:12Z

Versions of the test test_ec_parallel_writing with bigger file sizes
(65M) often fail. The reason behing those failures is the following
sequence of operations:

some mountpoint starts writing some chunk X (some kilobyte of that
chunk), this triggers the creation of a chunk for that chunkIndex and
successfully finishes the write operation sending to the master an
update in the size of the file.
other mountpoint (before the other one updates the file size) starts
writing some chunk Y, such that Y < X, i.e Y has a lower index than X.
Successfully finishes those writes, and at the moment of the updating
the file size, sends a size lower than current one by at least the
chunk X.
in the master side, the update of the size (fsnodes_setlength
function) removes the chunks which are not necessary considering the
current file size, thus remving the chunk X and trashing the previous
writes on it.
new writes on the X-th chunk allocate a new chunk and successfully
finishes the writing process, but the missing data is irreversible.

Solution is to extend the fsnodes_setlength to accept a
eraseFurtherChunks parameter to determine whether or not those chunks
should be erased. Summarizing:

the client communication of write end
(SAU_CLTOMA_FUSE_WRITE_CHUNK_END header) will not remove chunks not
necessary given the size sent.
the client communication of truncate (SAU_CLTOMA_FUSE_TRUNCATE
header) and truncate end (SAU_CLTOMA_FUSE_TRUNCATE_END header) will
continue removing chunks further than the given length.

As a side effect, the LENGTH metadata logs must be changed since the
state of the filesystem can not be recovered without saying whether
further chunks must be deleted or not:

old metadata will be handled as always removing further chunks.
new metadata will have 0|1 to represent whether the further chunks
must be deleted.

A test extending this issue was added in the LongSystemTests suite.

BREAKING CHANGE: changes the changelog format generated by master
server.

An initial tool for downgrading the system from the v5.0.0 to v4.X.X was
uploaded with all the previous changes.

rolysr

LGTM. Just a minor doubt

tests/test_suites/ShortSystemTests/test_ec_parallel_writing.sh

tests/test_suites/TestTemplates/test_ec_parallel_writing.inc

uristdwarf · 2024-10-29T14:38:20Z

Can you mark the commits as breaking change, e.g:

fix!: Fix long parallel writings

And add at the end a BREAKING CHANGE footer.

See here for an example

https://www.conventionalcommits.org/en/v1.0.0/

ralcolea

Great job @dmga44 👍 🔥 🚀

src/master/filesystem_operations.cc

Versions of the test test_ec_parallel_writing with bigger file sizes (65M) often fail. The reason behing those failures is the following sequence of operations: - some mountpoint starts writing some chunk X (some kilobyte of that chunk), this triggers the creation of a chunk for that chunkIndex and successfully finishes the write operation sending to the master an update in the size of the file. - other mountpoint (before the other one updates the file size) starts writing some chunk Y, such that Y < X, i.e Y has a lower index than X. Successfully finishes those writes, and at the moment of the updating the file size, sends a size lower than current one by at least the chunk X. - in the master side, the update of the size (fsnodes_setlength function) removes the chunks which are not necessary considering the current file size, thus remving the chunk X and trashing the previous writes on it. - new writes on the X-th chunk allocate a new chunk and successfully finishes the writing process, but the missing data is irreversible. Solution is to extend the fsnodes_setlength to accept a eraseFurtherChunks parameter to determine whether or not those chunks should be erased. Summarizing: - the client communication of write end (SAU_CLTOMA_FUSE_WRITE_CHUNK_END header) will not remove chunks not necessary given the size sent. - the client communication of truncate (SAU_CLTOMA_FUSE_TRUNCATE header) and truncate end (SAU_CLTOMA_FUSE_TRUNCATE_END header) will continue removing chunks further than the given length. As a side effect, the LENGTH metadata logs must be changed since the state of the filesystem can not be recovered without saying whether further chunks must be deleted or not: - old metadata will be handled as always removing further chunks. - new metadata will have 0|1 to represent whether the further chunks must be deleted. BREAKING CHANGE: changes the changelog format generated by master server.

This test is a version of the test_ec_parallel_writing with file size of 513MB. A test template for both tests were created to simplify them.

This change targets providing a place to store everything related to migrations, which is specially important if there are breaking changes involved. The idea is to fill the summary of the breaking changes and the scripts for downgrading a specific service with the upcoming breaking changes. The structure is also helpful for the next versions upgrade scheme. Changes: - added a folder in the top level for the migration tools. - added a folder inside migrations for the upgrade from v4.X.X to v5.0.0, and another one inside of it for downgrade tools. - added Breaking-changes-summary.md. - added downgrade-master.sh which is the only service that needs a special downgrade behavior for now.

dmga44 self-assigned this Oct 17, 2024

dmga44 requested review from lgsilva3087, ictus4u, ralcolea and rolysr October 17, 2024 23:00

dmga44 force-pushed the fix-long-parallel-writings branch 2 times, most recently from c26e505 to b56f355 Compare October 21, 2024 14:13

dmga44 requested a review from uristdwarf October 21, 2024 16:16

dmga44 force-pushed the fix-long-parallel-writings branch 2 times, most recently from 20d9e43 to 5eb486a Compare October 23, 2024 02:21

dmga44 mentioned this pull request Oct 24, 2024

feat(mount): New chunk-based write algorithm #158

Open

rolysr previously approved these changes Oct 24, 2024

View reviewed changes

tests/test_suites/ShortSystemTests/test_ec_parallel_writing.sh Show resolved Hide resolved

dmga44 force-pushed the fix-long-parallel-writings branch from 5eb486a to 9bedd1f Compare October 24, 2024 23:36

antuan96314 previously approved these changes Oct 25, 2024

View reviewed changes

dmga44 dismissed stale reviews from antuan96314 and rolysr via 112b7b4 October 28, 2024 13:50

dmga44 force-pushed the fix-long-parallel-writings branch from 9bedd1f to 112b7b4 Compare October 28, 2024 13:50

ralcolea previously approved these changes Oct 31, 2024

View reviewed changes

src/master/filesystem_operations.cc Show resolved Hide resolved

src/master/filesystem_operations.cc Show resolved Hide resolved

dmga44 dismissed ralcolea’s stale review via 94f37fd November 27, 2024 14:54

dmga44 force-pushed the fix-long-parallel-writings branch 6 times, most recently from 6ab85a7 to aa619f8 Compare November 27, 2024 17:08

dmga44 added 3 commits November 28, 2024 13:49

test: Test long parallel writings

439c0ae

This test is a version of the test_ec_parallel_writing with file size of 513MB. A test template for both tests were created to simplify them.

dmga44 force-pushed the fix-long-parallel-writings branch from aa619f8 to 3f0d5be Compare November 28, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix long parallel writings #222

fix: Fix long parallel writings #222

dmga44 commented Oct 17, 2024 •

edited

Loading

rolysr left a comment

uristdwarf commented Oct 29, 2024

ralcolea left a comment

fix: Fix long parallel writings #222

Are you sure you want to change the base?

fix: Fix long parallel writings #222

Conversation

dmga44 commented Oct 17, 2024 • edited Loading

rolysr left a comment

Choose a reason for hiding this comment

uristdwarf commented Oct 29, 2024

ralcolea left a comment

Choose a reason for hiding this comment

dmga44 commented Oct 17, 2024 •

edited

Loading