-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write to UFS when failed to write to Alluxio Worker #18017
Conversation
fbe4069
to
6fdf9ee
Compare
if (throwable != null) { | ||
e.addSuppressed(throwable); | ||
} | ||
LOG.error("Failed to write data to alluxio worker. ", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments here:
//writing to worker over netty failed. But Alluxio client continues writing to UFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we abort trying to write to worker for the following requests? And delete already written pages from worker cache.
The reasons are:
- The subsequent write request might fail, too.
- The worker will receive some data but there is already a hole (from the previous failed request). I am not sure if the worker/cache manager will handle this. I am afraid not. The cache manager on worker expects contiguous write. The write request does not have a POSITION for current data.
- suppose for a 1MB page, the alluxio client has written 4K data at [0, 4096). And then failure for next [4096, 8192) write. And then a successful write of 4K at [8192, 12288). So, the page file on worker will have a hole, not filled with actually data. Subsequent reading from paging store will get all ZEROES for such hole. This is not correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I submitted an issue #18157 to make this be a future work.
alluxio-bot, merge this please |
addSuppressed
instead of creating a new exception to throw the original exception.