Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runtime stats when cache hit #18503

Merged
merged 1 commit into from
Feb 23, 2024

Conversation

beinan
Copy link
Contributor

@beinan beinan commented Jan 31, 2024

What changes are proposed in this pull request?

Fix runtime stats when cache hit

Why are the changes needed?

We already count the runtime stats in local cache manager, but we didn't pass in a proper cache context

Does this PR introduce any user facing changes?

Please list the user-facing changes introduced by your change, including

  1. change in user-facing APIs
  2. addition or removal of property keys
  3. webui

@beinan beinan force-pushed the fix_runtime_stats_when_cache_hit branch from 6fe43b3 to c69d301 Compare January 31, 2024 05:19
@@ -927,10 +929,20 @@ private int getPage(PageInfo pageInfo, int pageOffset, int bytesToRead,
// data read from page store is inconsistent from the metastore
LOG.error("Failed to read page {}: supposed to read {} bytes, {} bytes actually read",
pageInfo.getPageId(), bytesToRead, ret);
target.offset(originOffset); //reset the offset
//best efforts to delete the corrupted file without acquire the write lock
deletePage(pageInfo, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete the metadata as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the metadata here will acquire more locks, which might cause dead lock. So I just delete the page file, and page store will re-cache this page in the next visit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good call, probably we can have a further improvement in the near future. But just for now, ,et's merge this pr and stop the bleeding of the data corruption issue

LOG.error("Data corrupted page {} from pageStore", pageInfo.getPageId(), e);
target.offset(originOffset); //reset the offset
//best efforts to delete the corrupted file without acquire the write lock
deletePage(pageInfo, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to delete the metadata as well?

Copy link
Contributor

@JiamingMai JiamingMai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left a comment. If we read a corrupted page, do we need to delete the metadata as well?

@JiamingMai JiamingMai added the type-code-quality code quality improvement label Feb 4, 2024
@gggrace14
Copy link

Hi @beinan , this fix works from our testing. Would it be possible to make it into version 308?

@beinan beinan force-pushed the fix_runtime_stats_when_cache_hit branch from c69d301 to 637364c Compare February 23, 2024 03:02
@beinan
Copy link
Contributor Author

beinan commented Feb 23, 2024

alluxio-bot, merge this please.

@alluxio-bot alluxio-bot merged commit 7429c25 into Alluxio:main Feb 23, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-code-quality code quality improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants