Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the error when multiple subtasks fetch the same data #2322

Merged
merged 9 commits into from
Aug 13, 2021

Conversation

hekaisheng
Copy link
Contributor

What do these changes do?

When multiple subtasks fetch the same key at the same time, the transfer task may fails. In this PR, ReceiverManagerActor holds a dict to record the data keys being fetched and create events to mark if transfer tasks are finished, SenderManagerActor will wait those events to make sure all data keys are sent.

Related issue number

Fixes #2321

@hekaisheng hekaisheng added type: bug Something isn't working to be backported Indicate that the PR need to be backported to stable branch mod: storage labels Aug 10, 2021
@hekaisheng hekaisheng added this to the v0.8.0a2 milestone Aug 10, 2021
@hekaisheng hekaisheng force-pushed the bugfix/transfer branch 2 times, most recently from 8ed7a5f to 0399efc Compare August 11, 2021 03:43
@hekaisheng hekaisheng marked this pull request as ready for review August 11, 2021 05:48
@hekaisheng hekaisheng requested a review from qinxuye as a code owner August 11, 2021 05:48
@hekaisheng hekaisheng force-pushed the bugfix/transfer branch 2 times, most recently from df8811a to 7539043 Compare August 11, 2021 10:04
@hekaisheng hekaisheng requested a review from wjsi as a code owner August 11, 2021 14:38
mars/services/storage/transfer.py Outdated Show resolved Hide resolved
mars/services/storage/transfer.py Outdated Show resolved Hide resolved
@hekaisheng hekaisheng force-pushed the bugfix/transfer branch 5 times, most recently from af3c1f4 to 8a52695 Compare August 12, 2021 17:00
Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 9054389 into mars-project:master Aug 13, 2021
hekaisheng added a commit to hekaisheng/mars that referenced this pull request Aug 16, 2021
@hekaisheng hekaisheng deleted the bugfix/transfer branch August 16, 2021 02:17
@qinxuye qinxuye added backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch labels Aug 16, 2021
chaokunyang pushed a commit to chaokunyang/mars that referenced this pull request Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported already PR has been backported mod: storage type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Transfer failed when multiple subtasks fetch the same data
3 participants