Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update worker api support for load multi replicas #18296

Merged
merged 3 commits into from
Oct 26, 2023

Conversation

jja725
Copy link
Contributor

@jja725 jja725 commented Oct 18, 2023

What changes are proposed in this pull request?

Update worker api support for load multi replicas

Why are the changes needed?

part of PR to support load multi replicas

Does this PR introduce any user facing changes?

na

@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • First word must be capitalized
  • Commits associated with Github account: PASS

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@jja725 jja725 changed the title add support for load multi replicas add worker api support for load multi replicas Oct 18, 2023
@jja725 jja725 changed the title add worker api support for load multi replicas Update worker api support for load multi replicas Oct 18, 2023
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: PASS

All checks passed!

}
long fileLength = block.getUfsStatus().getUfsFileStatus().getContentLength();
if (block.hasMainWorker()) {
WorkerNetAddress address = GrpcUtils.fromProto(block.getMainWorker());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main worker is set from scheduler? if we r submitting load tasks to all replicas at same time, how to ensure the main worker would have loaded at the time of secondary worker trying to read from it?

Copy link
Contributor Author

@jja725 jja725 Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we don't ensure that due to the concurrency issue of getAndLoad. So we would recommend to user to load one replica first and then set multiple replicas to ensure they only read the data from ufs once. If we want to further improve this part we can improve the getAndLoad in CacheManager.

Copy link
Contributor

@lucyge2022 lucyge2022 Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, fix getAndLoad to save only one read is out of scope on this PR, this PR targets at only save duplicate read from multiple workers not duplicate read from one worker.

@VisibleForTesting
public void loadDataFromRemote(String filePath, long offset, long lengthToLoad,
PositionReader reader, int chunkSize) throws IOException {
ByteBuffer buf = ByteBuffer.allocate(chunkSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use PooledDirectNioByteBuf.allocate(chunkSize) as loadData does since most of it will be aligned to pagesize, and pass in NettyBufTargetBuffer type for reader.read(long position, ReadTargetBuffer buffer, int length) as this pool can manage a reuse of these mostly aligned buffer. otherwise this is allocate onheap will cause huge mem footprint and put heavylifting on GC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh u use ByteBuffer bcos of cachemanager only support bytebuffer, then its better to allocate direct buffer than heap, large buffer mem footprint on heap might cause program misbehavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use direct buffer

ByteBuffer buf = ByteBuffer.allocate(chunkSize);
String fileId = new AlluxioURI(filePath).hash();

while (0 < lengthToLoad) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lengthToLoad > 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -608,6 +608,7 @@ message LoadJobPOptions {
optional bool partialListing = 3;
optional bool loadMetadataOnly = 4;
optional bool skipIfExists = 5;
optional int32 replicas = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im assuming this is not used in this PR right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be used in following PR

Comment on lines 622 to 623
try (PositionReader reader = new NettyDataReader(mFsContext, address, builder)) {
loadDataFromRemote(block.getUfsPath(), block.getOffsetInFile(), block.getLength(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is worker-to-worker communication, but reusing client-side code. This can cause some metrics to go inaccurate as the worker being requested cannot know whether the peer is really a client or another worker.

Can we have a different request type other than Protocol.ReadRequest so that the other worker can tell whether this is a normal read request from a true client, or from a peer worker for caching purposes? This allows splitting the code paths and reduce intertwined code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NettyDataReader should be moved into the common module, also it should be able to handle a generic *Request that involves data transmission.

Copy link
Contributor Author

@jja725 jja725 Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to add a field user in the ReadRequest to indicate who is issuing the read so we can distinguish the reader(client or worker) cause iterally worker is just another user to send the read request. And I would try to do the refactoring of NettyDataReader in a later PR so we can have limited scope in this PR

@@ -226,6 +226,7 @@ message Block{
optional int64 offset_in_file = 4;
optional int64 mountId = 5;
optional UfsStatus ufs_status = 6;
optional WorkerNetAddress main_worker = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the main worker of a block defined? Is it a consistent relationship between a block and a main worker? What happens when the main worker is not available?

If the client wants to express the idea that "for this load job and this particular block, load from this worker," then I don't think we need to involve the concept of a main worker. Instead, you can define the LoadRequest object as

message LoadRequest {
  message BlockLoadRequest {
    optional Block blockToLoad = 1;
    optional WorkerNetAddress workerToLoadFrom = 2;
  }
  repeated BlockLoadRequest blocks = 1;
  // ... other fields
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a new proto message loaddatasubtask to reduce confusion.

@jja725 jja725 force-pushed the load-replis branch 2 times, most recently from 7658a84 to a0bfcac Compare October 24, 2023 21:05
optional Block block = 1;
optional UfsStatus ufs_status = 2;
optional LoadDataSubTask load_data_subtask = 1;
optional UfsStatus load_metadata_subtask = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that sense its better to make a LoadMetadataSubTask in future refactor? but no need to address in current PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, avoid future incompatibility

Copy link
Contributor

@lucyge2022 lucyge2022 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dbw9580 dbw9580 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jja725
Copy link
Contributor Author

jja725 commented Oct 26, 2023

alluxio-bot, merge this please

@alluxio-bot
Copy link
Contributor

merge failed:
Merge refused because pull request does not have label start with type-

@jja725 jja725 added the type-feature This issue is a feature request label Oct 26, 2023
@jja725
Copy link
Contributor Author

jja725 commented Oct 26, 2023

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit 7a5734f into Alluxio:main Oct 26, 2023
14 checks passed
ssz1997 pushed a commit to ssz1997/alluxio that referenced this pull request Dec 15, 2023
### What changes are proposed in this pull request?
Update worker api support for load multi replicas

### Why are the changes needed?
part of PR to support load multi replicas

### Does this PR introduce any user facing changes?
na

			pr-link: Alluxio#18296
			change-id: cid-0213f2aba669b7687ac42cf932cdcec911d397a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants