Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RESTful API for distributed load #18254

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

JiamingMai
Copy link
Contributor

@JiamingMai JiamingMai commented Oct 10, 2023

Add RESTful API for distributed load.

Usage:

SUBMIT:
description: submit a load job
example:
http://localhost:28080/v1/load?path=/&opType=submit&partialListing=false&&verify=true&bandwidth=1000&loadMetadataOnly=false&verbose=true&skipIfExists=true

STOP:
description: stop the load job
example:
http://localhost:28080/v1/load?path=/&opType=stop

PROGRESS:
description: get the progress of the load job
example:
http://localhost:28080/v1/load?path=/&opType=progress&progressFormat=text&verbose=true

@JiamingMai JiamingMai requested a review from elega October 10, 2023 06:36
@JiamingMai JiamingMai self-assigned this Oct 10, 2023
@JiamingMai JiamingMai added the type-feature This issue is a feature request label Oct 10, 2023
@JiamingMai JiamingMai changed the title Add RESTful API for distributed load [WIP] Add RESTful API for distributed load Oct 10, 2023
@JiamingMai JiamingMai force-pushed the add-load-restful-api branch 3 times, most recently from fce7f1d to d682742 Compare October 10, 2023 14:35
@JiamingMai JiamingMai changed the title [WIP] Add RESTful API for distributed load Add RESTful API for distributed load Oct 12, 2023
@JiamingMai
Copy link
Contributor Author

JiamingMai commented Oct 12, 2023

The following snapshots show the example of this load RESTful API:

image image

@JiamingMai JiamingMai force-pushed the add-load-restful-api branch 2 times, most recently from 2d3eb9b to 426c1cd Compare October 12, 2023 07:33
/**
* This base under file system status iterator is for listing files iteratively.
*/
public class BaseUfsStatusIterator implements Iterator<UfsStatus> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used for listing the statuses of files only? If so, can rename it to UfsFileStatusIterator for clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the iterator can be made of type Iterator<UfsFileStatus>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the class to UfsFileStatusIterator. But changing the type to Iterator<UfsFileStatus> may break too many codes. Let's still use Iterator<UfsStatus> in this PR.


private final String mPath;

private final ListOptions mOptions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mOptions is unused? I think you need to at least check the recursive flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this mOptions.


private void initQueue(String path) {
try {
UfsStatus[] statuses = mUfs.listStatus(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UfsStatus[] statuses = mUfs.listStatus(path);
UfsStatus[] statuses = mUfs.listStatus(path, mOptions);

Copy link
Contributor Author

@JiamingMai JiamingMai Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do this becasuse we don't want it to listStatus recursively. Otherwise the feature partial list is not implemented.

return null;
}
return Iterators.forArray(result);
return new BaseUfsStatusIterator(this, path, options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contract of listStatusIterable requires:

   * @return An iterator of ufs status. Returns
   *  {@code null} if this abstract pathname does not denote a directory.

So you need to check if path is a file, and return null in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 135 to 142
private HttpLoadOptions() {
}

private void setOpType(OpType opType) {
mOpType = opType;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of having private setters, you can put all the options as arguments to the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a builder pattern, so it would be better to put them into setters, but not constructor since this allows user to set different fields independently. Otherwise, user has to call a constructor setting all the fields at once, this breaks builder pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have setters on HttpLoadOptions.Builder, so you can remove the setters on the HttpLoadOptions class.


private static final JobProgressReportFormat DEFAULT_FORMAT = JobProgressReportFormat.TEXT;

private static final String JOB_TYPE = "load";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringly typed? is there an enum for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I used enum instead.

Comment on lines 119 to 135
private OpType mOpType;

private boolean mPartialListing;

private boolean mVerify;

private long mBandWidth;

private String mProgressFormat;

private boolean mVerbose;

private boolean mLoadMetadataOnly;

private boolean mSkipIfExists;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think null is a sensible default value for mOpType, or 0 for mBandwidth.
Make them Optional<> and default to empty, as a user may not explicitly set a value for these fields via the HTTP URL params. Let the callers decide whatever the sensible default values are.

Copy link
Contributor Author

@JiamingMai JiamingMai Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I used OptionLong for mBandWidth. As for opType, user must specify the exact type, and we will check the parameter and throw exception if the parameter comes from HTTP request is not correct. So I think opType is OK without Optional.

@JiamingMai JiamingMai force-pushed the add-load-restful-api branch 6 times, most recently from fa42afc to 38b31c9 Compare October 12, 2023 09:41
@JiamingMai JiamingMai force-pushed the add-load-restful-api branch 2 times, most recently from 6e53d4c to 4ea7946 Compare October 16, 2023 02:57
Copy link
Contributor

@dbw9580 dbw9580 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JiamingMai
Copy link
Contributor Author

alluxio-bot, merge this please

@alluxio-bot alluxio-bot merged commit 7c3b462 into Alluxio:main Oct 17, 2023
14 checks passed
@jja725
Copy link
Contributor

jja725 commented Oct 17, 2023

The UfsFileStatusIterator implementation is incorrect given the following structure.

    AlluxioURI uriA = new AlluxioURI("/testRoot/testFileA");
    AlluxioURI uriB = new AlluxioURI("/testRoot/testFileB");
    AlluxioURI uriC = new AlluxioURI("/testRoot/testDirectory/testFileC");

This would give us /testRoot/testFileA, /testRoot/testFileA & /testRoot/testFileC

@JiamingMai JiamingMai deleted the add-load-restful-api branch November 20, 2023 10:34
ssz1997 pushed a commit to ssz1997/alluxio that referenced this pull request Dec 15, 2023
Add RESTful API for distributed load.

### Usage:
**SUBMIT:**
description: submit a load job
example:
`http://localhost:28080/v1/load?path=/&opType=submit&partialListing=false&&verify=true&bandwidth=1000&loadMetadataOnly=false&verbose=true&skipIfExists=true`

**STOP:**
description: stop the load job
example:
http://localhost:28080/v1/load?path=/&opType=stop

**PROGRESS:**
description: get the progress of the load job
example:
`http://localhost:28080/v1/load?path=/&opType=progress&progressFormat=text&verbose=true`
			pr-link: Alluxio#18254
			change-id: cid-8a2e4f6747d7ba6cb8c25032cc13ce4ec719da8f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature This issue is a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants