-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HDFS list status iterator #18295
Implement HDFS list status iterator #18295
Conversation
755e843
to
8ee2191
Compare
/** | ||
* HDFS under file system status iterator. | ||
*/ | ||
public class HdfsUfsStatusIterator implements Iterator<UfsStatus> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the HDFS version compatability of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you try build your code with HDFS 2 to see if it builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a new api in 2.7
https://hadoop.apache.org/docs/r2.7.2/api/org/apache/hadoop/fs/FileSystem.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can build the code successfully with HDFS 2.7.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested connecting HDFS node (2.7.2) with this PR's code. It works successfully.
} else { | ||
ufsStatus = new UfsFileStatus(path.getName(), alluxioUri.hash(), fileStatus.getLen(), | ||
fileStatus.getModificationTime(), fileStatus.getOwner(), fileStatus.getGroup(), | ||
fileStatus.getPermission().toShort(), mUserBlockSizeBytesDefault); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be block size be the HDFS block size instead alluxio one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I used HDFS block size instead.
fileStatus.getModificationTime()); | ||
mDirPathsToProcess.addLast(new Pair<>(path.toString(), ufsStatus)); | ||
} else { | ||
ufsStatus = new UfsFileStatus(path.getName(), alluxioUri.hash(), fileStatus.getLen(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use approximateContentHash to keep consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I used approximateContentHash
instead.
} else { | ||
ufsStatus = new UfsFileStatus(path.getName(), alluxioUri.hash(), fileStatus.getLen(), | ||
fileStatus.getModificationTime(), fileStatus.getOwner(), fileStatus.getGroup(), | ||
fileStatus.getPermission().toShort(), mUserBlockSizeBytesDefault); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
0849da3
to
36445ac
Compare
e76d7f0
to
d562dd4
Compare
alluxio-bot, merge this please |
When there are a lot of files in HDFS, it takes a large amount of time and memory to complete a `listStatus` request. Moreover, sometimes OOM occurs. This PR provides an iterator for the HDFS under file system to list files. pr-link: #18295 change-id: cid-11019e8f163210c7664f3f2b6ddf3bae27e8ee8c
When there are a lot of files in HDFS, it takes a large amount of time and memory to complete a `listStatus` request. Moreover, sometimes OOM occurs. This PR provides an iterator for the HDFS under file system to list files. pr-link: Alluxio#18295 change-id: cid-11019e8f163210c7664f3f2b6ddf3bae27e8ee8c
When there are a lot of files in HDFS, it takes a large amount of time and memory to complete a
listStatus
request. Moreover, sometimes OOM occurs. This PR provides an iterator for the HDFS under file system to list files.