Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Updated all calls to get file system through FSUtils.getFs() #191

Closed
wants to merge 1 commit into from

Conversation

gekath
Copy link
Contributor

@gekath gekath commented Jun 6, 2017

Updates all consumers of FSUtils.getFs() to accept a path, and infers scheme based on given path and/or configuration, instead of inferring scheme through fs.defaultFS. Calls to get a filesystem go through FSUtils.getFs(), aimed to support multiple filesystems.

/fix #96
/cc @vinothchandar @prazanna @zqureshi

@gekath gekath force-pushed the support-mult-fs branch 2 times, most recently from 1bcce25 to ef5a407 Compare June 6, 2017 20:53
private static DistributedFileSystem dfs;
private static Logger logger = LogManager.getLogger(TestMultipleFSExample.class);

@Parameter(names={"--table-path", "-p"}, description = "path for Hoodie sample table")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a test, and not a CLI right. so why the parameters?

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gekath Have you been able to test this out in production? This is rather big change, so would like to understand how much testing has been done..

@gekath gekath force-pushed the support-mult-fs branch 2 times, most recently from 63b301d to d6af8c9 Compare June 19, 2017 20:36
@gekath gekath changed the title Updated all calls to get file system through FSUtils.getFs() (WIP) Updated all calls to get file system through FSUtils.getFs() Jun 19, 2017
@gekath
Copy link
Contributor Author

gekath commented Jun 19, 2017

@vinothchandar this PR focused on testing that a read from local, and write to HDFS will result in correct count of records, and this functionality should be similar between any two file systems. Our specific goal is to consider use case of reading from HDFS, writing to GCS. Will look at adding additional testing, (changed to WIP).

Key updates are in:

  1. Every call to FSUtils.getFs() has been changed to require a basePath, from which the URI is inferred, and this PR updates every consumer of getFs().
  2. In HoodieParquetWriter, registerFileSystem() now takes a path in order for getFs to access a path from which URI is inferred.

@vinothchandar
Copy link
Member

Sounds good. Will take a closer look sometime this week.
What I meant was, have you been able to test this out on any real production datasets yet?

}

public static FileSystem getFs(String path, Configuration conf) {
System.out.println(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger or please remote s.o.pln

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look safe to me. Need to look at the stuff at HoodieWrapperFileSystem more closely..
Please ping back once you have tested this more for your use-cases.. I ll pull and test on HDFS at Uber as well and we can proceed from there.

@vinothchandar
Copy link
Member

@gekath any update on this?

@gekath
Copy link
Contributor Author

gekath commented Jul 5, 2017

Hi @vinothchandar , we're working on integrating Hoodie into our current pipeline and then will proceed to test out performance.

@vinothchandar
Copy link
Member

Sounds good. please keep us posted on progress/blockers.

@prazanna
Copy link
Contributor

prazanna commented Aug 2, 2017

@gekath - Do you have any updates on this PR? Thanks.

@zqureshi
Copy link
Contributor

zqureshi commented Aug 3, 2017

@prazanna not much progress on this front yet.

@vinothchandar
Copy link
Member

from what @gekath and I discusssed offline.. Seems this is being actively tested :)

@gekath gekath changed the title (WIP) Updated all calls to get file system through FSUtils.getFs() Updated all calls to get file system through FSUtils.getFs() Aug 24, 2017
@gekath gekath force-pushed the support-mult-fs branch 2 times, most recently from 82e2216 to 3949388 Compare August 24, 2017 17:16
Merge conflicts

Merge conflicts

Removed print statement

Merge conflicts

Merge conflicts

Merge conflicts

Merge conflict

Removed default getFs that takes no arguments.
@vinothchandar
Copy link
Member

@alunarbeach do you want to drive this?

@vinothchandar vinothchandar self-assigned this Dec 4, 2017
@vinothchandar vinothchandar changed the title Updated all calls to get file system through FSUtils.getFs() (WIP) Updated all calls to get file system through FSUtils.getFs() Dec 11, 2017
@vinothchandar
Copy link
Member

Closing in favor of #293

vinishjail97 pushed a commit to vinishjail97/hudi that referenced this pull request Dec 15, 2023
Co-authored-by: StreamingFlames <18889897088@163.com>
Co-authored-by: Nicholas Jiang <programgeek@163.com>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>
Co-authored-by: Alexey Kudinkin <alexey@infinilake.com>
Co-authored-by: RexAn <bonean131@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support multiple filesystems
4 participants