-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stocator should create folder names with a trailing '/' in IBM COS #210
Comments
@mariobriggs I will handle this. Thanks |
This issue is also breaking our Apache Spark reads of part files. The Apache Spark writes of the part files are creating a 0 byte directory file with no trailing slash. When we add the ending slash to the directory file that gets created the reads work again. |
@gilv how is the progress coming on a fix? |
I am also seeing this as a problem in our project. Thanks @gilv for looking into it. |
Is this issue fixed now after 16 months? I am still seeing an empty file being created. |
@robin-sun why there is a problem with an empty file? if you write "foo" file with Stocator via Spark it will be
You can now use Spark to read "foo" again and all works. If you list object storage via CLI you will see empty file "foo" and "foo/_SUCCESS". Why this is a a problem? |
Hi Gil, But I guess the question is really, why do we need an empty file if it is not used/useful at all. |
I think the real problem is this... if u wrote to COS using stocator, then u are forced that all your reader clients need to be using stocator as well. The latter is not under your control and therefore problematic.
thanksMario
----- Original message -----From: Robin Sun <notifications@github.com>To: CODAIT/stocator <stocator@noreply.github.com>Cc: Mario Briggs <mario.briggs@in.ibm.com>, Mention <mention@noreply.github.com>Subject: [EXTERNAL] Re: [CODAIT/stocator] Stocator should create folder names with a trailing '/' in IBM COS (#210)Date: Tue, Dec 15, 2020 4:00 PM
Hi Gil,This is causing errors when downloading the whole parent folder to a Windows OS as Windows doesn't support file/folder with the same name. I will have to download the output folder 1 by 1.
But I guess the question is really, why do we need an empty file if it is not used/useful at all.
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi Mario/Gil Could you help me understand, why do we need an empty file there? |
@mariobriggs @robin-sun empty file name to simulate a folder in object storage is not invented by Stocator, but used in other Big Data systems. This is easiest way for Hadoop eco-system to mark a "folder".. So the compatibility with Windows indeed has issues with such approach. We need empty object since it has Stocator specific metadata. If you just need to download all data created by Stocator to Windows, then just write some script that will ignore empty objects. |
I am using Stocator via Spark to write a dataframe to IBM COS
in the above call, stocator creates the folder 'call_center' in IBM COS. However stocator does not create the folder name with a trailing '/' and as a result this messes up reading of these IBM COS folders when using other tools like Alluxio, CyberDuck etc.
Below is an example of the CyberDuck UI. Notice the folder 'call_center' is listed as a 0 byte sized file as well.
Browsing through the stocator code, i see the code commented out to create the foldername with a trailing '/' and using a build where it is uncommented solved the issue.
Look forward to a fix
The text was updated successfully, but these errors were encountered: