Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to upload the parquet file to s3? #130

Open
aijazkhan81 opened this issue Aug 6, 2021 · 3 comments
Open

How to upload the parquet file to s3? #130

aijazkhan81 opened this issue Aug 6, 2021 · 3 comments

Comments

@aijazkhan81
Copy link

var writer = await parquet.ParquetWriter.openFile(schema, 'fruits.parquet');
await writer.appendRow({name: 'apples', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.appendRow({name: 'oranges', quantity: 10, price: 2.5, date: new Date(), in_stock: true});
await writer.close();

I have done this part, and the file gets saved locally. How to attach the file to a variable? If I can attach it to a variable, it will be easier to upload the file.

@sambonbonne
Copy link

If it can help you, I managed to do it by using pure stream, but I don't know if appendRow is compatible with stream mode:

  1. I receive my stream from a request, here I will name it sourceStream (you need to create your own Readable stream I guess)
  2. I create a ParquetTransformer, here I will name it parquetStream and pipe it to sourceStream
  3. I create an AWS S3 putObjectRequest with the stream as Body, using the official AWS SDK
// say we already have sourceStream and `parquetStream`
s3Bucket.upload({
  Bucket: 'bucketName',
  Key: 'path/of/the/file',
  Body: sourceStream.pipe(parquetStream)
}); // here I do a .promise() but this is for my usage

Using stream can have the advantage of saving RAM.

@eliasrosa
Copy link

@sambonbonne, my friend!

Can you put a more complete example, please!

I don't understand anything about stream.

Thank you very much!

@sambonbonne
Copy link

@eliasrosa I'm sorry, I don't know how to make a more complete example. I can add some variables or something but I'm not sure it will help:

const parquetStream = new ParquetTransformer({ /* your parquet and transform parameters */ });

// saying you already have a Readable source stream as sourceStream
const conversionStream = sourceStream.pipe(parquetStream);

s3Bucket.upload({
  Bucket: 'bucketName',
  Key: 'path/of/the/file',
  Body: conversionStream
});

I don't want to discourage you but I think you should not try to use streams without understanding those. Streams are important in NodeJS and have multiple advantages, maybe learning more about streams would be useful for you if you use NodeJS.

(I hope you won't take this answer as an attack, I just don't know how I can help better)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants