Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#172621882] generate zip file with user data #43

Merged
merged 37 commits into from
May 15, 2020

Conversation

balanza
Copy link
Contributor

@balanza balanza commented May 14, 2020

The following implements the generation of a zip bundle with all user data, and save it to a storage.

The functionality is exposed as a runnable script for local execution

Run locally

yarn build && node dist/ExtractUserDataActivity/cli.js <FISCAL CODE>

Notable points

  • bundles are compressed as zip and encrypted with zip20 algorithm
  • the activity returns either an error or a object containing the name of the saved blob and the password to open the zip

Yet to be solved

  • AllUserData.encode fails on actual db data. This is important as the encoder would purge any sensitive data. While this is unsolved, a patch is applied.
  • utils/zip module should be unit tested

balanza and others added 30 commits May 7, 2020 19:30
Co-authored-by: Danilo Spinelli <gunzip@users.noreply.github.com>
Co-authored-by: Danilo Spinelli <gunzip@users.noreply.github.com>
Co-authored-by: Danilo Spinelli <gunzip@users.noreply.github.com>
Co-authored-by: Danilo Spinelli <gunzip@users.noreply.github.com>
Co-authored-by: Danilo Spinelli <gunzip@users.noreply.github.com>
@balanza balanza requested review from gunzip and AleDore May 14, 2020 20:56
@balanza balanza self-assigned this May 14, 2020
@pagopa-github-bot
Copy link
Contributor

pagopa-github-bot commented May 14, 2020

Affected stories

  • 🌟 #172621882: Come CIT, quando esprimo la volontà di scaricare i miei dati, voglio ricevere un link per il download dei dati entro 7gg dalla richiesta

New dependencies added: archiver and archiver-zip-encrypted.

archiver

Author: Chris Talkington

Description: a streaming interface for archive generation

Homepage: https://github.com/archiverjs/node-archiver

Createdover 7 years ago
Last Updatedabout 1 month ago
LicenseMIT
Maintainers1
Releases69
Direct Dependenciesarchiver-utils, async, buffer-crc32, glob, readable-stream, tar-stream and zip-stream
Keywordsarchive, archiver, stream, zip and tar
README

Archiver

a streaming interface for archive generation

Visit the API documentation for a list of all methods available.

Install

npm install archiver --save

Quick Start

// require modules
var fs = require('fs');
var archiver = require('archiver');

// create a file to stream archive data to.
var output = fs.createWriteStream(__dirname + '/example.zip');
var archive = archiver('zip', {
  zlib: { level: 9 } // Sets the compression level.
});

// listen for all archive data to be written
// 'close' event is fired only when a file descriptor is involved
output.on('close', function() {
  console.log(archive.pointer() + ' total bytes');
  console.log('archiver has been finalized and the output file descriptor has closed.');
});

// This event is fired when the data source is drained no matter what was the data source.
// It is not part of this library but rather from the NodeJS Stream API.
// @see: https://nodejs.org/api/stream.html#stream_event_end
output.on('end', function() {
  console.log('Data has been drained');
});

// good practice to catch warnings (ie stat failures and other non-blocking errors)
archive.on('warning', function(err) {
  if (err.code === 'ENOENT') {
    // log warning
  } else {
    // throw error
    throw err;
  }
});

// good practice to catch this error explicitly
archive.on('error', function(err) {
  throw err;
});

// pipe archive data to the file
archive.pipe(output);

// append a file from stream
var file1 = __dirname + '/file1.txt';
archive.append(fs.createReadStream(file1), { name: 'file1.txt' });

// append a file from string
archive.append('string cheese!', { name: 'file2.txt' });

// append a file from buffer
var buffer3 = Buffer.from('buff it!');
archive.append(buffer3, { name: 'file3.txt' });

// append a file
archive.file('file1.txt', { name: 'file4.txt' });

// append files from a sub-directory and naming it `new-subdir` within the archive
archive.directory('subdir/', 'new-subdir');

// append files from a sub-directory, putting its contents at the root of archive
archive.directory('subdir/', false);

// append files from a glob pattern
archive.glob('subdir/*.txt');

// finalize the archive (ie we are done appending files but streams have to finish yet)
// 'close', 'end' or 'finish' may be fired right after calling this method so register to them beforehand
archive.finalize();

Formats

Archiver ships with out of the box support for TAR and ZIP archives.

You can register additional formats with registerFormat.

Formats will be changing in the future to implement a middleware approach.

archiver-zip-encrypted

Author: Atrem Karpenko

Description: AES-256 and legacy Zip 2.0 encryption for Zip files

Homepage: https://github.com/artem-karpenko/archiver-zip-encrypted#readme

Createdabout 1 year ago
Last Updated3 months ago
LicenseMIT
Maintainers1
Releases9
Direct Dependenciesaes-js, archiver, archiver-utils, compress-commons and zip-stream
Keywordszip, encryption, aes, archiver and password
README

archiver-zip-encrypted

AES-256 and legacy Zip 2.0 encryption for Zip files.

Build Status
Coverage Status

Plugin for archiver that adds encryption
capabilities to Zip compression. Pure JS, no external zip software needed.

Install

npm install archiver-zip-encrypted --save

Usage

const archiver = require('archiver');

// register format for archiver
// note: only do it once per Node.js process/application, as duplicate registration will throw an error
archiver.registerFormat('zip-encrypted', require("archiver-zip-encrypted"));

// create archive and specify method of encryption and password
let archive = archiver.create('zip-encrypted', {zlib: {level: 8}, encryptionMethod: 'aes256', password: '123'});
archive.append('File contents', {name: 'file.name'})
// ... add contents to archive as usual using archiver

Encryption methods

Plugin supports 2 encryption methods:

  • 'aes256' - this is implementation of AES-256 encryption introduced by WinZip in 2003.
    It is the most safe option in regards of encryption, but limits possibilities of opening resulting archives.
    It's known to be supported by recent versions 7-Zip and WinZip. It is NOT supported by
    Linux unzip 6.00 (by Info-Zip). It is also NOT supported by Windows explorer (i.e. not possible to open Zip file as folder),
    even in Windows 10.
  • 'zip20' - this is implementation of legacy Zip 2.0 encryption (also called "ZipCrypto" in 7-Zip application).
    This is the first encryption method added to Zip format and hence is widely supported, in particular
    by standard tools in Linux and Windows. However its security is proven to be breakable
    so I would not recommend using it unless you absolutely have to make it work w/o external software.

For more information on these encryption methods and its drawbacks in particular see WinZip documentation.
It's worth noting that neither of these encryption methods encrypt file names and their metainformation,
such as original size, filesystem dates, permissions etc.

Generated by 🚫 dangerJS

ExtractUserDataActivity/cli.ts Outdated Show resolved Hide resolved
const writableBlobStream = blobService.createWriteStreamToBlockBlob(
userDataContainerName,
blobName,
(err, _) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear why do we need the callback here ? cannot we just call

const writableBlobStream = blobService.createWriteStreamToBlockBlob(userDataContainerName, blobName)

and write to the archiver stream using append ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback refers to the saving on the remote storage (docs).

We use append to add content as you'd expect, but the final result of the operation is given in the callback

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right but the callback is optional (it will write on the remote storage even if you don't provide any). why do we need it ?

Copy link
Contributor Author

@balanza balanza May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because otherwise there would not be not a way to know if and when the blob has been saved to the storage.
The flow is: zipStream --> blobStream --> blobSaving. We can listen for the end and finish event of the streams, but still the saving step can fail imho

@@ -0,0 +1,51 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using a package ie https://github.com/klughammer/node-randomstring instead of writing our custom implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will if we'll need more complex stuff, so far it's easy job and I'd rather not add another dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that instead, your code is unsafely using Math.random, we need something better:
https://github.com/klughammer/node-randomstring/blob/master/lib/randomstring.js#L9

more code = more bugs, please use randomstring

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random string isn't working for us as it doesn't ensure the string will contain some for character for any character group. I'd need to fin another lib to include

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does this requirement come from ? you can concatenate more random strings if you really need this (do we?)

Copy link
Contributor Author

@balanza balanza May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Passwords are needed to be strong". What does strong mean? I don't know. My interpretation was:

  • 18 characters
  • at least one uppercase letter
  • at least one lowercase letter
  • at least one number
  • at least one symbol
  • no pattern

Should we check with security guys?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to pietro we don't need a constraint on the number of symbols for each set, so a 18 character string with one character set that has upper/lower case alphanumeric characters and special symbols is ok

utils/zip.ts Outdated Show resolved Hide resolved
utils/zip.ts Outdated
? archiver("zip-encrypted", {
encryptionMethod,
password,
zlib: { level: 8 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd extract this value into DEFAULT_ZLIB_LEVEL

}
}
);
readableZipStream.pipe(writableBlobStream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pipe should be called before any operation on the stream (append / finalize)

Copy link
Contributor Author

@balanza balanza May 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is implicitly true already as append and finalize from the archiver module operate on another eventloop iteration. I agree we should write explicit code
I see three options:

  1. we use the archiver.append function outside of the zip module, so we can call it after the zip stream has been piped. This leads to a leak of the archiver api to the activity handler, which may not know about it
  2. we keep the archiver.append call inside the zip module, as it is now, but we enforce a process.nextTick
  3. (don't know if it works) we keep the archiver.append call inside the zip module, but we wait for the pipe event of the readable stream (so we're sure we won't load files before it has been piped to somewhere)

Keep in mind that the main reason of having a zip module is that module initialization must happen once per application, otherwise it would throw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd simply put all the implementation here and remove the zip module. importing modules must not have side effects, even more if this may cause an exception. if you need to initialize something, implement some init() method and make it safe.

Comment on lines 219 to 223
createCompressedStream: (
// tslint:disable-next-line: no-any
data: Record<string, any>,
password: NonEmptyString
) => stream.Readable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would have been useful to pass this as a parameter in case you'd used some generic typings
(ie. () => stream.Readable) but since this is the exact type of createCompressedStream then you can just import the method and use it directly without passing it as a parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it mainly for testing, so it's easier to mock such function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll find a way to mock it anyway ie jest.mock("../zip")

balanza and others added 2 commits May 15, 2020 10:16
Co-authored-by: Danilo Spinelli <gunzip@users.noreply.github.com>
Co-authored-by: Danilo Spinelli <gunzip@users.noreply.github.com>
@@ -0,0 +1,51 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that instead, your code is unsafely using Math.random, we need something better:
https://github.com/klughammer/node-randomstring/blob/master/lib/randomstring.js#L9

more code = more bugs, please use randomstring

@codecov-io
Copy link

Codecov Report

Merging #43 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #43   +/-   ##
=======================================
  Coverage   86.91%   86.91%           
=======================================
  Files          24       24           
  Lines         795      795           
  Branches       51       51           
=======================================
  Hits          691      691           
  Misses        103      103           
  Partials        1        1           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ee1a8d...6ee1a8d. Read the comment docs.

* Mock implementation of the zip module
*/

export const createCompressedStream = jest.fn(() => ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally we don't need this kind of mocks, simply call jest.mock(<module path>) will transform every exported functions form that module into a mock (jest.fn)

@@ -0,0 +1,51 @@
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to pietro we don't need a constraint on the number of symbols for each set, so a 18 character string with one character set that has upper/lower case alphanumeric characters and special symbols is ok

]);
export type StrongPassword = t.TypeOf<typeof StrongPassword>;

const shuffleString = (str: string): string => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the comment above, remove this module and just generate a 18 character string from the needed charset

password?: NonEmptyString,
encryptionMethod: EncryptionMethodEnum = EncryptionMethodEnum.ZIP20
): Readable => {
const zipArchive = password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case we want a passwordless zip ? this is not needed.

}
}
);
readableZipStream.pipe(writableBlobStream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd simply put all the implementation here and remove the zip module. importing modules must not have side effects, even more if this may cause an exception. if you need to initialize something, implement some init() method and make it safe.

@gunzip gunzip merged commit 7347aae into master May 15, 2020
@gunzip gunzip deleted the 172621882-accesso-ai-dati branch May 15, 2020 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants