Skip to content

SEL-Columbia/mopup_stream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Commandline tool to split csv

NPM

This small tool spawn off from our need during the Nigeria MDGs Info System data mopup process, we needed to process millions of lines of csv file with a constraint of memory, and a good way to go was to split the csv based on one column and have each be processed separately in R.

We used streams to pick up one line at a time and dump the result to the output directory.

To use, do

csv-split -i [file_name] -b [group_by_column] [-o [output_directory]]

if output_directory is not specified, it will default to [file_name]_by_[group]

you can also pipe from stdin:

cat data.csv | csv-split -b [group_by_column]

  Usage: csv-splitter [options]

  Options:

    -h, --help                output usage information
    -V, --version             output the version number
    -i, --input <file>        select an input csv
    -o, --output <directory>  select an output directory
    -b, --groupby <group>     the column you want to group by

About

streamlining mopup data manipulations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published