Skip to content

Command line tool to quickly generate a lot of files in a lot of directories

License

Notifications You must be signed in to change notification settings

joshuaboud/gen-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gen-dataset

A command line tool to quickly generate a lot of files in a lot of directories. This tool creates an M-ary tree shaped directory tree and randomly places any number of files of any size within this tree. The distribution of files per directory is roughly equal. If a size is provided, the files will be filled with zeros up to that size. Usage example

Installation

Precompiled Static Binary

  • Download Binary
    sudo wget https://github.com/joshuaboud/gen-dataset/releases/download/v1.3/gen-dataset -P /usr/local/bin
  • Mark Executable
    sudo chmod +x /usr/local/bin/gen-dataset

From Source

  • Install Boost Development Libraries
  • Get Source and Install
    git clone https://github.com/joshuaboud/gen-dataset.git
    cd gen-dataset
    make -j8
    sudo make install

Usage

usage:
  gen-dataset  -c [-b -d -s -S -t -w -y] [path]

flags:
  -b, --branches <int>              - number of subdirectories per directory
  -c, --count <int>                 - total number of files to create
  -d, --depth <int>                 - number of directory levels
  -s, --size <float[K..T][i]B>      - file size
  -S, --buff-size <float[K..T][i]B> - write buffer size (default=1M)
  -t, --threads <int>               - number of parallel file creation threads
  -w, --max-wait <float (seconds)>  - max random wait between file creation
  -y, --yes                         - don't prompt before creating files

Example

Generate 10 1GiB files in a single subdirectory named 'subdir':

gen-dataset -c 10 -s 1GiB subdir

Generate 10,000 1M files in 3905 directories:

gen-dataset -d 5 -b 5 -c 10000 -s 1MiB

Simulate real usage by randomly waiting up to 2.5 seconds between file creations:

gen-dataset -d 4 -b 6 -c 1000 -s 1MiB -w 2.5

Generate 1,000,000 empty files in 55986 directories with 16 threads writing the files:

gen-dataset -d 6 -b 6 -c 1000000 -t 16