Skip to content

Data simulation software that creates data sets with particular characteristics

License

Notifications You must be signed in to change notification settings

EpistasisLab/hibachi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hibachi

Data simulation software that creates data sets with particular characteristics

usage: hib.py [-h] [-e EVALUATION] [-f FILE] [-g GENERATIONS]
              [-i INFORMATION_GAIN] [-m MODEL_FILE] [-o OUTDIR]
              [-p POPULATION] [-r RANDOM_DATA_FILES] [-s SEED] [-A]
              [-C COLUMNS] [-F] [-P PERCENT] [-R ROWS] [-S] [-T]

Run hibachi evaluations on your data

optional arguments:
  -h, --help            show this help message and exit
  -e EVALUATION, --evaluation EVALUATION
                        name of evaluation
                        [normal|folds|subsets|noise|oddsratio]
                        (default=normal) note: oddsration sets columns == 10
  -f FILE, --file FILE  name of training data file (REQ) filename of random
                        will create all data
  -g GENERATIONS, --generations GENERATIONS
                        number of generations (default=40)
  -i INFORMATION_GAIN, --information_gain INFORMATION_GAIN
                        information gain 2 way or 3 way (default=2)
  -m MODEL_FILE, --model_file MODEL_FILE
                        model file to use to create Class from; otherwise
                        analyze data for new model. Other options available
                        when using -m: [f,o,s,P,T]
  -o OUTDIR, --outdir OUTDIR
                        name of output directory (default = .) Note: the
                        directory will be created if it does not exist
  -p POPULATION, --population POPULATION
                        size of population (default=100)
  -r RANDOM_DATA_FILES, --random_data_files RANDOM_DATA_FILES
                        number of random data to use instead of files
                        (default=0)
  -s SEED, --seed SEED  random seed to use (default=random value 1-1000)
  -A, --showallfitnesses
                        show all fitnesses in a multi objective optimization
  -C COLUMNS, --columns COLUMNS
                        random data columns (default=3) note: evaluation of
                        oddsratio sets columns to 10
  -F, --fitness         plot fitness results
  -P PERCENT, --percent PERCENT
                        percentage of case for case/control (default=25)
  -R ROWS, --rows ROWS  random data rows (default=1000)
  -S, --statistics      plot statistics
  -T, --trees           plot best individual trees

Prerequisites:

  graphviz libraries
  Python 3.4+ and packages
    argparse
    collections
    csv
    deap
    glob
    itertools
    math
    matplotlib
    networkx
    numpy 
    operator
    os
    pandas
    pygraphviz
    random
    scipy
    sklearn
    sys
    time

About

Data simulation software that creates data sets with particular characteristics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published