Generate random CSVs.
This project is intended to provide:
- A publicly available Python package for generating random comma separated values.
- A utility for generating random comma separated values via command line interface.
Where the purpose of 1. is further integration of randcsv
with automated testing suits.
A modern (>=3.6) version of Python is required to use randcsv
.
The randcsv
logic uses the secrets
library released with Python 3.6 to generate "random" values and
make "random" decisions. While the secrets
library can be used to produce cryptographically secure
random numbers, it is advised users review the source directly (pertinent functions found
here) to ensure this
particular implementation is suitable for their needs when cryptographic security is a concern.
The package is publicly hosted on PyPI under the name randcsv
; you can install it using pip
.
- Install
randcsv
.
$ pip install randcsv
Collecting randcsv
Downloading randcsv-0.1.3-py3-none-any.whl (10 kB)
Installing collected packages: randcsv
Successfully installed randcsv-0.1.3
The randcsv
API consists of a single class definition, RandCSV
. Example usage is shown below.
from randcsv import RandCSV
# Make a 10 x 4 CSV with title and index.
#
# Use all available data types: integer,
# token, and float.
#
# Approx. 10% NaN values, 15% empty values (implies
# approx. 75% randomly distributed "regular" values).
data = RandCSV(
10,
4,
byte_size=8,
data_types=['integer', 'token', 'float'],
nan_freq=.1,
empty_freq=.15,
index_col=True,
title_row=True,
)
# The data.data property would then contain a list of random
# value lists, where the shape of data.data would be: 10 x 4.
# Save the CSV to a file `example.csv`
data.to_file('example.csv')
You should then find a file example.csv
contained in the current working directory.
An example output is shown below:
0 | 1 | 2 | 3 |
---|---|---|---|
1 | 0.5733712036037724 | -eLl9GnlEXo | |
2 | nan | ||
3 | RT3zxzTg4KI | nan | e2gOPMuGUGk |
4 | 12957925104777645606 | 0.13727825684393494 | 57589281133002397 |
5 | 0.46730821418402785 | 0.7212639567220399 | 10156229384055835642 |
6 | 2884154713072591035 | 0.36739108321888597 | 0.9194898822958113 |
7 | 17487691859213678632 | MORTDt3Y6Vc | 680401081312304743 |
8 | 0.6864180672941529 | 16386949079868257309 | nX-IUxLb-A8 |
9 | 0.3868689478103007 | uZsUJyCLRU8 |
n.b. The CSV shape will be M x N (-m
x -n
) including a title row and index column,
if applicable.
- (2, 1) and (2, 2) are examples of empty values
- (3, 2) and (2, 3) are examples of NaN values
- (5, 1) and (8, 1) are examples of floating point data types [0, 1)
- (7, 2) and (8, 3) are examples of token data types
- (7, 1) and (6, 1) are examples of integer data types
n.b. The error associated with the frequency of value types has been empirically tested at < 10% for 10,000 randomly generated regular, NaN, and None (empty) values.
The recommended way to install the randcsv CLI is using pipx
which requires Python version >=3.6
.
A step-by-step installation is shown here (performed on Ubuntu 20.04).
- Install
pipx
usingpip
.
$ python3 -m pip install --user pipx
Collecting pipx
.... (output has been truncated)
Installing collected packages: pyparsing, packaging, argcomplete, click, distro, userpath, pipx
WARNING: The script distro is installed in '/home/<username>/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script userpath is installed in '/home/<username>/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
WARNING: The script pipx is installed in '/home/<username>/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed argcomplete-1.12.1 click-7.1.2 distro-1.5.0 packaging-20.4 pipx-0.15.5.1 pyparsing-2.4.7 userpath-1.4.1
- As the warning contained in the output of the previous command, we now will ensure all required
pipx
scripts are available on PATH.
$ python3 -m pipx ensurepath
Success! Added /home/<username>/.local/bin to the PATH environment
variable.
/home/<username>/.local/bin has been been added to PATH, but you need to
open a new terminal or re-login for this PATH change to take
effect.
Consider adding shell completions for pipx. Run 'pipx completions' for
instructions.
You will need to open a new terminal or re-login for the PATH changes
to take effect.
Otherwise pipx is ready to go! ✨ 🌟 ✨
- Install the randcsv CLI.
$ pipx install randcsv
installed package randcsv 0.1.3, Python 3.8.3
These apps are now globally available
- randcsv
done! ✨ 🌟 ✨
The randcsv command line tool makes available the following configuration parameters:
n.b. All commands are available via long-hand and short-hand flags. So-called long-hand
flags begin with two (2) hyphens --
and short-hand flags begin with one (1) hyphen -
.
-
--rows
,-m
Integer (Required)- Number of rows the desired CSV file contains.
-
--cols
,n
Integer (Required)- Number of columns the desired CSV file contains.
-
--output
,-o
String (Optional. Default:--output rand.csv
)- Output file name.
-
--data-types
,-d
List (Optional. Default:--data-types integer
)- Data types present in the desired CSV file. Supported data types are: token, integer, float.
This argument accepts multiple values. Example:
--data-types float integer token
, or any combination thereof. If more than one data type is provided, the logic randomly selects one of the provided data types on a per-value basis.
- Data types present in the desired CSV file. Supported data types are: token, integer, float.
This argument accepts multiple values. Example:
-
--nan-freq
,-a
Float (Optional. Default:--nan-freq 0.0
)- Frequency of NaN values contained in desired CSV file. Example:
--nan-freq 0.25
, implies 25% of all the values in an infinite CSV file will benan
.
- Frequency of NaN values contained in desired CSV file. Example:
-
--empty-freq
,-e
Float (Optional. Default:--empty-freq 0.0
)- Frequency of empty values contained in desired CSV file. Example:
--empty-freq 0.25
, implies 25% of all the values in an infinite CSV file will be `` (no value).
- Frequency of empty values contained in desired CSV file. Example:
-
--index
,-i
Boolean (Optional. Default: omit flag)- Flag signaling whether the left most column should be a row index (ascending integer).
-
--title
,-t
Boolean (Optional. Default: omit flag)- Flag signaling whether the top most row should be a column index (ascending integer).
-
--byte-size
,-b
Integer (Optional. Default:--byte-size 8
)- Number of bytes used to generate the random values. Increasing the byte size will increase the size of the set of possible random values.
If you would like to file a bug, or make a suggestion please use the GitHub issue tracker.
You can find the source documented online at Read the Docs.