Skip to content

🎲 Simple statistical functions implemented in readable Python.

License

Notifications You must be signed in to change notification settings

sheriferson/simplestatistics

Repository files navigation

simplestatistics

Circle CI codecov Documentation Status PyPI version

simple-statistics for Python.

simplestatistics is compatible with Python 3.

Version 0.4.0 was the last version to not use Python 3 specific features. Going forward, simplestatistics will adopt Python 3 features (e.g., type hints).

Installation

Install the current PyPI release:

pip install simplestatistics

Or install the development version from GitHub:

pip install git+https://github.com/sheriferson/simplestatistics

Usage

>>> import simplestatistics as ss
>>> ss.mean([1, 2, 3])
2.0
>>> ss.t_test([1, 2, 2.4, 3, 0.9], 2)
-0.3461277235039039

Documentation

You can read the documentation online.

Or you can generate it yourself:

Inside simplestatistics/.

make html

Documentation will be generated in _build/html/.

Tests

To run all doctests and see test coverage:

pip install -r requirements.txt
pytest simplestatistics --doctest-modules --cov=simplestatistics

The code adheres to PEP8 guidelines except for the following checkers:

  • invalid-name
  • len-as-condition
  • superfluous-parens
  • unidiomatic-typecheck

To lint the code, make sure you have [pylint] installed (pip install pylint), cd into the simplestatistics/statistics directory, then run:

pylint -d 'invalid-name, len-as-condition, superfluous-parens, unidiomatic-typecheck' *.py

Functions and examples

Descriptive statistics

Function Example
Min min([-3, 0, 3])
Max max([1, 2, 3])
Sum sum([1, 2, 3.5])
Quantiles quantile([3, 6, 7, 8, 8, 9, 10, 13, 15, 16, 20], [0.25, 0.75])
Product product([1.25, 2.75], [2.5, 3.40])

Measures of central tendency

Function Example
Mean mean([1, 2, 3])
Median median([10, 2, -5, -1])
Mode mode([2, 1, 3, 2, 1])
Geometric mean geometric_mean([1, 10])
Harmonic mean harmonic_mean([1, 2, 4])
Root mean square root_mean_square([1, -1, 1, -1])
Add to mean add_to_mean(40, 4, (10, 12))
Skewness skew([1, 2, 5])
Kurtosis kurtosis([1, 2, 3, 4, 5])

Measures of dispersion

Function Example
Sample and population variance variance([1, 2, 3], sample = True)
Sample and population Standard deviation standard_deviation([1, 2, 3], sample = True)
Sample and population Coefficient of variation coefficient_of_variation([1, 2, 3], sample = True)
Interquartile range interquartile_range([1, 3, 5, 7])
Sum of Nth power deviations sum_nth_power_deviations([-1, 0, 2, 4], 3)
Sample and population Standard scores (z-scores) z_scores([-2, -1, 0, 1, 2], sample = True)

Linear regression

Function Example
Simple linear regression linear_regression([1, 2, 3, 4, 5], [4, 4.5, 5.5, 5.3, 6])
Linear regression line function generator linear_regression_line([.5, 9.5])([1, 2, 3])

Similarity

Function Example
Correlation correlate([2, 1, 0, -1, -2, -3, -4, -5], [0, 1, 1, 2, 3, 2, 4, 5])
Covariance covariance([1,2,3,4,5,6], [6,5,4,3,2,1])

Distributions

Function Example
Factorial factorial(20) or factorial([1, 5, 20])
Choose choose(5, 3)
Normal distribution normal(4, 8, 2) or normal([1, 4], 8, 2)
Binomial distribution binomial(4, 12, 0.2) or binomial([3,4,5], 12, 0.5)
Bernoulli distribution bernoulli(0.25)
Poisson distribution poisson(3, [0, 1, 2, 3])
Gamma function gamma_function([1, 2, 3, 4, 5])
Beta distribution beta([.1, .2, .3], 5, 2)
One-sample t-test t_test([1, 2, 3, 4, 5, 6], 3.385)
Chi Squared Distribution Table chi_squared_dist_table(k = 10, p = .01)

Classifiers

Function Example
Naive Bayesian classifier See documentation for examples of how to train and classify.
Perceptron See documentation for examples of how to train and classify.

Errors

Function Example
Gauss error function error_function(1)

Hyperbolic functions

Function Example
sinh sinh(2)
cosh cosh(2.5)
tanh tanh(.2)

Spirit and rules

  • Everything should be implemented in raw, organic, locally sourced Python.
  • Use libraries only if you have to and only when unrelated to the math/statistics. For example, from functools import reduce to make reduce available for those using python3. That's okay, because it's about making Python work and not about making the stats easier.
  • It's okay to use operators and functions if they correspond to regular calculator buttons. For example, all calculators have a built-in square root function, so there is no need to implement that ourselves, we can use math.sqrt(). Anything beyond that, like mean, median, we have to write ourselves.

Pull requests are welcome!

Contributors