An R package for automatic exploratory data analysis
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
The Automatic Statistician works on macOS and Linux systems, that have a working installation of the python/R data science platform Anaconda. Additionally, XQuartz needs to be installed on macOS.
Make sure that in your R environment the devtools
package is installed, or install it with:
install.packages("devtools")
Now, install the tpotr R package from GitHub as follows:
devtools::install_github("thllwg/AutoStatR")
The Automatic Statistician uses python libraries (e.g. TPOT via tpotr). The required python libraries are installed during package load. On package load, the systems verifies the availability of the required dependencies and installs them (again) if necessary.
AutoStatR is not delivered with automated unit tests. Instead, the examples in the example directory can be used to test the functionality. Usage is as simple as it should be. Select as dataset and use the single function interface of AutoStatR to get the dataset analysed:
# Load the iris dataset
data(iris)
# perform a train_test_split on the dataset
smp_size <- floor(0.95 * nrow(iris))
train_ind <- sample(seq_len(nrow(iris)), size = smp_size)
train <- iris[train_ind, ]
test <- iris[-train_ind, ]
# Call the automatic statistician
autostatr(data=train, data_to_predict=test, target="Species", type="classif", title="Iris")
You can read more about tpotr and its application in AutoML in the corresponding docs.
- Thorben Hellweg - thllwg
- Christopher Olbrich - ChristopherOlbrich
- Christian Werner - Bl7tzcrank