PlottingVideo.mp4
EDA.mp4
AppThemes.mp4
Note: if you're only looking to update Quantico, you only have to reinstall the Quantico package below in Step 4.
If you are setting up R for the first time run Steps 1-3
Step 1: install the "R-release" version of rtools and have it placed in your C:\
drive https://cran.r-project.org/bin/windows/Rtools/
Step 2: install R https://cran.r-project.org/bin/windows/base/
Step 3: install RStudio Desktop https://posit.co/download/rstudio-desktop/
Step 4: Install package dependencies:
Click to expand
options(install.packages.compile.from.source = "always")
# CRAN Packages
install.packages("AutoPlots")
install.packages("devtools")
install.packages("data.table")
install.packages("collapse")
install.packages("bit64")
install.packages("doParallel")
install.packages("foreach")
install.packages("lubridate")
install.packages("timeDate")
install.packages("combinat")
install.packages("DBI")
install.packages("e1071")
install.packages("fBasics")
install.packages("itertools")
install.packages("MLmetrics")
install.packages("nortest")
install.packages("pROC")
install.packages("RColorBrewer")
install.packages("RPostgres")
install.packages("Rfast")
install.packages("stringr")
install.packages("xgboost")
install.packages("lightgbm")
install.packages("regmedint")
install.packages("RCurl")
install.packages("jsonlite")
install.packages("h2o")
install.packages("AzureStor")
install.packages("gitlink")
install.packages("arrow")
install.packages("reactable")
install.packages("DT")
install.packages("shiny")
install.packages("shinydashboard")
install.packages("shinyWidgets")
install.packages("shiny.fluent")
install.packages("shinyjs")
install.packages("shinyjqui")
install.packages("shinyAce")
install.packages("shinybusy")
install.packages("gyro")
install.packages("arrangements")
install.packages("echarts4r")
install.packages('tidytext')
install.packages('tibble')
install.packages('stopwords')
install.packages('SentimentAnalysis')
install.packages('quanteda')
install.packages('quanteda.textstats')
install.packages('datamods')
install.packages('phosphoricons')
install.packages('correlation')
# GitHub Packages
devtools::install_url('https://github.com/catboost/catboost/releases/download/v1.2/catboost-R-Windows-1.2.tgz', INSTALL_opts = c("--no-multiarch", "--no-test-load"))
devtools::install_github("AdrianAntico/prettydoc", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/AutoNLP", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/Rodeo", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/AutoQuant", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/esquisse", upgrade = FALSE, dependencies = FALSE, force = TRUE)
devtools::install_github("AdrianAntico/Quantico", upgrade = FALSE, dependencies = FALSE, force = TRUE)
In your RStudio session, run the function Quantico::runQuantico()
to kick off a Quantico session
Easy start
# Optionally, you can change up the WorkingDirectory argument for your desired file path location
# Note: For the best user experience I recommend using Chrome and having the zoom level set to 75%
Quantico::runQuantico(WorkingDirectory = getwd())
If you have a PostGRE installation you can add in the PostGRE parameters (or just pass them in while in session)
# Optionally, you can change up the WorkingDirectory argument for your desired file path location (don't forget to use these "/" instead of these "\" in your path)
# Note: For the best user experience I recommend using Chrome and having the zoom level set to 75%
Quantico::runQuantico(
MaxTabs = 2L,
WorkingDirectory = getwd(),
PostGRE_DBNames = NULL, # list of database names you want connected
PostGRE_Host = 'localhost',
PostGRE_Port = 54321,
PostGRE_User = '...',
PostGRE_Password = '...')
- Background
- App Interface
- Documentation
- Data Management
- Code Generation
- Visualization
- Table Viewing
- Exploratory Data Analysis
- Data Wrangling
- Feature Engineering
- Unsupervised Learning
- Inference
- Inference Reporting
- Machine Learning
- Machine Learning Reporting
- Forecasting
- Forecasting Reporting
Quantico
is a Shiny App for data science, analytics, and business intelligence. The app is non-reactive where big data can cause a poor user experience. All data operations utilize data.table for fast processing and low memory utilization. Visualizations are based on the echarts4r library, machine learning algos currently include CatBoost, XGBoost, LightGBM, and some of the H2O models. Time series models are based on the forecast pacakge. Panel forecast models are ML-backed and can utilize CatBoost, XGBoost, or LightGBM. Data can be accessed via PostGRE or locally (currently), and session saving and restoration is available. There are 15 different colored app themes along with the inclusion of various background images if a user wants to zone out for a bit.
The fundamental goal of Quantico is to make life easier. While there are several GUI's available in the R ecosystem, I haven't found one that really serves my needs. I want to be able to explore data quickly and produce results that can be shared across an organization, as an example. Some of the tasks can take anywhere from an hour to a full day in a typical coding environment (possibly more or less dependent upon one's skills of course) while they can produced within minutes with Quantico. Another aspect is handling big data. The data.table package is utilized and can process big data quickly while keeping your memory footprint small, thus enabling larger datasets to be managed within the app for a given device. Lastly, I would like to be able to transition from an in-app experience to a coding environment with ease, which is handled nicely with the code generation part of the app. If I need to take something to the next level that the app doesn't support, I can grab the code and pick up where I left off in my favorite IDE.
The primary goals of the app design is make it easy and fast to use, and to create a look and feel that is fun to use. The way the app layout works is that the sidebar is predominantly intended for setting up inputs and running various tasks (aside from the settings options) while the main panel is for displaying various outputs. With this design, I am able to maximize the space available for viewing output.
Note:
For the best viewing experience I recommend using Chrome and having the zoom level set to 75%
Tasks
- Data Management
- Session Saving & Restoration
- Code Generation
- Visualization
- Data Viewer
- Data Wrangling
- Feature Engineering
- Unsupervised Learning
- Machine Learning
- Statistical Inference
- Forecasting
In-App Output:
- Multi-Plot Visualization
- Multi-Data Viewer
- Exploratory Data Analysis
- Statistical Inference
- Machine Learning
- Forecasting
Export Output:
- Multi-Plot Visualization
- Exploratory Data Analysis
- Machine Learning
- Forecasting
The documentation is located in the Home Tab in the Documentation tab. There is a side bar full of hyperlinks to speed up navigation. You simply click the topic of choice (and perhaps again if there are sub-categories) and the app will navigate to that location.
On the side bar, under Load / Save, you have a few options:
- Local
- Sessions
- PostGRE
With the local modal you can load and save:
- csv data
- parquet data
- machine learning models
You can save your session state and reload this at a later time. You can have a pre-configured plot output setup that you don't want to have to recreate every time you run the app. This would be similar to having saved reports. Further, all output panels will re-populate with what was previously setup at the time of the last save.
- Query data
- Create tables
- Create databases
- Remove tables
- Remove databases
The Code generation tab returns the code that was used to execute the various tasks and generate output. You can select from a variety of code themes as well. This can be really helpful to those who are looking to kickstart a project and then convert to a coding environment later. Some output can simply be generated much more quickly utilizing the app so this should be a time saver even to the most seasoned programmers.
Plotting is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. One of the goals is to make plotting as easy as possible. You don't have to pre-aggrgate your data for plotting purposes since those steps will be carried out for you (although it can be). Just pass in your data and utilize the inputs to tell the software what you want.
Distribution | Aggregate | Time Series | Relationship | Model Evaluation |
---|---|---|---|---|
Histogram | Barplot | Line | Correlogram | Residuals |
Density | Stacked Barplot | Area | Parallel | Residulas Scatter |
Boxplot | 3D Barplot | Step | Scatter | Partial Dependence Line |
Word Cloud | Heatmap | River | 3D Scatter | Partial Dependence Heatmap |
Probability Plot | Radar | Autocorrelation | Copula | Calibration Line |
Piechart | Partial Autocorr | 3D Copula | Calibration Boxplot | |
Donut | Variable Importance | |||
Rosetype | Shapley Importance | |||
ROC Plot | ||||
Confusion Matrix | ||||
Gains | ||||
Lift |
For the plots that enable faceting you only have to select the number of columns and rows and the app will take care of the rest. Note that, if your group variable contains more levels than the total allotted facet grid and you didn't subset the group levels to match that count, in the case that there are more levels than grid elements then the levels with the most records will be displayed first. Ties go to ABC order.
Since the software will automatically aggregate your data (for some of the plot types) you can specify how you'd like your data aggregated. Below is a list of options:
count
Counts of values by group. Here, you need to select any of the numeric YVars available in your data just so it doesn`t create an error for a missing YVarproportion
Proportion of total by group. Here, you need to select any of the numeric YVars available in your data just so it doesn`t create an error for a missing YVarmean
meanabs
(absolute values are taken first, then the measure)median
medianabs
(absolute values are taken first, then the measure)sum
sumabs
(absolute values are taken first, then the measure)sd
(standard deviation)sdabs
(absolute values are taken first, then the measure)skewness
skewnessabs
(absolute values are taken first, then the measure)kurtosis
kurtosisabs
(absolute values are taken first, then the measure)CoeffVar
(coefficient of variation)CoeffVarabs
(absolute values are taken first, then the measure)
If you have a numeric X-Variable you can choose to display your plot on a higher grain datetime. For example, if you have daily data and you are looking to build a barplot time series, you can switch the default date aggregate parameter from "as-is" to "month" to display monthly aggregated time series.
For numeric variables you can choose to have them transformed automatically
Asinh
: inverse hyperbolic sineLog
: natural logarithmLogPlus1
(natural log(x + absolute value of minimum value if min value is negative))Sqrt
: square rootAsin
: inverse sineLogit
BoxCox
YeoJohnson
In the plotting panel you simply click on the top buttons (e.g. Plot 1, Plot 2, ...) and select a plot type from the dropdown menu. Then you click the button below to fill out the necessary parameters for your plot. Lastly, drop the newly created box in the dragula pane and move it to the bottom row in order for it to display.
When you click the button below the plot type dropdown, a modal will appear with up to five tabs for inputs and selections:
- Data Selection Tab
- Axis Variables Tab
- Grouping Variables Tab (in most cases but not all)
- Filter Variables Tab
- Formatting Tab
The Data Selection tab is where you'll choose your dataset and number of records to display. The display record count is the number of records used for display purposes. For plots that require data aggregation display records won't typically matter but for non-aggregated data plots the records displayed are randomly sampled from your data right before the plot build occurs; not before any data preparation steps.
Axis variables: The Axis Variables tab is where you'll define your axis variables and any transformations you'd like applied. The modals are designed to only supply inputs that are actually used for the given plot type. For example, histogram plots only required variables to be defined across a single dimension (you can select more than one variable however), whereas with line plots, you'll need to defined an X-Axis variable (a date variable) and Y-Axis variables.
Transformations: Automatic transformations can be selected and generated for numeric variables during the data preparation process while the software builds the plots.
The Group Variables tab is where you'll optionally define up to 3 group variables and faceting selection (if applicable). Since multiple group variables are allowed for the plotting engine the group variables will be concatenated and the combined levels will be displayed. For each group variable you can select the levels you wish to have displayed. For faceting, you simply select the number of rows and columns desired to form the grid of your choice.
The Filter Variables tab is where you can optionally define filters for your data before having the plot displayed. You can select up to 4 filter variables, you'll define the logical operation you want conducted, and associated values based on the logical operation you selected.
The Formatting tab is where you can rename the plot title and axis titles. You can also select to have data values shown on the plots.
You can save you plotting setup to an html file. Just click the Save button after you've setup your plots. While you can setup a grid of output in the app the plots will be stacked on top of each other in the html file due to limited space. The only time this doesn't occur is for faceted plots, which are themselves a grid within a grid.
The Tables Viewer output tab allows you to views multiple tables stacked on top of each other. You can alter the number of records displayed, total records brought into the table, randomly sampled or not, and a few other formatting options. This can be useful for inspecting data after running some of the various tasks when you want to view new data or altered data.
The Exploratory Data Analysis Report can display a variety of data insights, by a group variable if desired, including:
- Data dictionary information
- Univariate statistics
- Univariate box plots
- Univariate bar plots
- Correlogram
- Trend line plots
The EDA Report can be generated by clicking the Save button on the EDA Output Panel either before or after generating the EDA info in app.
Data wrangling is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.
Category | Method |
---|---|
Shrink | Aggregate |
Subset Rows | |
Subset Columns | |
Sampling | |
Grow | Join |
Union | |
Dataset | Partition Data |
Sort Data | |
Remove Data | |
Model Data Prep | |
Pivot | Melt Data |
Cast Data | |
Columns | Type Casting |
Time Trend | |
Rename Columns | |
Concatenate Columns | |
Misc | Meta Programming |
Time Series Fill | |
Time Series Roll Fill |
Feature Engineering is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.
Category | Method |
---|---|
Numeric | Percent Rank |
Standardize | |
Transformations | |
Interaction | |
Categorical | Character Encoding |
Partial Dummies | |
Calendar | Calendar Variables |
Holiday Variables | |
Windowing | Rolling Numeric |
Differencing | |
Rolling Categorical |
Unsupervised Learning is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.
Category | Method |
---|---|
Text | Word2Vec |
Text Summary | |
Sentiment | |
Readability | |
Lexical Diversity | |
Other | Clustering |
Anomaly Detection | |
Dimensionality Reduction |
Inference is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. Below are all of the available methods with descriptions about how to use each and every one for each of their intended uses.
- Normality Testing
- Correlation Testing
- One-Sample T-Test
- Two-Sample T-Test
- F-Test
- Chi-Square Test
The Inference Reports are dependent upon the inference method chosen. They all return summary statistics and visuals to help assess effects and assumptions.
ML is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. The documentation in-app contains information on each of the ML Algo types.
Currently available algorithms include:
- CatBoost
- XGBoost
- LightGBM
- H2O-DRF
- H2O-GBM
- H2O-GLM
- H2O-HGLM
- Causal Mediation
Some of the built-in features include:
- Automatic Transformations and backtransformations if user requests
- Data partitioning for train, validation, and test data sets if the user only supplies a training data set
- Categorical variable encoding and backtransform if the user supplies categorical variables as features
- Computation of model metrics for evaluation
- Data conversion to the structure appropriate for the given algorithm selected
- Multi-arm bandit grid tuning
The ML Evaluation Report can be generated by clicking the Save button on the ML Output Panel either before or after generating the ML info in app.
Forecasting is a vitally important aspect of this software. It's important that you know how to utilize the functionality as intended. The documentation in-app contains information on each of the Forecasting Algo types.
Currently available algorithms can be split into Single Series and Panel Series:
- TBATS
- SARIMA
- ETS
- ARFIMA
- NNET
- Grid Tuning
- Forecasting
- CatBoost
- XGBoost
- LightGBM
There are various Run modes to train, backtest, and forecast:
- Train Model: This is equivalent to building an ML model
- Retrain Existing Model: This is for retraining a model that's already been built before. Perhaps you simply want an updated model but not a new forecast at the moment
- Backtest: This task will train a new model (if FC ArgsList is not supplied) and generate an N-Period ahead forecast that will be evaluated using Validation Data supplied by the user. If you don't have a Validation dataset, go to Data Wrangling and subset rows based on a time variable. The subset data will be your Training Data and your original dataset will be the Validation Data
- Backtest Cross Evaluation: Once you have a good model designed you can mock production by running this procedure. Here, you'll set the data refresh rate and the model update rate. Performance measure are returned in a data.table once the procedure is finished.
- Feature Engineering Test: This task will loop through various builds starting from the most simple up to a moderately sophisticated model. An evaluation table is generated that you can view in the Tables tab when the procedure is complete. Evaluation metrics are based on the Backtest method. Features tested are below and are in order. If a feature is beneficial it will remain in the models trained thereafter:
LogPlus1 vs None: this will test whether a target variable transformation is beneficial
Series Difference vs None: this will test whether utilizing Differencing your series is useful
Calendar Variables vs None: this will test whether utilizing Calendar Variables is useful
Holiday Variable vs None: this will test whether utilizing Holiday Variables is useful
Credibility vs Target Encoding : this will test whether a target encoding is better than a credibility encoding
Time Weights vs None: this will test whether utilizing Time Weighting is useful
Anomaly Detection vs None: this will test whether utilizing Anomaly Detection is useful
Time Trend Variable vs None: this will test whether utilizing a Time Trend Variable is useful
Lag 1 vs None: this will test whether utilizing Lags are useful
- Forecast: if you have a trained model you can call it to generate a forecast for you
- Retrain + Forecast: if you have a model you can refresh it and have it generate a forecast for you
The FC Evaluation Report can be generated by clicking the Save button on the FC Output Panel either before or after generating the FC info in app.