PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:

Low Resistance Useability
Easy Customization
Scalable and Easier to Deploy

It has been built on the shoulders of giants like PyTorch(obviously), and PyTorch Lightning.

Installation

Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from here, picking up the right CUDA version for your machine.

Once, you have got Pytorch installed, just use:

 pip install pytorch_tabular[all]

to install the complete library with extra dependencies.

And :

 pip install pytorch_tabular

for the bare essentials.

The sources for pytorch_tabular can be downloaded from the Github repo_.

You can either clone the public repository:

git clone git://github.com/manujosephv/pytorch_tabular

Once you have a copy of the source, you can install it with:

python setup.py install

Documentation

For complete Documentation with tutorials visit []

Available Models

FeedForward Network with Category Embedding is a simple FF network, but with and Embedding layers for the categorical columns.
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets.
TabNet: Attentive Interpretable Tabular Learning is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output.

To implement new models, see the How to implement new models tutorial. It covers basic as well as advanced architectures.

Usage

from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig, ExperimentConfig

data_config = DataConfig(
    target=['target'], #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
    auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
    batch_size=1024,
    max_epochs=100,
    gpus=1, #index of the GPU to use. 0, means CPU
)
optimizer_config = OptimizerConfig()

model_config = CategoryEmbeddingModelConfig(
    task="classification",
    layers="1024-512-512",  # Number of nodes in each layer
    activation="LeakyReLU", # Activation between each layers
    learning_rate = 1e-3
)

tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_from_checkpoint("examples/basic")

Blog

PyTorch Tabular – A Framework for Deep Learning for Tabular Data

Future Roadmap(Contributions are Welcome)

Add GaussRank as Feature Transformation
Add ability to use custom activations in CategoryEmbeddingModel
Add differential dropouts(layer-wise) in CategoryEmbeddingModel
Add Fourier Encoding for cyclic time variables
Integrate Optuna Hyperparameter Tuning
Add Text and Image Modalities for mixed modal problems
Integrate Wide and Deep model
Integrate TabTransformer

References and Citations

[1] Sergei Popov, Stanislav Morozov, Artem Babenko. "Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data". arXiv:1909.06312 [cs.LG] (2019)

[2] Sercan O. Arik, Tomas Pfister;. "TabNet: Attentive Interpretable Tabular Learning". arXiv:1908.07442 (2019).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Table of Contents

Installation

Documentation

Available Models

Usage

Blog

Future Roadmap(Contributions are Welcome)

References and Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Table of Contents

Installation

Documentation

Available Models

Usage

Blog

Future Roadmap(Contributions are Welcome)

References and Citations