Skip to content

carefree-learn 0.2.1

Compare
Choose a tag to compare
@carefree0910 carefree0910 released this 29 Oct 05:39
· 1614 commits to dev since this release

Release Notes

We're happy to announce that carefree-learn released v0.2.x, which made it capable of solving not only tabular tasks, but also other general deep learning tasks!

Introduction

Deep Learning with PyTorch made easy 🚀!

Like many similar projects, carefree-learn can be treated as a high-level library to help with training neural networks in PyTorch. However, carefree-learn does more than that.

  • carefree-learn is highly customizable for developers. We have already wrapped (almost) every single functionality / process into a single module (a Python class), and they can be replaced or enhanced either directly from source codes or from local codes with the help of some pre-defined functions provided by carefree-learn (see Register Mechanism).
  • carefree-learn supports easy-to-use saving and loading. By default, everything will be wrapped into a .zip file, and onnx format is natively supported!
  • carefree-learn supports Distributed Training.

Apart from these, carefree-learn also has quite a few specific advantages in each area:

Machine Learning 📈

  • carefree-learn provides an end-to-end pipeline for tabular tasks, including AUTOMATICALLY deal with (this part is mainly handled by carefree-data, though):
    • Detection of redundant feature columns which can be excluded (all SAME, all DIFFERENT, etc).
    • Detection of feature columns types (whether a feature column is string column / numerical column / categorical column).
    • Imputation of missing values.
    • Encoding of string columns and categorical columns (Embedding or One Hot Encoding).
    • Pre-processing of numerical columns (Normalize, Min Max, etc.).
    • And much more...
  • carefree-learn can help you deal with almost ANY kind of tabular datasets, no matter how dirty and messy it is. It can be either trained directly with some numpy arrays, or trained indirectly with some files locate on your machine. This makes carefree-learn stand out from similar projects.

When we say ANY, it means that carefree-learn can even train on one single sample.

For example

import cflearn

toy = cflearn.ml.make_toy_model()
data = toy.data.cf_data.converted
print(f"x={data.x}, y={data.y}")  # x=[[0.]], y=[[1.]]


This is especially useful when we need to do unittests or to verify whether our custom modules (e.g. custom pre-processes) are correctly integrated into carefree-learn.

For example

import cflearn
import numpy as np

# here we implement a custom processor
@cflearn.register_processor("plus_one")
class PlusOne(cflearn.Processor):
    @property
    def input_dim(self) -> int:
        return 1

    @property
    def output_dim(self) -> int:
        return 1

    def fit(self, columns: np.ndarray) -> cflearn.Processor:
        return self

    def _process(self, columns: np.ndarray) -> np.ndarray:
        return columns + 1

    def _recover(self, processed_columns: np.ndarray) -> np.ndarray:
        return processed_columns - 1

# we need to specify that we use the custom process method to process our labels
toy = cflearn.ml.make_toy_model(cf_data_config={"label_process_method": "plus_one"})
data = toy.data.cf_data
y = data.converted.y
processed_y = data.processed.y
print(f"y={y}, new_y={processed_y}")  # y=[[1.]], new_y=[[2.]]

There is one more thing we'd like to mention: carefree-learn is Pandas-free. The reasons why we excluded Pandas are listed in carefree-data.


Computer Vision 🖼️

  • carefree-learn also provides an end-to-end pipeline for computer vision tasks, and:
    • Supports native torchvision datasets.

      data = cflearn.cv.MNISTData(transform="to_tensor")

      Currently only mnist is supported, but will add more in the future (if needed) !

    • Focuses on the ImageFolderDataset for customization, which:

      • Automatically splits the dataset into train & valid.
      • Supports generating labels in parallel, which is very useful when calculating labels is time consuming.

      See IFD introduction for more details.

  • carefree-learn supports various kinds of Callbacks, which can be used for saving intermediate visualizations / results.
    • For instance, carefree-learn implements an ArtifactCallback, which can dump artifacts to disk elaborately during training.

Examples

Machine Learning 📈 Computer Vision 🖼️
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.api.fit_ml(x, y, carefree=True)

import cflearn
data = cflearn.cv.MNISTData(batch_size=16, transform="to_tensor")
m = cflearn.api.resnet18_gray(10).fit(data)

Please refer to Quick Start and Developer Guides for detailed information.

Migration Guide

From 0.1.x to v0.2.x, the design principle of carefree-learn changed in two aspects:

Framework

  • The DataLayer in v0.1.x has changed to the more general DataModule in v0.2.x.
  • The Model in v0.1.x, which is constructed by pipes, has changed to general Model.

These changes are made because we want to make carefree-learn compatible with general deep learning tasks (e.g. computer vision tasks).

Data Module

Internally, the Pipeline will train & predict on DataModule in v0.2.x, but carefree-learn also provided useful APIs to make user experiences as identical to v0.1.x as possible:

Train

v0.1.x v0.2.x
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.make().fit(x, y)

import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.api.fit_ml(x, y, carefree=True)

Predict

v0.1.x v0.2.x
predictions = m.predict(x)
predictions = m.predict(cflearn.MLInferenceData(x))

Evaluate

v0.1.x v0.2.x
cflearn.evaluate(x, y, metrics=["mae", "mse"], pipelines=m)
cflearn.ml.evaluate(cflearn.MLInferenceData(x, y), metrics=["mae", "mse"], pipelines=m)

Model

It's not very straight forward to migrate models from v0.1.x to v0.2.x, so if you require such migration, feel free to submit an issue and we will analyze the problems case by case!