carefree-learn 0.2.1
Release Notes
We're happy to announce that carefree-learn
released v0.2.x
, which made it capable of solving not only tabular tasks, but also other general deep learning tasks!
Introduction
Deep Learning with PyTorch made easy 🚀!
Like many similar projects, carefree-learn
can be treated as a high-level library to help with training neural networks in PyTorch. However, carefree-learn
does more than that.
carefree-learn
is highly customizable for developers. We have already wrapped (almost) every single functionality / process into a single module (a Python class), and they can be replaced or enhanced either directly from source codes or from local codes with the help of some pre-defined functions provided bycarefree-learn
(see Register Mechanism).carefree-learn
supports easy-to-use saving and loading. By default, everything will be wrapped into a.zip
file, andonnx
format is natively supported!carefree-learn
supports Distributed Training.
Apart from these, carefree-learn
also has quite a few specific advantages in each area:
Machine Learning 📈
carefree-learn
provides an end-to-end pipeline for tabular tasks, including AUTOMATICALLY deal with (this part is mainly handled bycarefree-data
, though):- Detection of redundant feature columns which can be excluded (all SAME, all DIFFERENT, etc).
- Detection of feature columns types (whether a feature column is string column / numerical column / categorical column).
- Imputation of missing values.
- Encoding of string columns and categorical columns (Embedding or One Hot Encoding).
- Pre-processing of numerical columns (Normalize, Min Max, etc.).
- And much more...
carefree-learn
can help you deal with almost ANY kind of tabular datasets, no matter how dirty and messy it is. It can be either trained directly with some numpy arrays, or trained indirectly with some files locate on your machine. This makescarefree-learn
stand out from similar projects.
When we say ANY, it means that carefree-learn
can even train on one single sample.
For example
import cflearn
toy = cflearn.ml.make_toy_model()
data = toy.data.cf_data.converted
print(f"x={data.x}, y={data.y}") # x=[[0.]], y=[[1.]]
This is especially useful when we need to do unittests or to verify whether our custom modules (e.g. custom pre-processes) are correctly integrated into carefree-learn
.
For example
import cflearn
import numpy as np
# here we implement a custom processor
@cflearn.register_processor("plus_one")
class PlusOne(cflearn.Processor):
@property
def input_dim(self) -> int:
return 1
@property
def output_dim(self) -> int:
return 1
def fit(self, columns: np.ndarray) -> cflearn.Processor:
return self
def _process(self, columns: np.ndarray) -> np.ndarray:
return columns + 1
def _recover(self, processed_columns: np.ndarray) -> np.ndarray:
return processed_columns - 1
# we need to specify that we use the custom process method to process our labels
toy = cflearn.ml.make_toy_model(cf_data_config={"label_process_method": "plus_one"})
data = toy.data.cf_data
y = data.converted.y
processed_y = data.processed.y
print(f"y={y}, new_y={processed_y}") # y=[[1.]], new_y=[[2.]]
There is one more thing we'd like to mention: carefree-learn
is Pandas-free. The reasons why we excluded Pandas are listed in carefree-data
.
Computer Vision 🖼️
carefree-learn
also provides an end-to-end pipeline for computer vision tasks, and:-
Supports native
torchvision
datasets.data = cflearn.cv.MNISTData(transform="to_tensor")
Currently only
mnist
is supported, but will add more in the future (if needed) ! -
Focuses on the
ImageFolderDataset
for customization, which:- Automatically splits the dataset into train & valid.
- Supports generating labels in parallel, which is very useful when calculating labels is time consuming.
See IFD introduction for more details.
-
carefree-learn
supports various kinds ofCallback
s, which can be used for saving intermediate visualizations / results.- For instance,
carefree-learn
implements anArtifactCallback
, which can dump artifacts to disk elaborately during training.
- For instance,
Examples
Machine Learning 📈 | Computer Vision 🖼️ |
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.api.fit_ml(x, y, carefree=True) |
import cflearn
data = cflearn.cv.MNISTData(batch_size=16, transform="to_tensor")
m = cflearn.api.resnet18_gray(10).fit(data) |
Please refer to Quick Start and Developer Guides for detailed information.
Migration Guide
From 0.1.x
to v0.2.x
, the design principle of carefree-learn
changed in two aspects:
- The
DataLayer
inv0.1.x
has changed to the more generalDataModule
inv0.2.x
. - The
Model
inv0.1.x
, which is constructed bypipe
s, has changed to generalModel
.
These changes are made because we want to make carefree-learn
compatible with general deep learning tasks (e.g. computer vision tasks).
Data Module
Internally, the Pipeline
will train & predict on DataModule
in v0.2.x
, but carefree-learn
also provided useful APIs to make user experiences as identical to v0.1.x
as possible:
Train
v0.1.x | v0.2.x |
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.make().fit(x, y) |
import cflearn
import numpy as np
x = np.random.random([1000, 10])
y = np.random.random([1000, 1])
m = cflearn.api.fit_ml(x, y, carefree=True) |
Predict
v0.1.x | v0.2.x |
predictions = m.predict(x) |
predictions = m.predict(cflearn.MLInferenceData(x)) |
Evaluate
v0.1.x | v0.2.x |
cflearn.evaluate(x, y, metrics=["mae", "mse"], pipelines=m) |
cflearn.ml.evaluate(cflearn.MLInferenceData(x, y), metrics=["mae", "mse"], pipelines=m) |
Model
It's not very straight forward to migrate models from v0.1.x
to v0.2.x
, so if you require such migration, feel free to submit an issue and we will analyze the problems case by case!