GitHub - xorbitsai/xorbits: Scalable Python DS & ML, in an API compatible & lightning fast way.

What is Xorbits?

Xorbits is an open-source computing framework that makes it easy to scale data science and machine learning workloads — from data preprocessing to tuning, training, and model serving. Xorbits can leverage multi-cores or GPUs to accelerate computation on a single machine or scale out up to thousands of machines to support processing terabytes of data and training or serving large models.

Xorbits provides a suite of best-in-class libraries for data scientists and machine learning practitioners. Xorbits provides the capability to scale tasks without the necessity for extensive knowledge of infrastructure.

Xorbits features a familiar Python API that supports a variety of libraries, including pandas, NumPy, PyTorch, XGBoost, etc. With a simple modification of just one line of code, your pandas workflow can be seamlessly scaled using Xorbits:

Why Xorbits?

As ML and AI workloads continue to grow in complexity, the computational demands soar high. Even though single-node development environments like your laptop provide convenience, but they fall short when it comes to accommodating these scaling demands.

Seamlessly scale your workflow from laptop to cluster

To use Xorbits, you do not need to specify how to distribute the data or even know how many cores your system has. You can keep using your existing notebooks and still enjoy a significant speed boost from Xorbits, even on your laptop.

Process large datasets that pandas can't

Xorbits can leverage all of your computational cores. It is especially beneficial for handling larger datasets, where pandas may slow down or run out of memory.

Lightning-fast speed

According to our benchmark tests, Xorbits surpasses other popular pandas API frameworks in speed and scalability. See our performance comparison , explanation and research paper.

Leverage the Python ecosystem with native integrations

Xorbits aims to take full advantage of the entire ML ecosystem, offering native integration with pandas and other libraries.

Where to get it?

The source code is currently hosted on GitHub at: https://github.com/xorbitsai/xorbits

Binary installers for the latest released version are available at the Python Package Index (PyPI).

# PyPI
pip install xorbits

Other resources

License

Apache 2

Roadmaps

The main goals we want to achieve in the future include the following:

Transitioning from pandas native to arrow native for data storage
will reduce the memory cost substantially and is more friendly for compute engine.
Introducing native engines that leverage technologies like vectorization and codegen to accelerate computations.
Scale as many libraries and algorithms as possible!

More detailed roadmaps will be revealed soon. Stay tuned!

Relationship with Mars

The creators of Xorbits are mainly those of Mars, and we currently built Xorbits on Mars to reduce duplicated work, but the vision of Xorbits suggests that it's not appropriate to put everything on Mars. Instead, we need a new project to support the roadmaps better. In the future, we will replace some core internal components with other upcoming ones we will propose. Stay tuned!

Getting involved

Platform	Purpose
Github Issues	Reporting bugs and filing feature requests.
StackOverflow	Asking questions about how to use Xorbits.
Slack	Collaborating with other Xorbits users.

Citing Xorbits

If Xorbits could help you, please cite our paper using the following metadata:

@inproceedings{lu2024Xorbits,
  title = {Xorbits: Automating Operator Tiling for Distributed Data Science},
  shorttitle = {Xorbits},
  booktitle = {2024 {{IEEE}} 40th {{International Conference}} on {{Data Engineering}} ({{ICDE}})},
  author = {Lu, Weizheng and He, Kaisheng and Qin, Xuye and Li, Chengjie and Wang, Zhong and Yuan, Tao and Liao, Xia and Zhang, Feng and Chen, Yueguo and Du, Xiaoyong},
  year = {2024},
  month = may,
  pages = {5211--5223},
  issn = {2375-026X},
  doi = {10.1109/ICDE60146.2024.00392},
}

Name		Name	Last commit message	Last commit date
Latest commit History 463 Commits
.github		.github
CI		CI
asv		asv
benchmarks		benchmarks
doc		doc
python		python
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Xorbits?

Why Xorbits?

Seamlessly scale your workflow from laptop to cluster

Process large datasets that pandas can't

Lightning-fast speed

Leverage the Python ecosystem with native integrations

Where to get it?

Other resources

License

Roadmaps

Relationship with Mars

Getting involved

Citing Xorbits

About

Releases 29

Packages

Contributors 34

Languages

License

xorbitsai/xorbits

Folders and files

Latest commit

History

Repository files navigation

What is Xorbits?

Why Xorbits?

Seamlessly scale your workflow from laptop to cluster

Process large datasets that pandas can't

Lightning-fast speed

Leverage the Python ecosystem with native integrations

Where to get it?

Other resources

License

Roadmaps

Relationship with Mars

Getting involved

Citing Xorbits

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 29

Packages 0

Contributors 34

Languages

Packages