English | 简体中文
cube-studio is a one-stop cloud-native machine learning platform open sourced by Tencent Music, Currently mainly includes the following functions
- 1、data management: feature store, online and offline features; dataset management, structure data and media data, data label platform
- 2、develop: notebook(vscode/jupyter); docker image management; image build online
- 3、train: pipeline drag and drop online; open template market; distributed computing/training tasks, example tf/pytorch/mxnet/spark/ray/horovod/kaldi/volcano; batch priority scheduling; resource monitoring/alarm/balancing; cron scheduling
- 4、automl: nni, katib, ray
- 5、inference: model manager; serverless traffic control; tf/pytorch/onnx/tensorrt model deploy, tfserving/torchserver/onnxruntime/triton inference; VGPU; load balancing、high availability、elastic scaling
- 6、infra: multi-user; multi-project; multi-cluster; edge cluster mode; blockchain sharing;
https://github.com/tencentmusic/cube-studio/wiki
learning、deploy、consult、contribution、cooperation, join group, wechart id luanpeng1234 remark<open source>
, construction guide
tips:
- 1、You can develop your own template, Easy to develop and more suitable for your own scenarios
template | type | describe |
---|---|---|
linux | base | Custom stand-alone operating environment, free to implement all custom stand-alone functions |
datax | import export | Import and export of heterogeneous data sources |
media-download | data processing | Distributed download of media files |
video-audio | data processing | Distributed extraction of audio from video |
video-img | data processing | Distributed extraction of pictures from video |
sparkjob | data processing | spark serverless |
ray | data processing | python ray multi-machine distributed framework |
volcano | data processing | volcano multi-machine distributed framework |
xgb | machine learning | xgb model training and inference |
ray-sklearn | machine learning | sklearn based on ray framework supports multi-machine distributed parallel computing |
pytorchjob-train | model train | Multi-machine distributed training of pytorch |
horovod-train | model train | Multi-machine distributed training of horovod |
tfjob | model train | Multi-machine distributed training of tensorflow |
tfjob-train | model train | distributed training of tensorflow: plain and runner |
tfjob-runner | model train | distributed training of tensorflow: runner method |
tfjob-plain | model train | distributed training of tensorflow: plain method |
kaldi-train | model train | Multi-machine distributed training of kaldi |
tf-model-evaluation | model evaluate | distributed model evaluation of tensorflow2.3 |
tf-offline-predict | model inference | distributed offline model inference of tensorflow2.3 |
model-offline-predict | model inference | distributed offline model inference of framework |
deploy-service | model deploy | deploy inference service |
algorithm: @hujunaifuture @jaffe-fly @JLWLL @ma-chengcheng @chendile
platform:
@xiaoyangmai
@VincentWei2021
@SeibertronSS
@cyxnzb
@gilearn
@wulingling0108