Skip to content

Commit

Permalink
Improving README. (#1308)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Port landing paras to docs index.rst.
  • Loading branch information
concretevitamin authored Oct 28, 2022
1 parent 6e332b5 commit edae25a
Show file tree
Hide file tree
Showing 2 changed files with 82 additions and 54 deletions.
103 changes: 63 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,67 +5,83 @@
</picture>
</p>

[![Join Slack](https://img.shields.io/badge/SkyPilot-Join%20Slack-blue?logo=slack)](https://join.slack.com/t/skypilot-org/shared_invite/zt-1i4pa7lyc-g6Lo4_rqqCFWOSXdvwTs3Q)
![pytest](https://github.com/skypilot-org/skypilot/actions/workflows/pytest.yml/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/skypilot/badge/?version=latest)](https://skypilot.readthedocs.io/en/latest/?badge=latest)
<p align="center">
<a href="https://skypilot.readthedocs.io/en/latest/">
<img alt="Documentation" src="https://readthedocs.org/projects/skypilot/badge/?version=latest">
</a>

<a href="https://github.com/skypilot-org/skypilot/releases">
<img alt="GitHub Release" src="https://img.shields.io/github/release/skypilot-org/skypilot.svg">
</a>

<a href="https://join.slack.com/t/skypilot-org/shared_invite/zt-1i4pa7lyc-g6Lo4_rqqCFWOSXdvwTs3Q">
<img alt="Join Slack" src="https://img.shields.io/badge/SkyPilot-Join%20Slack-blue?logo=slack">
</a>

</p>


SkyPilot is a framework for easily running machine learning workloads[^1] on any cloud.
<h3 align="center">
Run jobs on any cloud, easily and cost effectively
</h3>

Use the clouds **easily** and **cost effectively**, without needing cloud infra expertise.
SkyPilot is a framework for easily and cost effectively running ML workloads[^1] on any cloud.

_Ease of use_
* **Run existing projects on the cloud** with zero code changes
* Use a **unified interface** to run on any cloud, without vendor lock-in (currently AWS, Azure, GCP)
* **Queue jobs** on one or multiple clusters
* **Automatic failover** to find scarce resources (GPUs) across regions and clouds
* **Use datasets on the cloud** like you would on a local file system
SkyPilot abstracts away cloud infra burden:
- Launch jobs & clusters on any cloud (AWS, Azure, GCP)
- Find scarce resources across zones/regions/clouds
- Queue jobs & use cloud object stores

_Cost saving_
* Run jobs on **spot instances** with **automatic recovery** from preemptions
* Hands-free cluster management: **automatically stopping idle clusters**
* One-click use of **TPUs**, for high-performance, cost-effective training
* Automatically benchmark and find the cheapest hardware for your job
SkyPilot cuts your cloud costs:
* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html): **3x cost savings** using spot VMs, with auto-recovery from preemptions
* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): hands-free cleanup of idle clusters
* [Benchmark](https://skypilot.readthedocs.io/en/latest/reference/benchmark/index.html): find best VM types for your jobs
* Optimizer: **2x cost savings** by auto-picking best prices across zones/regions/clouds

SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.

Install with pip (choose your clouds) or [from source](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html):
```
pip install "skypilot[aws,gcp,azure]"
```

## Getting Started
You can find our documentation [here](https://skypilot.readthedocs.io/en/latest/).
- [Installation](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)
- [Quickstart](https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html)
- [CLI reference](https://skypilot.readthedocs.io/en/latest/reference/cli.html)

## Example SkyPilot Task
## SkyPilot in 1 minute

A SkyPilot task specifies: resource requirements, data to be synced, setup commands, and the task commands.

Once written in this [**unified interface**](https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html) (YAML or Python API), the task can be launched on any available cloud.
Once written in this [**unified interface**](https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html) (YAML or Python API), the task can be launched on any available cloud. This avoids vendor lock-in, and allows easily moving jobs to a different provider.

Example:
Paste the following into a file `my_task.yaml`:

```yaml
# my_task.yaml
resources:
# 1x NVIDIA V100 GPU
accelerators: V100:1
accelerators: V100:1 # 1x NVIDIA V100 GPU

# Number of VMs to launch in the cluster
num_nodes: 1
num_nodes: 1 # Number of VMs to launch

# Working directory (optional) containing the project codebase.
# Its contents are synced to ~/sky_workdir/ on the cluster.
workdir: ~/torch_examples

# Commands to be run before executing the job
# Commands to be run before executing the job.
# Typical use: pip install -r requirements.txt, git clone, etc.
setup: |
pip install torch torchvision
# Commands to run as a job
# Typical use: make use of resources, such as running training.
# Commands to run as a job.
# Typical use: launch the main program.
run: |
cd mnist
python main.py --epochs 1
```
Prepare the workdir by cloning locally:
Prepare the workdir by cloning:
```bash
git clone https://github.com/pytorch/examples.git ~/torch_examples
```
Expand All @@ -74,10 +90,11 @@ Launch with `sky launch`:
```bash
sky launch my_task.yaml
```
SkyPilot will perform multiple actions for you:

SkyPilot then performs the heavy-lifting for you, including:
1. Find the lowest priced VM instance type across different clouds
2. Provision the VM
3. Copy the local contents of `workdir` to the VM
2. Provision the VM, with auto-failover if the cloud returned capacity errors
3. Sync the local `workdir` to the VM
4. Run the task's `setup` commands to prepare the VM for running the task
5. Run the task's `run` commands

Expand All @@ -86,20 +103,26 @@ SkyPilot will perform multiple actions for you:
</p>


See [**`examples`**](./examples) for more YAMLs that run popular ML frameworks on the cloud with one command (PyTorch/Distributed PyTorch, TensorFlow/Distributed TensorFlow, HuggingFace, JAX, Flax, Docker).
Refer to [Quickstart](https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html) to get started with SkyPilot.

## Learn more

Besides YAML, SkyPilot offers a corresponding [**Python API**](https://github.com/skypilot-org/skypilot/blob/master/sky/core.py) for programmatic use.
- [Documentation](https://skypilot.readthedocs.io/en/latest/)
- [Example: HuggingFace](https://skypilot.readthedocs.io/en/latest/getting-started/tutorial.html)
- [Tutorials](https://github.com/skypilot-org/skypilot-tutorial)
- [YAML reference](https://skypilot.readthedocs.io/en/latest/reference/yaml-spec.html)
- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), and [many more](./examples).

Refer to [Quickstart](https://skypilot.readthedocs.io/en/latest/getting-started/quickstart.html) for more on how to get started with SkyPilot.

## Issues, feature requests, and questions
We are excited to hear your feedback!
* For issues and feature requests, please [open a GitHub issue](https://github.com/skypilot-org/skypilot/issues/new).
* For questions, please use [GitHub Discussions](https://github.com/skypilot-org/skypilot/discussions).

## Issues, feature requests and questions
We are excited to hear your feedback! SkyPilot has two channels for engaging with the community - [GitHub Issues](https://github.com/skypilot-org/skypilot/issues) and [GitHub Discussions](https://github.com/skypilot-org/skypilot/discussions).
* For bug reports and issues, please [open an issue](https://github.com/skypilot-org/skypilot/issues/new).
* For feature requests or general questions, please join us on [GitHub Discussions](https://github.com/skypilot-org/skypilot/discussions).
For general discussions, join us on the [SkyPilot Slack](https://join.slack.com/t/skypilot-org/shared_invite/zt-1i4pa7lyc-g6Lo4_rqqCFWOSXdvwTs3Q).

## Contributing
We welcome and value all contributions to the project! Please refer to the [contribution guide](CONTRIBUTING.md) for more on how to get involved.
We welcome and value all contributions to the project! Please refer to [CONTRIBUTING](CONTRIBUTING.md) for how to get involved.

<!-- Footnote -->
[^1]: While SkyPilot is currently targeted at machine learning workloads, it supports and has been used for other general workloads. We're excited to hear about your use case and how we can better support your requirements - please join us in [this discussion](https://github.com/skypilot-org/skypilot/discussions/1016)!
[^1]: While SkyPilot is currently targeted at machine learning workloads, it supports and has been used for other general batch workloads. We're excited to hear about your use case and how we can better support your requirements; please join us in [this discussion](https://github.com/skypilot-org/skypilot/discussions/1016)!
33 changes: 19 additions & 14 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
SkyPilot Documentation
Welcome to SkyPilot!
=========================

.. figure:: ./images/skypilot-wide-light-1k.png
Expand All @@ -17,24 +17,29 @@ SkyPilot Documentation
<a class="github-button" href="https://github.com/skypilot-org/skypilot/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork skypilot-org/skypilot on GitHub">Fork</a>
</p>

SkyPilot is a framework for easily running machine learning workloads on any cloud.
<p style="text-align:center">
<strong>Run jobs on any cloud, easily and cost effectively</strong>
</p>

SkyPilot is a framework for easily and cost effectively running ML workloads on any cloud.

SkyPilot abstracts away cloud infra burden:

Use the clouds **easily** and **cost effectively**, without needing cloud infra expertise.
- Launch jobs & clusters on any cloud (AWS, Azure, GCP)
- Find scarce resources across zones/regions/clouds
- Queue jobs & use cloud object stores

*Ease of use*
SkyPilot cuts your cloud costs:

- **Run existing projects on the cloud** with zero code changes
- Use a **unified interface** to run on any cloud, without vendor lock-in (currently AWS, Azure, GCP)
- **Queue jobs** on one or multiple clusters
- **Automatic failover** to find scarce resources (GPUs) across regions and clouds
- **Use datasets on the cloud** like you would on a local file system
* :ref:`Managed Spot <Managed Spot Jobs>`: **3x cost savings** using spot VMs, with auto-recovery from preemptions
* :ref:`Autostop <Auto-stopping>`: hands-free cleanup of idle clusters
* :ref:`Benchmark <Benchmark>`: find best VM types for your jobs
* Optimizer: **2x cost savings** by auto-picking best prices across zones/regions/clouds

*Cost saving*
SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.

- Run jobs on **spot instances** with **automatic recovery** from preemptions
- Hands-free cluster management: **automatically stopping idle clusters**
- One-click use of **TPUs**, for high-performance, cost-effective training
- Automatically benchmark and find the cheapest hardware for your job
Documentation
--------------------------

.. toctree::
:maxdepth: 1
Expand Down

0 comments on commit edae25a

Please sign in to comment.