ray-project · Yard1 · May 8, 2023 · Mar 13, 2023 · Mar 13, 2023 · Mar 13, 2023
@@ -92,6 +92,6 @@ Built-in Predictors for Library Integrations
     ~lightgbm.LightGBMPredictor
     ~tensorflow.TensorflowPredictor
     ~torch.TorchPredictor
-    ~huggingface.HuggingFacePredictor
+    ~hf_transformers.TransformersPredictor
     ~sklearn.SklearnPredictor
     ~rl.RLPredictor
@@ -5,7 +5,7 @@
 
 import ray
 from ray.air import session, Checkpoint
-from ray.train.huggingface.accelerate import AccelerateTrainer
+from ray.train.hf_accelerate import AccelerateTrainer
 from ray.air.config import ScalingConfig
 
 

@@ -9,7 +9,7 @@
 from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
 
 import ray
-from ray.train.huggingface import HuggingFaceTrainer
+from ray.train.hf_transformers import TransformersTrainer
 from ray.air.config import ScalingConfig
 
 
@@ -81,7 +81,7 @@ def trainer_init_per_worker(train_dataset, eval_dataset, **config):
 
 
 scaling_config = ScalingConfig(num_workers=3, use_gpu=use_gpu)
-trainer = HuggingFaceTrainer(
+trainer = TransformersTrainer(
     trainer_init_per_worker=trainer_init_per_worker,
     scaling_config=scaling_config,
     datasets={"train": ray_train_ds, "evaluation": ray_evaluation_ds},

@@ -402,16 +402,17 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "### Fine-tuning the model with Ray AIR <a name=\"train\"></a>\n",
     "\n",
-    "We can now configure Ray AIR's {class}`~ray.train.huggingface.huggingface_trainer.HuggingFaceTrainer` to perform distributed fine-tuning of the model. In order to do that, we specify a `trainer_init_per_worker` function, which creates a 🤗 Transformers `Trainer` that will be distributed by Ray using Distributed Data Parallelism (using PyTorch Distributed backend internally). This means that each worker will have its own copy of the model, but operate on different data, At the end of each step, all the workers will sync gradients.\n",
+    "We can now configure Ray AIR's {class}`~ray.train.hf_transformers.TransformersTrainer` to perform distributed fine-tuning of the model. In order to do that, we specify a `trainer_init_per_worker` function, which creates a 🤗 Transformers `Trainer` that will be distributed by Ray using Distributed Data Parallelism (using PyTorch Distributed backend internally). This means that each worker will have its own copy of the model, but operate on different data, At the end of each step, all the workers will sync gradients.\n",
     "\n",
     "Because GPT-J is a relatively large model, it may not be possible to fit it on smaller GPU types (<=16 GB GRAM). To deal with that issue, we can use [DeepSpeed](https://github.com/microsoft/DeepSpeed), a library to optimize the training process and allow us to (among other things) offload and partition optimizer and parameter states, reducing GRAM usage. Furthermore, DeepSpeed ZeRO Stage 3 allows us to load large models without running out of memory.\n",
     "\n",
-    "🤗 Transformers and Ray AIR's integration ({class}`~ray.train.huggingface.huggingface_trainer.HuggingFaceTrainer`) allow you to easily configure and use DDP and DeepSpeed. All you need to do is specify the DeepSpeed configuration in the [`TrainingArguments`](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments) object.\n",
+    "🤗 Transformers and Ray AIR's integration ({class}`~ray.train.hf_transformers.TransformersTrainer`) allow you to easily configure and use DDP and DeepSpeed. All you need to do is specify the DeepSpeed configuration in the [`TrainingArguments`](https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments) object.\n",
     "\n",
     "```{tip}\n",
     "There are many DeepSpeed settings that allow you to trade-off speed for memory usage. The settings used below are tailored to the cluster setup used (16 g4dn.4xlarge nodes) and per device batch size of 16. Some things to keep in mind:\n",
@@ -564,7 +565,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "With our `trainer_init_per_worker` complete, we can now instantiate the {class}`~ray.train.huggingface.huggingface_trainer.HuggingFaceTrainer`. Aside from the function, we set the `scaling_config`, controlling the amount of workers and resources used, and the `datasets` we will use for training and evaluation.\n",
+    "With our `trainer_init_per_worker` complete, we can now instantiate the {class}`~ray.train.hf_transformers.TransformersTrainer`. Aside from the function, we set the `scaling_config`, controlling the amount of workers and resources used, and the `datasets` we will use for training and evaluation.\n",
     "\n",
     "We pass the preprocessors we have defined earlier as an argument, wrapped in a {class}`~ray.data.preprocessors.chain.Chain`. The preprocessor will be included with the returned {class}`~ray.air.checkpoint.Checkpoint`, meaning it will also be applied during inference.\n",
     "\n",
@@ -579,12 +580,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from ray.train.huggingface import HuggingFaceTrainer\n",
+    "from ray.train.hf_transformers import TransformersTrainer\n",
     "from ray.air.config import ScalingConfig\n",
     "from ray.data.preprocessors import Chain\n",
     "\n",
     "\n",
-    "trainer = HuggingFaceTrainer(\n",
+    "trainer = TransformersTrainer(\n",
     "    trainer_init_per_worker=trainer_init_per_worker,\n",
     "    trainer_init_config={\n",
     "        \"batch_size\": 16,  # per device\n",
@@ -601,10 +602,11 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Finally, we call the {meth}`~ray.train.huggingface.huggingface_trainer.HuggingFaceTrainer.fit` method to start training with Ray AIR. We will save the {class}`~ray.air.Result` object to a variable so we can access metrics and checkpoints."
+    "Finally, we call the {meth}`~ray.train.hf_transformers.TransformersTrainer.fit` method to start training with Ray AIR. We will save the {class}`~ray.air.Result` object to a variable so we can access metrics and checkpoints."
    ]
   },
   {
@@ -642,7 +644,7 @@
        "<tr><th>Trial name                    </th><th>status    </th><th>loc              </th><th style=\"text-align: right;\">  iter</th><th style=\"text-align: right;\">  total time (s)</th><th style=\"text-align: right;\">  loss</th><th style=\"text-align: right;\">  learning_rate</th><th style=\"text-align: right;\">  epoch</th></tr>\n",
        "</thead>\n",
        "<tbody>\n",
-       "<tr><td>HuggingFaceTrainer_f623d_00000</td><td>TERMINATED</td><td>10.0.30.196:30861</td><td style=\"text-align: right;\">    85</td><td style=\"text-align: right;\">          2579.3</td><td style=\"text-align: right;\">0.0715</td><td style=\"text-align: right;\">    4.70588e-07</td><td style=\"text-align: right;\">      1</td></tr>\n",
+       "<tr><td>TransformersTrainer_f623d_00000</td><td>TERMINATED</td><td>10.0.30.196:30861</td><td style=\"text-align: right;\">    85</td><td style=\"text-align: right;\">          2579.3</td><td style=\"text-align: right;\">0.0715</td><td style=\"text-align: right;\">    4.70588e-07</td><td style=\"text-align: right;\">      1</td></tr>\n",
        "</tbody>\n",
        "</table>\n",
        "  </div>\n",
@@ -979,7 +981,7 @@
     {
      "data": {
       "text/plain": [
-       "HuggingFaceCheckpoint(local_path=/home/ray/ray_results/HuggingFaceTrainer_2023-03-06_16-35-29/HuggingFaceTrainer_f623d_00000_0_2023-03-06_16-35-30/checkpoint_000000)"
+       "TransformersCheckpoint(local_path=/home/ray/ray_results/TransformersTrainer_2023-03-06_16-35-29/TransformersTrainer_f623d_00000_0_2023-03-06_16-35-30/checkpoint_000000)"
       ]
      },
      "execution_count": 18,
@@ -998,13 +1000,13 @@
    "source": [
     "### Generate text from prompt\n",
     "\n",
-    "We can use the {class}`~ray.train.huggingface.huggingface_predictor.HuggingFacePredictor` to generate predictions from our fine-tuned model.\n",
+    "We can use the {class}`~ray.train.hf_transformers.huggingface_predictor.TransformersPredictor` to generate predictions from our fine-tuned model.\n",
     "\n",
     "```{tip}\n",
     "For large scale batch inference, consider configuring cloud checkpointing and then pass the cloud-backed {class}`~ray.air.checkpoint.Checkpoint` to {class}`~ray.train.batch_predictor.BatchPredictor`. More information [here](air-predictors).\n",
     "```\n",
     "\n",
-    "Because the {class}`~ray.train.huggingface.huggingface_predictor.HuggingFacePredictor` uses a 🤗 Transformers [`pipeline`](https://huggingface.co/docs/transformers/en/main_classes/pipelines) under the hood, we disable the tokenizer AIR Preprocessor we have used for training and let the `pipeline` to tokenize the data itself."
+    "Because the {class}`~ray.train.hf_transformers.huggingface_predictor.TransformersPredictor` uses a 🤗 Transformers [`pipeline`](https://huggingface.co/docs/transformers/en/main_classes/pipelines) under the hood, we disable the tokenizer AIR Preprocessor we have used for training and let the `pipeline` to tokenize the data itself."
    ]
   },
   {
@@ -1030,13 +1032,13 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from ray.train.huggingface import HuggingFacePredictor\n",
+    "from ray.train.hf_transformers import TransformersPredictor\n",
     "import pandas as pd\n",
     "\n",
     "prompts = pd.DataFrame([\"Romeo and Juliet\", \"Romeo\", \"Juliet\"], columns=[\"text\"])\n",
     "\n",
     "# Predict on the head node.\n",
-    "predictor = HuggingFacePredictor.from_checkpoint(\n",
+    "predictor = TransformersPredictor.from_checkpoint(\n",
     "    checkpoint=checkpoint,\n",
     "    task=\"text-generation\",\n",
     "    torch_dtype=torch.float16 if use_gpu else None,\n",