diff --git a/README.md b/README.md
index 66bb69c..b20f750 100644
--- a/README.md
+++ b/README.md
@@ -20,13 +20,13 @@ cd internalization
    conda activate internalization
    ``` 
 
-- **Step 2.** Install the dependencies and download the datasets:
+- **Step 2.** Install the dependencies and download the data:
 
    ```bash
    pip install -r requirements.txt
-   # download the datasets from Google Drive
-   gdown --folder 'https://drive.google.com/drive/folders/1KQDClI3cbFzPhzfknF2xmtqE-aIW1EDf?usp=sharing'
+   mkdir -p datasets/cvdb  # make a folder for the dataset
    ```
+   Download the CVDB dataset from https://data.sciencespo.fr/dataset.xhtml?persistentId=doi:10.21410/7E4/RDAG3O# and unzip `cross-verified-database.csv` into the folder above.
 
 - **Step 3 (Optional).**
    Configure `wandb`: