Code, datasets, and extended writeup for paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes".
We encourage the use of conda environments:
conda create --name evaporate python=3.8
conda activate evaporate
Clone as follows:
# Evaporate code
git clone git@github.com:HazyResearch/evaporate.git
cd evaporate
pip install -r requirements.txt
# Weak supervision code
cd metal-evap
git submodule init
git submodule update
pip install -e .
# Manifest
git clone git@github.com:HazyResearch/manifest.git
cd manifest
pip install -e .
The data used in the paper is hosted on HuggingFace's datasets platform: https://huggingface.co/datasets/hazyresearch/evaporate.
To download the datasets, run the following commands in your terminal:
git lfs install
git clone https://huggingface.co/datasets/hazyresearch/evaporate
Or download it via Python:
from datasets import load_dataset
dataset = load_dataset("hazyresearch/evaporate")
The extended write-up is included in this Github repository at this URL.