Evaporate

Code, datasets, and extended writeup for paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes".

Setup

We encourage the use of conda environments:

conda create --name evaporate python=3.8
conda activate evaporate

Clone as follows:

# Evaporate code
git clone git@github.com:HazyResearch/evaporate.git
cd evaporate
pip install -r requirements.txt

# Weak supervision code
cd metal-evap
git submodule init
git submodule update
pip install -e .

# Manifest 
git clone git@github.com:HazyResearch/manifest.git
cd manifest
pip install -e .

Datasets

The data used in the paper is hosted on HuggingFace's datasets platform: https://huggingface.co/datasets/hazyresearch/evaporate.

To download the datasets, run the following commands in your terminal:

git lfs install
git clone https://huggingface.co/datasets/hazyresearch/evaporate

Or download it via Python:

from datasets import load_dataset
dataset = load_dataset("hazyresearch/evaporate")

Extended write-up

The extended write-up is included in this Github repository at this URL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Evaporate

Setup

Datasets

Extended write-up

Files

README.md

Latest commit

History

README.md

File metadata and controls

Evaporate

Setup

Datasets

Extended write-up