Create a dataset out of the PDFs content #3

liadmagen · 2018-10-12T10:30:13Z

As part of the corpus creation process, the PDF content should be converted to text, and aggregated together into a large dataset.

This dataset should be stored into the data/papers/processed folder, and the script that creates it should be saved under src/papers/data/make_dataset.py file.

The text was updated successfully, but these errors were encountered:

liadmagen added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers hacktoberfest 🍁 https://hacktoberfest.digitalocean.com/ labels Oct 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a dataset out of the PDFs content #3

Create a dataset out of the PDFs content #3

liadmagen commented Oct 12, 2018

Create a dataset out of the PDFs content #3

Create a dataset out of the PDFs content #3

Comments

liadmagen commented Oct 12, 2018