Create a dataset out of the PDFs content #3
Labels
enhancement
New feature or request
good first issue
Good for newcomers
hacktoberfest 🍁
https://hacktoberfest.digitalocean.com/
help wanted
Extra attention is needed
As part of the corpus creation process, the PDF content should be converted to text, and aggregated together into a large dataset.
This dataset should be stored into the
data/papers/processed
folder, and the script that creates it should be saved undersrc/papers/data/make_dataset.py
file.The text was updated successfully, but these errors were encountered: