Gutenberg-parser

A set of R scripts that process and deconstruct Project Gutenberg plain utf-8 text files into a Hashmap of sentences.

Steps to replicate:

1: Obtain any utf-8 .txt book file from the website

2: Place .txt files in the same directory as R files

3: Rename and tweak filename code in both Break.R and Clean.R (TODO: automate this process)

4: Run Clean.R first, then Break.R

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
oldAlgorithm		oldAlgorithm
296-0.txt		296-0.txt
57.txt		57.txt
Break.R		Break.R
Clean.R		Clean.R
README.md		README.md
pg19033.txt		pg19033.txt
pg23661.txt		pg23661.txt
reformattedX.json		reformattedX.json

Provide feedback