legistext-NY

R scripts for basic text analysis of New York State legislation: bills introduced in the State Senate or Assembly.
TidyBill.R reads a bill in .pdf format and converts it to a 'tidy' .csv format in which each text paragraph is accompanied by an outline tag usable to reference the paragraph and an optional list of keywords. Commonalities.R uses these keywords to identify paragraphs in a second bill that may be related to each paragraph in a first bill. Accompanying .pdf and .csv files illustrate capabilities of the Tidybill and Commonalities scripts.

Limitations

Bills in pdf format do not conform to any formal specification, so the scripts are using various heuristics to identify paragraph outline tags. They don't always get this right. The objective is only to provide a .csv file that can be made presentable with light editing.
Scripts don't do any special handling of edits appearing in the pdf file. Text is included if it's visible, even though it may be shown with strikethrough markings in the pdf file. Such text is usually framed by square brackets [] in the .csv file and can be found and deleted pretty easily.
The method for specifying the input files is crude. A Tkinter front-end would be nice.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
CC-BY-4.0.txt		CC-BY-4.0.txt
Commonalities.R		Commonalities.R
LICENSE.txt		LICENSE.txt
README.md		README.md
S2622-Progressive-Income-Tax-all-attribs.csv		S2622-Progressive-Income-Tax-all-attribs.csv
S2622-Progressive-Income-Tax.csv		S2622-Progressive-Income-Tax.csv
S2622-Progressive-Income-Tax.pdf		S2622-Progressive-Income-Tax.pdf
S3462-Inheritance-Gift-Estate-Taxes.pdf		S3462-Inheritance-Gift-Estate-Taxes.pdf
S3462-Inheritance-Gift-Estate-Taxes_S2622-Progressive-Income-Tax.csv		S3462-Inheritance-Gift-Estate-Taxes_S2622-Progressive-Income-Tax.csv
S4264A (CCIA)-all-attribs.csv		S4264A (CCIA)-all-attribs.csv
S4264A (CCIA).csv		S4264A (CCIA).csv
S4264A (CCIA).pdf		S4264A (CCIA).pdf
TidyBill.R		TidyBill.R
billOutline.R		billOutline.R
digestBill.R		digestBill.R
first1000.csv		first1000.csv
legistext-NY.Rproj		legistext-NY.Rproj
s9417-Climate-Change-Superfund-Act-all-attribs.csv		s9417-Climate-Change-Superfund-Act-all-attribs.csv
s9417-Climate-Change-Superfund-Act.csv		s9417-Climate-Change-Superfund-Act.csv
s9417-Climate-Change-Superfund-Act.pdf		s9417-Climate-Change-Superfund-Act.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

legistext-NY

Limitations

About

Releases

Packages

Languages

License

ekoski1/legistext-NY

Folders and files

Latest commit

History

Repository files navigation

legistext-NY

Limitations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages