Ever wondered what it would look like if Australian Legislation was available in git / Github?
Picking up from where the original by xlfe ended.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Text is extracted, but there's still some weird formatting and additional style info, and still missing much of the structure (no table conversion is attempted)
- Get a list of all current acts and their ComLawID acts_current.txt
- Update spider.py to correctly crawl the legislation for updates.
- Get a list of all the RTF/DOC/DOCx versions and volumes of those acts details_current.json
- Download all the relevant RTF/DOC/DOCx files Amazon S3
- Extract structure of documents and convert to Markdown (in progress)
- Read DOCx format and extract indent and font sizes
- Convert these to markdown indents and heading size
- Extract table structures
- Write to markdown using historical git commit based on date legislation came into force
- Access historical / series of act for history
- Create presentation site
- Automate/schedule the gathering, conversion, and upload of new acts.
- spider.py Crawl legislation by year and get the ComLawID
- download.py Get the legislation detail form the ComLawID
- convert.py The actual conversion to Markdown (messy!)