This repo aims to make Taiwan laws easy to process by computer geeks. The output includes:
- Law article in JSON
- Law change history in JSON
- Script to create Git repo of laws
- Progress of law motion in JSON
For single law
% npm run prepublish && ./node_modules/.bin/lsc law2json.ls --outdir output/law/json data/law/憲法/中華民國憲法
..or all
% npm run prepublish && find data/law -type d -depth 2 -exec ./node_modules/.bin/lsc law2json.ls --outdir output/law/json {} +
% (cd output/law && git init)
% for dir in `find output/law/json -type d -depth 2`; do
./json2git.py $dir/law_history.json output/law
done
% git remote add github git@github.com:victorhsieh/tw-law-corpus.git
% git push -f github master
% git push github refs/notes/*
The source pages are committed so that we can check for updates. But if you need to fetch the source yourself, here is the instruction
- http://lis.ly.gov.tw/lgcgi/lglaw -> 分類瀏覽 -> 任意一筆
- set $PORTAL env variable to the url
% ./prepare_categories.sh # probably need to update the link manually
% ./fetcher.sh data/file-link.txt
Fetch a single category
% npm run prepublish && ./node_modules/.bin/lsc prepare_law.ls --cat 憲法 --dir data/law
% ./fetcher.sh data/law/憲法/file-link.tsv
..or fetch every categories
% npm run prepublish
% for cat in data/law/*; do
./node_modules/.bin/lsc prepare_law.ls --cat `basename $cat` --dir data/law
./fetcher.sh $cat/file-link.tsv
done
Generate file-link-all-revision.tsv for single category
% npm run prepublish && ./node_modules/.bin/lsc prepare_law_all_revision.ls --cat 憲法 --dir data/law
% ./fetcher.sh -r data/law/憲法/file-link-all-revision.tsv
..or for all categories
% npm run prepublish
% for cat in data/law/*; do
./node_modules/.bin/lsc prepare_law_all_revision.ls --cat `basename $cat` --dir data/law
./fetcher.sh -r $cat/file-link-all-revision.tsv
done
Currently it's done manually.
- Open http://lis.ly.gov.tw/lgcgi/ttsweb?@0:0:1:lgmempropg08@@0
- 選第一個會期、到最後一個, then search
- click 詳目顯示、依提案日期「遞增」
- open javascript console
- localStorage['page'] = 1
- document.querySelector('select[name="_TTS.DISPLAYPAGE"]').options[0].value = 200; document.querySelector('input[name="_TTS.PGTOP"]').value = 200*(localStorage['page']-1)+1; localStorage['page']++; document.querySelector('input[name="_IMG_顯示結果"]').click();
- document.querySelector('input[name="_IMG_本頁全部"]').click();
- repeat 2 until you got every pages.
- rename downloads to data/progress/8/ad-8-$N.txt
% ./node_modules/.bin/lsc parse_progress.ls --ad 8 > progress.json
# one record per line for mongodbimport
% ./node_modules/.bin/lsc parse_progress.ls --ad 8 --newline > progress.json
% npm install
- http://lis.ly.gov.tw/lgcgi/lglaw -> 分類瀏覽 -> 任意一筆
- set $PORTAL env variable to the url
you can use $PORTAL or default value in tasks/prepare_categories.ls
L12.
% gulp
default is gulp prepare_categories
Fetch single categories
% gulp fetch:single --cat 主計-會計
Fetcg all categories
% gulp fetch:all
Fetch single categories
% gulp fetch:single_revision --cat 主計-會計
Fetcg all categories
% gulp fetch:all_revision
Convert single law
% gulp json:single --name 會計師法
COnvert all laws
% gulp json:all