docker build -t srw-generate-data-assets -f docker\generate-data-assets.Dockerfile .
docker run \
--rm \
-v %cd%\data\magdeburg-8:/app/input-dir:ro \
-v %cd%\src\assets\electoral-periods\magdeburg-8:/app/output-dir \
-v %cd%\data\Magdeburg.json:/app/Magdeburg.json:ro \
srw-generate-data-assets
docker build -t srw-generate-paper-assets -f docker\generate-paper-assets.Dockerfile .
docker run \
--rm \
-v %cd%\data\Magdeburg.json:/app/Magdeburg.json:ro \
-v %cd%\output\papers:/app/papers:ro \
-v %cd%\src\assets\papers:/app/generated \
srw-generate-paper-assets
docker build -t srw-generate-routes-file -f docker\generate-routes-file.Dockerfile .
docker run \
--rm \
-v %cd%\data:/app/data:ro \
-v %cd%:/app/generated \
srw-generate-routes-file
docker build -t srw-download-paper-files -f docker\download-paper-files.Dockerfile .
docker run \
--rm \
-v %cd%\output\papers\2024:/app/papers \
-v %cd%\data\Magdeburg.json:/app/Magdeburg.json:ro \
srw-download-paper-files \
2024
docker build -t srw-tika -f docker\tika-batch-extract.Dockerfile .
When running the container, the input and output folders have to be provided as volume mounts. The input folder should contain the pdf files to be processed. The output folder will contain the extracted text files.
docker run \
--rm \
-v %cd%\output\papers\2023:/input \
-v %cd%\output\papers\2023-extracted:/output \
srw-tika
This tool scans the voting images and generates a json file containing the voting data.
docker build -t srw-scan-voting-images -f docker\scan-voting-images.Dockerfile .
docker run \
--rm \
-v %cd%\data\magdeburg-7\2022-09-01\config-2022-09-01.json:/app/session-config.json:ro \
-v %cd%\sessions-media-files\2022-09-01:/app/voting-images:ro \
-v %cd%\output\sessions-scan-results\2022-09-01:/app/output \
srw-scan-voting-images \
2022-09-01
docker build -t srw-index-search -f docker\index-search.Dockerfile .
Typesense connection information have to be provided by setting the following environment variables:
TYPESENSE_SERVER_URL
TYPESENSE_API_KEY
TYPESENSE_COLLECTION_NAME
docker run \
--rm \
-e TYPESENSE_SERVER_URL=http://host.docker.internal:8108 \
-e TYPESENSE_COLLECTION_NAME=papers-and-speeches-0001 \
-e TYPESENSE_API_KEY=abc123 \
-v %cd%\data\Magdeburg.json:/app/Magdeburg.json:ro \
-v %cd%\output\papers\all-extracted:/app/papers-content:ro \
-v %cd%\data:/app/electoral-periods:ro \
srw-index-search
This tool parses multiple rttm files (from one session) and generates a single json file containing all speakers data.
docker build -t srw-parse-speakers -f docker\parse-speakers.Dockerfile .
docker run \
--rm \
-v %cd%\sessions-media-files\2022-09-01:/app/input:ro \
-v %cd%\output\sessions-scan-results\2022-09-01:/app/output \
srw-parse-speakers \
2022-09-01
docker build -t srw-speech-to-text -f docker\speech-to-text.Dockerfile .
OpenAI API key has to be provided by setting the following environment variable:
OPENAI_ORGANIZATION_ID
OPENAI_PROJECT_ID
OPENAI_API_KEY
docker run \
--rm \
-e OPENAI_ORGANIZATION_ID=<OpenAI organization id> \
-e OPENAI_PROJECT_ID=<OpenAI project id> \
-e OPENAI_API_KEY=<OpenAI api key> \
-v %cd%\output\sessions-scan-results\2022-09-01:/app/output \
-v %cd%\output\speeches\2022-09-01:/app/speeches:ro \
srw-speech-to-text \
2022-09-01
docker build -t srw-stadtratwatch-web -f docker\stadtrat-watch-web.Dockerfile .
docker run --rm -p 8080:80 srw-stadtratwatch-web