Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR] Preparing 2.0 Release #39

Open
wants to merge 103 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
103 commits
Select commit Hold shift + click to select a range
b95aba4
docs: drafting doc changes, docker as main distribution channel
Kaszanas Nov 11, 2024
c6a62c1
docs: adjusted readibility
Kaszanas Nov 11, 2024
776de34
docs: adjusted the description in processed mapping copier
Kaszanas Nov 11, 2024
e7486f4
docs: added full package names in README
Kaszanas Nov 11, 2024
ef5a36d
docs: simplified docs, sc2egset using docker
Kaszanas Nov 11, 2024
d5e28cb
refactor: no random uuid, using file hash in flattener
Kaszanas Nov 11, 2024
3d03363
docs: fixing typo in PR template
Kaszanas Nov 11, 2024
b5b9c9b
refactor: multiprocessing off in sc2_replaypack_processor
Kaszanas Nov 12, 2024
b7c31d2
refactor: renamed sc2_replaypack_processor -> sc2egset_replaypack
Kaszanas Nov 12, 2024
a7c84c2
docs: added link to citation at the top
Kaszanas Nov 12, 2024
3fd124e
perf: downloading maps as a pre-process step
Kaszanas Nov 13, 2024
b172e40
docs: added more README documentation, added TOC
Kaszanas Nov 17, 2024
82282ca
docs: formatting CONTRIBUTING
Kaszanas Nov 17, 2024
4aa2092
refactor: capitalized "AS" in docker
Kaszanas Nov 17, 2024
ec5cbe8
docs: drafted script README files with Docker
Kaszanas Nov 17, 2024
cd01672
Merge pull request #41 from Kaszanas/40-script-docker-docs
Kaszanas Nov 17, 2024
7a968ce
docs: updated all CLI Usage for scripts
Kaszanas Nov 17, 2024
fd8ad2a
fix: fixed log level, fixing path initialization
Kaszanas Nov 17, 2024
8b10ac8
fix: fixing glob issues, testing directory flattener
Kaszanas Nov 18, 2024
be6d51e
docs: solving #42 and #43, refined documentation
Kaszanas Nov 18, 2024
7088375
docs: removed redundant information from README
Kaszanas Nov 18, 2024
02870fc
docs: added generic information in README, editing
Kaszanas Nov 18, 2024
2b633c3
perf: directory_flattener, hash from filepath, added tqdm
Kaszanas Nov 18, 2024
a9e2bd4
fix: converting paths with click, changed target name
Kaszanas Nov 18, 2024
f601edf
docs: fixed READMEs after review
Kaszanas Nov 18, 2024
302a8ef
build: bumped dependency versions
Kaszanas Nov 20, 2024
6e76dfe
Merge pull request #45 from Kaszanas/44-bump-dependency-versions
Kaszanas Nov 20, 2024
67bbba0
refactor: renamed dir_packager to directory_packager
Kaszanas Nov 20, 2024
02a669e
Merge pull request #47 from Kaszanas/46-dir-packager-full-name
Kaszanas Nov 20, 2024
fe4a914
fix: fixing paths in Dockerfile
Kaszanas Nov 20, 2024
f222ec1
Merge branch 'dev' of https://github.com/Kaszanas/SC2DatasetPreparato…
Kaszanas Nov 20, 2024
a93e82d
fix: mounting curdir as a dot
Kaszanas Nov 20, 2024
227f080
Merge pull request #49 from Kaszanas/48-current-directory-docker
Kaszanas Nov 20, 2024
c00b38e
test: added dotenv to set TEST_WORKSPACE
Kaszanas Nov 20, 2024
046ff31
refactor: refreshed ci installing poetry
Kaszanas Nov 20, 2024
af14698
build: bumped poetry version in Dockerfile
Kaszanas Nov 20, 2024
e1b1349
test: commented out test, file_renamer_test not ready
Kaszanas Nov 20, 2024
b500fd6
feat: added default flag values for golang
Kaszanas Nov 20, 2024
7e8b72c
Merge pull request #52 from Kaszanas/51-set-default-flags-go
Kaszanas Nov 20, 2024
28ee746
fix: fixing imports in sc2reset
Kaszanas Nov 20, 2024
c6e5c49
test: added extractor arguments in test
Kaszanas Nov 20, 2024
a3a31c7
fix: fixing opening and writing to file
Kaszanas Nov 20, 2024
9203288
feat: sc2infoextractorgo executable path in settings
Kaszanas Nov 20, 2024
6d7447d
fix: fixing return value, removed range loop
Kaszanas Nov 20, 2024
06d5513
build: adjusted dockerfiles, copying files separately
Kaszanas Nov 20, 2024
748f840
feat: test workspace in .env
Kaszanas Nov 20, 2024
74236e0
test: adjusted test target in make
Kaszanas Nov 20, 2024
fba1bf2
Merge pull request #54 from Kaszanas/53-run-tests-fix-commands
Kaszanas Nov 20, 2024
9ce6193
fix: fixing pre-commit in dev docker
Kaszanas Nov 20, 2024
e3ba2b2
ci: removing volume from docker-test-compose
Kaszanas Nov 20, 2024
e39f5c1
build: copying CONTRIBUTING to dev docker image
Kaszanas Nov 20, 2024
7f67860
ci: adjusted TEST_COMMAND, not writing logs
Kaszanas Nov 20, 2024
391fe50
build: copying scripts to top in docker images
Kaszanas Nov 20, 2024
1ae85c5
Merge pull request #56 from Kaszanas/55-docker-copy-scripts-top-dir
Kaszanas Nov 20, 2024
41caf7a
docs: added info on pre-commit and commitizen, #34
Kaszanas Nov 20, 2024
985c6c2
docs: added information on code standards, #34
Kaszanas Nov 20, 2024
e0d82da
docs: updated all README files for scripts
Kaszanas Nov 20, 2024
cff767f
refactor: changed the processing dir structure
Kaszanas Nov 20, 2024
76a5416
refactor: adjusted make targets for sc2egset, removed unused param
Kaszanas Nov 20, 2024
6bce3cb
ci: added docker releases
Kaszanas Nov 20, 2024
e8faa7d
Merge pull request #58 from Kaszanas/57-docker-release-on-branch-pushes
Kaszanas Nov 20, 2024
f6eb987
build: added maps needed for SC2InfoExtractorGo
Kaszanas Nov 20, 2024
79a29cd
Merge pull request #60 from Kaszanas/59-copy-maps-sc2infoextractorgo
Kaszanas Nov 20, 2024
7ed1f1a
refactor: using dev dockerfile in sc2reset_sc2egset process
Kaszanas Nov 20, 2024
4f117c6
docs: changed docs for a more concise read
Kaszanas Jan 3, 2025
623fb61
build: bumped ruff and commitizen versions
Kaszanas Jan 3, 2025
ce58589
build: ran poetry lock
Kaszanas Jan 3, 2025
e82d355
docs: refined documentation, added TODO
Kaszanas Jan 5, 2025
d257a0d
build: added variables in makefile, adjusted targets, added echo
Kaszanas Jan 5, 2025
d5a5393
docs: changed docs, new CLI text, renamed container
Kaszanas Jan 5, 2025
18a6abc
build: removed dockerfiles per script, using main dockerfile
Kaszanas Jan 5, 2025
4d7b41f
refactor: drafting refactor of sc2egset_replaypack_processor
Kaszanas Jan 5, 2025
828a356
feat: added processed_mapping_copier target to makefile
Kaszanas Jan 5, 2025
d4e3cb7
feat: draft functionality of sc2egset_replaypack... full pipeline
Kaszanas Jan 5, 2025
67ec3e0
feat: drafted utils/user_prompt
Kaszanas Jan 6, 2025
76be1bc
refactor: renamed user prompting function
Kaszanas Jan 6, 2025
78a8d00
refactor: applied user prompting in sc2egset_replaypack_processor
Kaszanas Jan 6, 2025
e6118de
feat(directory_flattener.py): added user_prompt feature
Kaszanas Jan 6, 2025
2969e07
refactor(user_prompt.py): added logging
Kaszanas Jan 6, 2025
308b772
feat(directory_packager.py): added user prompting
Kaszanas Jan 6, 2025
e3bf197
refactor: using glob instead of os.walk
Kaszanas Jan 6, 2025
cc6a65a
docs: changed CLI description
Kaszanas Jan 6, 2025
75c744e
refactor: renamed force to force_overwrite
Kaszanas Jan 6, 2025
8a40d05
feat: added force_overwrite flag to CLI
Kaszanas Jan 6, 2025
aa10695
feat(json_merger.py): added user prompting, and CLI flag
Kaszanas Jan 6, 2025
cf28564
refactor(processed_mapping_copier.py): using pathlib, refactored func…
Kaszanas Jan 6, 2025
2efea73
refactor: applied user prompting for every script
Kaszanas Jan 6, 2025
e99c242
Merge pull request #64 from Kaszanas/63-prompt-user-possible-overwrite
Kaszanas Jan 6, 2025
7b0ae21
ci: attempt at fixing GH Actions, new make target name
Kaszanas Jan 6, 2025
7b31022
ci: fixing next step in CI pipeline, new target name
Kaszanas Jan 6, 2025
bcb41db
test: fixing tests with new features, fixing assertions
Kaszanas Jan 7, 2025
e1a1a00
feat: drafted full SC2ReSet/SC2EGSet pipeline
Kaszanas Jan 8, 2025
bc9f7ca
refactor: added logging statements
Kaszanas Jan 8, 2025
0c9288f
refactor: removed old directory structure from processing
Kaszanas Jan 8, 2025
64458f7
fix: manually tested directory_packager, working version
Kaszanas Jan 8, 2025
157bd50
feat: (directory_packager.py) added tqdm progres bar
Kaszanas Jan 8, 2025
8719e88
refactor: command saved to a variable
Kaszanas Jan 8, 2025
579f345
build(makefile): added targets for seeding maps locally
Kaszanas Jan 9, 2025
14c8cf9
build(docker): changed location of the maps directory in docker
Kaszanas Jan 9, 2025
7356c1d
feat: ignoring maps directory
Kaszanas Jan 9, 2025
9569bc8
fix(directory_flattener.py): manually tested flattening directories
Kaszanas Jan 9, 2025
377d838
feat: separate sc2egset_pipeline and replaypack_processor
Kaszanas Jan 9, 2025
af66764
test: fixing tests after func args change
Kaszanas Jan 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# To have imports resolve correctly this should be the path to the root of the project:
TEST_WORKSPACE=
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.MD
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
## Description
<!--- Describe your changes in detail -->

## Related IssueS
## Related Issues
<!--- This project only accepts pull requests related to open issues -->
<!--- If suggesting a new feature or change, please discuss it in an issue first -->
<!--- If fixing a bug, there should be an issue describing it with steps to reproduce -->
Expand Down
15 changes: 9 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,29 @@
name: continuous integration (ci)

on: [pull_request, workflow_dispatch]
on:
pull_request:
push:
branches:
- main
- dev
workflow_dispatch:

# To successfully find the files that are required for testing:
env:
TEST_WORKSPACE: ${{ github.workspace }}

jobs:

pre_commit:
# Set up operating system
runs-on: ubuntu-latest

# Define job steps
steps:

- name: Check-out repository
uses: actions/checkout@v4

- name: Build Dev Docker Image
run: |
make docker_build_dev
make docker_build_devcontainer

- name: Docker Run pre-commit on all files.
run: |
Expand All @@ -41,7 +44,7 @@ jobs:

- name: Build Dev Docker Image
run: |
make docker_build_dev PYTHON_VERSION=${{ matrix.python-version }}
make docker_build_devcontainer PYTHON_VERSION=${{ matrix.python-version }}

- name: Build Docker Image With Python ${{ matrix.python-version }}
run: |
Expand Down
47 changes: 47 additions & 0 deletions .github/workflows/docker_images.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: Publish Docker Images

# This should run only after the tests from the CI pipeline have passed.
# On a rare ocassion contributors can trigger this manually, and it should also
# run after a release has been published.
on:
workflow_run:
workflows: ["continuous integration (ci)"]
types:
- completed
push:
branches:
- main
- dev
workflow_dispatch:
release:
types: [published]

jobs:
push_to_registries:
name: Push Docker Image to Docker Hub
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
steps:
- name: Check out Code
uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11
- name: Log in to Docker Hub
uses: docker/login-action@e92390c5fb421da1463c202d546fed0ec5c39f20
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_TOKEN }}
- name: Extract Metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@8e5442c4ef9f78752691e2d8f8d19755c6f78e81
with:
images: |
kaszanas/datasetpreparator
- name: Build and Push Docker images
uses: docker/build-push-action@2cdde995de11925a030ce8070c3d77a52ffcf1c0
with:
context: .
file: ./docker/Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
/.vscode
/venv*

/processing
processing/
maps/

*.SC2Replay
*.SC2Map
Expand Down Expand Up @@ -34,3 +35,5 @@ ruff_cache/

# PyCharm
/.idea

.env
32 changes: 22 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,24 +56,36 @@ docker run -it -v .:/app datasetpreparator:devcontainer

### Local Development

Ready to contribute? Here's how to set up `datasetpreparator` for local development.
Ready to contribute? Here's how to set up `datasetpreparator` for local development. The code style standards that we use are defined in the `.pre-commit-config.yaml` file.

1. Download a copy of `datasetpreparator` locally.
2. Install `datasetpreparator` using `poetry`:

```console
poetry install
```
```console
poetry install
```

3. Install the pre-commit hooks:

```console
poetry run pre-commit install
```

3. Use `git` (or similar) to create a branch for local development and make your changes:
4. Use `git` (or similar) to create a branch for local development and make your changes:

```console
git checkout -b name-of-your-bugfix-or-feature
```
```console
git checkout -b name-of-your-bugfix-or-feature
```

5. When you're done making changes, check that your changes conform to any code formatting requirements and pass any tests.

4. When you're done making changes, check that your changes conform to any code formatting requirements and pass any tests.
6. Format your commit with `commitizen`:

```console
poetry run cz commit
```

5. Commit your changes and open a pull request.
7. Commit your changes (we are using commitizen to check commit messages) and open a pull request.

## Pull Request Guidelines

Expand Down
89 changes: 67 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,98 @@

# DatasetPreparator

Tools in this repository were used to create the **[SC2ReSet: StarCraft II Esport Replaypack Set](https://doi.org/10.5281/zenodo.5575796)**, and finally **[SC2EGSet: StarCraft II Esport Game State Dataset](https://doi.org/10.5281/zenodo.5503997)**.
This project contains various scripts that can assist in the process of preparing datasets. To have a broad overview of the tools please refer to the **[Detailed Tools Description](#detailed-tools-description)**.

Tools in this repository were used to create the **[SC2ReSet: StarCraft II Esport Replaypack Set](https://doi.org/10.5281/zenodo.5575796)**, and finally **[SC2EGSet: StarCraft II Esport Game State Dataset](https://doi.org/10.5281/zenodo.5503997)**, citation information **[Cite Us!](#cite-us)**.

## Installation

To install current version of the toolset as separate CLI tools run the following command:
> [!NOTE]
> To run this project there are some prerequisites that you need to have installed on your system:
> - Docker
> - make

Our prefered way of distributing the toolset is through DockerHub. We Use the Docker Image to provide a fully reproducible environment for our scripts.

To pull the image from DockerHub, run the following command:

```bash
docker pull kaszanas/datasetpreparator:latest
```
pip install datasetpreparator[all]

If you wish to clone the repository and build the Docker image yourself, run the following command:

```bash
make docker_build
```

After that each of the scripts should be available to call from the command line directly.
After building the image, please refer to the **[Command Line Arguments Usage](#command-line-arguments-usage)** section for the usage of the scripts and for a full description for each of the scripts refer to **[Detailed Tools Description](#detailed-tools-description)**.


## Command Line Arguments Usage

When using Docker, you will have to pass the arguments through the `docker run` command and mount the input/output directory. Below is an example of how to run the `directory_flattener` script using Docker. For ease of use we have prepared example directory structure in the `processing` directory. The command below uses that to issue a command to flatten the directory structure:

```bash
docker run \
-v "./processing:/app/processing" \
datasetpreparator:latest \
python3 directory_flattener.py \
--input_path /app/processing/directory_flattener/input \
--output_path /app/processing/directory_flattener/output
```

## Dataset Preparation Steps
## SC2EGSet Dataset Reproduction Steps

To reproduce our experience with defining a dataset and to be able to compare your results with our work we describe how to perform the processing below.
> [!NOTE]
> Instructions below are for reproducing the result of the SC2EGSet dataset. If you wish to use the tools in this repository separately for your own dataset, please refer to the **[Detailed Tools Description](#detailed-tools-description)**.

### Using Docker

1. Build the docker image from: https://github.com/Kaszanas/SC2InfoExtractorGo
2. Run the commands as described in the ```makefile```. But first make sure that all of the script parameters are set according to your needs.
We provide a release image containing all of the scripts. To see the usage of these scripts please refer to their respective ``README.md`` files as described in [Detailed Tools Description](#detailed-tools-description).

### Using Python
The following steps were used to prepare the SC2EGSet dataset:
1. Build the docker image for the DatasetPreparator using the provided ```makefile``` command: ```make docker_build```. This will load all of the dependencies such as the [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo).
2. Place the input replaypacks into `./processing/directory_flattener/input` directory.
3. Run the command ```make sc2reset_sc2egset``` to process the replaypacks and create the dataset. The output will be placed in `./processing/sc2egset_replaypack_processor/output` directory.

0. Obtain replays to process. This can be a replaypack or your own replay folder.
1. Download latest version of [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo), or build it from source.
2. **Optional** If the replays that you have are held in nested directories it is best to use ```src/directory_flattener.py```. This will copy the directory and place all of the files to the top directory where it can be further processed. In order to preserve the old directory structure, a .json file is created. The file contains the old directory tree to a mapping: ```{"replayUniqueHash": "whereItWasInOldStructure"}```. This step is is required in order to properly use [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo) as it only lists the files immediately available on the top level of the input directory. [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo).
3. **Optional** Use the map downloader ```src/sc2_map_downloader.py``` to download maps that were used in the replays that you obtained. This is required for the next step.
4. **Optional** Use the [SC2MapLocaleExtractor](https://github.com/Kaszanas/SC2MapLocaleExtractor) to obtain the mapping of ```{"foreign_map_name": "english_map_name"}``` which is required for the [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo) to translate the map names in the output .json files.
5. Perform replaypack processing using ```src/sc2_replaypack_processor.py``` with the [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo) placed in PATH, or next to the script.
6. **Optional** Using the ```src/file_renamer.py```, rename the files that were generated in the previous step. This is not required and is done to increase the readibility of the directory structure for the output.
7. Using the ```src/file_packager.py```, create .zip archives containing the datasets and the supplementary files. By finishing this stage, your dataset should be ready to upload.

#### Customization
### Detailed Tools Description

In order to specify different processing flags for https://github.com/Kaszanas/SC2InfoExtractorGo please modify the ```sc2_replaypack_processor.py``` file directly
Each of the scripts has its usage described in their respective `README.md` files, you can find the detailed description of the available tools below.

## Command Line Arguments Usage
#### CLI Usage; Generic scripts
1. [Directory Packager (dir_packager): README](src/datasetpreparator/dir_packager/README.md)
2. [Directory Flattener (directory_flattener): README](src/datasetpreparator/directory_flattener/README.md)
3. [File Renamer (file_renamer): README](src/datasetpreparator/file_renamer/README.md)
4. [JSON Merger (json_merger): README](src/datasetpreparator/json_merger/README.md)
5. [Processed Mapping Copier (processed_mapping_copier): README](src/datasetpreparator/processed_mapping_copier/README.md)

#### CLI Usage; StarCraft 2 Specific Scripts
1. [SC2 Map Downloader (sc2_map_downloader): README](src/datasetpreparator/sc2/sc2_map_downloader/README.md)
2. [SC2EGSet Replaypack Processor (sc2egset_replaypack_processor): README](src/datasetpreparator/sc2/sc2egset_replaypack_processor/README.md)
3. [SC2ReSet Replaypack Downloader (sc2reset_replaypack_downloader): README](src/datasetpreparator/sc2/sc2reset_replaypack_downloader/README.md)


<!-- ### Using Python

1. Obtain replays to process. This can be a replaypack or your own replay folder.
2. Download latest version of [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo), or build it from source.
3. **Optional** If the replays that you have are held in nested directories it is best to use ```src/directory_flattener.py```. This will copy the directory and place all of the files to the top directory where it can be further processed. In order to preserve the old directory structure, a .json file is created. The file contains the old directory tree to a mapping: ```{"replayUniqueHash": "whereItWasInOldStructure"}```. This step is is required in order to properly use [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo) as it only lists the files immediately available on the top level of the input directory. [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo).
4. **Optional** Use the map downloader ```src/sc2_map_downloader.py``` to download maps that were used in the replays that you obtained. This is required for the next step.
5. **Optional** Use the [SC2MapLocaleExtractor](https://github.com/Kaszanas/SC2MapLocaleExtractor) to obtain the mapping of ```{"foreign_map_name": "english_map_name"}``` which is required for the [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo) to translate the map names in the output .json files.
6. Perform replaypack processing using ```src/sc2_replaypack_processor.py``` with the [SC2InfoExtractorGo](https://github.com/Kaszanas/SC2InfoExtractorGo) placed in PATH, or next to the script.
7. **Optional** Using the ```src/file_renamer.py```, rename the files that were generated in the previous step. This is not required and is done to increase the readibility of the directory structure for the output.
8. Using the ```src/file_packager.py```, create .zip archives containing the datasets and the supplementary files. By finishing this stage, your dataset should be ready to upload. -->

Each of the scripts has its usage described in their respective `README.md` files.

## Contributing and Reporting Issues

If you want to report a bug, request a feature, or open any other issue, please do so in the **[issue tracker](https://github.com/Kaszanas/SC2DatasetPreparator/issues/new/choose)**.

Please see **[CONTRIBUTING.md](https://github.com/Kaszanas/SC2DatasetPreparator/blob/main/CONTRIBUTING.md)** for detailed development instructions and contribution guidelines.

## Citing
## Cite Us!

### This Repository

Expand Down
1 change: 1 addition & 0 deletions ci/install_poetry.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@

For full documentation, visit https://python-poetry.org/docs/#installation.
""" # noqa: E501

import sys


Expand Down
51 changes: 44 additions & 7 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,31 +1,68 @@
# Built .exe replay parsing tool is required to run sc2_replaypack_processor
# https://github.com/Kaszanas/SC2InfoExtractorGo

ARG PYTHON_VERSION=3.11

FROM kaszanas/sc2infoextractorgo:latest as extractor
# Built .exe replay parsing tool is required to run sc2_replaypack_processor
# https://github.com/Kaszanas/SC2InfoExtractorGo
FROM kaszanas/sc2infoextractorgo:latest AS extractor

FROM python:${PYTHON_VERSION}-alpine
FROM python:${PYTHON_VERSION}-alpine AS build

WORKDIR /app

# Copying the replay parsing tool:
COPY --from=extractor /SC2InfoExtractorGo /SC2InfoExtractorGo
# sc2egset_replaypack_processor requires the .exe file to be in the same directory as the script:
COPY --from=extractor /app/SC2InfoExtractorGo /app/SC2InfoExtractorGo
COPY --from=extractor /app/maps/ /app//processing/maps/

# Ensure the executable has the right permissions
RUN chmod +x /app/SC2InfoExtractorGo

# Copy only what is required to install the project:
COPY pyproject.toml poetry.lock ci/install_poetry.py /app/

# Install poetry
# TODO: this is rather ugly, we are installing poetry into the release Docker build. Use multi-stage builds instead.
ENV POETRY_HOME=/opt/poetry
RUN python3 install_poetry.py --version 1.8.2 && \
RUN python3 install_poetry.py --version 1.8.4 && \
$POETRY_HOME/bin/poetry --version

# Install only dependencies without installing current project:
RUN $POETRY_HOME/bin/poetry config virtualenvs.create false && $POETRY_HOME/bin/poetry install --no-root
RUN $POETRY_HOME/bin/poetry \
config virtualenvs.create false \
&& $POETRY_HOME/bin/poetry install --no-root

# Copy entire repository contents
COPY . .

# Copy test files:
COPY /src/ /app/src/
COPY /tests/__init__.py /app/tests/__init__.py
COPY /tests/conftest.py /app/tests/conftest.py
COPY /tests/test_utils.py /app/tests/test_utils.py
COPY /tests/test_settings.py /app/tests/test_settings.py
COPY /tests/test_main.py /app/tests/test_main.py
COPY /tests/test_cases/ /app/tests/test_cases/

# Copy docs files:
COPY /docs/ /app/docs/
COPY mkdocs.yml /app/mkdocs.yml
COPY README.md /app/README.md
COPY CONTRIBUTING.md /app/CONTRIBUTING.md

# Bring the scripts to the top level.
# They import parts of the project but as long as the project is installed
# in the same environment, they can run from anywhere as long as the environment
# is activated.
COPY /src/datasetpreparator/directory_flattener/directory_flattener.py \
/src/datasetpreparator/directory_packager/directory_packager.py \
/src/datasetpreparator/file_renamer/file_renamer.py \
/src/datasetpreparator/json_merger/json_merger.py \
/src/datasetpreparator/processed_mapping_copier/processed_mapping_copier.py \
/src/datasetpreparator/sc2/sc2_map_downloader/sc2_map_downloader.py \
/src/datasetpreparator/sc2/sc2egset_replaypack_processor/sc2egset_replaypack_processor.py \
/src/datasetpreparator/sc2/sc2reset_replaypack_downloader/sc2reset_replaypack_downloader.py \
/app/


# Install current project:
RUN $POETRY_HOME/bin/poetry install --all-extras
Loading
Loading