Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/dialogrpt ru #121

Merged
merged 42 commits into from
Mar 18, 2022
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
be7a207
fix: file drafts
dilyararimovna Mar 14, 2022
239c8b3
feat: files for dialogrpt
dilyararimovna Mar 14, 2022
28cddf3
feat: dialogrpt pipeline and scores
dilyararimovna Mar 14, 2022
6ca5b97
feat: dialogrpt pipeline and scores
dilyararimovna Mar 14, 2022
74dba5c
feat: dialogrpt readme
dilyararimovna Mar 14, 2022
3cdff45
fix: small readme
dilyararimovna Mar 15, 2022
57636c7
fix: sno healthcheck
dilyararimovna Mar 15, 2022
11898e9
feat: add dialogrpt to pipeline
dilyararimovna Mar 15, 2022
ad820fc
fix: codestyle
dilyararimovna Mar 15, 2022
b88cf17
fix: test files
dilyararimovna Mar 15, 2022
9c59cb4
feat: upd packages in dockerfile
dilyararimovna Mar 15, 2022
d6e0b46
fix: path to file
dilyararimovna Mar 15, 2022
44885b5
fix: shared file
dilyararimovna Mar 15, 2022
1108b6d
fix: codestyle
dilyararimovna Mar 15, 2022
c6021e7
fix: imports
dilyararimovna Mar 15, 2022
96b5eeb
fix: option consider
dilyararimovna Mar 15, 2022
c98d82f
fix: option consider
dilyararimovna Mar 15, 2022
c6e4320
fix: codestyle
dilyararimovna Mar 15, 2022
1242e9f
fix: vars
dilyararimovna Mar 15, 2022
febb3ee
Merge remote-tracking branch 'origin/feat/dialogrpt_ru' into feat/dia…
dilyararimovna Mar 15, 2022
55325ce
fix: test file
dilyararimovna Mar 15, 2022
8fcc9e4
fix: convert to list predictions
dilyararimovna Mar 15, 2022
fe89aae
fix: tests
dilyararimovna Mar 15, 2022
915143d
fix: codestyle
dilyararimovna Mar 15, 2022
6dd19de
fix: codestyle
dilyararimovna Mar 15, 2022
22501d1
fix: codestyle
dilyararimovna Mar 15, 2022
261fc60
fix: readme
dilyararimovna Mar 15, 2022
5fd8431
fix: dialogrpt to tests
dilyararimovna Mar 16, 2022
6dd1f0c
Merge remote-tracking branch 'origin/feat/russian_baseline' into feat…
dilyararimovna Mar 16, 2022
73b6b46
feat: no extra files, add tokenizer as parameter
dilyararimovna Mar 16, 2022
63019e8
fix: codestyle
dilyararimovna Mar 16, 2022
b582846
fix: var name
dilyararimovna Mar 16, 2022
d68f7ca
fix: batch prediction
dilyararimovna Mar 16, 2022
8e00607
fix: batch prediction parameter
dilyararimovna Mar 16, 2022
c047ccf
fix: test choice
dilyararimovna Mar 16, 2022
9d745de
fix: format values
dilyararimovna Mar 16, 2022
17a45fb
fix: codestyle
dilyararimovna Mar 16, 2022
6dc95d3
fix: upd deeppavlov download
dilyararimovna Mar 17, 2022
3010472
fix: dialogrpt container name
dilyararimovna Mar 18, 2022
ae78d8d
fix: dialogrpt as hyp annotator
dilyararimovna Mar 18, 2022
178a932
Merge remote-tracking branch 'origin/feat/russian_baseline' into feat…
dilyararimovna Mar 18, 2022
d0f6c5c
fix: dialogrpt test
dilyararimovna Mar 18, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions assistant_dists/dream_russian/cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ services:
dialogpt:
environment:
CUDA_VISIBLE_DEVICES: ""
dialogrpt:
environment:
CUDA_VISIBLE_DEVICES: ""
5 changes: 5 additions & 0 deletions assistant_dists/dream_russian/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,11 @@ services:
- "./common:/src/common"
ports:
- 8092:8092
dialogrpt:
volumes:
- "./services/dialogrpt_ru:/src"
ports:
- 8122:8122
dff-template-skill:
volumes:
- "./skills/dff_template_skill:/src"
Expand Down
20 changes: 20 additions & 0 deletions assistant_dists/dream_russian/docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,25 @@ services:
reservations:
memory: 128M

dialogrpt:
env_file: [ .env ]
build:
context: ./services/dialogrpt_ru/
args:
SERVICE_PORT: 8122
PRETRAINED_MODEL_FNAME: dialogrpt_ru_ckpt_v0.pth
TOKENIZER_NAME_OR_PATH: "Grossmend/rudialogpt3_medium_based_on_gpt2"
command: flask run -h 0.0.0.0 -p 8122
environment:
- CUDA_VISIBLE_DEVICES=0
- FLASK_APP=server
deploy:
resources:
limits:
memory: 2.5G
reservations:
memory: 2.5G

dff-template-skill:
env_file: [.env]
build:
Expand All @@ -305,4 +324,5 @@ services:
memory: 128M
reservations:
memory: 128M

version: '3.7'
13 changes: 13 additions & 0 deletions assistant_dists/dream_russian/pipeline_conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,19 @@
"skills"
],
"state_manager_method": "add_hypothesis_annotation_batch"
},
"dialogrpt": {
"connector": {
"protocol": "http",
"timeout": 1,
"url": "http://entity-detection:8122/respond"
},
"dialog_formatter": "state_formatters.dp_formatters:hypotheses_with_context_list",
"response_formatter": "state_formatters.dp_formatters:simple_formatter_service",
"previous_services": [
"skills"
],
"state_manager_method": "add_hypothesis_annotation_batch"
}
},
"response_selectors": {
Expand Down
3 changes: 3 additions & 0 deletions assistant_dists/dream_russian/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,8 @@ services:
dialogpt:
environment:
- CUDA_VISIBLE_DEVICES=7
dialogrpt:
environment:
- CUDA_VISIBLE_DEVICES=7
dff-template-skill:
version: '3.7'
2 changes: 1 addition & 1 deletion dockerfile_agent
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ RUN mkdir /pavlov && \
pip install -e .

RUN pip install sentry-sdk==0.16.0 requests==2.24.0 pandas GitPython==3.1.14 pyaml openpyxl==3.0.0 xlrd==1.2.0
RUN pip install git+git://github.com/deepmipt/dp-agent.git@35960a8fb0ac8df8ecf75215c895a64c225c1490
RUN pip install https://codeload.github.com/deepmipt/dp-agent/tar.gz/35960a8fb0ac8df8ecf75215c895a64c225c1490

ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
Expand Down
26 changes: 26 additions & 0 deletions services/dialogrpt_ru/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# syntax=docker/dockerfile:experimental

FROM pytorch/pytorch:1.5-cuda10.1-cudnn7-runtime

RUN apt-get update && apt-get install -y --allow-unauthenticated wget && rm -rf /var/lib/apt/lists/*

WORKDIR /src

ARG PRETRAINED_MODEL_FNAME
ENV PRETRAINED_MODEL_FNAME ${PRETRAINED_MODEL_FNAME}
ARG SERVICE_PORT
ENV SERVICE_PORT ${SERVICE_PORT}
ARG TOKENIZER_NAME_OR_PATH
ENV TOKENIZER_NAME_OR_PATH ${TOKENIZER_NAME_OR_PATH}

RUN mkdir /data/

RUN wget -c -q http://files.deeppavlov.ai/deeppavlov_data/${PRETRAINED_MODEL_FNAME} -P /data/

COPY ./requirements.txt /src/requirements.txt
RUN pip install -r /src/requirements.txt

COPY . /src

CMD gunicorn --workers=1 server:app -b 0.0.0.0:${SERVICE_PORT} --timeout=300

15 changes: 15 additions & 0 deletions services/dialogrpt_ru/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Russian DialogRPT model

Code from https://github.com/golsun/DialogRPT

Trained on 827k samples (plus 95k validation samples) from Russian Pikabu web-site.

Data parsed from Pikabu by `zhirzemli` (OpenDataScience Slack nickname), code is available [on GitHub](https://github.com/alexeykarnachev/dialogs_data_parsers)
and the data is available [here](https://drive.google.com/file/d/1XYCprTqn_MlzDD9qgj7ANJkwFigK66mv/view?usp=sharing).

Final acc=0.64 (on valid).

Trained on 8 GPUs.
```
python src/main.py train --data=data/out/updown --min_score_gap=20 --min_rank_gap=0.5 --max_seq_len 256 --batch 16 1>out.txt 2>&1
```
Loading