Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dialoGPT Persona #185

Merged
merged 67 commits into from
Oct 10, 2022
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
72842e3
Fix requirements.txt (#84)
AndriiHura Jan 24, 2022
a67e15c
fix itsdangerous requirements
mtalimanchuk Feb 18, 2022
0f8ef0e
pin itsdangerous requirements for all flask==1.1.1 servers
mtalimanchuk Feb 18, 2022
6f0684a
Merge pull request #102 from deepmipt/fix/combined-classification-fla…
mtalimanchuk Feb 18, 2022
d237711
Merge pull request #103 from deepmipt/dev
dilyararimovna Feb 18, 2022
e990264
Merge pull request #107 from deepmipt/dev
dilyararimovna Mar 2, 2022
3208f71
Merge pull request #119 from deepmipt/dev
dilyararimovna Mar 11, 2022
ab44553
Merge pull request #123 from deepmipt/dev
dilyararimovna Mar 18, 2022
1c9a463
Merge pull request #137 from deepmipt/dev
dilyararimovna Apr 8, 2022
f8e4a59
Merge pull request #145 from deepmipt/dev
dilyararimovna Apr 30, 2022
48872a6
Merge pull request #150 from deepmipt/dev
dilyararimovna May 4, 2022
ed42f0c
Merge pull request #153 from deepmipt/dev
dilyararimovna May 5, 2022
30f290c
Merge pull request #155 from deepmipt/dev
dilyararimovna May 6, 2022
de510bc
Merge pull request #158 from deepmipt/dev
dilyararimovna May 11, 2022
ab2dcbd
Merge pull request #165 from deepmipt/dev
dilyararimovna May 27, 2022
525783a
Merge pull request #174 from deepmipt/dev
dilyararimovna Jun 27, 2022
7e87a36
Merge pull request #177 from deepmipt/dev
dilyararimovna Jun 30, 2022
3c2169d
increase timeout to 5s
dmitrymailk Jul 5, 2022
0e9f1bb
add logs
dmitrymailk Jul 5, 2022
49e3270
increase to 100
dmitrymailk Jul 5, 2022
f286d94
add first working version gpt_persona
dmitrymailk Jul 13, 2022
57849b6
add sentence_ranking(not working)
dmitrymailk Jul 14, 2022
ff686d9
fix wrong endpoint
dmitrymailk Jul 14, 2022
36af140
create sentecnce ranker annotator
dmitrymailk Jul 14, 2022
ba674a6
rewrite text generation logic
dmitrymailk Jul 14, 2022
5cb5695
add comments to code
dmitrymailk Jul 14, 2022
3d25e53
fix gpt_persona fallback
dmitrymailk Jul 15, 2022
563f643
clean code, add train script, write tests
dmitrymailk Jul 20, 2022
b0d7f2b
add get_intents, remove dataset, add hyperparams
dmitrymailk Jul 24, 2022
d8d1499
add batch support
dmitrymailk Jul 26, 2022
2d66aa4
fix: move files
dilyararimovna Aug 29, 2022
dad1c10
fix: move files
dilyararimovna Aug 29, 2022
f409638
fix: merge
dilyararimovna Aug 29, 2022
43973f9
fix: codestyle
dilyararimovna Aug 29, 2022
3783b76
fix: remove sentence ranker
dilyararimovna Aug 29, 2022
c2811fe
feat: new distribution and rename skill
dilyararimovna Aug 29, 2022
353379c
feat: new annotator
dilyararimovna Aug 29, 2022
ab88168
feat: relative persona extractor
dilyararimovna Aug 29, 2022
a601237
fix: codestyle
dilyararimovna Aug 29, 2022
051356d
fix: proxy
dilyararimovna Aug 29, 2022
d19cce1
Merge branch 'persona_bot' of https://github.com/dmitrymailk/dream in…
dilyararimovna Aug 29, 2022
ea82137
fix: params
dilyararimovna Aug 29, 2022
b791fb6
fix: volumes
dilyararimovna Aug 29, 2022
270326f
fix: reqs
dilyararimovna Aug 29, 2022
1569dfc
fix: tests
dilyararimovna Aug 29, 2022
ca5042b
fix: tests relative sents extr
dilyararimovna Aug 29, 2022
a42faa6
fix: batching
dilyararimovna Aug 30, 2022
56b72e3
fix: codestyle
dilyararimovna Aug 30, 2022
fd257db
fix: persona extractor tests
dilyararimovna Aug 30, 2022
9f2eed5
fix: persona get
dilyararimovna Aug 30, 2022
bcd32bf
fix: tests
dilyararimovna Aug 30, 2022
9142095
fix: imports
dilyararimovna Aug 30, 2022
1ddfa2b
fix: logs
dilyararimovna Aug 31, 2022
e3f43f6
fix: docs
dilyararimovna Aug 31, 2022
3b9baff
fix: add midas
dilyararimovna Aug 31, 2022
222071b
Merge remote-tracking branch 'origin/dev' into persona_bot
dilyararimovna Sep 16, 2022
f35fda6
fix: command
dilyararimovna Sep 16, 2022
9a8e663
Merge remote-tracking branch 'origin/dev' into persona_bot
dilyararimovna Sep 20, 2022
4c7dfdd
feat: add to main dist and tests
dilyararimovna Sep 20, 2022
28c7dc4
feat: remove infilling, add dialogpt persona based, docs
dilyararimovna Sep 21, 2022
aefd444
fix: gpus
dilyararimovna Sep 26, 2022
84ca2fd
fix: params
dilyararimovna Sep 27, 2022
191764c
fix: param
dilyararimovna Sep 27, 2022
6a7453f
Merge remote-tracking branch 'origin/dev' into persona_bot
dilyararimovna Sep 27, 2022
4cedb68
fix: indent
dilyararimovna Sep 27, 2022
fa51ce6
fix: remove infilling from tests
dilyararimovna Sep 28, 2022
999f185
Merge remote-tracking branch 'origin/dev' into persona_bot
dilyararimovna Oct 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 144 additions & 108 deletions README.md

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions annotators/sentence_ranker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# syntax=docker/dockerfile:experimental

FROM pytorch/pytorch:1.5-cuda10.1-cudnn7-runtime

WORKDIR /src

ARG PRETRAINED_MODEL_NAME_OR_PATH
ENV PRETRAINED_MODEL_NAME_OR_PATH ${PRETRAINED_MODEL_NAME_OR_PATH}
ARG SERVICE_PORT
ENV SERVICE_PORT ${SERVICE_PORT}
# ARG N_HYPOTHESES_TO_GENERATE
# ENV N_HYPOTHESES_TO_GENERATE ${N_HYPOTHESES_TO_GENERATE}


COPY ./requirements.txt /src/requirements.txt
COPY ./persona_sentences.txt /src/persona_sentences.txt
RUN pip install -r /src/requirements.txt

RUN python -c "from sentence_transformers import SentenceTransformer;SentenceTransformer('${PRETRAINED_MODEL_NAME_OR_PATH}')"

COPY . /src

CMD gunicorn --workers=1 server:app -b 0.0.0.0:${SERVICE_PORT} --timeout=300
30 changes: 30 additions & 0 deletions annotators/sentence_ranker/persona_sentences.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
I used to be ADDICTED to Diet Coke.
I hate puzzles.
I was in a commercial for my aunt and uncle’s store.
I have a scar on my left hand from where I cat bit me when I was babysitting when I was 12.
I used to try to sleep in my splits when I was a dancer because I wanted to have perfect flexibility.
I love being outside, but I hate smelling like outside.
I am afraid of jack-in-the-boxes even though I know when they are going to pop out.
In second grade, I fell off the counter I was standing on trying to get a mug for milk and I had to wear an ice pack in my pants to school for a couple weeks.
For the third grade play, the part I wanted the most was to play the mom.
I almost went to the University of Iowa to major in English and be on the dance team.
The high school I graduated from, I only went to for my senior year.
My sister, Caroline, and I get asked if we are twins all the time, and when people find out we aren’t actually twins, they always think she’s older.
I’m really close to my Dad.
When my parents first met Cam’s parents, they kept saying how Cam’s mom looked so familiar, but couldn’t figure out where they’d seen her before.
Cam and I planned for him to ask me out on November 15th because 15 was his favorite number.
When I was younger I was OBSESSED with Jesse McCartney in the way that some people nowadays are obsessed with One Direction.
I either get told I look like Hilary Duff or Leighton Meester. I’m ok with either of those!!
A lot of people can’t figure out what nationality I am. Some people say I look 100% American and some people think I’m Asian. I am 50% Honduran and my mom is from the states.
No, I am not fluent in Spanish.
I haven’t had un-painted toenails in 6 years. I just don’t like the way my toenails look without polish haha!!
I was a drill team officer in high school and it was one of the most challenging and rewarding positions.
I’m a literature nerd. I LOVE Shakespeare, Jane Austen, Charlotte Bronte, and anything Greek mythology.
I ALMOST got certified to be a Zumba instructor, but then I was transferring so I didn’t.
Oddly enough, I think my favorite food ever is tuna salad.
I’m obsessed with libraries and churches. When I’m in a new town, I like to wander around those if I can find them!
My favorite place in the world is Eureka Springs, Arkansas.
I didn’t get a cell phone until my freshman year of high school, and I didn’t get an iPhone until I was a junior in high school.
If I could have any job in the world, I would own a coffee shop that also sold art, flowers, and stationary.
I plan on working for myself at some point in my life.
I’m insanely good with state capitols.
11 changes: 11 additions & 0 deletions annotators/sentence_ranker/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
transformers==4.20.1
flask==1.1.1
itsdangerous==2.0.1
gunicorn==19.9.0
requests==2.22.0
sentry-sdk[flask]==0.14.1
healthcheck==1.3.3
jinja2<=3.0.3
Werkzeug<=2.0.3
sentence-transformers==2.2.2
huggingface-hub==0.4.0
131 changes: 131 additions & 0 deletions annotators/sentence_ranker/server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
from typing import List, Tuple
from sentence_transformers import SentenceTransformer
import logging
import time
import os

import sentry_sdk
import torch
from flask import Flask, request, jsonify
from sentry_sdk.integrations.flask import FlaskIntegration
sentry_sdk.init(dsn=os.getenv("SENTRY_DSN"), integrations=[FlaskIntegration()])

class SentenceRanker:
def __init__(self,
persona_sentences=None,
sentence_model=None
):
"""_summary_

Args:
persona_sentences (List[str]): список предложений составляющие полную персону. Defaults to None.
sentence_model (SentenceTransformer): модель для перевода предложения в вектор. Defaults to None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

англ

"""
self.persona_sentences = persona_sentences
self.sentence_model = sentence_model
self.sentence_embeddings = self.sentence_model.encode(
persona_sentences,
convert_to_tensor=True
)
# для кеширования похожих запросов
self.ranked_sentences = {}

def rank_sentences(self, query, k):
"""возвращает топ k предложений которые похожи на query

Args:
query (str): предложение, на основе которого ищем похожие
k (int): количество возвращаемых предложений. Defaults to 5.

Returns:
List[List[str], float]: отранжированные предложения и максимальное косинусное расстояние среди всех
"""
key = f"{query}_{k}"
if self.ranked_sentences.get(key, False):
return self.ranked_sentences[key]

user_sentence_embeddings = self.sentence_model.encode(query, convert_to_tensor=True)

cos_sim_ranks = self.cos_sim(
user_sentence_embeddings,
self.sentence_embeddings
)

top_indices = torch.argsort(cos_sim_ranks, descending=True)
max_similarity = float(cos_sim_ranks[top_indices][0])
top_indices = list(top_indices[:k].cpu().numpy())
similar_sentences = [self.persona_sentences[idx] for idx in top_indices]
self.ranked_sentences[key] = similar_sentences, max_similarity
return [similar_sentences, max_similarity]

def cos_sim(self, a, b):
"""возвращает косинусное расстояние

K - количество предложений для сравнения
N - размерность возвращаемого вектора
Args:
a (torch.FloatTensor): shape (1, N)
b (torch.FloatTensor): shape (K, N)

Returns:
torch.FloatTensor: shape (1, K) тензор с косинусными расстояниями
"""
a_norm = torch.nn.functional.normalize(a, p=2, dim=1)
b_norm = torch.nn.functional.normalize(b, p=2, dim=1)
return torch.sum(a_norm * b_norm, dim=1)

logging.basicConfig(format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.INFO)
# logging.getLogger("werkzeug").setLevel("INFO")
logger = logging.getLogger(__name__)

PRETRAINED_MODEL_NAME_OR_PATH = os.environ.get("PRETRAINED_MODEL_NAME_OR_PATH")
# logging.info(f'PRETRAINED_MODEL_NAME_OR_PATH = {PRETRAINED_MODEL_NAME_OR_PATH}')
DEFAULT_CONFIDENCE = 10
N_HYPOTHESES_TO_GENERATE = int(os.environ.get("N_HYPOTHESES_TO_GENERATE", 1))
ZERO_CONFIDENCE = 0.0
MAX_HISTORY_DEPTH = 3
TOP_SIMILAR_SENTENCES = 5

try:
sentence_model = SentenceTransformer(PRETRAINED_MODEL_NAME_OR_PATH)

persona = open("./persona_sentences.txt").read()
persona_sentences = persona.split("\n")
persona_sentences = [item for item in persona_sentences if len(item) > 0]

sentence_ranker = SentenceRanker(
persona_sentences=persona_sentences,
sentence_model=sentence_model
)
logger.info("sentence_ranker is ready")
except Exception as e:
sentry_sdk.capture_exception(e)
logger.exception(e)
raise e

app = Flask(__name__)

@app.route("/response", methods=["POST"])
dilyararimovna marked this conversation as resolved.
Show resolved Hide resolved
def respond():
try:
dialogs = request.json.get("dialogs", [])
dilyararimovna marked this conversation as resolved.
Show resolved Hide resolved
process_result = []
# берем последнюю реплику, затем забираем у нее результат работы аннотатора sentseg
context_str = dialogs[0]["human_utterances"][-1]["annotations"]['sentseg']['punct_sent']
max_likelihood_sentences, max_sentence_similarity = sentence_ranker.rank_sentences(
[context_str],
k=TOP_SIMILAR_SENTENCES
)

process_result.append([
max_likelihood_sentences,
max_sentence_similarity
])

except Exception as exc:
logger.exception(exc)
sentry_sdk.capture_exception(exc)

return jsonify(
process_result
)
4 changes: 4 additions & 0 deletions assistant_dists/dream_mini/cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ services:
environment:
DEVICE: cpu
CUDA_VISIBLE_DEVICES: ""
dialogpt-persona:
environment:
DEVICE: cpu
CUDA_VISIBLE_DEVICES: ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence ranker тоже

intent-catcher:
environment:
DEVICE: cpu
Expand Down
13 changes: 13 additions & 0 deletions assistant_dists/dream_mini/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,17 @@ services:
- "./services/dialogpt:/src"
ports:
- 8125:8125

dialogpt-persona:
volumes:
- "./services/dialogpt_persona:/src"
ports:
- 8131:8131

sentence-ranker:
volumes:
- "./annotators/sentence_ranker:/src"
ports:
- 8130:8130

version: "3.7"
67 changes: 51 additions & 16 deletions assistant_dists/dream_mini/docker-compose.override.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,10 @@ services:
agent:
command: sh -c 'bin/wait && python -m deeppavlov_agent.run -ch http_client -pl assistant_dists/dream_mini/pipeline_conf.json --cors'
environment:
WAIT_HOSTS: "convers-evaluator-annotator:8004, dff-program-y-skill:8008, sentseg:8011, convers-evaluation-selector:8009,
dff-intent-responder-skill:8012, intent-catcher:8014, badlisted-words:8018,
spelling-preprocessing:8074, dialogpt:8125"
WAIT_HOSTS: "convers-evaluator-annotator:8004, dff-program-y-skill:8008, sentseg:8011, convers-evaluation-selector:8009, dff-intent-responder-skill:8012, intent-catcher:8014, badlisted-words:8018, spelling-preprocessing:8074, dialogpt:8125, dialogpt-persona:8131, sentence-ranker:8130"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

верни в нескольтко строк - там удобнее смотртеь, пожалуйста

WAIT_HOSTS_TIMEOUT: ${WAIT_TIMEOUT:-480}
convers-evaluator-annotator:
env_file: [.env]
env_file: [ .env ]
build:
args:
CONFIG: conveval.json
Expand All @@ -27,7 +25,7 @@ services:
memory: 2G

dff-program-y-skill:
env_file: [.env]
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8008
Expand All @@ -43,9 +41,8 @@ services:
reservations:
memory: 1024M


sentseg:
env_file: [.env]
env_file: [ .env ]
build:
context: ./annotators/SentSeg/
command: flask run -h 0.0.0.0 -p 8011
Expand All @@ -59,18 +56,18 @@ services:
memory: 1.5G

convers-evaluation-selector:
env_file: [.env]
env_file: [ .env ]
build:
args:
TAG_BASED_SELECTION: 1
TAG_BASED_SELECTION: 0
CALL_BY_NAME_PROBABILITY: 0.5
PROMPT_PROBA: 0.3
PROMPT_PROBA: 0
ACKNOWLEDGEMENT_PROBA: 0.3
PRIORITIZE_WITH_REQUIRED_ACT: 1
PRIORITIZE_NO_DIALOG_BREAKDOWN: 0
PRIORITIZE_WITH_SAME_TOPIC_ENTITY: 1
PRIORITIZE_WITH_SAME_TOPIC_ENTITY: 0
IGNORE_DISLIKED_SKILLS: 0
GREETING_FIRST: 1
GREETING_FIRST: 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

вернуть параметры на старые

RESTRICTION_FOR_SENSITIVE_CASE: 1
PRIORITIZE_PROMTS_WHEN_NO_SCRIPTS: 0
ADD_ACKNOWLEDGMENTS_IF_POSSIBLE: 1
Expand Down Expand Up @@ -105,15 +102,15 @@ services:
memory: 128M

intent-catcher:
env_file: [.env]
env_file: [ .env ]
build:
context: .
dockerfile: ./annotators/IntentCatcherTransformers/Dockerfile
args:
SERVICE_PORT: 8014
CONFIG_NAME: intents_model_dp_config.json
INTENT_PHRASES_PATH: intent_phrases.json
command: python -m flask run -h 0.0.0.0 -p 8014
command: python -m flask run -h 0.0.0.0 -p 8014
environment:
- FLASK_APP=server
- CUDA_VISIBLE_DEVICES=0
Expand All @@ -125,7 +122,7 @@ services:
memory: 3.5G

badlisted-words:
env_file: [.env]
env_file: [ .env ]
build:
context: annotators/BadlistedWordsDetector/
command: flask run -h 0.0.0.0 -p 8018
Expand All @@ -139,7 +136,7 @@ services:
memory: 256M

spelling-preprocessing:
env_file: [.env]
env_file: [ .env ]
build:
context: ./annotators/spelling_preprocessing/
command: flask run -h 0.0.0.0 -p 8074
Expand Down Expand Up @@ -172,4 +169,42 @@ services:
reservations:
memory: 2G

dialogpt-persona:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8131
SERVICE_NAME: dialogpt_persona
PRETRAINED_MODEL_NAME_OR_PATH: dim/dialogpt-medium-persona-chat
context: ./services/dialogpt_persona/
command: flask run -h 0.0.0.0 -p 8131
environment:
- CUDA_VISIBLE_DEVICES=0
- FLASK_APP=server
deploy:
resources:
limits:
memory: 2G
reservations:
memory: 2G

sentence-ranker:
env_file: [ .env ]
build:
args:
SERVICE_PORT: 8130
SERVICE_NAME: sentence_ranker
PRETRAINED_MODEL_NAME_OR_PATH: 'sentence-transformers/nli-distilroberta-base-v2'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

можно без кавычек

context: ./annotators/sentence_ranker/
command: flask run -h 0.0.0.0 -p 8130
environment:
- CUDA_VISIBLE_DEVICES=0
- FLASK_APP=server
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 1G

version: '3.7'
Loading