deeppavlov · dilyararimovna · Jun 5, 2023 · Apr 4, 2023 · Apr 5, 2023 · Apr 5, 2023
diff --git a/assistant_dists/dream_google_api/README.md b/assistant_dists/dream_google_api/README.md
@@ -0,0 +1,125 @@
+# Dream Prompted Distribution
+
+**_One may consider this distribution as a TEMPLATE for a prompt-based distribution which may contain any number of 
+prompt-based skills each of which is conditioned on a single prompt during the whole conversation_**
+
+**Note!** Each Prompt-based Skill utilizes the **same prompt during the whole dialog**!
+
+# What is Dream Prompted Distribution
+
+Dream Prompted distribution is an example of the prompt-based dialogue system which contains one prompt-based skill, 
+in particular, prompt is a persona description. 
+
+Dream Prompted distribution contains the following skills:
+* Dummy Skill (`dummy_skill`) is a fallback skill (also it is a part of agent container, so no separate container required)
+* DFF Dream Persona Prompted Skill (`dff_dream_persona_prompted_skill`) is a skill created via DFF (Dialog Flow Framework)
+which generates a response to the current dialogue context taking into account the given prompt, i.g., bot's persona description.
+
+### DFF Dream Persona Prompted Skill
+
+The **DFF Dream Persona Prompted Skill** is a light-weight container sending requests to the generative service 
+which utilizes a neural network for prompt-based generation.
+DFF Dream Persona Prompted Skill accepts two main environmental variables:
+  * `PROMPT_FILE`  contains a path to a JSON file containing dictionary with prompt, 
+  * `GENERATIVE_SERVICE_URL` contains a URL of the generative service to be used.
+  The service must utilize the same input-output format as Transformers-LM (`transformers_lm`). 
+  * `N_UTTERANCES_CONTEXT` contains lengths of the considered context in terms of number of dialogue utterances.
+
+**Note!** DFF Dream Persona Prompted Skill utilizes a special universal template `skills/dff_template_prompted_skill`
+which do not require creation of the new skill's directory. For your convenience, creating a new skill, 
+you should utilize the same template folder but specify another prompt file, service port, and specify another container name.
+
+### Prompt Selector
+
+The distribution may contain **several Prompt-based skills.** Therefore, the **Prompt Selector** component is presented. 
+The Prompt Selector is also a light-weight container utilizing **Sentence Ranker** component 
+(its URL is given in `.env` file as `SENTENCE_RANKER_SERVICE_URL`) to select `N_SENTENCES_TO_RETURN` 
+the most relevant prompts (precisely, it returns ordered list of prompt names) among the given ones. 
+The `,`-joint list of the prompt names to be considered is given as an environmental variable `PROMPTS_TO_CONSIDER`.
+Each considered prompt should be located as `dream/common/prompts/<prompt_name>.json`.
+
+**Note!** In the Dream Persona Prompted Distribution we give a list of prompts to the Prompt Selector: `dream_persona,pizza` 
+separated with semicolon just for the demonstration of the `PROMPTS_TO_CONSIDER`'s input format. Actually,
+Dream Persona Prompted Distribution contains only one prompted skill which utilizes Dream Persona prompt.
+
+### Skill Selector
+
+You should not do any changes in the Skill Selector, it would call all the skills with the most relevant prompts
+automatically according to the Prompt Selector.  If Prompt Selector annotations are detected in the user utterance, 
+the Skill Selector turns on skills with names `dff_<prompt_name>_prompted_skill` for each prompt_name from
+`N_SENTENCES_TO_RETURN` the most relevant prompts detected by Prompt Selector. 
+Therefore, a prompt name can contain `'_'` but not `'-'`. 
+
+**Note!** Pay attention that you may specify to the Prompt Selector prompt names 
+even if the corresponding skills are not presented in the distribution, so if you, for example, specify 5 prompt names
+while your distribution contains only 2 prompted skill, and you assign the number of returned most relevant prompts
+(`N_SENTENCES_TO_RETURN`) to 3, you may face a situation when the Prompt Selector will choose all prompts for which
+you do not have skills, so the response on that step will be provided by other skills presented in the distribution 
+(in particular, by Dummy Skill for the current version of Dream Prompted distribution).
+
+# How to Create a New Prompted Distribution
+
+If one wants to create a new prompted distribution (distribution containing prompt-based skill(s)), one should:
+
+1. Copy the `dream/assistant_dists/dream_persona_prompted` directory to `dream/assistant_dists/dream_custom_prompted`
+(the name is an example!).
+2. **For each prompt-based skill, one needs to**:
+   1. create a `dream/common/prompts/<prompt_name>.json` files containing a prompt. 
+   **Important!** `<prompt_name>` should only contain letters, numbers and underscores (`_`) but no dashes (`-`)!
+   2. in `dream/assistant_dists/dream_custom_prompted/` folder in files `docker-compose.override.yml`, `dev.yml` 
+   copy description of container `dream-persona` and replace strings `dream-persona` with `<prompt-name>` 
+   (container names are using dashes) and 
+   `dream_persona` with `<prompt_name>` (component names are using underscores). 
+   If your prompt name is written as a single word 
+   (for example, `spacexfaq` not `spacex_faq`), replace both `dream-persona` and `dream_persona` with your prompt name.
+   3. for each new container (a new container for each new DFF skill) change the `SERVICE_PORT` 
+   to an unused one.
+3. Choose the generative service to be used. For that one needs to:
+   1. in `dream/assistant_dists/dream_custom_prompted/` folder in files `docker-compose.override.yml`, `dev.yml` 
+   replace `transformers-lm-gptj` container description to a new one. 
+   In particular, one may replace in `PRETRAINED_MODEL_NAME_OR_PATH` parameter 
+   a utilized Language Model (LM) `GPT-J` with another one from `Transformers` library. 
+   Please change a port (`8130` for `transformers-lm-gptj`) to unused ones. 
+   2. in all prompted skills' containers descriptions change `GENERATIVE_SERVICE_URL` to your generative model. 
+   Take into account that the service name is constructed as `http://<container-name>:<port>/<endpoint>`. 
+4. For each prompted skill, one needs to create an input state formatter. To do that, one needs to:
+   1. in `dream/dp_formatters/state_formatters.py` duplicate function:
+   ```python
+   def dff_dream_persona_prompted_skill_formatter(dialog):
+       return utils.dff_formatter(
+           dialog, "dff_dream_persona_prompted_skill",
+           types_utterances=["human_utterances", "bot_utterances", "utterances"]
+       )
+   ```
+   2. replace string  `dream_persona` with `<prompt_name>` (component names are using underscores) in each duplicated function. 
+5. In `dream/assistant_dists/dream_custom_prompted/pipeline_conf.json` for each prompt-based skill, one needs to:
+   1. copy description of DFF Dream Persona Prompted Skill:
+   ```json
+            "dff_dream_persona_prompted_skill": {
+                "connector": {
+                    "protocol": "http",
+                    "timeout": 4.5,
+                    "url": "http://dff-dream-persona-gpt-j-prompted-skill:8134/respond"
+                },
+                "dialog_formatter": "state_formatters.dp_formatters:dff_dream_persona_prompted_skill_formatter",
+                "response_formatter": "state_formatters.dp_formatters:skill_with_attributes_formatter_service",
+                "previous_services": [
+                    "skill_selectors"
+                ],
+                "state_manager_method": "add_hypothesis"
+            },
+   ```
+   2. replace strings `dream-persona` with `<prompt-name>` (container names are using dashes) and 
+   `dream_persona` with `<prompt_name>` (component names are using underscores). It will change the container name, 
+   skill name, formatter name
+   3. replace port (`8134` in the example) to the assigned one in 
+   `dream/assistant_dists/dream_custom_prompted/docker-compose.override.yml`.
+6. If one does not want to keep DFF Dream Persona Prompted Skill in their distribution, one should remove all mentions
+of DFF Dream Persona Prompted Skill container from `yml`-configs and `pipeline_conf.json` files.
+
+**Note!** Please, take into account that naming skill utilizing <prompt_name> according to the instruction above
+is very important to provide Skill Selector automatically turn on the prompt-based skills which are returned as 
+`N_SENTENCES_TO_RETURN` the most relevant prompts.
+
+
+
diff --git a/assistant_dists/dream_google_api/cpu.yml b/assistant_dists/dream_google_api/cpu.yml
@@ -0,0 +1,14 @@
+version: '3.7'
+services:
+  combined-classification:
+    environment:
+      DEVICE: cpu
+      CUDA_VISIBLE_DEVICES: ""
+  sentence-ranker:
+    environment:
+      DEVICE: cpu
+      CUDA_VISIBLE_DEVICES: ""
+  transformers-lm-gptj:
+    environment:
+      DEVICE: cpu
+      CUDA_VISIBLE_DEVICES: ""
diff --git a/assistant_dists/dream_google_api/db_conf.json b/assistant_dists/dream_google_api/db_conf.json
@@ -0,0 +1,6 @@
+{
+    "host": "DB_HOST",
+    "port": "DB_PORT",
+    "name": "DB_NAME",
+    "env": true
+}
diff --git a/assistant_dists/dream_google_api/dev.yml b/assistant_dists/dream_google_api/dev.yml
@@ -0,0 +1,56 @@
+# С такими volumes удобно дебажить, не нужно пересобирать контейнер каждый раз при изменении кода
+services:
+  agent:
+    volumes:
+      - ".:/dp-agent"
+    ports:
+      - 4242:4242
+  sentseg:
+    volumes:
+      - "./annotators/SentSeg:/src"
+    ports:
+      - 8011:8011
+  convers-evaluation-no-scripts-selector:
+    volumes:
+      - "./response_selectors/convers_evaluation_based_selector:/src"
+      - "./common:/src/common"
+    ports:
+      - 8009:8009
+  badlisted-words:
+    volumes:
+      - "./annotators/BadlistedWordsDetector:/src"
+      - "./common:/src/common"
+    ports:
+      - 8018:8018
+  spelling-preprocessing:
+    volumes:
+      - "./annotators/spelling_preprocessing:/src"
+    ports:
+      - 8074:8074
+  combined-classification:
+    volumes:
+      - "./common:/src/common"
+      - "./annotators/combined_classification:/src"
+    ports:
+      - 8087:8087
+  sentence-ranker:
+    volumes:
+      - "./services/sentence_ranker:/src"
+      - "~/.deeppavlov/cache:/root/.cache"
+    ports:
+      - 8128:8128
+  dialogpt:
+    volumes:
+      - "./services/dialogpt:/src"
+      - "./common:/src/common"
+      - "~/.deeppavlov/cache:/root/.cache"
+    ports:
+      - 8125:8125
+  dff-google-api-skill:
+    volumes:
+      - "./skills/dff_google_api_skill:/src"
+      - "./common:/src/common"
+    ports:
+      - 8156:8158
+
+version: "3.7"
diff --git a/assistant_dists/dream_google_api/docker-compose.override.yml b/assistant_dists/dream_google_api/docker-compose.override.yml
@@ -0,0 +1,167 @@
+services:
+  agent:
+    command: sh -c 'bin/wait && python -m deeppavlov_agent.run agent.pipeline_config=assistant_dists/dream_persona_prompted/pipeline_conf.json'
+    environment:
+      WAIT_HOSTS: "sentseg:8011, convers-evaluation-no-scripts-selector:8009, badlisted-words:8018, combined-classification:8087, 
+        spelling-preprocessing:8074, sentence-ranker:8128, dialogpt:8125, dff-google-api-skill:8156"
+      WAIT_HOSTS_TIMEOUT: ${WAIT_TIMEOUT:-1000}
+
+  sentseg:
+    env_file: [ .env ]
+    build:
+      context: ./annotators/SentSeg/
+    command: flask run -h 0.0.0.0 -p 8011
+    environment:
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 1.5G
+        reservations:
+          memory: 1.5G
+
+  combined-classification:
+    env_file: [ .env ]
+    build:
+      args:
+        CONFIG: combined_classifier.json
+        SERVICE_PORT: 8087
+      context: .
+      dockerfile: ./annotators/combined_classification/Dockerfile
+    environment:
+      - CUDA_VISIBLE_DEVICES=0
+    deploy:
+      resources:
+        limits:
+          memory: 2G
+        reservations:
+          memory: 2G
+
+  convers-evaluation-no-scripts-selector:
+    env_file: [ .env ]
+    build:
+      args:
+        TAG_BASED_SELECTION: 1
+        CALL_BY_NAME_PROBABILITY: 0.5
+        PROMPT_PROBA: 0.1
+        ACKNOWLEDGEMENT_PROBA: 0.3
+        PRIORITIZE_WITH_REQUIRED_ACT: 0
+        PRIORITIZE_NO_DIALOG_BREAKDOWN: 0
+        PRIORITIZE_WITH_SAME_TOPIC_ENTITY: 0
+        IGNORE_DISLIKED_SKILLS: 0
+        GREETING_FIRST: 1
+        RESTRICTION_FOR_SENSITIVE_CASE: 1
+        PRIORITIZE_PROMTS_WHEN_NO_SCRIPTS: 0
+        MAX_TURNS_WITHOUT_SCRIPTS: 7
+        ADD_ACKNOWLEDGMENTS_IF_POSSIBLE: 1
+        PRIORITIZE_SCRIPTED_SKILLS: 0
+        CONFIDENCE_STRENGTH: 0.8
+        CONV_EVAL_STRENGTH: 0.4
+        PRIORITIZE_HUMAN_INITIATIVE: 1
+        QUESTION_TO_QUESTION_DOWNSCORE_COEF: 0.8
+      context: .
+      dockerfile: ./response_selectors/convers_evaluation_based_selector/Dockerfile
+    command: flask run -h 0.0.0.0 -p 8009
+    environment:
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 100M
+        reservations:
+          memory: 100M
+
+  badlisted-words:
+    env_file: [ .env ]
+    build:
+      args:
+        SERVICE_PORT: 8018
+        SERVICE_NAME: badlisted_words
+      context: annotators/BadlistedWordsDetector/
+    command: flask run -h 0.0.0.0 -p 8018
+    environment:
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 256M
+        reservations:
+          memory: 256M
+
+  spelling-preprocessing:
+    env_file: [ .env ]
+    build:
+      args:
+        SERVICE_PORT: 8074
+        SERVICE_NAME: spelling_preprocessing
+      context: ./annotators/spelling_preprocessing/
+    command: flask run -h 0.0.0.0 -p 8074
+    environment:
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 100M
+        reservations:
+          memory: 100M
+
+  sentence-ranker:
+    env_file: [ .env ]
+    build:
+      args:
+        SERVICE_PORT: 8128
+        SERVICE_NAME: sentence_ranker
+        PRETRAINED_MODEL_NAME_OR_PATH: sentence-transformers/bert-base-nli-mean-tokens
+      context: ./services/sentence_ranker/
+    command: flask run -h 0.0.0.0 -p 8128
+    environment:
+      - CUDA_VISIBLE_DEVICES=0
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 3G
+        reservations:
+          memory: 3G
+
+  dialogpt:
+    env_file: [ .env ]
+    build:
+      args:
+        SERVICE_PORT: 8125
+        SERVICE_NAME: dialogpt
+        PRETRAINED_MODEL_NAME_OR_PATH: microsoft/DialoGPT-medium
+        N_HYPOTHESES_TO_GENERATE: 5
+        CONFIG_NAME: dialogpt_en.json
+        MAX_HISTORY_DEPTH: 2
+      context: .
+      dockerfile: ./services/dialogpt/Dockerfile
+    command: flask run -h 0.0.0.0 -p 8125
+    environment:
+      - CUDA_VISIBLE_DEVICES=0
+      - FLASK_APP=server
+    deploy:
+      resources:
+        limits:
+          memory: 2G
+        reservations:
+          memory: 2G
+
+  dff-google-api-skill:
+    env_file: [ .env,.env_secret ]
+    build:
+      args:
+        SERVICE_PORT: 8156
+        SERVICE_NAME: dff_google_api_skill
+        ENVVARS_TO_SEND: OPENAI_API_KEY,GOOGLE_CSE_ID,GOOGLE_API_KEY
+      context: .
+      dockerfile: ./skills/dff_google_api_skill/Dockerfile
+    command: gunicorn --workers=1 server:app -b 0.0.0.0:8156 --reload
+    deploy:
+      resources:
+        limits:
+          memory: 128M
+        reservations:
+          memory: 128M
+
+version: '3.7'
diff --git a/assistant_dists/dream_google_api/gpu1.yml b/assistant_dists/dream_google_api/gpu1.yml
@@ -0,0 +1,25 @@
+services:
+  agent:
+    restart: unless-stopped
+    volumes:
+      - "/cephfs/home/ignatov/artifacts:/output"
+      - ".:/dp-agent"
+    ports:
+      - ${AGENT_PORT}:4242
+  combined-classification:
+    restart: unless-stopped
+    environment:
+      - CUDA_VISIBLE_DEVICES=1
+  mongo:
+    restart: unless-stopped
+    command: mongod
+    image: mongo:4.0.0
+  sentence-ranker:
+    restart: unless-stopped
+    environment:
+      - CUDA_VISIBLE_DEVICES=1
+  transformers-lm-gptj:
+    restart: unless-stopped
+    environment:
+      - CUDA_VISIBLE_DEVICES=0
+version: '3.7'