Skip to content

Latest commit

 

History

History
60 lines (47 loc) · 3.81 KB

W1+W2+W3.md

File metadata and controls

60 lines (47 loc) · 3.81 KB

This section describes how to run Hume with DART as a BYOD system.

System Preparation

For this mode, you will need to prepare a config file.

You will also need to create a new working folder, we'll use runtime as an example in this document.

Docker Config File

The config file is a json file that must be visible at runtime from /extra/config.json inside the container. It's modified from https://github.com/twosixlabs-dart/python-kafka-consumer/blob/master/pyconsumer/resources/env/test.json. An example is provided as follows:

Note: Do not change unannotated fields in the example unless you really know what you're doing.

{
  "kafka.bootstrap.servers": "str,required: example is wm-ingest-pipeline-streaming-1.prod.dart.worldmodelers.com:9093",
  "auth": {
    "username": "str,required. If auth is not required, please delete auth dict directly",
    "password": "str,required"
  },
  "app": {
    "id": "hume",
    "auto_offset_reset": "earliest",
    "enable_auto_commit": false
  },
  "topic": {
    "from": "str,required: topic to listen to from kafka"
  },
  "CDR_retrieval": "str,required: example is https://wm-ingest-pipeline-rest-1.prod.dart.worldmodelers.com/dart/api/v1/cdrs",
  "DART_upload": "str,required: example is https://wm-ingest-pipeline-rest-1.prod.dart.worldmodelers.com/dart/api/v1/readers",
  "Ontology_retrieval": "str,required: example is https://wm-ingest-pipeline-rest-1.prod.dart.worldmodelers.com/dart/api/v1/ontologies",
  "DART_upload_labels": [
    "from_docker"
  ],
  "hume.domain": "WM",
  "hume.num_of_vcpus": "int, required: Number of cpu cores available to you. It at least needs to be 2, and please only include number of physical cores instead of SMT cores.",
  "hume.tmp_dir": "required,str: a bind point for sharing data in between your local system and docker instance",
  "hume.streaming.mini_batch_size": "int, required: For streaming mode, this deterimines the size of a mini batch (which is used for improved efficiency)",
  "hume.streaming.mini_batch_wait_time": "int, required: For streaming mode, even if we received fewer documents than mini_batch_size, the system will start processing after mini_batch_wait_time seconds.",
  "hume.streaming.keep_pipeline_data": "bool, optional: When enabled all intermediate files during processing will be preserved. This is mainly a debugging switch for developer.",
  "hume.streaming.skip_processed": "bool, optional: When enabled, we'll keep a record of processed (document_id, ontology_id) tuples in a file under hume.tmp_dir/processed.log and if we see the same kafka reading request, we'll ignore it. ",
  "hume.use_regrounding_cache": "bool, optional: When enabled, the system will save intermediate files of processing result into a directory that if we see the same document later, we can skip certain processing steps and only kick off regrounding pipeline.",
  "hume.regrounding_cache_path": "str, required when hume.use_regrounding_cache is true: a persist directory for hosting regrounding cache. Ideally to be a persist storage that can be shared in between docker instances. Also please don't reuse grounding cache from OIAD.",
  "hume.cdr_cache_dir": "str or null, optional: When specified, Hume will use the path from inside docker to cache CDRs locally so when new reading requests come as long as it's found in local cache, it won't goto API for fetching it again"
}

Please copy the config file to runtime/config.json

To Run the Hume System

Run Hume with the following command:

docker run -it -v runtime:/extra docker.io/wmbbn/hume:R2022_03_21 /usr/local/envs/py3-jni/bin/python3 /wm_rootfs/git/Hume/src/python/dart_integration/streaming_processing.py

Upon succeessful deployment, any resulting json_ld CAGs will be uploaded back to DART at DART_upload and the system will continue to listen for kafka messages until you stop it.