The Evaluation Workbench (EWB) API Dockers comprise a multi-container application that includes essential components like the Solr cluster and REST APIs for Topic Modeling, Inference, and Classification services. This multi-container application is orchestrated using a docker-compose script, connecting all services through the ewb-net
network.
This service comprises a RESTful API that utilizes the Solr search engine for data storage and retrieval. It enables the indexing of logical corpora and associated topic models, formatted according to the specifications provided by the topicmodeler
. Additionally, it facilitates information retrieval through a set of queries.
This system relies on the following services:
-
ewb-tm: This service hosts the Topic Modeling's RESTful API server. It is constructed using the Dockerfile located in the
ewb-tm
directory. It has dependencies on the Solr service and requires access to the following mounted volumes:./data/source
,./data/inference
, and./ewb_config
. These volumes are crucial for accessing necessary data from the ITMT (the project folder containing the topic models) and for delivering results obtained through the EWB or generated via the Inference service. Theewb_config
volume also houses some important configuration variables. -
ewb-solr: This service operates the Solr search engine. It employs the official Solr image from Docker Hub and relies on the zoo service. The service mounts several volumes, including:
- The Solr data directory (
./db/data/solr:/var/solr
) for data persistence. - Two custom Solr plugins:
- solr-ewb-jensen-shanon-distance-plugin for utilizing the Jensen–Shannon divergence as a vector scoring method.
- solr-ewb-jensen-sims for retrieving documents with similarities within a specified range.
- The Solr configuration directory (
./solr_config:/opt/solr/server/solr
) to access the specific Solr schemas for EWB.
- The Solr data directory (
-
ewb-solr-initializer: This service is temporary and serves the sole purpose of initializing the mounted volume
/db/data
with the necessary permissions required by Solr. -
ewb-zoo: This service runs Zookeeper, which is essential for Solr to coordinate cluster nodes. It employs the official zookeeper image and mounts two volumes for data and logs.
-
ewb-solr-config: This service handles Solr configuration. It is constructed using the Dockerfile located in the
solr_config
directory. This service has dependencies on the Solr and zoo services and mounts the Docker socket and thebash_scripts
directory, which contains a script for initializing the Solr configuration for EWB.
This service serves as a Topic Model Inferencer, constructed using the Dockerfile found in the ewb-inferencer
directory. It relies on access to mounted volumes at ./data/source
, ./data/inference
, and ./ewb_config
.
Its primary purpose is to be used internally by the Topic Modeling Service, although it can also function as a standalone component.
This service serves as an inference system for hierarchical classification, built on top of the clf-inference-intelcomp
library, that allows to classify texts based on a given hierarchy of language models. It relies on access to mounted volumes at ./data/classifier
and ./ewb_config
.
Python requirements files (ewb-tm
, ewb-inferencer
and ewb-classifier
).
Note that the requirements are directly installed in their respective services at the building-up time.
A sample corpus and model can be downloaded from here.