https://opensemanticsearch.org
Open Semantic Search is:
- an integrated search server,
- ETL framework for document processing (crawling, text extraction, text analysis, named entity recognition and OCR for images and embedded images in PDF),
- search user interfaces, text mining, text analytics and search apps for fulltext search, faceted search, exploratory search and knowledge graph search
This README.md is documentation for software developers.
The documentation for users and admins is included in the software packages/images and linked in the search user interface (Menu "Help").
You can find the documentation of the search engine architecture in docs/doc/modules/README.md
.
This integrated HTML documentation is generated by the static site generator MkDocs with the config file mkdocs.yml
.
The source of the documentation (Markdown format) and the charts (mermaid format) is editable in the directory docs
.
How to build the deb package for installation on Debian or Ubuntu server or the docker images for running in Docker containers:
Clone the repository including the dependencies:
git clone --recurse-submodules --remote-submodules https://github.com/opensemanticsearch/open-semantic-search.git
cd open-semantic-search
To build a deb
package for Debian GNU/Linux or Ubuntu Linux, call the build script build-deb
as user root (change user by su
or sudo su
):
./build-deb
How to build an Open Semantic Desktop Search Appliance for VirtualBox is documented in
src/open-semantic-desktop-search/README.md
.
Build the Docker images using the default docker-compose config docker-compose.yml
:
docker-compose build
After these builds all the Docker images/dependencies/services can be started together by docker-compose with the config file docker-compose.yml
.
You can start the whole environment by running:
docker-compose up
which will expose the web user interface on port 8080
.
You can browse the Open Semantic Search user interface in your favourite browser by this URL:
http://localhost:8080/search/
For CI/CD there are some different automated tests:
Since the submodule Open Semantic ETL uses and needs different powerful services like Solr, spaCy-services or Tika-Server by HTTP and REST-API, many automated tests run as integration tests within the docker-compose environment configured in docker-compose.etl.test.yml
so these services are available while running the unittests and integration tests.
docker-compose -f docker-compose.etl.test.yml build
docker-compose -f docker-compose.etl.test.yml up
Some automated integration tests and end-to-end (E2E) tests within a web browser controlled by the browser automation framework Playwright and the node.js / javascript based test framework JEST.
You can extend the automated tests in test/test.js
They run by the docker image Dockerfile-test
and need the services of the docker-compose environment docker-compose.test.yml
:
docker-compose -f docker-compose.test.yml build
docker-compose -f docker-compose.test.yml up
Dependencies are resolved automatically by building or by installation of the Debian or Ubuntu packages or by building the Docker images.
Documentation on this dependencies which may help debugging dependency hell issues or installations in other environments:
Dependencies on other Git repositories / submodules of components like Open Semantic ETL are defined in the Git config file .gitmodules
The submodules will be checked out automatically to the subdirectory src
, if you check out this repository by git in recursive mode.
The submodules src/tika-server.deb
and src/solr.deb
need the JAR of Apache Tika-Server and Apache Solr.
If not there, they will be downloaded from Apache Software Foundation by wget in the build-deb
script or the submodules Dockerfile
.
Dependencies of tools and libraries, which are available in the Debian or Ubuntu package repositories, are defined in the section Depends
of the deb package config file DEBIAN/control
Dependencies of Python libraries which are not available as packages of the Linux distribution but in Python Package Index (PyPI), are defined in
src/open-semantic-etl/src/opensemanticetl/requirements.txt
This dependencies will be installed automatically on installation of the Debian/Ubuntu packages by the DEBIAN/postinst
script of the Debian/Ubuntu packages or by docker build configured by Dockerfile
by
pip3 install -r /usr/lib/python3/dist-packages/opensemanticetl/requirements.txt
Most contributors are not shown by the Github user interface as "Contributors" of this repository, since this main repository is structured by Git submodules like Open Semantic ETL and other modules, which are managed in separated Git(hub) repositories.
So thanks to all (current and former) contributors:
- Markus Mandalka (@mandalka)
- @g-braeunlich
- @maehr
- @sdinten
- @wsldankers
- @rivimey
- @rbussche
- @mosea3
- @bhelou
- @hpiedcoq
- @andreclinio
- @agharbeia
- @ciyer
- @davidshq ...
Feel free to extend if you contributed/supported/sponsored in different forms.