Add OpenSearch to the toolchain #4

Jadw1 · 2024-11-22T17:32:27Z

As we want to use OpenSearch database as prototype implementation of Approximate Nearest Neighbour Search, we need to be able to run OpenSearch node/cluster in Scylla testing environment.
There is no need to run the external database directly on our machines, we can use docker to do it simply and quickly. [1][2]

Issue requirements:

add OpenSearch python client dependency [3]
There should be a script/docker-compose to quickly setup OpenSearch locally to use it in the development environment.
There is no need to run OpenSearch in every Scylla's unit test, therefore you should add a new test suite in test/ and handle it in test.py. All tests in the new suite should have access to running OpenSearch instance.
Add a simple test in the new suite to verify that OpenSearch in running correctly.

[1] https://opensearch.org/docs/latest/install-and-configure/install-opensearch/docker/
[2] https://hub.docker.com/r/opensearchproject/opensearch
[3] https://opensearch.org/docs/latest/clients/python-low-level/

Some references for extending `test.py` capabilities

I imagine you can create classes similar to TopologyTest and TopologyTestSuite (probably your classes can inherit after them?). In addition to topology tests, yours should also start OpenSearch node/cluster.

Next think to do is to create library to manage the lifecycle of the OpenSearch databse. It can be similar to ManagerClient but it can be much simpler (minimal set of operations is: start, stop, clear, execute)

The text was updated successfully, but these errors were encountered:

Jadw1 · 2024-12-06T10:37:05Z

Next think to do is to create library to manage the lifecycle of the OpenSearch databse. It can be similar to ManagerClient but it can be much simpler (minimal set of operations is: start, stop, clear, execute)

There are 3 possible way to manage the lifecycle of the OpenSearch:

Copy design of Scylla's ManagerClient and allow to run new instance of Scylla per each test run. Then the lifecycle can be manage from a test.
Have one instance per whole test.py run and clear the cluster/restore it to fresh state after each run.
Have one instance per whole test.py run and create Scylla keyspace counterpart in OpenSearch per test run.

*in (2) and (3) you shouldn't be able to stop/start/clear the whole OpenSearch cluster from a test

Because we're using OpenSearch only as an index, I think (2) and (3) are possible and there won't be any unwanted interaction between test runs, even if the are running in parallel.
Ideally, you should choose the method that has the lowest time overhead. But it's totally fine to take into account time needed to implement it, and can do the easiest way. If it become to painful, we can improve it later.

smoczy123 assigned smoczy123 and Balwancia Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenSearch to the toolchain #4

Add OpenSearch to the toolchain #4

Jadw1 commented Nov 22, 2024 •

edited

Loading

Jadw1 commented Dec 6, 2024 •

edited

Loading

Add OpenSearch to the toolchain #4

Add OpenSearch to the toolchain #4

Comments

Jadw1 commented Nov 22, 2024 • edited Loading

Some references for extending test.py capabilities

Jadw1 commented Dec 6, 2024 • edited Loading

Jadw1 commented Nov 22, 2024 •

edited

Loading

Some references for extending `test.py` capabilities

Jadw1 commented Dec 6, 2024 •

edited

Loading