Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenSearch to the toolchain #4

Open
Jadw1 opened this issue Nov 22, 2024 · 1 comment
Open

Add OpenSearch to the toolchain #4

Jadw1 opened this issue Nov 22, 2024 · 1 comment
Assignees

Comments

@Jadw1
Copy link

Jadw1 commented Nov 22, 2024

As we want to use OpenSearch database as prototype implementation of Approximate Nearest Neighbour Search, we need to be able to run OpenSearch node/cluster in Scylla testing environment.
There is no need to run the external database directly on our machines, we can use docker to do it simply and quickly. [1][2]

Issue requirements:

  • add OpenSearch python client dependency [3]
  • There should be a script/docker-compose to quickly setup OpenSearch locally to use it in the development environment.
  • There is no need to run OpenSearch in every Scylla's unit test, therefore you should add a new test suite in test/ and handle it in test.py. All tests in the new suite should have access to running OpenSearch instance.
  • Add a simple test in the new suite to verify that OpenSearch in running correctly.

[1] https://opensearch.org/docs/latest/install-and-configure/install-opensearch/docker/
[2] https://hub.docker.com/r/opensearchproject/opensearch
[3] https://opensearch.org/docs/latest/clients/python-low-level/


Some references for extending test.py capabilities

I imagine you can create classes similar to TopologyTest and TopologyTestSuite (probably your classes can inherit after them?). In addition to topology tests, yours should also start OpenSearch node/cluster.

Next think to do is to create library to manage the lifecycle of the OpenSearch databse. It can be similar to ManagerClient but it can be much simpler (minimal set of operations is: start, stop, clear, execute)

@Jadw1
Copy link
Author

Jadw1 commented Dec 6, 2024

Next think to do is to create library to manage the lifecycle of the OpenSearch databse. It can be similar to ManagerClient but it can be much simpler (minimal set of operations is: start, stop, clear, execute)

There are 3 possible way to manage the lifecycle of the OpenSearch:

  1. Copy design of Scylla's ManagerClient and allow to run new instance of Scylla per each test run. Then the lifecycle can be manage from a test.
  2. Have one instance per whole test.py run and clear the cluster/restore it to fresh state after each run.
  3. Have one instance per whole test.py run and create Scylla keyspace counterpart in OpenSearch per test run.

*in (2) and (3) you shouldn't be able to stop/start/clear the whole OpenSearch cluster from a test

Because we're using OpenSearch only as an index, I think (2) and (3) are possible and there won't be any unwanted interaction between test runs, even if the are running in parallel.
Ideally, you should choose the method that has the lowest time overhead. But it's totally fine to take into account time needed to implement it, and can do the easiest way. If it become to painful, we can improve it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants