Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytests overhaul #569

Merged
merged 16 commits into from
Jan 9, 2023
Merged

Pytests overhaul #569

merged 16 commits into from
Jan 9, 2023

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Dec 17, 2022

Big testing overhaul

image

A lot of new features that we introduce for replication, snapshotting, compression, serialization and cancellation don't have proper tests. Many of those features are also hard to test in general, under load and for corner cases.

So far pytests have made a good job in uncovering bugs, but they used only simple commands and a single database.

The new DflySeeder can issue command sequences that converge to some targeted number of keys and oscilliate upon reaching it. It supports all main data types (strings, lists, sets, hsets, zsets) and 10 incremental commands (but can be extended to any number).

It allows creating captures on the master instance (that is expected to work faultless) and then comparing them to the state on different instances, showing any some changes if needed.

Its designed to be efficient (fully async, parallel work on multiple dbs, pipelined requests) so python's performance is not the bottleneck.

Example:

# Create seeder with target number of keys (100k) of specified size (200) and work on 5 dbs
seeder = new DflySeeder(keys=100_000, value_size=200, dbcount=5)

# Stop when we are in 5% of target number of keys (i.e. above 95_000)
# because its probabilistic we might never reach exactly 100_000
await seeder.run(target_deviation=0.05) 

# Run 3 iterations (full batches) in stable state
await seeder.run(target_times=3)

# Create a capture
capture = await seeder.capture()

# Compare capture to replica on port 1112
assert await seeder.compare(capture, port=1112)

fixes #530

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@@ -0,0 +1,245 @@
import asyncio
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add here what it does?

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg force-pushed the pytest-generator branch 2 times, most recently from 412e627 to c158593 Compare December 24, 2022 17:47
@dranikpg dranikpg changed the title EXPERIMENT: Pytest data generator Pytests overhaul Dec 24, 2022
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@@ -99,6 +105,16 @@ def create(self, **kwargs) -> DflyInstance:
self.instances.append(instance)
return instance

def start_all(self, instances):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use python test containers and not run them as sub processes (if I understand correctly, this is what it does here, spins DFs sub processes). They reject my PR for supporting DF, but you can start a DF container with the right parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconvenient (increased memory requirements) and requires re-building a container just for testing some change

assert False, str(e)
def gen_test_data():
for i in range(10):
yield "key-"+str(i), "value-"+str(i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick - yield f"key-{i}" f"value-{i}"
BTW why did you removed the "gen_test_data"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because its now the only place where it would be used... Maybe I should keep it though

def gen_test_data(n, start=0, seed=None):
for i in range(start, n):
yield "k-"+str(i), "v-"+str(i) + ("-"+str(seed) if seed else "")
async def wait_available_async(client: aioredis.Redis):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at what cases this is useful? I mean from what I know await will block the current task until the awaited function will return. So when the iteration will take place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for waiting until an instance becomes available for queries (i.e. exits the LOADING state). Its indeed supposed to block all this time, because the test has nothing else to to except wait for the instance to be available for comparing data

client = aioredis.Redis(port=port, db=target_db)
return DataCapture(await self._capture_entries(client, keys))

async def compare(self, initial_capture, port=6379):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have self.port, why not making this defaulted to self.port and not to 6379?

Comment on lines 158 to 164
('LPOP {k}', ValueType.LIST),
#('SADD {k} {val}', ValueType.SET),
#('SPOP {k}', ValueType.SET),
('HSETNX {k} v0 {val}', ValueType.HSET),
('HINCRBY {k} v1 1', ValueType.HSET),
#('ZPOPMIN {k} 1', ValueType.ZSET),
#('ZADD {k} 0 {val}', ValueType.ZSET)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stable state currently has issues with set and zset commands (for example spop pops different values), so I commented them out for now to let the tests run

@@ -99,6 +105,16 @@ def create(self, **kwargs) -> DflyInstance:
self.instances.append(instance)
return instance

def start_all(self, instances):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconvenient (increased memory requirements) and requires re-building a container just for testing some change

assert False, str(e)
def gen_test_data():
for i in range(10):
yield "key-"+str(i), "value-"+str(i)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because its now the only place where it would be used... Maybe I should keep it though

def gen_test_data(n, start=0, seed=None):
for i in range(start, n):
yield "k-"+str(i), "v-"+str(i) + ("-"+str(seed) if seed else "")
async def wait_available_async(client: aioredis.Redis):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for waiting until an instance becomes available for queries (i.e. exits the LOADING state). Its indeed supposed to block all this time, because the test has nothing else to to except wait for the instance to be available for comparing data

tests/dragonfly/utility.py Show resolved Hide resolved
@romange
Copy link
Collaborator

romange commented Dec 31, 2022 via email

Signed-off-by: Vladislav <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg marked this pull request as ready for review January 9, 2023 07:42
@dranikpg dranikpg requested review from romange and adiholden January 9, 2023 09:33
@dranikpg
Copy link
Contributor Author

dranikpg commented Jan 9, 2023

They seem to pass now consistently

It takes about 3 min to run them fully on my machine. I reduced the tests a little because if it fails, it usually tends to do so already on the medium sized ones. Otherwise testing will take eternity 😄

Some parts are commented out - those are the commands we don't support. It works on Redis though and once we support them, we'll just uncomment the parts (like SPOP for example)

romange
romange previously approved these changes Jan 9, 2023
Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vlad, it's an amazing addition to our tests and to our testing methodology!
You really raise the bar for our testing quality. I gave you a few readability comments.

tests/README.md Outdated
@@ -15,6 +15,8 @@ You can override the location of the binary using `DRAGONFLY_PATH` environment v
### Custom arguments

- use `--gdb` to start all instances inside gdb.
- use `--df arg=val` to pass custom arguments to all dragonfly instances.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide a full command instead of a single option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does full command mean? You can use it multiple times like --df logtostdout --df proactor_threads=2, I'll add this info

tests/dragonfly/utility.py Outdated Show resolved Hide resolved
tests/dragonfly/utility.py Outdated Show resolved Hide resolved
tests/dragonfly/utility.py Outdated Show resolved Hide resolved
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg
Copy link
Contributor Author

dranikpg commented Jan 9, 2023

Pushed fixes to you comments and some more last minute ones

@dranikpg dranikpg requested a review from romange January 9, 2023 13:01
@dranikpg dranikpg merged commit 5ef8454 into dragonflydb:main Jan 9, 2023
@dranikpg dranikpg deleted the pytest-generator branch February 27, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Testing data generator for pytests
3 participants