Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: collection size #3844

Merged
merged 1 commit into from
Oct 8, 2024
Merged

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Oct 1, 2024

Fixes #3840

Added element_size_ratio parameter to both static and dynamic seeder

data size ~ data volume, element size = data size ^ ratio, element count = data volume / element size

@dranikpg dranikpg marked this pull request as ready for review October 1, 2024 17:09
@dranikpg dranikpg requested a review from adiholden October 1, 2024 17:09

keys = await async_client.keys()
assert (await async_client.llen(keys[0])) == 1
assert len(await async_client.lpop(keys[0])) == 10_000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question:

if element_size_ratio=1/2 then data_size is 10k ** (1/2) == 100 and variance is 1. So dsize=100. My question is, why do we xor the dsize ? That is: LG_funcs.esize = math.ceil(dsize ^ delement_ratio) which is how many elements a given type should contain (that is llen(keys[0])).

So to sumamrize:

  1. Why do we xor this ? LG_funcs.esize = math.ceil(dsize ^ delement_ratio)
  2. Why do we express the number of elements per set via all of this? Can't we just be specific on how many elements we want of a given size each ?

There is something I am missing so I am asking here 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's not xor, it's power 🙃
  2. If it's too difficult and fragile, no one will use it properly. It's just a 0/1 slider: 0 means smallest possible elements, 1 means biggest possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1. power is ** not caret ^ which is xor ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brainfarted. It's LUA not python 😮‍💨 🤦

Now it all makes sense. Ignore my blindness....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol @dranikpg we make the comment at exact same time. I did not read your link but somehow I noticed and then you replied at the exact same moment I figured and reply 🤣

kostasrim
kostasrim previously approved these changes Oct 3, 2024
Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM maybe wait for Adi?

@@ -79,13 +79,15 @@ def __init__(
data_size=100,
variance=5,
samples=10,
element_size_ratio=1 / 3,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead use element count?
When writing a tests case I want to define the total size of the datastructor and the number of elements it will have , each element size will be total size / element count
I find element count param much more intuitive than element size ratio which defines the element size which will be total size ^ element size ratio

@dranikpg dranikpg force-pushed the seeder-element-ratio branch from 7ce25b5 to 76fc7aa Compare October 7, 2024 17:06
@dranikpg dranikpg changed the title chore: element ratio parameter for seeder chore: collection size Oct 7, 2024
@dranikpg
Copy link
Contributor Author

dranikpg commented Oct 7, 2024

Updated to collection size parameter 🎩

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg force-pushed the seeder-element-ratio branch from 76fc7aa to 9eda2e4 Compare October 8, 2024 11:09
@dranikpg dranikpg merged commit 786c9cd into dragonflydb:main Oct 8, 2024
9 checks passed
@dranikpg dranikpg deleted the seeder-element-ratio branch October 20, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pytests : Improve seeder adding element count for entries like set/hash/list
3 participants