Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate unit tests for index types #580

Merged
merged 6 commits into from
Jan 11, 2023
Merged

Separate unit tests for index types #580

merged 6 commits into from
Jan 11, 2023

Conversation

jsmassa
Copy link
Collaborator

@jsmassa jsmassa commented Nov 21, 2022

Addresses #520

Turns all default values for a database into dynamic vars, so they can be set in tests.edn to test specific information.

The idea is to use the default configuration for all tests, so each test has to be written only once.

Problematic with this strategy are

  • :attribute-refs? because the entity IDs are shifted and depending on the function used either keyword or eids are returned, and
  • :schema-on-write because it needs a schema to be transacted, which means that here entity IDs are also shifted

For these cases, we would need some kind of translation, similar to the way it is done in the test/datahike/test/attribute-refs test namespaces. If possible, I would like to get rid of these test namespaces as they only contain a subset of the tests and are cumbersome to maintain.
Also, we would need to rewrite a lot of the tests to not check for the entity id but for example run queries as it is already done in some updated tests

Unproblematic, however, are config options such like

  • :store-backend and
  • :keep-history?

Please, tell me if this strategy makes sense or if there is some problem with it that I have overlooked.

I am also interested to hear which config options you think should be tested, or which combinations you think would make sense.

@jsmassa jsmassa changed the title separate index type tests Separate unit tests for index types Nov 21, 2022
@jsmassa jsmassa self-assigned this Nov 21, 2022
tests.edn Show resolved Hide resolved
TimoKramer
TimoKramer previously approved these changes Nov 22, 2022
.circleci/config.yml Outdated Show resolved Hide resolved
.circleci/config.yml Outdated Show resolved Hide resolved
tests.edn Show resolved Hide resolved
tests.edn Show resolved Hide resolved
@jsmassa jsmassa marked this pull request as ready for review December 22, 2022 12:30
@whilo
Copy link
Member

whilo commented Dec 25, 2022

The strategy makes sense and normalizing the tests like this must be the key strategy to avoid duplication and arbitrary symmetry breaking (i.e. having different tests for different configurations). Thanks for addressing this @jsmassa !

I am not familiar enough with the attribute-ref mismatches. For schema settings I would suggest to always assume the stricter requirements (schema-on-write) and then potentially skip it automatically for schema-on-read. This can be done with a function in the test namespaces that dispatches on the schema configuration. Do you think this would work? Would this also be possible for the attribute-refs?

Also, is there still a reason to have attribute-refs only optionally? Edit: Yes, of course, because they depend on schema-on-write. But does this mean that when we always provide a schema all tests can be run with attribute-refs?

@whilo
Copy link
Member

whilo commented Dec 27, 2022

A related serious problem I just stumbled over in #332 is that many tests use the same database, which worked so far for the memory database because the connections were not (properly) shared. It would be good if the db :id for the memory database could be automatically retrieved from the test name. Can we address this here as well?

Edit: I have solved this there for now by manually duplicating the connections for the tests.

@jsmassa
Copy link
Collaborator Author

jsmassa commented Jan 5, 2023

The schema isn't really a problem. Yes, we can always add a schema, but we could also only add it when required. That doesn't make much of a difference, I think.
The most problematic thing is still the entity ID which is being checked in a lot of the tests, so there is heaps of rewriting to do. Entity IDs for user entities are shifted using attribute references because system attributes like :db/ident or :db/valueType also need an entity ID and those are added to the database before the first user entities can be transacted.

1 similar comment
@jsmassa
Copy link
Collaborator Author

jsmassa commented Jan 5, 2023

The schema isn't really a problem. Yes, we can always add a schema, but we could also only add it when required. That doesn't make much of a difference, I think.
The most problematic thing is still the entity ID which is being checked in a lot of the tests, so there is heaps of rewriting to do. Entity IDs for user entities are shifted using attribute references because system attributes like :db/ident or :db/valueType also need an entity ID and those are added to the database before the first user entities can be transacted.

@whilo
Copy link
Member

whilo commented Jan 8, 2023

Could we just grab max-eid at the beginning of the tests and use this as the offset?

whilo
whilo previously approved these changes Jan 8, 2023
Copy link
Member

@whilo whilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice work!

@jsmassa
Copy link
Collaborator Author

jsmassa commented Jan 10, 2023

Could we just grab max-eid at the beginning of the tests and use this as the offset?

This is approximately how I did it for the attribute reference tests. Some of the tests used specific input datoms though in addition to the expected output datoms, so the input had to be manipulated as well. This lead me to write some helper functions to shift collections of datoms according to the amount of the system entity datoms. So far, there are also separate functions to shift the value as well which is needed for attributes with reference value type. Of course, this can be made simpler with a more complex helper function detecting when the value is a reference and has to be shifted.

In some instances there might not be a way around using functions to shift the entity IDs to the correct values, but I would say that in the end, a database user will not care about which ID an entity gets allocated inside the database or if the attributes are saved as references or keywords. This is why I think that, ideally, we should avoid checking those details for high-level functions like transact, q or pull.

I would say that the strategies I used for the attribute reference tests in test/datahike/test/attribute_refs/ are ok and - with some improvements - can be used similarly to refactor most tests to work with all configurations. I would hope though that we can find a reformulation of a lot of the tests that make them more focused on the actual fact they are challenging and better readable than they would be with a habitual shifting of numbers.

@jsmassa jsmassa requested review from whilo and TimoKramer January 11, 2023 13:18
@whilo
Copy link
Member

whilo commented Jan 11, 2023

Could we just grab max-eid at the beginning of the tests and use this as the offset?

This is approximately how I did it for the attribute reference tests. Some of the tests used specific input datoms though in addition to the expected output datoms, so the input had to be manipulated as well. This lead me to write some helper functions to shift collections of datoms according to the amount of the system entity datoms. So far, there are also separate functions to shift the value as well which is needed for attributes with reference value type. Of course, this can be made simpler with a more complex helper function detecting when the value is a reference and has to be shifted.

In some instances there might not be a way around using functions to shift the entity IDs to the correct values, but I would say that in the end, a database user will not care about which ID an entity gets allocated inside the database or if the attributes are saved as references or keywords. This is why I think that, ideally, we should avoid checking those details for high-level functions like transact, q or pull.

I would say that the strategies I used for the attribute reference tests in test/datahike/test/attribute_refs/ are ok and - with some improvements - can be used similarly to refactor most tests to work with all configurations. I would hope though that we can find a reformulation of a lot of the tests that make them more focused on the actual fact they are challenging and better readable than they would be with a habitual shifting of numbers.

I agree that we should not overfit to specific eids and rewrite tests to not rely on them in general is the right way forward.

Copy link
Member

@whilo whilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jsmassa jsmassa merged commit bdd20b6 into main Jan 11, 2023
@jsmassa jsmassa deleted the 520-config-tests branch January 11, 2023 21:30
whilo pushed a commit that referenced this pull request Mar 5, 2023
* separate index type tests

* Do not use tools/unittest

* Remove focused flag

* Remove unnecessary ssh bits
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants