Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding ability to include conditions along with node and relation types in the neo4j staleness removal task #1464

Conversation

kristenarmes
Copy link
Contributor

@kristenarmes kristenarmes commented Aug 31, 2021

Signed-off-by: Kristen Armes karmes@lyft.com

Summary of Changes

Currently, the neo4j staleness removal task can only remove stale nodes and/or relations based on specified types. This new functionality allows users to specify a condition along with the type. To do this, you can specify the inputs to be a list of TargetWithCondition objects that each define a target type and a condition. Only stale nodes or relations of that type and that meet the condition will be removed when using this type of input.

For example, you could add a condition that you only want to remove stale nodes of type Table that have some kind of relation to a node of type Column, rather than deleting all stale Table nodes.

This change is backwards compatible so users may continue to only specify the type without conditions for deleting stale nodes or relations.

Tests

The tests have been updated in test_neo4j_staleness_removal_task.py along with the addition of two new tests: test_validation_statement_with_target_condition and test_delete_statement_with_target_condition. These test the validation and deletion of nodes and relations when target conditions are included.

Documentation

A description of the feature has been added to databuilder/README.md to describe what the feature does and how to use it.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

…es in the neo4j staleness removal task

Signed-off-by: Kristen Armes <karmes@lyft.com>
@kristenarmes kristenarmes requested a review from a team as a code owner August 31, 2021 02:32
@boring-cyborg boring-cyborg bot added area:databuilder From databuilder folder area:all Related to all the project area:docs labels Aug 31, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 31, 2021

Congratulations on your first Pull Request and welcome to Amundsen community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/amundsen-io/amundsen/blob/main/CONTRIBUTING.md)

Copy link

@dkunitsk dkunitsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Pretty much all my comments are nits.

databuilder/README.md Outdated Show resolved Hide resolved
databuilder/README.md Show resolved Hide resolved
Signed-off-by: Kristen Armes <karmes@lyft.com>
@@ -64,6 +70,32 @@ class Neo4jStalenessRemovalTask(Task):

"""

delete_stale_nodes_statement = textwrap.dedent("""
MATCH (target:{{type}})
{staleness_condition}{{extra_condition}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an we make staleness_condition one of {{ as well so it has to be named when formatting. Just makes things more uniform IMO. Same points about the other statements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed about the uniformity but the reason I put the single brace around staleness_condition (and the reason it was single braces before it was named) is bc each time .format is called on the statement it removes one set of curly braces, so since we have the _decorate_staleness function that handles populating the staleness condition first, and then after that the target type and extra_condition are left with a single {} and are populated in each of the other functions after. Do you think this is alright, or an alternative would be to bring all the formatting into the _decorate_staleness function if that is preferable?

extra_condition=stale_extra_condition),
param_dict={MARKER_VAR_NAME: self.marker})

total_record_value = next(iter(total_records), None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you just return the count from the statement instead of the tuple, i believe you won't have to do this weird thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _execute_cypher_query function returns a Iterable[Dict[str, Any]] type which is the reason for this, the same next iter thing is being done in _batch_delete when only a count is returned (although I didn't realize it when I wrote this part)

Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
@dkunitsk dkunitsk merged commit 1b25881 into amundsen-io:main Sep 3, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 3, 2021

Awesome work, congrats on your first merged pull request!

@kristenarmes kristenarmes deleted the optional-conditions-for-staleness-removal-task branch September 3, 2021 18:26
ozandogrultan pushed a commit to deliveryhero/amundsen that referenced this pull request Apr 28, 2022
…on types in the neo4j staleness removal task (amundsen-io#1464)

* Adding ability to include conditions along with node and relation types in the neo4j staleness removal task

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing up query templates

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing type issue in test

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing formatting requirements

Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Ozan Dogrultan <ozan.dogrultan@deliveryhero.com>
zacr pushed a commit to SaltIO/amundsen that referenced this pull request May 13, 2022
…on types in the neo4j staleness removal task (amundsen-io#1464)

* Adding ability to include conditions along with node and relation types in the neo4j staleness removal task

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing up query templates

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing type issue in test

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing formatting requirements

Signed-off-by: Kristen Armes <karmes@lyft.com>
hansadriaans pushed a commit to DataChefHQ/amundsen that referenced this pull request Jun 30, 2022
…on types in the neo4j staleness removal task (amundsen-io#1464)

* Adding ability to include conditions along with node and relation types in the neo4j staleness removal task

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing up query templates

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing type issue in test

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing formatting requirements

Signed-off-by: Kristen Armes <karmes@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:all Related to all the project area:databuilder From databuilder folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants