Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In memory label propagation community detection #136

Merged
merged 6 commits into from
Sep 23, 2024
Merged

In memory label propagation community detection #136

merged 6 commits into from
Sep 23, 2024

Conversation

prasmussen15
Copy link
Collaborator

@prasmussen15 prasmussen15 commented Sep 21, 2024

Important

Replaces Neo4j GraphDataScience Plugin with in-memory label propagation for community detection, updating functions and tests accordingly.

  • Community Detection: Replaces Neo4j GraphDataScience Plugin with in-memory label propagation in get_community_clusters() in community_operations.py. Introduces Neighbor class to support label propagation.
  • Function Updates: Updates build_communities() in graphiti.py to use the modified get_community_clusters().
  • Testing: Updates test_graphiti_int.py to call build_communities() during tests.
  • Misc: Removes Neo4j GraphDataScience Plugin requirement from README.md.

This description was created by Ellipsis for 3d60056. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 2c524a6 in 27 seconds

More details
  • Looked at 179 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. graphiti_core/utils/maintenance/community_operations.py:108
  • Draft comment:
    The line new_community = max(community_lst[0][1], curr_community) is incorrect. It should select the community with the highest count.
            new_community = community_lst[0][1]
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The current code logic seems to be intentionally using max to compare the current community with the one having the highest count. This might be a safeguard to prevent unnecessary changes. The suggestion to always select the community with the highest count might not align with the intended logic. Without strong evidence that the current logic is incorrect, the comment should be deleted.
    I might be missing the context of why the current logic uses max. There could be a specific reason for this choice that isn't immediately obvious.
    The use of max suggests a deliberate choice to compare values, which implies there might be a reason for this logic. Without clear evidence that this is incorrect, it's safer to assume the current logic is intentional.
    Delete the comment as it suggests a change without strong evidence that the current logic is incorrect.
2. tests/test_graphiti_int.py:77
  • Draft comment:
    Ensure graphiti is closed even if an exception occurs by using a try-finally block.
try:
    # existing code
finally:
    graphiti.close()
  • Reason this comment was not posted:
    Confidence changes required: 80%
    In test_graphiti_init, the graphiti instance is not closed if an exception occurs before reaching graphiti.close(). This can lead to resource leaks.
3. graphiti_core/utils/maintenance/community_operations.py:47
  • Draft comment:
    The logic for executing queries and processing results is repeated. Consider refactoring this into a separate function to adhere to the DRY principle.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The get_community_clusters function in graphiti_core/utils/maintenance/community_operations.py has a DRY violation. The logic for executing queries and processing results is repeated. This can be refactored into a separate function to adhere to the DRY principle.
4. graphiti_core/utils/maintenance/community_operations.py:48
  • Draft comment:
    The variable community_clusters is reassigned in the loop, which can be misleading. Consider renaming it to something more descriptive like all_community_clusters.
  • Reason this comment was not posted:
    Confidence changes required: 70%
    The get_community_clusters function in graphiti_core/utils/maintenance/community_operations.py has a misleading variable name community_clusters which is reassigned in the loop. This can cause confusion. It should be renamed to something more descriptive like all_community_clusters.
5. tests/test_graphiti_int.py:81
  • Draft comment:
    Avoid logging sensitive data directly. Consider sanitizing or removing sensitive information from the logs.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_ATJcZke4Hujh152c


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 3cb0a47 in 19 seconds

More details
  • Looked at 26 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. graphiti_core/utils/maintenance/community_operations.py:98
  • Draft comment:
    Redundant type hint for community_candidates. It's already a defaultdict(int).
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The type hint for community_candidates is redundant since it's already initialized as a defaultdict(int). This applies to other similar cases in the code.
2. graphiti_core/utils/maintenance/community_operations.py:95
  • Draft comment:
    Use idiomatic type hinting for defaultdict:
community_candidates = defaultdict(int)  # type: defaultdict[str, int]
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code in community_operations.py uses a non-idiomatic way to declare a defaultdict type hint. It should be corrected for better readability and adherence to idiomatic Python.

Workflow ID: wflow_qmHU4GSTNbBsHTCD


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 0a41df3 in 16 seconds

More details
  • Looked at 57 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 drafted comments based on config settings.
1. graphiti_core/utils/maintenance/community_operations.py:98
  • Draft comment:
    Ensure community_candidates uses int as the key type for community IDs.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The community_candidates dictionary should use int as the key type, not str, since it stores community IDs which are integers.
2. graphiti_core/utils/maintenance/community_operations.py:82
  • Draft comment:
    Simplify the comment to: "Label propagation algorithm: Assign each node its own community, update communities based on neighbors, and iterate until stable."
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The comment on the label_propagation function is too detailed and could be more concise.
3. graphiti_core/utils/maintenance/community_operations.py:81
  • Draft comment:
    The function label_propagation should have a concise comment explaining its purpose, e.g., "Perform label propagation to detect communities."
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The function label_propagation is responsible for the entire label propagation algorithm, which is a single responsibility. However, the comment is too detailed and should be concise.

Workflow ID: wflow_h2W7OITpR2jMpHAE


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Incremental review on 7a43c41 in 19 seconds

More details
  • Looked at 20 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. README.md:81
  • Draft comment:
    Consider updating the documentation to reflect the removal of the Neo4j GraphDataScience Plugin requirement if it's no longer needed.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The README file has been modified to remove a requirement for the Neo4j GraphDataScience Plugin. This change should be reflected in the documentation if it's no longer needed.

Workflow ID: wflow_Pe0YvK7k9SRbd448


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

README.md Show resolved Hide resolved
@danielchalef
Copy link
Member

danielchalef commented Sep 21, 2024

@prasmussen15 I see the following when running build_communities from this branch. let me know what else you need to debug this.

  File "/Users/danielchalef/dev/zep/chat-tenetmedia/ingest.py", line 408, in <module>
    asyncio.run(main())
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 217, in async_wrapper
    self._handle_exception(observation, e)
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 508, in _handle_exception
    raise e
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 215, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/ingest.py", line 388, in main
    await client.build_communities()
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 569, in build_communities
    community_nodes, community_edges = await build_communities(self.driver, self.llm_client)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/community_operations.py", line 199, in build_communities
    community_clusters = await get_community_clusters(driver)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/community_operations.py", line 70, in get_community_clusters
    cluster_uuids = label_propagation(projection)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/community_operations.py", line 109, in label_propagation
    new_community = max(community_lst[0][1], curr_community)
                        ~~~~~~~~~~~~~^^^
IndexError: list index out of range```

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 3d60056 in 20 seconds

More details
  • Looked at 99 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. graphiti_core/utils/maintenance/community_operations.py:154
  • Draft comment:
    Inconsistent indentation in function definitions. Ensure consistent indentation for better readability. This issue is also present in other function definitions in this file.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code has inconsistent indentation in function definitions, which can lead to readability issues.
2. graphiti_core/utils/maintenance/community_operations.py:196
  • Draft comment:
    Inconsistent indentation in function definitions. Ensure consistent indentation for better readability. This issue is also present in other function definitions in this file.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code has inconsistent indentation in function definitions, which can lead to readability issues.
3. graphiti_core/utils/maintenance/community_operations.py:223
  • Draft comment:
    Inconsistent indentation in function definitions. Ensure consistent indentation for better readability. This issue is also present in other function definitions in this file.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code has inconsistent indentation in function definitions, which can lead to readability issues.
4. graphiti_core/utils/maintenance/community_operations.py:284
  • Draft comment:
    Inconsistent indentation in function definitions. Ensure consistent indentation for better readability. This issue is also present in other function definitions in this file.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code has inconsistent indentation in function definitions, which can lead to readability issues.
5. examples/podcast/podcast_runner.py:79
  • Draft comment:
    Remove commented-out code if it's not needed to keep the codebase clean.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code contains commented-out sections which should be removed if not needed.
6. examples/podcast/podcast_runner.py:63
  • Draft comment:
    Avoid hardcoding credentials like neo4j_password. Use environment variables or a secure vault instead.
  • Reason this comment was not posted:
    Comment was on unchanged code.
7. graphiti_core/utils/maintenance/community_operations.py:153
  • Draft comment:
    Refactor build_community to adhere to the Single Responsibility Principle by separating concerns like summarization, node creation, and logging.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The function build_community is not following the single responsibility principle as it is doing multiple tasks like summarizing, creating nodes, and logging.
8. graphiti_core/utils/maintenance/community_operations.py:195
  • Draft comment:
    Refactor build_communities to adhere to the Single Responsibility Principle by separating concerns like building communities and separating nodes and edges.
  • Reason this comment was not posted:
    Confidence changes required: 80%
    The function build_communities is not following the single responsibility principle as it is doing multiple tasks like building communities and separating nodes and edges.

Workflow ID: wflow_l7gJS589vnJFOn4D


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@prasmussen15
Copy link
Collaborator Author

@prasmussen15 I see the following when running build_communities from this branch. let me know what else you need to debug this.

  File "/Users/danielchalef/dev/zep/chat-tenetmedia/ingest.py", line 408, in <module>
    asyncio.run(main())
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.6/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 217, in async_wrapper
    self._handle_exception(observation, e)
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 508, in _handle_exception
    raise e
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/langfuse/decorators/langfuse_decorator.py", line 215, in async_wrapper
    result = await func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/ingest.py", line 388, in main
    await client.build_communities()
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/graphiti.py", line 569, in build_communities
    community_nodes, community_edges = await build_communities(self.driver, self.llm_client)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/community_operations.py", line 199, in build_communities
    community_clusters = await get_community_clusters(driver)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/community_operations.py", line 70, in get_community_clusters
    cluster_uuids = label_propagation(projection)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/danielchalef/dev/zep/chat-tenetmedia/.venv/lib/python3.12/site-packages/graphiti_core/utils/maintenance/community_operations.py", line 109, in label_propagation
    new_community = max(community_lst[0][1], curr_community)
                        ~~~~~~~~~~~~~^^^
IndexError: list index out of range```

Fixed, this was caused when we tried to determine the community of an isolated node (they have no neighbors)

@prasmussen15 prasmussen15 merged commit 5506a01 into main Sep 23, 2024
6 checks passed
@prasmussen15 prasmussen15 deleted the leiden branch September 23, 2024 15:05
@github-actions github-actions bot locked and limited conversation to collaborators Sep 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants