Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Some classifications not returned from findEntityXXX requests #6311

Closed
1 task done
mandy-chessell opened this issue Mar 11, 2022 · 4 comments
Closed
1 task done
Assignees
Labels
bug Something isn't working triage New bug/issue which needs checking & assigning

Comments

@mandy-chessell
Copy link
Contributor

mandy-chessell commented Mar 11, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The federated findEntityXXX repository requests supported by the enterprise repository services do not return classifications for entities if they are stored in a repository attached to an entity proxy. These methods do return all classifications attached to home entities and reference copies.

The reason for the difference is that the entity proxy can not be returned on the findEntityXXX methods because they return EntityDetail objects.

The methods that retrieve a single entity (isEntityKnown, getEntitySummary, getEntityDetail) will successfully return all classifications because the executors for these requests are making requests for a specific entity and so receive EntityProxyOnlyException if the entity is known but only a proxy is available. This then allows getHomeClassifcations() to be called on that repository to retrieve any locally homed classifications from that repository.

Expected Behavior

All classifications for an entity should be returned from the federated query no matter how they are stored.

The federating query mechanism needs to minimise calls to the cohort member repositories and allow the query to execute in parallel.

Steps To Reproduce

The entities needs to be stored in a repository that is using the adapter pattern (ie running in a repository proxy) and does not have an event mapper nor supports the Anchors classification. The lack of an event mapper means that events relating to its metadata will not be sent over the cohort topic(s) and so no reference copies from this repository will be created in the other cohort members. The lack of support of the Anchors classification means that this classification needs to be stored in another repository - and due to the lack of reference copies, the classification will be attached to an entity proxy.

This repository needs to be part of a cohort with a native repository so there is somewhere to store the Anchors classifications.

Then issue a query to find these entities through an OMAS to engage the federated query. Beforethis fix, the anchors classification is missing, after this fix, the Anchors classification is returned.

Environment

- Egeria: 3.7 SNAPSHOT or earlier

Any Further Information?

No response

@mandy-chessell mandy-chessell added bug Something isn't working triage New bug/issue which needs checking & assigning labels Mar 11, 2022
@mandy-chessell mandy-chessell self-assigned this Mar 11, 2022
@mandy-chessell
Copy link
Contributor Author

mandy-chessell commented Mar 11, 2022

This fix requires the findEntityXXX executors to operate in two phases. The first phase to retrieve the entities and the second phase to retrieve any stranded home classifications.

The first phase visits each repository and retrieve none, or a list of entity details. These are assembled in the accumulator where old/duplicated versions are discarded and the classifications are merged into a common list.
The second phase needs to visit each repository for each entity returned by the first phase and retrieve any homed classifications for that entity. Any retrieved classifications are added to the accumulator.

In order to avoid calling a repository that has already returned an entity, the accumulator needs to remember which repositories have returned a specific entity so they are skipped in the second phase.

In this fix I have also changed the single entity queries to use the same pattern. Their current implementation manages the two phases in the executor. This means the request has to be executed in sequence using a single executor. Moving the accumulation of the entity, classifications and the list of visited repositories to the accumulator means these requests can be executed in parallel using a different executor in each thread.

Note: at this time, the federated query is operating in a single thread but the design/code allows for queries to be issued in parallel threads - the code is waiting for the thread pool suppport to be added to ParallelFederationControl. This parallel operation is needed for large cohorts.

mandy-chessell added a commit to mandy-chessell/egeria that referenced this issue Mar 14, 2022
Signed-off-by: Mandy Chessell <mandy.e.chessell@gmail.com>
@alexandra-bucur
Copy link

@mandy-chessell, I managed to test the changes with IGC proxy and there is something missing in the first phase. When the FindEntitiesByPropertyExecutor's issueRequestToRepository method is called, the method always returns true. In case of the glossary term from IGC, the first connector (the local one) will retrieve no results. This means that the federation control logic, in executeCommand will go to break and will stop searching for entities in the next connector (no entities will be searched in IGC, no entities will be returned at all).

The same happens when I try to publish the glossary term context through Asset Lineage. This time the executor is GetRelationshipsForEntityExecutor. No relationship is found.

I tried to add
if(results == null) { return false; } in issueRequestToRepository after every acumulator call (sometimes it is also called when catching certain exceptions). This made the search in Asset Catalog and the publish context in Asset Lineage work again.

I am not sure if some other logic should be added or if other executors should have a similar change, but I saw we have a lot more than these 2. Could you please take a look at this issue so I don't mess with the flow? :) Thank you!

@mandy-chessell
Copy link
Contributor Author

mandy-chessell commented Mar 15, 2022

issueRequestToRepository should always return false to ensure all repositories are visited. This is required for all find executors

mandy-chessell added a commit to mandy-chessell/egeria that referenced this issue Mar 16, 2022
Signed-off-by: Mandy Chessell <mandy.e.chessell@gmail.com>
mandy-chessell added a commit that referenced this issue Mar 17, 2022
@mandy-chessell
Copy link
Contributor Author

I think this is finished now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage New bug/issue which needs checking & assigning
Projects
None yet
Development

No branches or pull requests

2 participants