Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Index Mapping Tool #1609

Conversation

dbwiddis
Copy link
Member

@dbwiddis dbwiddis commented Nov 8, 2023

Description

Implements a tool to fetch index settings and mappings.

Issues Resolved

In support of #1161

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env November 8, 2023 01:52 — with GitHub Actions Failure
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 01:52 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 01:52 — with GitHub Actions Inactive
@dbwiddis dbwiddis changed the title Ipmlement Index Mapping Tool Implement Index Mapping Tool Nov 8, 2023
Copy link

codecov bot commented Nov 8, 2023

Codecov Report

Merging #1609 (4e63a2a) into feature/agent_framework_dev (90d9b31) will decrease coverage by 0.01%.
Report is 1 commits behind head on feature/agent_framework_dev.
The diff coverage is 80.30%.

@@                        Coverage Diff                        @@
##             feature/agent_framework_dev    #1609      +/-   ##
=================================================================
- Coverage                          72.19%   72.18%   -0.01%     
- Complexity                          2492     2494       +2     
=================================================================
  Files                                220      221       +1     
  Lines                              11018    11084      +66     
  Branches                            1119     1128       +9     
=================================================================
+ Hits                                7954     8001      +47     
- Misses                              2592     2606      +14     
- Partials                             472      477       +5     
Flag Coverage Δ
ml-commons 72.18% <80.30%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...g/opensearch/ml/engine/tools/IndexMappingTool.java 80.30% <80.30%> (ø)

... and 3 files with indirect coverage changes

Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 03:55 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 03:55 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env November 8, 2023 03:55 — with GitHub Actions Failure
List<String> indexList = parameters.containsKey("index")
? gson.fromJson(parameters.get("index"), List.class)
: Collections.emptyList();
final String[] indices = indexList.toArray(Strings.EMPTY_ARRAY);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If indices is an empty list, should we early return the result, not go through the following logic then return empty result? Early returning can improve the tool execution performance in edge cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I originally copied the code from cat indices where "empty" meant "match all the indices". But I guess that isn't the case here.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
Comment on lines +43 to +46
@Getter
private String type;
@Getter
private String version;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than defining these fields for each tool can we create an abstract tool with type and version defined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question @arjunkumargiri . These were not part of the Tool interface when I started this PR, and @jngz-es made a direct commit to this dev branch here without any documentation of what they mean, so I added them here and in #1582 just to have something that implements the interface.

I was under the understanding that this dev branch was for rapid iteration (thus the non-PR maintainer commit linked above). I'd really like to not hold up this PR until there is full documentation for the new committed interface that I'm required to implement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, @arjunkumargiri . As @dbwiddis mentioned, this dev branch is for rapid development to unblock different teams, I can do refactoring afterwards.

Comment on lines 48 to 49
@SuppressWarnings("unused")
private String modelId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why modelId is required if it is not used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly because I'm just copying over from the CatIndices where it was also unused but was based on @ylwu-amzn's POC code where it was unused as well.

There have been no detailed requirements written for what I'm writing this PR against, and I understand the Tool interface isn't finalized, so I'm doing my best to guess at what will eventually be needed.

This can probably be deleted but I'd rather have it here unused and delete later than delete now and re-add later if we find out it's needed.

See previous comment on doing my best to rapidly iterate and keep up with changing requirements rather than waiting for everything to be finalized.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelId is not required. The current code is not cleaned up. May have some testing code. Agree that let's don't wait, we can move forward and fine tune later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but we will need clean up code before merging onto main branch.

listener.onResponse(empty);
return;
}
StringBuilder sb = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the output of tool follow json format and let the model understand from json format? this will help simplify each tool by directly returning json response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, the requirements for the output have not been specified and the format I gave was in response to my question for how it should be. If we want a different format it needs to be clearly specified for me to write code against. @ylwu-amzn @jngz-es

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the output of tool follow json format and let the model understand from json format

Unless we do see the benefit of this, I would suggest not enforcing this. Keep it flexible and we could fine tune later.

the requirements for the output have not been specified and the format I gave was in response to my question for how it should be

It's hard to tell what's the exact format which will works the best without testing and fine tuning. Let's not worry too much about format for now. We can iterate fast and fine tune later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with moving forward on dev branch. However, I would prefer to define a unified format for all tool output so that it is easier for other tool development. If we are inclined on moving tuning the output later, I would prefer tools output in json format and let agent implement a common mechanism to covert from json to other formats.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 06:57 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 06:57 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 06:57 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 06:57 — with GitHub Actions Inactive
Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 16:06 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env November 8, 2023 16:06 — with GitHub Actions Failure
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 16:06 — with GitHub Actions Inactive
Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env November 8, 2023 17:52 — with GitHub Actions Failure
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 17:52 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 17:52 — with GitHub Actions Inactive
Signed-off-by: Daniel Widdis <widdis@gmail.com>
Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 18:56 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env November 8, 2023 18:56 — with GitHub Actions Failure
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 18:56 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env November 8, 2023 18:56 — with GitHub Actions Inactive
@ylwu-amzn ylwu-amzn merged commit abac194 into opensearch-project:feature/agent_framework_dev Nov 22, 2023
5 of 7 checks passed
@ylwu-amzn
Copy link
Collaborator

This PR opens for a long time. I don't see more comments on it. Just merged it to unblock Dan. You can add more comments, @dbwiddis , you can fix in new PR if necessary.

@dbwiddis dbwiddis mentioned this pull request Jan 19, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants