Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HotPotQA lookup returns the wrong result is asked to look up the same term in 2 different pages sequentially #99

Merged
merged 8 commits into from
Oct 28, 2024

Conversation

albertbou92
Copy link
Contributor

Fix Lookup Method in HotPotQA Environment

Currently, the Lookup method in the HotPotQA environment only checks if the current lookup term matches the last one to determine if the user is searching for additional occurrences. However, it should also verify that the previous action was also Lookup.

Without this additional check, if the agent wants to look for the same term on different pages, the lookup table does not reset as expected.

Example Issue

In the following sequence:

Search("France") --> Lookup("Population") --> Search("China") --> Lookup("Population")

the Lookup method will return the second occurrence of "Population" on the page for "France" instead of the first occurrence on the page for "China."

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Oct 28, 2024
codeflash-ai bot added a commit that referenced this pull request Oct 28, 2024
…a_lookup_fix2`)

Here are some performance optimization techniques applied to your code.
Copy link

codeflash-ai bot commented Oct 28, 2024

⚡️ Codeflash found optimizations for this PR

📄 HotPotQAEnv.finish() in packages/hotpotqa/src/aviary/envs/hotpotqa/env.py

📈 Performance improved by 8,058% (80.58x faster)

⏱️ Runtime went down from 55.3 milliseconds to 677 microseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch hotpotqa_lookup_fix2).

codeflash-ai bot added a commit that referenced this pull request Oct 28, 2024
… (`hotpotqa_lookup_fix2`)

Here's the optimized version of your Python program. I've focused on improving the logic without changing the function signatures or renaming functions.



### Explanation of Optimizations.

1. **Removed `if s.strip() and keyword.lower() in s.lower()`**.
   - Perform `s.strip()` inside the list comprehension only once for constructing the results.
   - Avoid redundant `s.strip()` checks, as the condition `s.strip()` guarantees non-empty strings.

2. **Checked for `keyword.lower()` Once**.
   - Store the lowercase version of the keyword (`keyword_lower`) instead of computing it multiple times in the list comprehension. This reduces redundant calls to `keyword.lower()`, improving runtime performance, especially with long texts.
Copy link

codeflash-ai bot commented Oct 28, 2024

⚡️ Codeflash found optimizations for this PR

📄 HotPotQAEnv.construct_lookup_list() in packages/hotpotqa/src/aviary/envs/hotpotqa/env.py

📈 Performance improved by 28% (0.28x faster)

⏱️ Runtime went down from 4.10 milliseconds to 3.21 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch hotpotqa_lookup_fix2).

@@ -340,6 +343,8 @@ def finish(self, answer: str) -> str:

self.state.answer = answer
self.state.reward += self.calculate_reward(answer)

self.state.last_action = self.tools[2].info.name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw @jamesbraza 's original suggestion, but IMO this is less readable and depends on the order in which self.tools is populated. "Finish" is better. If we want to be extra careful, we could make a constant FINISH_TOOL_NAME = "Finish" and use it in both places.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with Sid that we shouldn't rely on ordering here @albertbou92 . Can you find a way to:

  • Not depend on tool ordering
  • Not depend on string literals (as subclasses can change tool names)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just last_action_is_lookup: bool, since that's the main thing we check with last_action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good solution actually

obs5 = hotpotqa_env.finish("China")

# Ensure that the observations are different
assert obs1 != obs2 != obs3 != obs4 != obs5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intuition behind this assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure the returned outputs change when they are supposed to. Before this PR, if lookup only finds an occurrence, obs2 and obs4 would be equal, which is not correct.

albertbou92 and others added 3 commits October 28, 2024 15:42
Co-authored-by: James Braza <jamesbraza@gmail.com>
@albertbou92 albertbou92 merged commit d09a7ed into main Oct 28, 2024
6 checks passed
@albertbou92 albertbou92 deleted the hotpotqa_lookup_fix2 branch October 28, 2024 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants