HotPotQA lookup returns the wrong result is asked to look up the same term in 2 different pages sequentially #99

albertbou92 · 2024-10-28T19:21:47Z

Fix Lookup Method in HotPotQA Environment

Currently, the Lookup method in the HotPotQA environment only checks if the current lookup term matches the last one to determine if the user is searching for additional occurrences. However, it should also verify that the previous action was also Lookup.

Without this additional check, if the agent wants to look for the same term on different pages, the lookup table does not reset as expected.

Example Issue

In the following sequence:

Search("France") --> Lookup("Population") --> Search("China") --> Lookup("Population")

the Lookup method will return the second occurrence of "Population" on the page for "France" instead of the first occurrence on the page for "China."

…a_lookup_fix2`) Here are some performance optimization techniques applied to your code.

codeflash-ai · 2024-10-28T19:38:17Z

⚡️ Codeflash found optimizations for this PR

📄 `HotPotQAEnv.finish()` in `packages/hotpotqa/src/aviary/envs/hotpotqa/env.py`

📈 Performance improved by 8,058% (80.58x faster)

⏱️ Runtime went down from 55.3 milliseconds to 677 microseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method HotPotQAEnv.finish by 8,058% in PR #99 (hotpotqa_lookup_fix2) #100

If you approve, it will be merged into this PR (branch hotpotqa_lookup_fix2).

packages/hotpotqa/src/aviary/envs/hotpotqa/env.py

… (`hotpotqa_lookup_fix2`) Here's the optimized version of your Python program. I've focused on improving the logic without changing the function signatures or renaming functions. ### Explanation of Optimizations. 1. **Removed `if s.strip() and keyword.lower() in s.lower()`**. - Perform `s.strip()` inside the list comprehension only once for constructing the results. - Avoid redundant `s.strip()` checks, as the condition `s.strip()` guarantees non-empty strings. 2. **Checked for `keyword.lower()` Once**. - Store the lowercase version of the keyword (`keyword_lower`) instead of computing it multiple times in the list comprehension. This reduces redundant calls to `keyword.lower()`, improving runtime performance, especially with long texts.

codeflash-ai · 2024-10-28T21:18:27Z

⚡️ Codeflash found optimizations for this PR

📄 `HotPotQAEnv.construct_lookup_list()` in `packages/hotpotqa/src/aviary/envs/hotpotqa/env.py`

📈 Performance improved by 28% (0.28x faster)

⏱️ Runtime went down from 4.10 milliseconds to 3.21 milliseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method HotPotQAEnv.construct_lookup_list by 28% in PR #99 (hotpotqa_lookup_fix2) #101

If you approve, it will be merged into this PR (branch hotpotqa_lookup_fix2).

sidnarayanan · 2024-10-28T21:39:03Z

packages/hotpotqa/src/aviary/envs/hotpotqa/env.py

@@ -340,6 +343,8 @@ def finish(self, answer: str) -> str:

        self.state.answer = answer
        self.state.reward += self.calculate_reward(answer)
+
+        self.state.last_action = self.tools[2].info.name


I saw @jamesbraza 's original suggestion, but IMO this is less readable and depends on the order in which self.tools is populated. "Finish" is better. If we want to be extra careful, we could make a constant FINISH_TOOL_NAME = "Finish" and use it in both places.

Yeah I agree with Sid that we shouldn't rely on ordering here @albertbou92 . Can you find a way to:

Not depend on tool ordering

Not depend on string literals (as subclasses can change tool names)

Maybe just last_action_is_lookup: bool, since that's the main thing we check with last_action?

Yeah this is a good solution actually

Ryan-Rhys · 2024-10-28T22:14:38Z

packages/hotpotqa/tests/test_hotpotqa_env.py

+    obs5 = hotpotqa_env.finish("China")
+
+    # Ensure that the observations are different
+    assert obs1 != obs2 != obs3 != obs4 != obs5


What's the intuition behind this assertion?

Just making sure the returned outputs change when they are supposed to. Before this PR, if lookup only finds an occurrence, obs2 and obs4 would be equal, which is not correct.

packages/hotpotqa/tests/test_hotpotqa_env.py

Co-authored-by: James Braza <jamesbraza@gmail.com>

albertbou92 added 3 commits October 27, 2024 14:51

increase hotpotqa lookup range to a full paragraph

356db3e

Merge branch 'main' into hotpotqa_lookup_range

bdc09f5

lookup bug fix plus test

5b997a4

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Oct 28, 2024

format

4fb7bcb

albertbou92 requested review from jamesbraza, sidnarayanan and Ryan-Rhys October 28, 2024 19:23

codeflash-ai bot added a commit that referenced this pull request Oct 28, 2024

⚡️ Speed up method HotPotQAEnv.finish by 8,058% in PR #99 (`hotpotq…

02b791f

…a_lookup_fix2`) Here are some performance optimization techniques applied to your code.

codeflash-ai bot mentioned this pull request Oct 28, 2024

⚡️ Speed up method HotPotQAEnv.finish by 8,058% in PR #99 (hotpotqa_lookup_fix2) #100

Closed

jamesbraza reviewed Oct 28, 2024

View reviewed changes

packages/hotpotqa/src/aviary/envs/hotpotqa/env.py Outdated Show resolved Hide resolved

packages/hotpotqa/src/aviary/envs/hotpotqa/env.py Outdated Show resolved Hide resolved

fixes

cff7513

Ryan-Rhys reviewed Oct 28, 2024

View reviewed changes

packages/hotpotqa/src/aviary/envs/hotpotqa/env.py Show resolved Hide resolved

codeflash-ai bot mentioned this pull request Oct 28, 2024

⚡️ Speed up method HotPotQAEnv.construct_lookup_list by 28% in PR #99 (hotpotqa_lookup_fix2) #101

Closed

sidnarayanan reviewed Oct 28, 2024

View reviewed changes

Ryan-Rhys reviewed Oct 28, 2024

View reviewed changes

Ryan-Rhys approved these changes Oct 28, 2024

View reviewed changes

jamesbraza reviewed Oct 28, 2024

View reviewed changes

packages/hotpotqa/tests/test_hotpotqa_env.py Outdated Show resolved Hide resolved

albertbou92 and others added 3 commits October 28, 2024 15:42

Update packages/hotpotqa/tests/test_hotpotqa_env.py

43fb054

Co-authored-by: James Braza <jamesbraza@gmail.com>

fixes

04a31c0

format

3274129

albertbou92 merged commit d09a7ed into main Oct 28, 2024
6 checks passed

albertbou92 deleted the hotpotqa_lookup_fix2 branch October 28, 2024 23:03

albertbou92 mentioned this pull request Oct 28, 2024

Increase HotPotQA lookup range to a full paragraph #96

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HotPotQA lookup returns the wrong result is asked to look up the same term in 2 different pages sequentially #99

HotPotQA lookup returns the wrong result is asked to look up the same term in 2 different pages sequentially #99

albertbou92 commented Oct 28, 2024

codeflash-ai bot commented Oct 28, 2024

⚡️ Speed up method `HotPotQAEnv.finish` by 8,058% in PR #99 (`hotpotqa_lookup_fix2`) #100

codeflash-ai bot commented Oct 28, 2024

⚡️ Speed up method `HotPotQAEnv.construct_lookup_list` by 28% in PR #99 (`hotpotqa_lookup_fix2`) #101

sidnarayanan Oct 28, 2024

jamesbraza Oct 28, 2024

sidnarayanan Oct 28, 2024

albertbou92 Oct 28, 2024

Ryan-Rhys Oct 28, 2024

albertbou92 Oct 28, 2024

HotPotQA lookup returns the wrong result is asked to look up the same term in 2 different pages sequentially #99

HotPotQA lookup returns the wrong result is asked to look up the same term in 2 different pages sequentially #99

Conversation

albertbou92 commented Oct 28, 2024

Example Issue

codeflash-ai bot commented Oct 28, 2024

⚡️ Codeflash found optimizations for this PR

📄 HotPotQAEnv.finish() in packages/hotpotqa/src/aviary/envs/hotpotqa/env.py

I created a new dependent PR with the suggested changes. Please review:

codeflash-ai bot commented Oct 28, 2024

⚡️ Codeflash found optimizations for this PR

📄 HotPotQAEnv.construct_lookup_list() in packages/hotpotqa/src/aviary/envs/hotpotqa/env.py

I created a new dependent PR with the suggested changes. Please review:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

📄 `HotPotQAEnv.finish()` in `packages/hotpotqa/src/aviary/envs/hotpotqa/env.py`

📄 `HotPotQAEnv.construct_lookup_list()` in `packages/hotpotqa/src/aviary/envs/hotpotqa/env.py`