Understanding processing of Mind2Web dataset for Lumos grounding #5

DanielRoeder1 · 2024-05-07T16:28:14Z

Hello,

I am trying to map the Lumos WebAgent grounding dataset onto the original Mind2Web dataset. Unfortunetly the ids (annotation_id, action_uid) were removed in the Lumos version but via query extraction and matching I can match 1001/1009 samples to their corresponding Mind2Web entries.

But the problem that I am facing now is that Lumos must have done some processing on the actions itself. Lumos appears to have sometimes more, sometimes less actions (i.e. user msgs defining a grounding sentence). Why is this the case? Which processing was applied?

For my work I need a mapping of the Lumos grounding steps (that is the user msgs in the Lumos dataset) to the html_source code found in Mind2Web.

Happy to receive and guidance or advice and thanks for the great open-source work!

yuchenlin · 2024-08-09T07:51:26Z

@WadeYin9712 plz take a look at this issue?

WadeYin9712 · 2024-08-12T05:22:20Z

Hi Daniel,

Sorry for the late reply! I was pretty busy working on the other ongoing project.

The mismatch might be due to the annotation conversion process, since sometimes the LLM may output something with invalid formats, and those will be arbitarily discarded (You can take a look at prompt_convertion.py in data folder). But indeed I wasn't aware of the issue about extra actions. But it might be simple to filter these out by matching the actions with the original ones in Mind2Web: If the action doesn't appear in Mind2Web, there must be sth wrong and feel free to remove them.

Let me know if you have further questions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding processing of Mind2Web dataset for Lumos grounding #5

Understanding processing of Mind2Web dataset for Lumos grounding #5

DanielRoeder1 commented May 7, 2024

yuchenlin commented Aug 9, 2024

WadeYin9712 commented Aug 12, 2024

Understanding processing of Mind2Web dataset for Lumos grounding #5

Understanding processing of Mind2Web dataset for Lumos grounding #5

Comments

DanielRoeder1 commented May 7, 2024

yuchenlin commented Aug 9, 2024

WadeYin9712 commented Aug 12, 2024