You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to map the Lumos WebAgent grounding dataset onto the original Mind2Web dataset. Unfortunetly the ids (annotation_id, action_uid) were removed in the Lumos version but via query extraction and matching I can match 1001/1009 samples to their corresponding Mind2Web entries.
But the problem that I am facing now is that Lumos must have done some processing on the actions itself. Lumos appears to have sometimes more, sometimes less actions (i.e. user msgs defining a grounding sentence). Why is this the case? Which processing was applied?
For my work I need a mapping of the Lumos grounding steps (that is the user msgs in the Lumos dataset) to the html_source code found in Mind2Web.
Happy to receive and guidance or advice and thanks for the great open-source work!
The text was updated successfully, but these errors were encountered:
Sorry for the late reply! I was pretty busy working on the other ongoing project.
The mismatch might be due to the annotation conversion process, since sometimes the LLM may output something with invalid formats, and those will be arbitarily discarded (You can take a look at prompt_convertion.py in data folder). But indeed I wasn't aware of the issue about extra actions. But it might be simple to filter these out by matching the actions with the original ones in Mind2Web: If the action doesn't appear in Mind2Web, there must be sth wrong and feel free to remove them.
Hello,
I am trying to map the Lumos WebAgent grounding dataset onto the original Mind2Web dataset. Unfortunetly the ids (annotation_id, action_uid) were removed in the Lumos version but via query extraction and matching I can match 1001/1009 samples to their corresponding Mind2Web entries.
But the problem that I am facing now is that Lumos must have done some processing on the actions itself. Lumos appears to have sometimes more, sometimes less actions (i.e. user msgs defining a grounding sentence). Why is this the case? Which processing was applied?
For my work I need a mapping of the Lumos grounding steps (that is the user msgs in the Lumos dataset) to the html_source code found in Mind2Web.
Happy to receive and guidance or advice and thanks for the great open-source work!
The text was updated successfully, but these errors were encountered: