How does the evaluation on downstream tasks carried out? #7

ee2110 · 2024-02-03T19:36:03Z

Hi, thank you for the great work and interesting ideas!

Are the validation/test set from COIN & CrossTask datasets used during evaluation?
Are the downstream models (MLP / Transformer) trained with COIN & CrossTask data before evaluation?
During evaluation for task recognition, are all annotated video segments from a video fed into the pre-trained model e(.)? or only specific one segment from a video is used? I wondered how the accuracy was calculated.

Hope to get more information about these, I enjoyed reading your work.

Below is the screenshot of a diagram taken from the paper

Thank you.

hongluzhou · 2024-02-08T00:29:47Z

Thank you for your interest in our work and for your kind words!

We used train/test sets from COIN (

paprika/datasets/coin.py

Line 32 in cbefd71

if split == 'train' and self.coin_json['database'][video_sid]['subset'] == 'training':

) and created train/test sets for CrossTask on our own using random splits (

paprika/datasets/cross_task.py

Line 12 in cbefd71

def get_task_cls_train_test_splits(cross_task_video_dir, train_ratio=0.8):

).
Yes, downstream models were trained on the train set of the downstream datasets before evaluating them on the downstream test set.
We used the pre-trained model to extract features of the video segments that contain steps. These features served as the input to the downstream models (

paprika/datasets/cross_task.py

Line 201 in cbefd71

video_feats = np.load(

).

Provide feedback