You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey guys,
so I recently came across your paper and found some problems, that I'd like to discuss.
For nearly all results in the paper that are claimed to be "zeroshot", the authors clearly trained on that dataset, thus is not truely zeroshot. For example, this table:
Shows superiority in (ZS) evaluation against the baselines. However, CLARA's training set contains of CREMA-D, RAVDESS etc., while (some) of the baselines didn't use this "trick".
Can you clarify why it is believed that this is zero-shot performance?
What are the test datasets ? i.e., in example Table VI:
Since you have trained on so much speech data, why is there no zero-shot evaluation for MSW or even some english datasets?
Are pretrained checkpoints available? The links seem broken in the README.
Kind regards,
Heinrich
The text was updated successfully, but these errors were encountered:
Hey guys,
so I recently came across your paper and found some problems, that I'd like to discuss.
Shows superiority in (ZS) evaluation against the baselines. However, CLARA's training set contains of CREMA-D, RAVDESS etc., while (some) of the baselines didn't use this "trick".
Can you clarify why it is believed that this is zero-shot performance?
What are the test datasets ? i.e., in example Table VI:
![image](https://private-user-images.githubusercontent.com/3957278/397599638-6c1fae3f-1551-42d6-8082-f0a29837d316.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2NzAzMDksIm5iZiI6MTczOTY3MDAwOSwicGF0aCI6Ii8zOTU3Mjc4LzM5NzU5OTYzOC02YzFmYWUzZi0xNTUxLTQyZDYtODA4Mi1mMGEyOTgzN2QzMTYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNiUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTZUMDE0MDA5WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ODg1MzEwNGY3NjZhZDA3Yzc1ZGI5YzkyYzcxNzZhY2JkOTQ1YWNjMzI0MzU0MmFjZjYzZDYyZGU5ZjI0ODczNSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.fvTnWf-Nncwz4f3VRv0LuE0bdIuT7NQX1tpyrRnyv0k)
Since you have trained on so much speech data, why is there no zero-shot evaluation for MSW or even some english datasets?
Are pretrained checkpoints available? The links seem broken in the README.
Kind regards,
Heinrich
The text was updated successfully, but these errors were encountered: