You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The junior-v2 interview is showing it's age, I created it back when llama was all we had and at the time every single open source model failed the test.
The clustering we now see at the top of the leaderboard is a result of the massive improvements in open source coding models these past 6 months, anything above .95 is a binary pass and junior-v2 has no comparing ability up here.
A more difficult test suite is needed.
The text was updated successfully, but these errors were encountered:
The junior-v2 interview is showing it's age, I created it back when llama was all we had and at the time every single open source model failed the test.
The clustering we now see at the top of the leaderboard is a result of the massive improvements in open source coding models these past 6 months, anything above .95 is a binary pass and junior-v2 has no comparing ability up here.
A more difficult test suite is needed.
The text was updated successfully, but these errors were encountered: