-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Judging #135
Comments
I agree that there are quite a few false positives, but some of the ones you've listed are opinions. For the opinion ones, other bots report them too, so I have to or else, like you, I miss out on points. For the false positives, I try to minimize them as much as possible, but I still make mistakes during the short time we have to review things. Sometimes I have to make a tweak to a rule that I've marked as "never wrong", and that ends up introducing a bug into the rule, but I try to to review even those the day after, and always try not to make the same mistake twice (assuming I catch it). I've actually been working on most of these mistakes all day. I'm not sure if that helps, but that's what I see from my side. The more you point out these sorts of mistakes, the less of them you'll see in future races. From the judging side, what would you like to see happen differently? My hope is bots-judging-bots is implemented soon, and that'll free up the judge to look more carefully at the ones other bots complain about. edit: after going through them all, none of the duplicates mentioned are duplicates |
I suggest there should be at least a list of rules. Like 'missing zero validation' is 1 issue it doesn't matter if you write 'missing zero on constructor, on initializer, on setter' etc. like 15 different tasks it is only 1 issue and it should be judged like that. Because like i mentioned i can write 15 different title from one detector. Disputes shouldn't be counted to the points (if they are) and i also think there should not be point for 'global issues' like invarianttests etc. I don't know if you guys agree |
And also i am %100 sure every bot has quite a lot false positives but i think judges are not giving penalties from them so it ends up like if a report has x amount of gas issue then ok that report should get x points which makes everything bad. To be honest i think 40 issue with 0 false positive is much much much better than 80 issues with 20 false positives |
who's going to spend the time and effort to maintain the list? who has the power to decide which ones are split and which ones aren't? I think leaving it up to the bots, and then having the judges decide on how to group them is the least-time-intensive way, and that's happening right now. The downside is that not all judges spend the time to group them, so it's hit-or-miss, and you just have to list all combinations to make sure you don't miss out. I agree with you that there should be large penalties for mistakes, and I've mentioned it many times in the primary bot racing org issue and in the bot racing discord channel, but a vocal group oppose it, and I'm essentially the only one asking for it. If more than just I were regularly asking for it, maybe it would happen. |
So you think if i can split an issue to x pieces then i should do it? If judges also agrees this then yeah i can triple my detector count tonight without any problem :D x+=1 is better struct.x+=1 is better array[index]+=1 is better struct.array[index]+=1 is better lets go |
some of those save gas and some don't, so be careful =) |
Hi, i don't know if i am misjudging or not but i don't think bot race judging is fair.
This is from last bot race winner bot: https://github.com/code-423n4/2023-11-panoptic/blob/main/bot-report.md
i didn't spend too much on it so its possible that i have mistakes here but long story short this report has almost %33 false positive and to be honest %33 false positive means like they tried bruteforce everyhthing and pasted the results. And also there are a lot of different issues i still don't understand:
[N-71] Variables need not be initialized to false
[N-72] Variables need not be initialized to zero
so are they really different findings and should get 2 point? or
[N-26] Contracts should have full test coverage
[N-13] Consider adding formal verification proofs
[G-10] Enable IR-based code generation
[N-39] Large or complicated code bases should implement invariant tests
are they really a valid finding? I am not doing this cheap cheats on my bot and eventually i am getting low score because you judge thinks i don't find enough issue. And one last thing, i am not sure if disputed section is also counted or not but if its counted i think its also huge mistake. Disputed means something like "other bots may say this is an issue but actually its not" kind of statements. Without even taking a look i can write 250 different disputed issue and send them with my report with bruteforce like:
d-01: Other bots may think there is a signature replay attack in the contract however that is not the case.
Who can tell me this d01 is not valid? Other bots may tell yes and %99.9 it is not valid ?
Can i get some answers please. Thanks!
The text was updated successfully, but these errors were encountered: