Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less fp's #4279

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Less fp's #4279

wants to merge 5 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Aug 1, 2020

  • Excludes StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason (since StackOverflow can have <img> code counted and the other 3 have a lot of MathJax used)
  • Adds Cross Validated to the exclusion list for the "mostly punctuation marks in {}" reason (MathJax)

Statistics:

Excluding StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason, will result in:

  • 31 fewer fp's
  • 0 fewer tp's

The current accuracy of this reason is 17% (17)
New accuracy: 40% (40)

Excluding Cross Validated from the "mostly punctuation marks in {}" reason, will result in:

  • 0 fewer tp's (all tp's caught by other reasons)
  • 30 fewer fp's

Daniil-M-beep added 3 commits August 1, 2020 11:27
Signed-off-by: Daniil-M-beep <64322880+Daniil-M-beep@users.noreply.github.com>
@ghost
Copy link
Author

ghost commented Aug 1, 2020

Note: The failures are not because of my code but because of how the tests are set up.

Edit: Fixed now

@user12986714
Copy link
Contributor

Less fp for the mostly-img reason would be great, but I don't think excluding sites is optimal approach. See #4190

@ghost
Copy link
Author

ghost commented Aug 1, 2020

Less fp for the mostly-img reason would be great, but I don't think excluding sites is optimal approach. See #4190

MathJax would still get caught though

@ghost ghost closed this Aug 7, 2020
@ghost ghost deleted the ShuffleFindspam branch August 7, 2020 23:50
@ghost ghost restored the ShuffleFindspam branch August 7, 2020 23:51
@ghost ghost reopened this Aug 7, 2020
@NobodyNada
Copy link
Member

Excludes StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason (since StackOverflow can have code counted and the other 3 have a lot of MathJax used)

Isn't there a stripcodeblocks option that would help on Stack Overflow? As for the math sites...as far as I can tell, MathJax doesn't render as images, it just embeds the text in a <span class="math-container"> (which is rendered by client-side JS).

@ghost
Copy link
Author

ghost commented Aug 29, 2020

Excludes StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason (since StackOverflow can have code counted and the other 3 have a lot of MathJax used)
Isn't there a stripcodeblocks option that would help on Stack Overflow? As for the math sites...as far as I can tell, MathJax doesn't render as images, it just embeds the text in a (which is rendered by client-side JS).

I have updated the PR and it has nothing to do with SO now, also the issue is MathJax posted as images and not actual MathJax.

@makyen makyen added the status: confirmed Confirmed as something that needs working on. label Aug 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: confirmed Confirmed as something that needs working on.
Development

Successfully merging this pull request may close these issues.

3 participants