Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix <meta charset> with single quotes #152

Merged
merged 4 commits into from
Oct 4, 2021
Merged

Conversation

lhvy
Copy link
Contributor

@lhvy lhvy commented Sep 30, 2021

Resolves #144 with a small change to the regex to account for single or double quotes

P.S. I'm participating in Hacktoberfest 2021. If this PR is up to standard and merged, I'd appreciate if the hacktoberfest-accepted label could be added. Thanks!

@codecov
Copy link

codecov bot commented Oct 1, 2021

Codecov Report

Merging #152 (2dd32c2) into master (c14cb08) will increase coverage by 1.96%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #152      +/-   ##
==========================================
+ Coverage   62.62%   64.58%   +1.96%     
==========================================
  Files          17       17              
  Lines         610      624      +14     
==========================================
+ Hits          382      403      +21     
+ Misses        228      221       -7     
Impacted Files Coverage Δ
src/downloader.rs 72.89% <100.00%> (ø)
src/scraper.rs 22.70% <100.00%> (+10.42%) ⬆️

Copy link
Collaborator

@CohenArthur CohenArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for this? Thanks!

@lhvy
Copy link
Contributor Author

lhvy commented Oct 1, 2021

I'm unsure how to do this... The tests don't fail on the warn message, and suckit seems to scrape <meta charset='utf-8'> and output <meta charset="utf-8">.

@CohenArthur
Copy link
Collaborator

@lhvy Here's a test if you want to add it to the scraper.rs file:

    #[test]
    fn test_charset_parsing() {
        assert_eq!(Scraper::find_charset("<meta charset=\"UTF-8\">".as_bytes(), None), Some(String::from("utf-8")));
        assert_eq!(Scraper::find_charset("<meta charset='UTF-8'>".as_bytes(), None), Some(String::from("utf-8")));
    }

This just checks that the function returns the correct charset whether the value is between double quotes or single quotes

Copy link
Collaborator

@CohenArthur CohenArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Owner

@Skallwar Skallwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks !

src/scraper.rs Outdated Show resolved Hide resolved
@lhvy lhvy requested a review from Skallwar October 4, 2021 11:34
@Skallwar Skallwar merged commit eebbf56 into Skallwar:master Oct 4, 2021
@lhvy lhvy deleted the encoding-regex branch October 4, 2021 22:10
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Quoting issue on charset detection
3 participants