Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove quoted_na arg and solve other bugs due to major release of readr #27

Merged
merged 12 commits into from
Sep 8, 2021

Conversation

damianooldoni
Copy link
Contributor

This PR would like to update the package as some bugs have been detected due to a major release (2.x) of readr package.
Also a warning due to the use of the deprecated arg quoted_na.

This PR has still not to be reviewed. I wrote it now so to have automatic tests running on different OS via GitHub actions.

This is needed as the argument is now deprecated
@damianooldoni damianooldoni marked this pull request as draft August 21, 2021 13:34
A classic copy paste typo. WIth crlf instead of cr this test works fine.
@peterdesmet
Copy link
Member

Odd that quoted_na triggers deprecation warning: it is still listed in the documentation of readr.

@damianooldoni
Copy link
Contributor Author

After a detailed debugging I finally found the reason why R hangs for tests with numbers, e.g. test of resources at lines 218-249.
It's a locale problem when passing grouping mark "". This has been reported (tidyverse/readr#1241) and tackled in master as the other one about single digit hours (tidyverse/readr#1276). So, moving readr from Imports to Remotes should fix all these problems. And it should avoid further failures as readr version 2 is still not mature.

@peterdesmet
Copy link
Member

Yes, I debugged it like that as well. 😊 Good that it will be fixed at readr. We could also always pass “.” I’ll finish this PR.

One last question: is quoted_na effectively deprecated?

@damianooldoni
Copy link
Contributor Author

About the question of quoted_na and its deprecation. I get this while reading a dummy file I have just created:

image

@damianooldoni
Copy link
Contributor Author

About the problems with numbers, I think that passing "." to the grouping mark could create problems as it is also used as decimal mark, isn't?

About the bug in readr which has been solved at dev level, notice that this has NOT been solved in readr itself but in the dev version of one of its dependencies, vroom. Still, as I noticed in tidyverse/readr#1241 (comment), as long vroom in CRAN is not updated readr will not solve this.
My advice is to avoid to use "" as grouping mark.

@peterdesmet
Copy link
Member

My advice is to avoid to use "" as grouping mark.

Indeed, we could pass , (you're right, not .) as default grouping mark to locale (it is the default for https://readr.tidyverse.org/reference/locale.html):

  1. By default, grouping mark will be ignored, because we parse numbers with col_double() which doesn't take grouping mark into account
  2. If one of the fields has a groupChar=x defined, that x will be used. Great, because that is the intended behaviour
  3. If one of the fields has bareNumber=false (and groupChar is undefined), then , will be used, since we also have to use col_number() to get rid of non-numeric characters. That could lead to some unforeseen consequences, e.g. when the decimalMark is also ,.

I therefore wonder if we couldn't set grouping mark by default to a special character that is unlikely to be used in non-bare-numbers?

@damianooldoni
Copy link
Contributor Author

I just gave a try and I found that:

  1. we will get an error if we try to set a grouping mark equal to the decimal mark
library(readr)
local_damianoo <- locale(decimal_mark = ",",
                         grouping_mark = ",")
#> Error: `decimal_mark` and `grouping_mark` must be different
  1. by default grouping mark is one of , or . depending on what you choose as decimal mark
local_damianoo <- locale(decimal_mark = ".")
cat(paste0("Decimal mark: \"", local_damianoo$decimal_mark,
      "\"\nGrouping mark: \"", local_damianoo$grouping_mark, "\""))
#> Decimal mark: "."
#> Grouping mark: ","

local_damianoo <- locale(decimal_mark = ",")
cat(paste0("Decimal mark: \"", local_damianoo$decimal_mark,
           "\"\nGrouping mark: \"", local_damianoo$grouping_mark, "\""))
#> Decimal mark: ","
#> Grouping mark: "."
  1. by default decimal mark is one of , or . depending on what you choose as grouping mark
local_damianoo <- locale(grouping_mark = ".")
cat(paste0("Decimal mark: \"", local_damianoo$decimal_mark,
      "\"\nGrouping mark: \"", local_damianoo$grouping_mark, "\""))
#> Decimal mark: ","
#> Grouping mark: "."

local_damianoo <- locale(grouping_mark = ",")
cat(paste0("Decimal mark: \"", local_damianoo$decimal_mark,
           "\"\nGrouping mark: \"", local_damianoo$grouping_mark, "\""))
#> Decimal mark: "."
#> Grouping mark: ","

@peterdesmet peterdesmet added this to the readr milestone Sep 7, 2021
@codecov
Copy link

codecov bot commented Sep 8, 2021

Codecov Report

Merging #27 (c96d633) into main (a195f68) will not change coverage.
The diff coverage is 100.00%.

❗ Current head c96d633 differs from pull request most recent head 7c1a84d. Consider uploading reports for the commit 7c1a84d to get more accurate results
Impacted file tree graph

@@            Coverage Diff            @@
##              main       #27   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            2         2           
  Lines          186       185    -1     
=========================================
- Hits           186       185    -1     
Impacted Files Coverage Δ
R/read_resource.R 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a195f68...7c1a84d. Read the comment docs.

@peterdesmet peterdesmet marked this pull request as ready for review September 8, 2021 09:42
@peterdesmet peterdesmet merged commit 2362e7f into main Sep 8, 2021
@peterdesmet peterdesmet deleted the remove-quoted-na-arg branch September 8, 2021 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants