Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking coverage of OA entities #211

Merged
merged 26 commits into from
Nov 18, 2024
Merged

Conversation

yjunechoe
Copy link
Collaborator

New file openalexR-coverage.R tracks oa2df()'s coverage of OA entity fields (currently just one draft for the Work entity)

New function get_coverage() introduces an interface to the coverage information, given an entity

This could also use some tests to ensure a match between what oa2df() actually returns vs. what we say it returns.

@yjunechoe yjunechoe marked this pull request as draft February 26, 2024 15:57
@yjunechoe
Copy link
Collaborator Author

@trangdata Could you help me double check my understanding of how we process variables from Work objects? And feel free to suggest anything to add to the comments column

@trangdata
Copy link
Collaborator

Thank you for putting this together @yjunechoe 💯 We needed this a long time ago. You're as thorough as always! 🚀

I will make a PR if I think of other things to add to comments.

@rkrug
Copy link

rkrug commented Oct 15, 2024

I like this functionality.

One point: in the case of e.g. biblio, where there are values extracted, it would be great if the field could be specified, e.g. biblio.volume, volume and also biblio, NA to indicate where the value is coming from and that biblio itself is removed.

That would make it clearer to understand and also make it possible to rename sub fields after extraction.

@rkrug
Copy link

rkrug commented Oct 15, 2024

Oh - I would also suggest to add the sorting of the fields in this function (or a separate function).

@trangdata trangdata mentioned this pull request Nov 13, 2024
- add summary.stats column for institutions
- works column author is now authorships
- institutions column display_name_international is now international_display_name
- add coverage tracking for other entities
@trangdata trangdata marked this pull request as ready for review November 15, 2024 17:48
@trangdata
Copy link
Collaborator

trangdata commented Nov 15, 2024

@yjunechoe I can't add you as a reviewer because you opened the PR. But I would love to get your feedback when you have a minute. (no rush though!)

  • add warnings for the breaking changes
  • add keywords as an entity, remove documentation for concepts
  • add coverage for all entities, update test
  • update readme and news

@yjunechoe
Copy link
Collaborator Author

yjunechoe commented Nov 18, 2024

This looks great @trangdata !! Thanks so much for finishing up the coverage data - I know that's a lot of manual labor!

One thing that stands out is our current handling of data, e.g.:

get_coverage <- function(entity = NULL) {
  utils::data("oa2df_coverage", envir = environment(), package = "openalexR")
  oa2df_coverage <- oa2df_coverage

I don't mean to overthink fixing something that's not broken (and I can't even come up with something better ATM) but just want to mull over this a bit. An alternative to utils::data() that's less side-effect-y is to use oa2df_coverage <- get("oa2df_coverage", envir = asNamespace("openalexR")). Anyways I'm attempting to nerd-snipe some R wizards with a mastodon post so let me come back to this but otherwise the PR LGTM!


P.S. - Now that we have {rlang}, we're 1-step removed from also adding {cli} which I think may be worth considering later as QoL improvement: ex: warn() -> cli_warn(). The cli::cli_progress_*() functions can also replace our dependency to {progress}!

@trangdata
Copy link
Collaborator

trangdata commented Nov 18, 2024

Thank you @yjunechoe! So are you saying oa2df_coverage <- get("oa2df_coverage", envir = asNamespace("openalexR")) would work? Or it's still not working?

P.S. - Now that we have {rlang}, we're 1-step removed from also adding {cli} which I think may be worth considering later as QoL improvement: ex: warn() -> cli_warn(). The cli::cli_progress_*() functions can also replace our dependency to {progress}!

Great point! I'm happy to add cli to Imports (and remove progress)!

@rkrug
Copy link

rkrug commented Nov 18, 2024

Just another idea (after reading the problem with the data): why not have a function returning the data, and also provide the function with one argument, the version of openalexR which defaults to the currently installed version. By doing this, one could easily go back in time and see what changed between an older version and the new version in regards to coverage - very useful if one want’s to re-run a script after some time again and it does not work.

This was referenced Nov 18, 2024
@trangdata
Copy link
Collaborator

@yjunechoe OK I have moved these points to #297 and #296. Let's continue our discussions there.

@rkrug Thanks for your input. I have added the warning in multiple places and described the changes in NEWS.md. We understand that breaking changes require adjustment, but they are often essential for keeping the package aligned with best practices and maximizing functionality over the long term.

@trangdata trangdata merged commit 6a806c2 into ropensci:main Nov 18, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants