-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oa_fetch () missing additional author's institutions? #270
Comments
If the author has multiple institutions, we track only the first in oa_fetch_test1$author[[1]][2,]$institution_id
#> [1] "https://openalex.org/I58286723"
oa_fetch_test1$author[[1]][2,]$institution_lineage
#> [1] "https://openalex.org/I1329765538, https://openalex.org/I58286723"
oa_fetch_test1$author[[1]][2,]$institution_lineage |>
strsplit(", ") |>
el(1) |>
oa_fetch(entity = "institutions") |>
subset(, c("id", "display_name"))
#> # A tibble: 2 × 2
#> id display_name
#> <chr> <chr>
#> 1 https://openalex.org/I1329765538 Universities Space Research Association
#> 2 https://openalex.org/I58286723 Lunar and Planetary Institute Ref: #155 Actually sorry that's not quite right. I still don't see "University of Arizona". I'm not sure whether the data structure allowed multiple institutions back when we first implemented this - @trangdata do you recall? The structure for this "Malhotra" author is: #> 'data.frame': 1 obs. of 12 variables:
#> $ au_id : chr "https://openalex.org/A5003933592"
#> $ au_display_name : chr "Renu Malhotra"
#> $ au_orcid : chr "https://orcid.org/0000-0002-1226-3305"
#> $ author_position : chr "middle"
#> $ is_corresponding : logi FALSE
#> $ au_affiliation_raw : chr "Lunar and Planetary Laboratory, The University of Arizona, USA"
#> $ institution_id : chr "https://openalex.org/I58286723"
#> $ institution_display_name: chr "Lunar and Planetary Institute"
#> $ institution_ror : chr "https://ror.org/01r4eh644"
#> $ institution_country_code: chr "US"
#> $ institution_type : chr "facility"
#> $ institution_lineage : chr "https://openalex.org/I1329765538, https://openalex.org/I58286723" |
Thank you. It will be nice to have all the institutions available, given the number of cases like the above. My case shows about 10% of works. There will be multiple ways to get it implemented, such as list(). or an additional fields |
Thank you for this conversation @yhan818 and @yjunechoe. I think OpenAlex used to provide only one affiliation of authors, and when they introduced more affiliations/institutions, we still stick with exporting only the first one for simplicity. But you're right, we could make these list columns. Lines 222 to 236 in 774aff7
|
OK so currently, we have the following columns for oa_fetch_test1 <- openalexR::oa_fetch(entity = "works", id = "https://openalex.org/W4401226694")
oa_fetch_test1$author[[1]] |>
dplyr::select(au_affiliation_raw, starts_with("institution"))
#> au_affiliation_raw
#> 1 Department of Astronomy & Astrophysics, University of Toronto, Canada
#> 2 Lunar and Planetary Laboratory, The University of Arizona, USA
#> 3 Dept. of Physics and Astronomy, Northwestern University, 2145 Sheridan Rd., Evanston, IL 60208 and Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA), USA
#> institution_id institution_display_name
#> 1 https://openalex.org/I185261750 University of Toronto
#> 2 https://openalex.org/I58286723 Lunar and Planetary Institute
#> 3 https://openalex.org/I111979921 Northwestern University
#> institution_ror institution_country_code institution_type
#> 1 https://ror.org/03dbr7087 CA education
#> 2 https://ror.org/01r4eh644 US facility
#> 3 https://ror.org/000e0be47 US education
#> institution_lineage
#> 1 https://openalex.org/I185261750
#> 2 https://openalex.org/I1329765538, https://openalex.org/I58286723
#> 3 https://openalex.org/I111979921 Created on 2024-09-08 with reprex v2.0.2 The question is, do we want to include oa_fetch_test1$author[[1]]$affiliations
# [[1]]
# [[1]]$raw_affiliation_string
# [1] "Department of Astronomy & Astrophysics, University of Toronto, Canada"
#
# [[1]]$institution_ids
# [[1]]$institution_ids[[1]]
# [1] "https://openalex.org/I185261750"
#
#
#
# [[2]]
# [[2]]$raw_affiliation_string
# [1] "Lunar and Planetary Laboratory, The University of Arizona, USA"
#
# [[2]]$institution_ids
# [[2]]$institution_ids[[1]]
# [1] "https://openalex.org/I58286723"
#
# [[2]]$institution_ids[[2]]
# [1] "https://openalex.org/I138006243"
#
#
#
# [[3]]
# [[3]]$raw_affiliation_string
# [1] "Dept. of Physics and Astronomy, Northwestern University, 2145 Sheridan Rd., Evanston, IL 60208 and Center for Interdisciplinary Exploration and Research in Astrophysics (CIERA), USA"
#
# [[3]]$institution_ids
# [[3]]$institution_ids[[1]]
# [1] "https://openalex.org/I111979921"
oa_fetch_test1$author[[1]]$institutions
#> [[1]]
#> # A tibble: 1 × 6
#> id display_name ror country_code type lineage
#> <chr> <chr> <chr> <chr> <chr> <named list>
#> 1 https://openalex.org/I185261750 University of Toronto https://ror.org/03dbr7087 CA education <list [1]>
#>
#> [[2]]
#> # A tibble: 2 × 6
#> id display_name ror country_code type lineage
#> <chr> <chr> <chr> <chr> <chr> <named list>
#> 1 https://openalex.org/I58286723 Lunar and Planetary Institute https://ror.org/01r4eh644 US facility <list [2]>
#> 2 https://openalex.org/I138006243 University of Arizona https://ror.org/03m2x1q45 US education <list [1]>
#>
#> [[3]]
#> # A tibble: 1 × 6
#> id display_name ror country_code type lineage
#> <chr> <chr> <chr> <chr> <chr> <named list>
#> 1 https://openalex.org/I111979921 Northwestern University https://ror.org/000e0be47 US education <list [1]> What do we think? @yjunechoe @yhan818 What do we want to keep for backward compatibility? (again, I think it's good to keep in mind this change from one institution to more was from OpenAlex, so maybe a breaking change is necessary). Also note that there may be a cost in performance to do all this concatenation when we include everything like the According to the documentation:
|
OpenAlex has changed some outputs quite heavily in 2024. It has new data model and added new entities (e.g. grants). In general, maintaining backward compatibility is a good practice. For example, it will not break code developed using the current openAlexR. Shall we add a new field (e.g. author's affiliations) and leave the old one untouched? |
@yhan818 I agree generally it's good practice to maintain backward compatibility, but we do have to balance that out with other factors like cost of maintenance, computation, complexity, etc. I have shared this view before. To sum up, as a third-party package, I think it's important we try to mirror how OpenAlex changes. |
To keep up with OpenAlex changes is a moving target, and openalexR will always be running behind. But one could do the following, to offer both:
The problem would be step on, i.e. changing a default value, which will break compatibility, but this could be introduces over a few version with deprecation warning. |
Agreed with both of you in principle. Given the changes with openAlex, it is not mature. So back-comparability may not be that important. I am fine with either approach. |
I am conducting institutional-level citation analysis.
There are some cases that an author having multiple affiliations. A parent organization may have multiple child organizations. For example, University of Arizona ROR (https://ror.org/03m2x1q45) have multiple units, including Lunar and Planetary Institute (https://ror.org/01r4eh644)
For certain works, an author has multiple institutions/affiliations associated with the work's metadata in OpenAlex.
oa_fetch_test1 <-oa_fetch( entity="works", id="https://openalex.org/W4401226694")
view(oa_fetch_test1[[4]][[1]])
It has " 2 https://openalex.org/I58286723 Lunar and Planetary Institute https://ror.org/01r4eh644 " only.
It has both (Lunar and Planetary Institute" and "University of Arizona".
So oa_fetch() for "works" missing the additional institutions from openAlex's API data?
The text was updated successfully, but these errors were encountered: