Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Achieving compatibility between openalexR and openalexPro #288

Closed
rkrug opened this issue Oct 13, 2024 · 4 comments
Closed

Achieving compatibility between openalexR and openalexPro #288

rkrug opened this issue Oct 13, 2024 · 4 comments
Labels
question Further information is requested

Comments

@rkrug
Copy link

rkrug commented Oct 13, 2024

Hi

I am working on openalexPro (https://github.com/rkrug/openalexPro) and I have it working finally to download the json files (adaptation to api_request() and oa_request()), converting them to parquet format, and converting the abstracts from inverted index and, in addition, creating a citation for each work(first author, et al or second author to nothing (publication_year) completely in duckdb - works very nicely and also populates the fields if missing with the correct structure if present in at least one work), and reading them into R into a tibble.

Now I am looking at compatibility to openalexR oa_fetch(output = "tibble"). I have the following questions (and I think it is easier to ask then to dug into the code as the conversion is quite complex):

  1. How are the works ordered? Any specific sorting, or simply a by-product of the code?
  2. How are the columns sorted - it seems for example, that the column doi is moved? Any specific sort order?
  3. are there any columns you are dropping / renaming / process?

My idea is to use that "compatibility mode" for (is possible) the openalexPro system of packages so that these (graphing, analysis, etc) can also be used from openalexR.

I would also welcome comments to the download procedure of the json files, but this is not that important - my aim is, again, to keep compatibility with the input format of openalexR.

Any feedback welcome,

Rainer

@rkrug
Copy link
Author

rkrug commented Oct 14, 2024

OK - I found

col_order <- c(
concerning column order (2) - can I assume, that these are all columns returned?

Concerning (3) are these all the columns returned and the others dropped?

@trangdata
Copy link
Collaborator

Hi Rainer, I assume you mean oa_fetch(output = "tibble")?

1 - The works are not ordered. The user gets whichever order returned from OpenAlex.

2, 3 - The package was originally written to accommodate bibliometric analyses, so some of the columns were renamed. We're still working on tracking the coverage in #211 (works and authors done — TODO other entities. Maybe the files changed there will give you a better idea).

@trangdata trangdata added the question Further information is requested label Oct 14, 2024
@rkrug
Copy link
Author

rkrug commented Oct 14, 2024

Thanks - that looks great.

I can definitely work with that.

One point: in the case of e.g. bibliography, where there are values extracted, it would be great if the field could be specified, e.g. "biblio.volume, volume" and also "biblio, NA" to indicate where the value is coming from and that biblio itself is removed.

That would make it clearer to understand and also make it possible to rename sub fields.

Thanks again,

Rainer

@rkrug
Copy link
Author

rkrug commented Oct 14, 2024

Yes - tibble. Edited the question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants