Achieving compatibility between openalexR and openalexPro #288

rkrug · 2024-10-13T15:38:57Z

Hi

I am working on openalexPro (https://github.com/rkrug/openalexPro) and I have it working finally to download the json files (adaptation to api_request() and oa_request()), converting them to parquet format, and converting the abstracts from inverted index and, in addition, creating a citation for each work(first author, et al or second author to nothing (publication_year) completely in duckdb - works very nicely and also populates the fields if missing with the correct structure if present in at least one work), and reading them into R into a tibble.

Now I am looking at compatibility to openalexR oa_fetch(output = "tibble"). I have the following questions (and I think it is easier to ask then to dug into the code as the conversion is quite complex):

How are the works ordered? Any specific sorting, or simply a by-product of the code?
How are the columns sorted - it seems for example, that the column doi is moved? Any specific sort order?
are there any columns you are dropping / renaming / process?

My idea is to use that "compatibility mode" for (is possible) the openalexPro system of packages so that these (graphing, analysis, etc) can also be used from openalexR.

I would also welcome comments to the download procedure of the json files, but this is not that important - my aim is, again, to keep compatibility with the input format of openalexR.

Any feedback welcome,

Rainer

The text was updated successfully, but these errors were encountered:

rkrug · 2024-10-14T08:21:56Z

OK - I found

openalexR/R/oa2df.R

Line 139 in bb8321d

col_order <- c(

concerning column order (2) - can I assume, that these are all columns returned?

Concerning (3) are these all the columns returned and the others dropped?

trangdata · 2024-10-14T16:01:32Z

Hi Rainer, I assume you mean oa_fetch(output = "tibble")?

1 - The works are not ordered. The user gets whichever order returned from OpenAlex.

2, 3 - The package was originally written to accommodate bibliometric analyses, so some of the columns were renamed. We're still working on tracking the coverage in #211 (works and authors done — TODO other entities. Maybe the files changed there will give you a better idea).

rkrug · 2024-10-14T18:51:07Z

Thanks - that looks great.

I can definitely work with that.

One point: in the case of e.g. bibliography, where there are values extracted, it would be great if the field could be specified, e.g. "biblio.volume, volume" and also "biblio, NA" to indicate where the value is coming from and that biblio itself is removed.

That would make it clearer to understand and also make it possible to rename sub fields.

Thanks again,

Rainer

rkrug · 2024-10-14T18:52:23Z

Yes - tibble. Edited the question.

trangdata added the question Further information is requested label Oct 14, 2024

trangdata closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Achieving compatibility between openalexR and openalexPro #288

Achieving compatibility between openalexR and openalexPro #288

rkrug commented Oct 13, 2024 •

edited

Loading

rkrug commented Oct 14, 2024

trangdata commented Oct 14, 2024

rkrug commented Oct 14, 2024

rkrug commented Oct 14, 2024

Achieving compatibility between openalexR and openalexPro #288

Achieving compatibility between openalexR and openalexPro #288

Comments

rkrug commented Oct 13, 2024 • edited Loading

rkrug commented Oct 14, 2024

trangdata commented Oct 14, 2024

rkrug commented Oct 14, 2024

rkrug commented Oct 14, 2024

rkrug commented Oct 13, 2024 •

edited

Loading