Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: BibTeX + EndNote XML citation output for dataset with Permalink #10769

Open
vera opened this issue Aug 12, 2024 · 6 comments · May be fixed by #10790
Open

Bug: BibTeX + EndNote XML citation output for dataset with Permalink #10769

vera opened this issue Aug 12, 2024 · 6 comments · May be fixed by #10790
Labels
Type: Bug a defect

Comments

@vera
Copy link
Contributor

vera commented Aug 12, 2024

What steps does it take to reproduce the issue?

  1. Create dataset with permalink PID (in my example, the permalink of my dataset is https://clinicaltrials.gov/study/NCT00080262)
  2. Open dataset page and click "Cite Dataset" > "BibTeX" or "EndNote XML"

Two problems:

  1. I'm seeing weird output in the BibTeX output in L1 and line doi (missing http and extra slash after http).

    In the EndNote XML output, there is also an extra slash in <electronic-resource-num>.

    I briefly checked the code (BibTeX, EndNote XML) and I'm not sure why?

    The RIS citation is fine.

  2. in the BibTeX output, the permalink should not be given as doi since it's not a DOI

BibTeX:

@data{s://clinicaltrials.gov/study/NCT00080262_2024,
author = {$AUTHORS},
publisher = {Root},
title = {{$TITLE}},
year = {2024},
version = {V1},
doi = {http/s://clinicaltrials.gov/study/NCT00080262},
url = {http://localhost:8080/citation?persistentId=perma:https://clinicaltrials.gov/study/NCT00080262}
}

EndNote XML:

<?xml version='1.0' encoding='UTF-8'?><xml><records><record><ref-type name="Dataset">59</ref-type><contributors><authors>...</authors></contributors><titles><title>...</title></titles><section>...</section><dates><year>...</year></dates><edition>...</edition><publisher>...</publisher><urls><related-urls><url>http://localhost:8080/citation?persistentId=perma:https://clinicaltrials.gov/study/NCT00080262</url></related-urls></urls><electronic-resource-num>perma/http/s://clinicaltrials.gov/study/NCT00080262</electronic-resource-num></record></records></xml>

Which version of Dataverse are you using?

6.2

Any related open or closed issues to this bug report?

not aware

Screenshots:

-

Are you thinking about creating a pull request for this issue?

yes, would be interested

@qqmyers
Copy link
Member

qqmyers commented Aug 12, 2024

It looks like the citation code is assuming a / as a separator rather than using PidProvider specific code to create the entries. The specific issue of the / being 4 characters in is from using an unmanaged permalink. Because permalinks don't require a separator, there is no reliable way to tell the authority from the shoulder, so the code picks the first four chars as the authority.

@johannes-darms
Copy link
Contributor

@qqmyers Should we update the code to use the PIDProvider specific properties and create a PR?

@qqmyers
Copy link
Member

qqmyers commented Aug 12, 2024

I haven't looked at the code to be certain, but I think that makes sense. The GlobalId class has methods to get whatever form or part of a PID you want, so I think at this point, there shouldn't be core code outside that class hardcoding the protocol name or trying to parse/generate a PID for display.

@vera
Copy link
Contributor Author

vera commented Aug 12, 2024

It looks like the citation code is assuming a / as a separator rather than using PidProvider specific code to create the entries. The specific issue of the / being 4 characters in is from using an unmanaged permalink. Because permalinks don't require a separator, there is no reliable way to tell the authority from the shoulder, so the code picks the first four chars as the authority.

I see, that makes sense.

For completeness, here's what the export looks like with a managed Permalink:

BibTeX:

@data{NCT00080262_2024,
author = {$AUTHORS},
publisher = {Root},
title = {{$TITLE}},
year = {2024},
version = {V1},
doi = {https://clinicaltrials.gov/study//NCT00080262},
url = {http://localhost:8080/citation?persistentId=perma:https://clinicaltrials.gov/study/NCT00080262}
}

-> L1 seems fine, but the doi property has an extra slash in a different position (before the unique part of the Permalink)

EndNote XML:

<electronic-resource-num>perma/https://clinicaltrials.gov/study//NCT00080262</electronic-resource-num>

-> same issue (extra slash before the unique part of the Permalink)

RIS citation is still fine.

@qqmyers
Copy link
Member

qqmyers commented Aug 12, 2024

Cool. I see

out.write("doi = {");
out.write(persistentId.getAuthority());
out.write("/");
out.write(persistentId.getIdentifier());
which is where the hardcoded doi and / come from. I'm not sure what BibTeX allows for non-DOIs - looks like url is an option according to https://www.bibtex.com/g/bibtex-format/.

@pdurbin
Copy link
Member

pdurbin commented Aug 12, 2024

Yeah, I agree, "url" sounds like a good option when "doi" isn't available.

I just checked a dataset that uses Handles ( https://data.cimmyt.org/dataset.xhtml?persistentId=hdl:11529/10016 ) and the Bibtex output includes a false DOI like this:

doi = {11529/10016},

So yeah, it would probably be good to do something here to not assume DOIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

4 participants