-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added option to import CFF files #7946
Conversation
Thanks for tackling this issue. For the Authors take a look at Jabrefs Author List and Authors parser. Maybe you can use that. |
For the java classes from the yaml, you can generate them from https://www.jsonschema2pojo.org/ |
@AidanM11 What is the status here? It would be really cool if you could continue working on this |
Sorry about the delay, had a busy couple weeks. I've added tests and made some minor changes to the code. As of now, the importer can parse and map most of the fields CFF supports. The main exception is the person/entity system from CFF. In CFF, the authors and contact fields are lists of Person or Entity objects, which each have their own extensive lists of fields for all kinds of information. It seems like it would be very difficult to fit that system into the way jabref handles fields, so instead I set it up so that the importer just extracts the names of each object in the 'authors' list and concatenates them into an author string. In this case, the import operation seems unavoidably lossy. Similarly, CFF supports a list of 'identifiers' of different types (url, swh, other). JabRef does have support for these fields, but the CFF specs appear allow the user to add as many of these identifiers as they like (I'm not entirely sure about this, but it seems like it). So a single entry could have multiple SWH identifiers, which seems tricky to work into the JabRef system. However, there are also some 'static' fields in CFF that don't have JabRef counterparts (specifically: commit, license-url, repository-code, repository-artifact). You mentioned earlier that if CFF fields don't map to JabRef standard fields, I could "import/add them with a prefix". I wasn't sure what that meant, and was unable to find any examples of it in the codebase, so I was wondering if I could get a little more clarification on that. So, overall, the importer is nearly finished, I just need to work out what to do with identifiers (maybe just use the first one, or do a comma-separated list?) and also what to do about the 'static' fields without mappings. |
hm.put("abstract", StandardField.ABSTRACT); | ||
hm.put("message", StandardField.COMMENT); | ||
hm.put("date-released", StandardField.DATE); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For fields that can't be mapped, you can simply create a new UnkownField: (example is taken from the MSBibImporter)
And set CFF_PREFIX to sth ilke cff-
new UnknownField(CFF_PREFIX + "medium"), "Medium");
JabRef can show non-standard fields in the entry editor as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a better solution than prefixing those fields, since none of the citation styles would pick them up - but you might want to have the repository printed in the reference list. I've started a discussion at the biblatex repo how to handle those fields: plk/biblatex#1169
String aAlias = auth.vals.get("alias"); | ||
|
||
if (aName != null) { | ||
authors.add(aName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JabRef has a class called Author which will handle the construction of authors, see org.jabref.model.entry
public Author(String first, String firstabbr, String von, String last, String jr)
Afterwards, you put in an AuthorLIst and return the and separated value
See for example the Medra-Fetcher (org.jabref.logic.importer.fetcher)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from String firstabbr
, CFF has the same fields for person
type objects (e.g., authors). The main difference between these and the entity
type objects (for this use case at least) is that entities only have a single field name
instead, which I guess could be imported as is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first
->given-names
von
->name-particle
last
->family-names
jr
->name-suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that seems to be correct, for reference:
jabref/src/main/java/org/jabref/model/entry/Author.java
Lines 22 to 31 in cef4151
/** | |
* Creates the Author object. If any part of the name is absent, <CODE>null</CODE> must be passed; otherwise other methods may return erroneous results. | |
* | |
* @param first the first name of the author (may consist of several tokens, like "Charles Louis Xavier Joseph" in "Charles Louis Xavier Joseph de la Vall{\'e}e Poussin") | |
* @param firstabbr the abbreviated first name of the author (may consist of several tokens, like "C. L. X. J." in "Charles Louis Xavier Joseph de la Vall{\'e}e Poussin"). It is a responsibility of the caller to create a reasonable abbreviation of the first name. | |
* @param von the von part of the author's name (may consist of several tokens, like "de la" in "Charles Louis Xavier Joseph de la Vall{\'e}e Poussin") | |
* @param last the last name of the author (may consist of several tokens, like "Vall{\'e}e Poussin" in "Charles Louis Xavier Joseph de la Vall{\'e}e Poussin") | |
* @param jr the junior part of the author's name (may consist of several tokens, like "Jr. III" in "Smith, Jr. III, John") | |
*/ | |
public Author(String first, String firstabbr, String von, String last, String jr) { |
Hi 👋, CFF co-lead here. It's really great to see this work happening for JabRef!
FYI: as of version |
Yes, people can supply several identifiers for a software (which reflects reality).
Happy to help with such a heuristic which I think should ideally be based on the software citation principles as much as possible.
In terms of what the software citation principles suggest, |
Alright, so I've changed the author handling to incorporate the JabRef Author system, as @Siedlerchr suggested, and it seems to be working well. For DOI:
For SWHID
As you can see, when there are multiple identifiers of the same type, the current system just discards them. This doesn't seem ideal, but as the CFF file does not contain any other information that could differentiate those identifiers, it seems like the only alternative is to choose a random one, and that seems like a bad option. I guess it could also put them all into the field in a comma-separated list, but that seems like it goes against the intended use of those fields. Anyway, besides that, I've still got a small list of things to do before this pull request is finished (updated tests, handling fields without mappings, dataset type support, etc) so I'll keep working on it. Just wanted to make sure this identifier handling is satisfactory. |
I think this sounds good for the heuristic, DOI is always a good/valid option. Provides the easiest way to get any addtional metadata if needed. |
Thanks @AidanM11 for doing this work! I like your heuristics. Perhaps one thing to consider is that it's relatively common to have two DOIs in |
src/test/java/org/jabref/logic/importer/fileformat/CffImporterTest.java
Outdated
Show resolved
Hide resolved
Thanks for the work, can you please merge in the latest upstream main branch of JabRef and resolve the conflicts? |
src/main/java/org/jabref/logic/importer/fileformat/CffImporter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fileformat/CffImporter.java
Outdated
Show resolved
Hide resolved
List<String> relSwhIds = swhIds.stream() | ||
.filter(id -> id.split(":").length > 3) // quick filter for invalid swhids | ||
.filter(id -> id.split(":")[2].equals("rel")) | ||
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here you can also use findFirst again as above
src/main/java/org/jabref/logic/importer/fileformat/CffImporter.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fileformat/CffImporter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me and thanks for the tests! I have just some minor improvements regarding the code style.
@Siedlerchr I've made the requested changes. Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed some minor style issues. Otherwise LGTM, thanks for making JabRef better.
As soon as tests are green, we're merging.
* upstream/main: Enable copy&paste for IntelliJ run configuration GitBook: [main] one page modified GitBook: [main] one page modified GitBook: [main] 19 pages and 6 assets modified Refine documentation Oobranch f : add frontend (#7791) Observable Preferences C (General) (#8047) Welcome btut 🎉 Observable preferences B (ProtectedTerms, EntryEditor and MrDlib) (#8046) Refine howto (#8049) Decision on handling of localized preferences. Added option to import CFF files (#7946) Observable preferences A (Appearance and Importer) (#8041) Squashed 'buildres/csl/csl-locales/' changes from ec6d62a9e7..7a507fc008 Squashed 'buildres/csl/csl-styles/' changes from ec4a4c0..2b37392
Added ability to import CFF files as requested in #7945. As per CFF specifications, there is only one entry per CFF file, and CFF entries are always of the type "Software". JabRef does not support all the fields that CFF does, so a fair amount of information could be lost on import (specifically details about the authors beyond their names). I still need to test this more thoroughly and add automated tests, so I've marked this as a draft. I also plan on adding export support.
fixes #7945
Images:
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)