Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion #8531

addak · 2022-02-26T17:35:59Z

Changes:-

Add missing setters/extractors for COVERAGE, LANGUAGE, RIGHTS, SOURCE, TYPE wherever possible.
Support Day in Date Field during extraction

Tasks To Complete:-

Add Missing Extractor/ Filler functions
Add Day Support for Date Field
Add some additional test post initial discussion
Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Squash Commits
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

Siedlerchr · 2022-02-26T18:20:10Z

src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java

-                         .ifPresent(month -> bibEntry.setMonth(month));
+
+            if (Objects.nonNull(calender)) {
+                if (StringUtils.isNotEmpty(dateFormat)) {


Use the method "StringUtil" class from JabRef (note the absence of the "s")

Siedlerchr · 2022-02-26T18:30:32Z

src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java

-            String date = dates.get(0).trim();
+
+            String dateString = dates.get(0).trim();
+            var dateFormat = ((DublinCoreSchemaCustom) dcSchema).getDateAttributeFormatQualifier();


I still don't understand why it's necessary to add an additional attribute and modify the existing schema. This will probably break things.
Have you looked at JabRefs' Date.parse() it can already treat date ranges

Yeah it did break stuff in the Adobe Acrobat DC viewer( It couldn't find the seq entry for dc:date even though it was present )

As for Date.parse(), yeah you're right. I'll go with that then.
Edit : As was mentioned by the original method, the reason I wanted the format is due to the attached image ( while the rawValue is indeed 2000-05, the getElementsasString returns the date time value )

One way I can think of to get the rawValue is to override the dcSchema object's getUnqualifiedSequenceValueList ( and since the dcSchema object is already created and not instantiated by us, we can't use an anonymous class instantiation, I assume )

Ah now I see the issue. You need to adjust the Date parser https://www.baeldung.com/java-datetimeformatter

Try to add a pattern like uuuu-MM-dd'T'HH:mm:ssxxx not sure about the xx

I am currently testing this if this could work

Well, I get the heauristic approach part but I would like to know about whether the creation of the custom DcSchema class is fine or not. It uses FieldUtils because there aren't any setters for those fields.
I use this to get the raw value of the xmp date:dc li field, and not the getElementsAsString version ( which, as you said, provides the ISO8601 converted date, hence the defaults). While getElementsAsStrijg provides us the iso8601 date time string, the raw value is as is from the xmp data. Could be iso8601, yyyy-mm, yyyy, yyyy-mm-dd, etc. Since we're handling parsing anyway, we might as well disallow the standard parsing done by dcSchema.
In this case, we wouldn't need to assume/go about a heuristic approach. The only reason we needed that was because during fetching of date value from dcSchema, it would apply a DateConverter.toISO8601 on the rawValue and provide it to us, resulting in defaults for non-available parts of the date.

An Example:-
say we did (refer to XmpUtilWriterTest.testWriteXmp)

bibEntry.setDate(new Date(2000, 5));

when we convert this into xmp , this is how it looks

<?xpacket begin="ï»¿" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about=""> <dc:creator> <rdf:Seq> <rdf:li>Vladimir N. Vapnik</rdf:li> </rdf:Seq> </dc:creator> <dc:relation> <rdf:Bag> <rdf:li>bibtex/citationkey/WriteXMPTest</rdf:li> <rdf:li>bibtex/owner/Ich</rdf:li> </rdf:Bag> </dc:relation> <dc:coverage>coverageField</dc:coverage> <dc:identifier>10.1007/978-1-4757-3264-1</dc:identifier> <dc:language> <rdf:Bag> <rdf:li>English</rdf:li> <rdf:li> Japanese</rdf:li> </rdf:Bag> </dc:language> <dc:publisher> <rdf:Bag> <rdf:li>Springer Science + Business Media</rdf:li> </rdf:Bag> </dc:publisher> <dc:rights> <rdf:Alt> <rdf:li xml:lang="x-default">Right To X</rdf:li> </rdf:Alt> </dc:rights> <dc:source>JabRef</dc:source> <dc:title> <rdf:Alt> <rdf:li xml:lang="x-default">The Nature of Statistical Learning Theory</rdf:li> </rdf:Alt> </dc:title> <dc:date> <rdf:Seq> <rdf:li>2000-05</rdf:li> </rdf:Seq> </dc:date> <dc:format>application/pdf</dc:format> <dc:type> <rdf:Bag> <rdf:li>Book</rdf:li> </rdf:Bag> </dc:type> </rdf:Description> </rdf:RDF> </x:xmpmeta><?xpacket end="w"?>

Now, after parsing the xmp into the dsSchema, the raw value is 2005-05, as shown in the screenshot

The DateType Object ( which dc:date is ) , when a getElementsAsString() call is made, uses this function

@Override public String getStringValue() { return DateConverter.toISO8601(dateValue); }

and that's the value we get usually. After getting the raw value, this is what the parsing looks like

This is the raw output of the xml before any interpretation is happening.

If that is indeed the rawValue that's present as is in the xmp data, then with respect to the above example I've shown, I think we should parse it as is, without making assumptions.

Ah yes, I just checked that as well. So your approach looks good with the custom schema

Ah yes, I just checked that as well. So your approach looks good with the custom schema

Okay then, if everything else is good to go, I'll begin squashing commits, adding the "Support Date conversion from XMP to bibTex" in changelog. I don't think I need to add more tests honestly, since modifications to current Test Cases should cover that. There are no UI changes present. I guess the only thing remaining would be to run the Jabref application

You could add one test for the date thing now. Maybe in the Writer Tests
You also don't need to squash your commits. We do that usually on merge

Oh...I already squashed them...
Anyways here's a screenshot of it working

Sure, I'll add another test for that

Siedlerchr · 2022-02-26T18:31:29Z

src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java

@@ -299,7 +351,19 @@ private void fillCreator(String creators) {
     */
    private void fillDate() {
        bibEntry.getFieldOrAlias(StandardField.DATE)
-                .ifPresent(publicationDate -> dcSchema.addUnqualifiedSequenceValue("date", publicationDate));
+                .ifPresent(publicationDate -> {


Using Jabref's inbuilt Date parsing should be sufficient.

jabref/src/main/java/org/jabref/model/entry/Date.java

Line 81 in 1eed985

public static Optional<Date> parse(String dateString) {

…For Day Field.

Siedlerchr · 2022-03-01T19:00:29Z

src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java

+                            dateValue.getYear().ifPresent(year -> bibEntry.setField(StandardField.YEAR, Integer.toString(year)));
+                        });
+            } catch (RuntimeException e) {
+                LOGGER.error("Failed To Parse Date\n {}", ExceptionUtils.getStackTrace(e));


Is it really necessary to catch a RuntimeEception here? You have already optionals that will evaluate to empty if they cannot parse a date.
A Runtime Exception is an unchecked exception that should never be caught. An exception should always be as precise as possible.
https://www.baeldung.com/java-exceptions

Second, every logger takes an exception/throwable as second parameter. Stack trace etc are handled automatically

Siedlerchr · 2022-03-01T19:02:20Z

src/main/java/org/jabref/model/schema/DublinCoreSchemaCustom.java

+            FieldUtils.writeField(dublinCoreSchemaCustom, "container", dcSchema.getContainer(), true);
+            FieldUtils.writeField(dublinCoreSchemaCustom, "attributes", FieldUtils.readField(dcSchema, "attributes", true), true);
+        } catch (Exception e) {
+            LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e));


StackTrace etc is handled automatically by the logger. Every Logger takes an exception/trowable as last argument

Suggested change

LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e));

LOGGER.error("Error making custom DC Schema", e);

Siedlerchr · 2022-03-01T19:05:09Z

In general looks already good, just some code style improvements and it's okay from my side! 👍

Btw, as you are now into the XMP stuff, would you mind taking a look at this issue as well? #7925
But a fix should be a follow up PR then.

ThiloteE · 2022-03-01T19:11:07Z

If you need me for testing, just say when and what i have to do.

…ex for Date Field

addak · 2022-03-03T16:46:21Z

src/main/java/org/jabref/model/schema/DublinCoreSchemaCustom.java

        } catch (Exception e) {
-            LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e));
+            LOGGER.error("Error making custom DC Schema. Using the default", e);
+            return dcSchema;


I believe it will never end up here. but just in case, I've left it here

addak · 2022-03-03T16:52:19Z

Please Review!

Btw, as you are now into the XMP stuff, would you mind taking a look at this issue as well? #7925
But a fix should be a follow up PR then.

Sure, I'll look into it once I wrap this one up.

@Siedlerchr I forgot to mention but since dc:Type is mapped to the entityType ( say article), It's already implemented and hence I haven't covered it.

@ThiloteE Please read the Issue description. Verify that those fields don't end up as bibtex/{fieldname}/{value} in XMP and that partial dates don't have some default value when XMP dc:date is converted to bibTex and that day is present if it is present in the XMP.
I've kinda tested it in the UI but I'm not really familiar with the application in terms of functionality. Would be quite helpful if you could confirm it works as intended.

Siedlerchr · 2022-03-03T17:24:01Z

src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java

@@ -78,25 +81,17 @@ private void extractAuthor() {
     * property filled with jan will override this behavior and no data is lost. In the cases, where xmp is written by
     * another service, the assumption is, that the 1st January is not a publication date at all.
     */


Just saw this, I guess the javadoc needs to be updated to reflect the new behavior

Ah, you're right, and it looks I've forgotten to change the description. Will change that as part of the last task remaining

@Siedlerchr any idea which document I have to edit?
I know https://github.com/JabRef/user-documentation/tree/main/en contains the docs but I'm not sure which doc to edit. Some docs related to what I'm doing are:-
https://github.com/JabRef/user-documentation/blob/main/en/advanced/xmp.md
https://github.com/JabRef/user-documentation/blob/main/en/advanced/fields.md
and maybe this : https://github.com/JabRef/user-documentation/blob/main/en/collect/findunlinkedfiles.md

But I'm not sure which one I should edit

The xmp one should be the only relevant one. The findunlinked files is a different feature,

https://github.com/JabRef/user-documentation/blob/main/en/advanced/xmp.md

Okay I looked into the doc and there's no warning note, or any in particular, regarding JabRef's previous inability to convert day from XMP to bibtex

Siedlerchr

Thanks again!

CHANGELOG.md

koppor

In general: looks good. I have small nitpick comments 😇

src/main/java/org/jabref/model/entry/Date.java

CHANGELOG.md

src/main/java/org/jabref/model/schema/DublinCoreSchemaCustom.java

src/test/resources/org/jabref/logic/xmp/article_dublinCore.bib

src/main/java/org/jabref/logic/xmp/DublinCoreExtractor.java

Siedlerchr · 2022-03-09T17:46:27Z

Thanks again!

* upstream/main: Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion (#8531) Bump checkstyle from 9.3 to 10.0 (#8544) Fix online link detection in entry editor (#8514) Add some JavaDoc to Fetchers Support two argument form of \abx@aux@cite macro in DefaultAuxParser (#8549) Bump guava from 31.0.1-jre to 31.1-jre (#8543) Bump org.beryx.jlink from 2.24.4 to 2.25.0 (#8548) Bump postgresql from 42.3.2 to 42.3.3 (#8546) Bump richtextfx from 0.10.7 to 0.10.9 (#8547) Bump archunit-junit5-engine from 0.22.0 to 0.23.1 (#8545) Bump actions/checkout from 2 to 3 (#8542) Squashed 'buildres/csl/csl-styles/' changes from eb97405..8f69d4e Bump classgraph from 4.8.139 to 4.8.141 (#8535) Bump archunit-junit5-api from 0.22.0 to 0.23.1 (#8536) # Conflicts: # src/test/java/org/jabref/logic/xmp/XmpUtilReaderTest.java

* upstream/main: Update RemoteSetupTest.java, adding eq() function from mockito (#8561) readd encoding after merge Open office refactor finalization (formerly OObranch J cleanup) (#7795) Revert "Refine documentation (#8551)" (#8560) Refine documentation on logging (#8550) Revert "Refine documentation (#8551)" (#8559) Refine documentation (#8551) New Crowdin updates (#8557) New Crowdin updates (#8553) Fix missing metadata in BibDatabaseContext (#8556) Add encoding detection (and pin export to UTF-8) (#8506) Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion (#8531) Bump checkstyle from 9.3 to 10.0 (#8544) Fix online link detection in entry editor (#8514) Add some JavaDoc to Fetchers Support two argument form of \abx@aux@cite macro in DefaultAuxParser (#8549)

Siedlerchr reviewed Feb 26, 2022

View reviewed changes

Siedlerchr requested a review from btut February 26, 2022 18:22

Siedlerchr reviewed Feb 26, 2022

View reviewed changes

addak marked this pull request as ready for review February 28, 2022 04:53

addak force-pushed the fix/8491 branch from f7f031c to 149ad7f Compare February 28, 2022 05:57

addak changed the title ~~Add Missing Fillers/Extractors for Supported Fields and Supoort Day Conversion~~ Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion Feb 28, 2022

addak force-pushed the fix/8491 branch from 149ad7f to 8944aa9 Compare March 1, 2022 17:35

JabRef#8491 : Add Missing Setters/Fillers. Support XMP(DC) -> BibTex …

6d35ff5

…For Day Field.

Siedlerchr reviewed Mar 1, 2022

View reviewed changes

JabRef#8491 : Implement Suggestions. Refactor. Add Test for XMP->BibT…

2cf44b4

…ex for Date Field

addak force-pushed the fix/8491 branch from 8944aa9 to 2cf44b4 Compare March 3, 2022 16:42

addak commented Mar 3, 2022

View reviewed changes

Siedlerchr reviewed Mar 3, 2022

View reviewed changes

Siedlerchr approved these changes Mar 3, 2022

View reviewed changes

Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Mar 3, 2022

Siedlerchr reviewed Mar 3, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

JabRef#8491 : Change changelog message and update method description

7345b32

koppor requested changes Mar 7, 2022

View reviewed changes

JabRef#8491 : Address Requested Changes

ad93e77

addak force-pushed the fix/8491 branch from 9af8533 to ad93e77 Compare March 9, 2022 17:36

Siedlerchr merged commit 7bc6766 into JabRef:main Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion #8531

Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion #8531

addak commented Feb 26, 2022 •

edited

Loading

Siedlerchr Feb 26, 2022 •

edited

Loading

Siedlerchr Feb 26, 2022

addak Feb 26, 2022 •

edited

Loading

Siedlerchr Feb 26, 2022

Siedlerchr Feb 26, 2022

Siedlerchr Feb 26, 2022

addak Feb 28, 2022 •

edited

Loading

Siedlerchr Feb 28, 2022

addak Mar 1, 2022

Siedlerchr Mar 1, 2022

addak Mar 1, 2022 •

edited

Loading

Siedlerchr Feb 26, 2022

Siedlerchr Feb 26, 2022

Siedlerchr Mar 1, 2022

Siedlerchr Mar 1, 2022

Siedlerchr commented Mar 1, 2022

ThiloteE commented Mar 1, 2022

addak Mar 3, 2022

addak commented Mar 3, 2022 •

edited

Loading

Siedlerchr Mar 3, 2022

addak Mar 3, 2022

addak Mar 4, 2022

Siedlerchr Mar 4, 2022

addak Mar 6, 2022

Siedlerchr left a comment

koppor left a comment

Siedlerchr commented Mar 9, 2022

	LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e));
	LOGGER.error("Error making custom DC Schema", e);

Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion #8531

Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion #8531

Conversation

addak commented Feb 26, 2022 • edited Loading

Siedlerchr Feb 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

addak Feb 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

addak Feb 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

addak Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Siedlerchr commented Mar 1, 2022

ThiloteE commented Mar 1, 2022

Choose a reason for hiding this comment

addak commented Mar 3, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Siedlerchr left a comment

Choose a reason for hiding this comment

koppor left a comment

Choose a reason for hiding this comment

Siedlerchr commented Mar 9, 2022

addak commented Feb 26, 2022 •

edited

Loading

Siedlerchr Feb 26, 2022 •

edited

Loading

addak Feb 26, 2022 •

edited

Loading

addak Feb 28, 2022 •

edited

Loading

addak Mar 1, 2022 •

edited

Loading

addak commented Mar 3, 2022 •

edited

Loading