-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion #8531
Conversation
.ifPresent(month -> bibEntry.setMonth(month)); | ||
|
||
if (Objects.nonNull(calender)) { | ||
if (StringUtils.isNotEmpty(dateFormat)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the method "StringUtil" class from JabRef (note the absence of the "s")
String date = dates.get(0).trim(); | ||
|
||
String dateString = dates.get(0).trim(); | ||
var dateFormat = ((DublinCoreSchemaCustom) dcSchema).getDateAttributeFormatQualifier(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why it's necessary to add an additional attribute and modify the existing schema. This will probably break things.
Have you looked at JabRefs' Date.parse() it can already treat date ranges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it did break stuff in the Adobe Acrobat DC viewer( It couldn't find the seq entry for dc:date even though it was present )
As for Date.parse(), yeah you're right. I'll go with that then.
Edit : As was mentioned by the original method, the reason I wanted the format is due to the attached image ( while the rawValue is indeed 2000-05, the getElementsasString returns the date time value )
One way I can think of to get the rawValue is to override the dcSchema object's getUnqualifiedSequenceValueList ( and since the dcSchema object is already created and not instantiated by us, we can't use an anonymous class instantiation, I assume )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah now I see the issue. You need to adjust the Date parser https://www.baeldung.com/java-datetimeformatter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to add a pattern like uuuu-MM-dd'T'HH:mm:ssxxx
not sure about the xx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am currently testing this if this could work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I get the heauristic approach part but I would like to know about whether the creation of the custom DcSchema class is fine or not. It uses FieldUtils because there aren't any setters for those fields.
I use this to get the raw value of the xmp date:dc li field, and not the getElementsAsString version ( which, as you said, provides the ISO8601 converted date, hence the defaults). While getElementsAsStrijg provides us the iso8601 date time string, the raw value is as is from the xmp data. Could be iso8601, yyyy-mm, yyyy, yyyy-mm-dd, etc. Since we're handling parsing anyway, we might as well disallow the standard parsing done by dcSchema.
In this case, we wouldn't need to assume/go about a heuristic approach. The only reason we needed that was because during fetching of date value from dcSchema, it would apply a DateConverter.toISO8601 on the rawValue and provide it to us, resulting in defaults for non-available parts of the date.
An Example:-
say we did (refer to XmpUtilWriterTest.testWriteXmp)
bibEntry.setDate(new Date(2000, 5));
when we convert this into xmp , this is how it looks
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?><x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
<dc:creator>
<rdf:Seq>
<rdf:li>Vladimir N. Vapnik</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:relation>
<rdf:Bag>
<rdf:li>bibtex/citationkey/WriteXMPTest</rdf:li>
<rdf:li>bibtex/owner/Ich</rdf:li>
</rdf:Bag>
</dc:relation>
<dc:coverage>coverageField</dc:coverage>
<dc:identifier>10.1007/978-1-4757-3264-1</dc:identifier>
<dc:language>
<rdf:Bag>
<rdf:li>English</rdf:li>
<rdf:li> Japanese</rdf:li>
</rdf:Bag>
</dc:language>
<dc:publisher>
<rdf:Bag>
<rdf:li>Springer Science + Business Media</rdf:li>
</rdf:Bag>
</dc:publisher>
<dc:rights>
<rdf:Alt>
<rdf:li xml:lang="x-default">Right To X</rdf:li>
</rdf:Alt>
</dc:rights>
<dc:source>JabRef</dc:source>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">The Nature of Statistical Learning Theory</rdf:li>
</rdf:Alt>
</dc:title>
<dc:date>
<rdf:Seq>
<rdf:li>2000-05</rdf:li>
</rdf:Seq>
</dc:date>
<dc:format>application/pdf</dc:format>
<dc:type>
<rdf:Bag>
<rdf:li>Book</rdf:li>
</rdf:Bag>
</dc:type>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta><?xpacket end="w"?>
Now, after parsing the xmp into the dsSchema, the raw value is 2005-05, as shown in the screenshot
The DateType Object ( which dc:date is ) , when a getElementsAsString() call is made, uses this function
@Override
public String getStringValue()
{
return DateConverter.toISO8601(dateValue);
}
and that's the value we get usually. After getting the raw value, this is what the parsing looks like
This is the raw output of the xml before any interpretation is happening.
If that is indeed the rawValue that's present as is in the xmp data, then with respect to the above example I've shown, I think we should parse it as is, without making assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I just checked that as well. So your approach looks good with the custom schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I just checked that as well. So your approach looks good with the custom schema
Okay then, if everything else is good to go, I'll begin squashing commits, adding the "Support Date conversion from XMP to bibTex" in changelog. I don't think I need to add more tests honestly, since modifications to current Test Cases should cover that. There are no UI changes present. I guess the only thing remaining would be to run the Jabref application
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add one test for the date thing now. Maybe in the Writer Tests
You also don't need to squash your commits. We do that usually on merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -299,7 +351,19 @@ private void fillCreator(String creators) { | |||
*/ | |||
private void fillDate() { | |||
bibEntry.getFieldOrAlias(StandardField.DATE) | |||
.ifPresent(publicationDate -> dcSchema.addUnqualifiedSequenceValue("date", publicationDate)); | |||
.ifPresent(publicationDate -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Jabref's inbuilt Date parsing should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static Optional<Date> parse(String dateString) { |
dateValue.getYear().ifPresent(year -> bibEntry.setField(StandardField.YEAR, Integer.toString(year))); | ||
}); | ||
} catch (RuntimeException e) { | ||
LOGGER.error("Failed To Parse Date\n {}", ExceptionUtils.getStackTrace(e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really necessary to catch a RuntimeEception here? You have already optionals that will evaluate to empty if they cannot parse a date.
A Runtime Exception is an unchecked exception that should never be caught. An exception should always be as precise as possible.
https://www.baeldung.com/java-exceptions
Second, every logger takes an exception/throwable as second parameter. Stack trace etc are handled automatically
FieldUtils.writeField(dublinCoreSchemaCustom, "container", dcSchema.getContainer(), true); | ||
FieldUtils.writeField(dublinCoreSchemaCustom, "attributes", FieldUtils.readField(dcSchema, "attributes", true), true); | ||
} catch (Exception e) { | ||
LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StackTrace etc is handled automatically by the logger. Every Logger takes an exception/trowable as last argument
LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e)); | |
LOGGER.error("Error making custom DC Schema", e); |
In general looks already good, just some code style improvements and it's okay from my side! 👍 Btw, as you are now into the XMP stuff, would you mind taking a look at this issue as well? #7925 |
If you need me for testing, just say when and what i have to do. |
…ex for Date Field
} catch (Exception e) { | ||
LOGGER.error("Error making custom DC Schema\n {}", ExceptionUtils.getStackTrace(e)); | ||
LOGGER.error("Error making custom DC Schema. Using the default", e); | ||
return dcSchema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it will never end up here. but just in case, I've left it here
Please Review!
Sure, I'll look into it once I wrap this one up. @Siedlerchr I forgot to mention but since dc:Type is mapped to the entityType ( say article), It's already implemented and hence I haven't covered it. @ThiloteE Please read the Issue description. Verify that those fields don't end up as bibtex/{fieldname}/{value} in XMP and that partial dates don't have some default value when XMP dc:date is converted to bibTex and that day is present if it is present in the XMP. |
@@ -78,25 +81,17 @@ private void extractAuthor() { | |||
* property filled with jan will override this behavior and no data is lost. In the cases, where xmp is written by | |||
* another service, the assumption is, that the 1st January is not a publication date at all. | |||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw this, I guess the javadoc needs to be updated to reflect the new behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, and it looks I've forgotten to change the description. Will change that as part of the last task remaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Siedlerchr any idea which document I have to edit?
I know https://github.com/JabRef/user-documentation/tree/main/en contains the docs but I'm not sure which doc to edit. Some docs related to what I'm doing are:-
https://github.com/JabRef/user-documentation/blob/main/en/advanced/xmp.md
https://github.com/JabRef/user-documentation/blob/main/en/advanced/fields.md
and maybe this : https://github.com/JabRef/user-documentation/blob/main/en/collect/findunlinkedfiles.md
But I'm not sure which one I should edit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The xmp one should be the only relevant one. The findunlinked files is a different feature,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/JabRef/user-documentation/blob/main/en/advanced/xmp.md
Okay I looked into the doc and there's no warning note, or any in particular, regarding JabRef's previous inability to convert day from XMP to bibtex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general: looks good. I have small nitpick comments 😇
src/main/java/org/jabref/model/schema/DublinCoreSchemaCustom.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/model/schema/DublinCoreSchemaCustom.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/model/schema/DublinCoreSchemaCustom.java
Outdated
Show resolved
Hide resolved
Thanks again! |
* upstream/main: Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion (#8531) Bump checkstyle from 9.3 to 10.0 (#8544) Fix online link detection in entry editor (#8514) Add some JavaDoc to Fetchers Support two argument form of \abx@aux@cite macro in DefaultAuxParser (#8549) Bump guava from 31.0.1-jre to 31.1-jre (#8543) Bump org.beryx.jlink from 2.24.4 to 2.25.0 (#8548) Bump postgresql from 42.3.2 to 42.3.3 (#8546) Bump richtextfx from 0.10.7 to 0.10.9 (#8547) Bump archunit-junit5-engine from 0.22.0 to 0.23.1 (#8545) Bump actions/checkout from 2 to 3 (#8542) Squashed 'buildres/csl/csl-styles/' changes from eb97405..8f69d4e Bump classgraph from 4.8.139 to 4.8.141 (#8535) Bump archunit-junit5-api from 0.22.0 to 0.23.1 (#8536) # Conflicts: # src/test/java/org/jabref/logic/xmp/XmpUtilReaderTest.java
* upstream/main: Update RemoteSetupTest.java, adding eq() function from mockito (#8561) readd encoding after merge Open office refactor finalization (formerly OObranch J cleanup) (#7795) Revert "Refine documentation (#8551)" (#8560) Refine documentation on logging (#8550) Revert "Refine documentation (#8551)" (#8559) Refine documentation (#8551) New Crowdin updates (#8557) New Crowdin updates (#8553) Fix missing metadata in BibDatabaseContext (#8556) Add encoding detection (and pin export to UTF-8) (#8506) Add Missing Fillers/Extractors for Supported Fields and Support Day Conversion (#8531) Bump checkstyle from 9.3 to 10.0 (#8544) Fix online link detection in entry editor (#8514) Add some JavaDoc to Fetchers Support two argument form of \abx@aux@cite macro in DefaultAuxParser (#8549)
Changes:-
Fixes #8491
Tasks To Complete:-
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)