Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jabref-meta storage in bib file should be improved (by switching to embedded JSON) #10371

Open
koppor opened this issue Sep 11, 2023 · 7 comments
Assignees
Labels
📍 Assigned Assigned by assign-issue-action (or manually assigned) 📌 Pinned
Milestone

Comments

@koppor
Copy link
Member

koppor commented Sep 11, 2023

Context

While seeing that diff, I thought, something is really wrong:

B

The semicolon on position 1 indicates that multiple meta data items are can be written into @Comment. This was clear to me today (and not in 2016 #960). This would be great as this would minimize the number of @Comment entries. However, the saveActions also use ; as delimiter (position 2).

The "feature" of non-merging the meta fields is long time present. See e.g., an old issue report #250.

Thus, a straight-forward merge is most probably not possible.

Code hint: Separation according to ; is done at org.jabref.logic.importer.util.MetaDataParser#getNextUnit


Call for new metadata storage

Single JSON in @comment field

Example:

@Comment{jabref-meta-0.1.0
{
  "saveActions" :
  {
    "state": true,
    "date": ["normalize_date", "action2"],
    "pages" : ["normalize_page_numbers"],
    "month" : ["normalize_month"]
  }
}
}

Content:

{
  "saveActions" :
  {
     "state": true,
    "date": ["normalize_date", "action2"],
    "pages" : ["normalize_page_numbers"],
    "month" : ["normalize_month"]
  }
}

Decision outcome: Use "Single JSON in @comment field"


Migration path:

  • v6.0 can read and write both setting formats
    • when reading, the new format "wins" (if both exists)
  • v7.0 can read both settings, but writes only new setting format

After this is implemented, we can work on #8701


ADR

Single JSON in @comment field

  • Good, because a single @Comment element is enough
  • Good, because JSON parser can directly be used
  • Good, because we can nest elements in the json without the need of a custom format
  • Neutral, because JSON is nested in BibTeX
  • Bad, because syntax highlighting won't work
  • Bad, because the meta format changes
  • Bad, because looks "hacky"

Multiple JSON

Each preference could have a separate JSON nesting.

  • Bad, because lookup would be done using BibTeX data and second lookup using JSON. The preferences should be in a consistent format.

BibTeX

Example (From JabRef#232)

old:

@Comment{jabref-meta: saveActions:enabled;
date[normalize_date]
pages[normalize_page_numbers]
month[normalize_month]
;}

new:

@JabRef{saveActions,
  state = {enabled},
  date = {normalize_date, action2}
  pages = {normalize_page_numbers}
  month = {normalize_month}
}
  • Good, because feels natural
  • Good, because no additional parsing logic needs to be implemented
  • Good, because we currently have only one level of key/value pairs for the meta data (to be checked)
  • Bad, because even nested list (e.g., normalize_date, action2) is a custom format.
  • Bad, because multiple elements have to be used: One for each meta data key
  • Bad, because does not allow for nesting of properties
  • Bad, because other tools might treat these entries special
  • Bad, because "old" JabRef versions will treat these entries as "normal" entries

@comment and then nested

JabRef v5.9 (and before) used that format.

  • Good, because arbitrary content can be used
  • Bad, because the parsing logic needs to be written for the content inside

JSON at the end of the file

New entries always start with @. Anything outside the “argument” of a “command” starting
with an @ is considered as a comment. This gives an easy way to comment a given entry: just
remove the initial @. As usual when a language allows comments, don’t hesitate to use them so
that you have a clean, ordered, and easy-to-maintain database. Conversely, anything starting
with an @ is considered as being a new entry

@Article{demo,
   note={just an example article to illustrate the **previous** entry}
}

// jabref-meta-0.1.0
{
  "saveActions" :  {
   "state": true,
   "date": ["normalize_date", "action2"],
   "pages" : ["normalize_page_numbers"],
   "month" : ["normalize_month"]
  }
}
@Siedlerchr
Copy link
Member

BibDesk on mac stores its groups into apple plist xml format:

grafik

@koppor
Copy link
Member Author

koppor commented Sep 20, 2023

@koppor koppor changed the title jabref-meta storage in bib file should be improved jabref-meta storage in bib file should be improved (by switching to embedded JSON) Jul 3, 2024
@ThiloteE ThiloteE added this to the 6.0 milestone Sep 7, 2024
@leaf-soba
Copy link
Contributor

leaf-soba commented Sep 11, 2024

Sorry I'm new here and I want to work on this issue, I try to break this issue into some small steps, please check if I understand this issue right.

  1. write a unit test input is the Example in Single JSON in @comment field.
    • I don't know the expected output exactly in unit test now, but I'll try to figure it out later.
@Comment{jabref-meta-0.1.0
{
  "saveActions" :
  {
    "state": true,
    "date": ["normalize_date", "action2"],
    "pages" : ["normalize_page_numbers"],
    "month" : ["normalize_month"]
  }
}
}
  1. Update MetaDataParser#getNextUnit to handle the new JSON format in unit test case
  2. Write logic code to parse, read and write new JSON format.
    • I didn't find the proper place to put these logic code, maybe I should put them in MetaDataSerializer, MetaDataParser?
    • And I didn't find the old code to read @Comment in this step now, maybe in BibtexDatabaseWriter?
  3. Add more corner case in unit test about this update.

@koppor
Copy link
Member Author

koppor commented Oct 30, 2024

1. write a unit test input is the Example in `Single JSON in @comment field`.

Yes

   * I don't know the expected output exactly in unit test now, but I'll try to figure it out later.

The JSON content itself. Maybe the GSon library is your friend. I made good experiences in the http server part with it.

2. Update `MetaDataParser#getNextUnit` to handle the new JSON format in unit test case

The place is ´org.jabref.logic.importer.fileformat.BibtexParser#parseJabRefComment`.

3. Write logic code to parse, read and write new JSON format.

The hole MetaDataParser can be "deleted" - and a new loading from JSON. I think, it is JSON -> DTO -> metadata. Maybe also directly from JSON to MetaData. -- "deleted" is not quite true, because JabRef should be able to read "old" files - and on version 7, the old metadata is not writtin any more. In version 6, both formats are read and written; with the new format taking predecdence)

   * I didn't find the proper place to put these logic code,  maybe I should put them in `MetaDataSerializer`, `MetaDataParser`?
  • Reading: See above.
  • Writing: org.jabref.logic.exporter.BibDatabaseWriter#writeMetaData
   * And I didn't find the old code to read `@Comment` in this step now, maybe in `BibtexDatabaseWriter`?

See above.

There will be many unit tests for that.

@leaf-soba
Copy link
Contributor

OK, it is clear now, please assign to me.

@koppor
Copy link
Member Author

koppor commented Oct 30, 2024

/assign @leaf-soba

Copy link
Contributor

👋 Hey @leaf-soba, thank you for your interest in this issue! 🎉

We're excited to have you on board. Start by exploring our Contributing guidelines, and don't forget to check out our workspace setup guidelines to get started smoothly.

In case you encounter failing tests during development, please check our developer FAQs!

Having any questions or issues? Feel free to ask here on GitHub. Need help setting up your local workspace? Join the conversation on JabRef's Gitter chat. And don't hesitate to open a (draft) pull request early on to show the direction it is heading towards. This way, you will receive valuable feedback.

Happy coding! 🚀

⏳ Please note, you will be automatically unassigned if the issue isn't closed within 30 days (by 29 November 2024). A maintainer can also add the "📌 Pinned"" label to prevent automatic unassignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📍 Assigned Assigned by assign-issue-action (or manually assigned) 📌 Pinned
Projects
Status: Assigned
Development

No branches or pull requests

4 participants