Skip to content

Bondoki/ParsingMetadataMD2JSON

Repository files navigation

ParsingMetadataMD2JSON

Simplistic parser to convert Markdown metadata file for a given M_README.md with identifiers similar to DublinCore terms to a simplistic JSON file for further processing. The JSON file is inspired by the ZENODO.json schema, see also ZENODO developers guide. Metadata information (data on data) are crucial to find and understand your data in your project tree and these JSON files can be used for further data processing, e.g. to create a database catalog for your files or to provide additional metadata in public repository. Feel free to adapt it to your needs.

Getting Started

Follow these instructions to run the application ParsingMetadataMD2JSON.

Prerequisites

Requirements for the software:

  • Python3 and Python modules sys, pathlib, re, and json
  • Optional: Jupyter for interactivity

Installing

  • clone the repository
    git clone https://github.com/Bondoki/ParsingMetadataMD2JSON

RUNNING

  • run the application with sample file M_Dataset_README_Example.md
    python3 ParsingMetadataMD2JSON.py M_PhD_README_Example.md
  • this should generate a new file M_Dataset_README_Example.json and promted with success:
    SUCCESS: M_Dataset_README_Example.md parsed to M_Dataset_README_Example.json
  • alternatively, run and use the Jupyter notebook ParsingMetadataMD2JSON.jpynb with
    jupyter-lab ParsingMetadataMD2JSON.ipynb

Metadata fields

The following keywords will be parsed and converted:

Keyword Description
Title Descriptive name the Paper/Project/Thesis/Dataset
Creator A consecutive list of names, who created the resource and is primarily responsible.
Creator.ORCID Additional information: The ORCID identifier of the Creator.
Creator.Email Additional information: The email identifier of the Creator.
Publisher The department/institute responsible for making the resource available.
Contributor A consecutive list of names, contributed to the resource and is secondary to Creators.
Contributor.ORCID Additional information: The ORCID identifier of the Contributor.
Contributor.Email Additional information: The email identifier of the Contributor.
Description A textual description of the content of the resource.
Subject Phrase\Keywords describing the content of the resource.
Date A date associated with the creation or availability of the resource. Recommended format: YYYY-MM-DD.
Language The language of the resource recommended as BCP 47 language tag.
Format The data format to identify the software and possibly hardware that might be needed to display or operate the resource. For a list of MIME types see here.
Type The category of the resource e.g. Collection, Dataset, Event, Image, Experiment, Simulation, Report, Text, Draft, Image. See also DCMI Type Vocabulary.
Coverage Temporal coverage is typically a period for acquiring the data.
Source Information about a second resource from which the present resource is derived - if applicable.
Relation Provide a relationship from source to the present resource, e.g. IsVersionOf, IsReplacedBy, IsPartOf, IsReferencedBy, see Qualified Dublin Core Terms.
Identifier An unique identifier of the resource, e.g. DOI, ISBN, Number
Method Refer to your (post-)processing tools/methods, e.g. URL or git hash, as relation.
Rights A rights management statement of the resource, e.g. license for publishing and sharing.

Authors

License

This project is licensed under the Unlicense.

About

Simplistic Parser to convert Markdown to JSON

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published