diff --git a/docs/DMP1.bak.odt b/DMPDocs/DMP1.bak.odt similarity index 100% rename from docs/DMP1.bak.odt rename to DMPDocs/DMP1.bak.odt diff --git a/docs/DMP1.odt b/DMPDocs/DMP1.odt similarity index 100% rename from docs/DMP1.odt rename to DMPDocs/DMP1.odt diff --git a/docs/DataManagmentplan.docx b/DMPDocs/DataManagmentplan.docx similarity index 100% rename from docs/DataManagmentplan.docx rename to DMPDocs/DataManagmentplan.docx diff --git a/DMPDocs/bmbf-en.xml b/DMPDocs/bmbf-en.xml new file mode 100644 index 0000000..f22c9de --- /dev/null +++ b/DMPDocs/bmbf-en.xml @@ -0,0 +1,310 @@ +
+
+

Datenmanagementplan (Beta test)

+

Projektname: $_PROJECT

+

Forschungsförderer: Bundesministerium für Bildung und Forschung

+

Förderprogramm:

+

FKZ: $_DMPVERSION

+ +

Projektkoordinator: $_USERNAME

+

Kontaktperson Datenmanagement: $_DATAOFFICER

+ +

Kontakt: $_EMAIL

+

Projektbeschreibung: + + +

+ The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of analyzing data. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section. + +

+ +

+ + The $_PROJECT will collect and/or generate the following types of raw data: $_PHENOTYPIC, $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEoMIC, $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data will also be processed and modified using analytical pipelines, which may yield different results or include ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be + tracked in the DataPLANT ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive these resources (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise + in the DataPLANT consortium#endif$_DATAPLANT. + +

+

+

+

Erstellungsdatum:

+

Änderungsdatum:

+

Zu beachtende Vorgaben:

+ +

#if$_EU The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. #endif$_EU To best profit from open data, it is necessary not only to store the data but to make it Findable, Accessible, Interoperable, and Reusable (FAIR).#if$_PROTECT We support open and FAIR data, however, we also consider the need to protect individual data sets. #endif$_PROTECT + +

+

#if$_DATAPLANT Durch die Implementierung von DataPLANT können Forscher sicherstellen, dass alle relevanten Richtlinien und Anforderungen im Zusammenhang mit dem Datenmanagement eingehalten werden, was zu einer höheren Qualität und Zuverlässigkeit der Forschungsdaten führt.#endif$_DATAPLANT +

+ + + +

Datenerhebung

+ +

Public data will be extracted as described in the previous paragraph. For the $_PROJECT, specific data sets will be generated by the consortium partners.

+ + +

+ Data of different types or representing different domains will be generated using unique approaches. For example: +

+ + + #if$_PREVIOUSPROJECTS +

Data from previous projects such as $_PREVIOUSPROJECTS will be considered.

+ #endif$_PREVIOUSPROJECTS + +

We expect to generate $_RAWDATA GB of raw data and up to $_DERIVEDDATA GB of processed data.

+

+ +

+ + +

Datenspeicherung:

+ +

+ + #if$_DATAPLANT In DataPLANT, data storage relies on the Annotated Research Context (ARC). It is password protected, so before any data can be obtained or samples generated, an authentication needs to + take place. #endif$_DATAPLANT + +

+ + +

+ + Online platforms will be protected by vulnerability scanning, two-factor authorization and daily automatic backups allowing immediate recovery. All partners holding confidential project data to use secure platforms with automatic backups and offsite secure copies. + #if$_DATAPLANT DataHUB and ARCs have been generated in DataPLANT, data security will be imposed. This comprises secure storage, and the use of password and usernames is generally transferred via separate safe media.#endif$_DATAPLANT + +

+ +

+

The $_PROJECT will bear the costs of data curation, #if$_DATAPLANT ARC consistency checks, #endif$_DATAPLANT and data maintenance/security before transfer to public repositories. Subsequent costs are then borne by the operators of these repositories.

+

+ + Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) but not charged against the $_PROJECT or its members but by the operation budget of these repositories. + +

+ + + +

+ +

Die Dateibenennung erfolgt nach folgendem Standard:

+

+ + Data variables will be allocated standard names. For example, genes, proteins and metabolites will be named according to approved nomenclature and conventions. These will also be linked to functional ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by humans. Plant names will include traditional names, binomials, and all strain/cultivar/subspecies/variety identifiers. + +

+ +

+

Datendokumentation

+

+ We will use Investigation, Study, Assay (ISA) specification for metadata creation. #if$_RNASEQ|$_GENOMIC For specific data (e.g., RNASeq or genomic data), we use metadata templates from the end-point repositories. #if$_MINSEQE The Minimum Information About a Next-generation Sequencing Experiment (MinSEQe) will also be used. #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC + + The following metadata/ minimum informatin standards will be used to collect metadata: + #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS + #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU + #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG + #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS + #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: Specimen),#endif$_MIMARKSSPECIMEN + #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: Survey),#endif$_MIMARKSSURVEY + #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG + #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG + #endif$_GENOMIC|$_GENETIC + #if$_TRANSCRIPTOMIC + #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC + #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC + #if$_IMAGE + #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI + #endif$_IMAGE + #if$_PROTEOMIC + #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE + #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX + #endif$_PROTEOMIC + #if$_METABOLOMIC #if$_METABOLIGHTS Metabolights submission compliant standards will be used for metabolomic data where this is acccepted by the consortium partners.#issuewarning some Metabolomics partners considers Metabolights not an accepted standard.#endissuewarning #endif$_METABOLIGHTS #endif$_METABOLOMIC As a part of plant research community, we use #if$_MIAPPE MIAPPE for phenotyping data in the broadest sense, but we will also be rely on #endif$_MIAPPE specific SOPs for additional + annotations #if$_DATAPLANT that consider advanced DataPLANT annotation and ontologies. #endif$_DATAPLANT + + +

+

+ In case some of the metadata is still missing from the documentation from the experimental scientists and data officer. #if$_DATAPLANT Raw data identifier and parsers provided by DataPLANT will be + used to get meta data directly from the raw data file. The metadata collected from the raw data file can also be used to validate the metadata previously collected in case there are any mistakes. + #endif$_DATAPLANT We foresee using #if$_RNASEQ|$_GENOMIC e.g.#if$_MINSEQE MinSEQe for sequencing data and#endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites as + well as MIAPPE for phenotyping like data. The latter will thus allow the integration of data across projects and safeguards that reuse established and tested protocols. Additionally, we will use ontology + terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the $_PROJECT.

+ + +

Legitimität

+

+

+ + At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee, however, diligence for plant resource + benefit sharing is considered. #issuewarning you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya (🡺see Nagoya protocol). gets also part of sequence information. + In any case if you use material not from your (partner) country and characterize this physically e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action unless + this is from e.g. US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. #endissuewarning + +

+

+ + The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication + activities using specific methods and procedures developed by the $_PROJECT partners to adhere to data protection. #issuewarning you need to inform and better get WRITTEN consent that you store emails and + names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them #endissuewarning + +

+ + +

+ +

Data Sharing

+

+

+ + In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally, and the username and the password will be required (see also our GDPR rules). + In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. this is the case for ENA as well and both are in line with GDPR requirements. + +

+

+ There will be no restrictions once the data is made public. +

+

+ + The data will initially benefit the $_PROJECT partners, but will also be made available to selected stakeholders closely involved in the project, and then the scientific community working on $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also use the data after publication. The data will be disseminated according to the $_PROJECT's dissemination and communication plan, #if$_DATAPLANT which aligns with DataPLANT platform or other means#endif$_DATAPLANT + +

+

+

Datenerhalt

+

+ +

+ We expect to generate raw data in the range of $_RAWDATA GB of data. The size of the derived data will be about $_DERIVEDDATA GB. +

+ + +

+ #if$_DATAPLANT As the $_PROJECT is closely aligned with DataPLANT, the ARC converter and DataHUB will be used to find the end-point repositories and upload to the repositories automatically. #endif$_DATAPLANT + +

+

+ + Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in + international discipline related repositories which use specialized technologies: +

+ #if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC

+ + +

+ #if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC

+ +

+ #if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC

+ +

+ #if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE

+ +

+ #if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS EBI-MetaboLights;#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC

+

+ #if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE;#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC

+ + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as well.#endif$_OTHEREP + +

+

+ + The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. + #if$_DATAPLANT For DataPLANT, this has been agreed upon. #endif$_DATAPLANT #issuewarning if no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication. #endissuewarning + +

+ +

+
+
\ No newline at end of file diff --git a/docs/dmp1.html b/DMPDocs/dmp1.html similarity index 100% rename from docs/dmp1.html rename to DMPDocs/dmp1.html diff --git a/docs/dmp1.html.bak b/DMPDocs/dmp1.html.bak similarity index 100% rename from docs/dmp1.html.bak rename to DMPDocs/dmp1.html.bak diff --git a/docs/guide.html b/DMPDocs/guide.html similarity index 100% rename from docs/guide.html rename to DMPDocs/guide.html diff --git a/docs/guide.odt b/DMPDocs/guide.odt similarity index 100% rename from docs/guide.odt rename to DMPDocs/guide.odt diff --git a/docs/proposal.html b/DMPDocs/proposal.html similarity index 100% rename from docs/proposal.html rename to DMPDocs/proposal.html diff --git a/docs/proposal.odt b/DMPDocs/proposal.odt similarity index 100% rename from docs/proposal.odt rename to DMPDocs/proposal.odt diff --git a/docs/proposalh2020.html b/DMPDocs/proposalh2020.html similarity index 100% rename from docs/proposalh2020.html rename to DMPDocs/proposalh2020.html diff --git a/docs/test.html b/DMPDocs/test.html similarity index 100% rename from docs/test.html rename to DMPDocs/test.html diff --git a/css/bootstrap.min.css.map b/css/bootstrap.min.css2106.map similarity index 100% rename from css/bootstrap.min.css.map rename to css/bootstrap.min.css2106.map diff --git a/css/bootstrap.min.css b/css/bootstrap.min2106.css similarity index 100% rename from css/bootstrap.min.css rename to css/bootstrap.min2106.css diff --git a/css/bs5-intro-tour.css b/css/bs5-intro-tour2106.css similarity index 97% rename from css/bs5-intro-tour.css rename to css/bs5-intro-tour2106.css index 4c19e22..01d66cc 100644 --- a/css/bs5-intro-tour.css +++ b/css/bs5-intro-tour2106.css @@ -14,8 +14,8 @@ body.tour-active-element { .popover-tour { padding: 5px 10px; - min-width: 200px; - width: 33vw; + min-width: 400px; + max-height: 80%; max-width: 50%; font-size: 14px; z-index: 1051; diff --git a/css/bs5-intro-tour.css.map b/css/bs5-intro-tour2106.css.map similarity index 100% rename from css/bs5-intro-tour.css.map rename to css/bs5-intro-tour2106.css.map diff --git a/css/custom.css b/css/custom2106.css similarity index 94% rename from css/custom.css rename to css/custom2106.css index 18621b9..c5fcef3 100644 --- a/css/custom.css +++ b/css/custom2106.css @@ -49,7 +49,7 @@ .subtitle{padding-top:0pt;color:#666666;font-size:15pt;padding-bottom:16pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2; text-align: justify; text-align-last: left;} p{margin:0;font-size:12pt;font-family:"Arial"; white-space: normal} - h1{padding-top:20pt;font-size:20pt;padding-bottom:6pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2; text-align: justify; text-align-last: left;} + h2{padding-top:18pt;font-size:16pt;padding-bottom:6pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2; text-align: justify; text-align-last: left;} h3{padding-top:8pt;color:#434343;font-size:14pt;padding-bottom:4pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2; text-align-last: left;} h4{padding-top:8pt;color:#666666;font-size:12pt;padding-bottom:4pt;font-family:"Arial";line-height:1.15;page-break-after:avoid;orphans:2;widows:2; text-align: justify; text-align-last: left;} diff --git a/index.html b/index.html index 5c4433a..3ad0669 100644 --- a/index.html +++ b/index.html @@ -1,5756 +1,7189 @@ - - - - - - - - - DataPLAN, generate a data management plan DMP easily - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + - -
-
-
- - Logo - - -
- - - - - - - - - -
-
-
- -
-
-
-
- -
- - -
-
- - - - - - - - -
-
- - - - - - - - - -
- - - -
-
-
-
-
-
-
-
- -

1 Basic Information:

- -
-

1.1 What is the project name or acronym?

- - - - -
- - -
- - - - - - - -
- -
- - - - -
-
-

- Who is the most likely to benefit from the data? - - -

- - - - -
-
-

1.3 Other - - - DMP Metadata

- - - - -
-
- - - - -
- -
- - - - -
- - -
- - - - -
- -

1.4 Please select from the following options

-
-
- - -
-
- - - -
-
- - -
- - - - -
- - -
- - - -
- - - -
- - -
-
- - - - -
- -

Where will you submit your data as endpoints?

-
- - -
- -
- - -
-
- - -
- -
- - -
+ } + //warning_text_location.scrollIntoView(); + } - -
- - -
+ /** + * edit checked/unchecked placeholders in the file. + * @param {string} right_range - the extended keyword (includes #if and #endif) need to be replaced. + * + */ -
- - -
- - -
- - -
-
- - - -
- - - - -
- - - - -
- - - - -
- - - - -
- - - - - -
- - -
-

2. What kind of data will you handle?

-
- - -
-
- - -
-
- - -
-
- - -
-
- - -
- -
- - - -
- -
- - -
- -
- - -
-
- - -
-
- - -
-
- - -
-
- - -
- - - -
- - - - - -
-
-
-
-
-
-
- -

3. How much data will you likely to generate?

- -
- -
- - GB - - -
-
- -
- -
- - GB - - -
-
- -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-

4. Are any of the following standards relevant to your project?

- -
- - -
-
- - -
-
- - -
- - -

4.1 Will you adhere to any high level metadata submission standards?

-
- - -
-
- - -
- -

4.2 When will you make your data public?

- -
- - -
- - -
- - -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- -

5. Do you intend to use data visualization in your project?

-
- - - - - -
-
- - - - - - - -
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- -
-
-
The project aim should be a apart of a sentence.
- -
-
-

Example 1: aims at creating a computational model of carbon and water flow within a whole plant architecture

-
-

Example 2: aims at generating data management plan with minimal effort and making the data as open as possible

-
-
- -
-
-
The project object = target.
- -
-
-

Example 1: carbon and water flow in plants

-
-

Example 2: data management plan

-
-
-
-
-
Here is the space for additional sentence.
- -
-
-

Example 1: Industry, politicians and students can also use the data for different purposes.

-
-

Example 2: The data acquired in the project can be used by a wide range of people with different purpose.

-
-
- - -
-
-
- -
-
-

Information in this section is only used in DMP metadata and not used in the document

- -
-
-
-
-
- -
-
-

Data officers are also known as data stewards and curator.

- -
-
- -
-
-
- -
-
-

software that legally remains the property of the organization, group, or individual who created it. -

- -
-
- - -
-
-

User-defined template

- - -
- -
-

- You can click the dotted box to start editing.
- Click the grey buttons to reuse templates.
- Click submit when you finished. -
- - - - - - -

-
-
-
- - - - - - - - - - + // remove ranges, very important. took me long time to find this function + const selection = window.getSelection(); + selection.removeAllRanges(); + // all objects which first class name is the option name + const class_name = Object.keys(options)[0]; + //verbose console.log(class_name); + const checked = options[class_name]["checked"]; + const unchecked = options[class_name]["unchecked"]; - - -
-
-
-
-
-
-
- - -
Data Management Plan of the H2020 Project $_PROJECT
- -
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - -

Action Number:

$_PROJECT

Action Acronym:

$_PROJECT

Action Title:

$_PROJECT

Date:

DMP version:

$_DMPVERSION

- -
-
- -
-
- - -

1    Introduction

-

#if$_EU The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. #endif$_EU To best profit from open data, it is necessary not only to store the data but to make it Findable, Accessible, Interoperable, and Reusable (FAIR).#if$_PROTECT We support open and FAIR data, however, we also consider the need to protect individual data sets. #endif$_PROTECT - -

-

- - The aim of this document is to provide guidelines on the principles of data management in the $_PROJECT and to specify which type of data will be stored, this will be achieved by using the responses to the EU questionnaire on Data Management Plan (DMP) as a DMP - document. - -

-

- - The detailed DMP states how data will be handled during and after the project. The $_PROJECT DMP is prepared according to the Horizon 2020 and Horizon Europe online manual. #if$_UPDATE It will be updated/its validity checked during the - $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. #endif$_UPDATE - -

-

2    Data Management Plan EU Template

- -

2.1    Data Summary

-

What is the purpose of the data collection/generation and its relation to the objectives of the project?

-

- - - The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analyzing data. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section. -

What types and formats of data will the project generate/collect?

- -

-

- - The $_PROJECT will collect and/or generate the following types of raw data : $_PHENOTYPIC, $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEoMIC, $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data will also be processed and modified using analytical pipelines, which may yield different results or include ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be - tracked in the DataPLANT ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive these resources (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise - in the DataPLANT consortium#endif$_DATAPLANT. - -

-

-

Will you re-use any existing data and how?

-

- - The project builds on existing data sets and relies on them. #if$_RNASEQ For example, without a proper genomic reference it is very difficult to analyze next-generation sequencing (NGS) data sets.#endif$_RNASEQ It is also important to include existing data-sets on the expression and metabolic behavior of the $_STUDYOBJECT, and on existing background knowledge. #if$_PARTNERS of the partners. #endif$_PARTNERS - Genomic references can be gathered from reference databases for genomes/ and sequences, like the US National Center for Biotechnology Information: NCBI, European Bioinformatics Institute: EBI; DNA Data - Bank of Japan: DDBJ. Furthermore, prior 'unstructured' data in the form of publications and data contained therein will be used for decision making. - -

- - -

What is the origin of the data?

-

Public data will be extracted as described in the previous paragraph. For the $_PROJECT, specific data sets will be generated by the consortium partners.

- - -

- Data of different types or representing different domains will be generated using unique approaches. For example: -

- - - #if$_PREVIOUSPROJECTS -

Data from previous projects such as $_PREVIOUSPROJECTS will be considered.

- #endif$_PREVIOUSPROJECTS -

What is the expected size of the data?

-

We expect to generate $_RAWDATA GB of raw data and up to $_DERIVEDDATA GB of processed data.

-

-

To whom might it be useful ('data utility')?

-

- - The data will initially benefit the $_PROJECT partners, but will also be made available to selected stakeholders closely involved in the project, and then the scientific community working on $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also use the data after publication. The data will be disseminated according to the $_PROJECT's dissemination and communication plan, #if$_DATAPLANT which aligns with DataPLANT platform or other means#endif$_DATAPLANT - -

-

- - - -

-

2.2    FAIR data

-

Making data findable, including provisions for metadata

-

- - Are the data produced and/or used in the project discoverable with metadata, identifiable and locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers such as Digital - Object Identifiers)? - -

-

- - All datasets will be associated with unique identifiers and will be annotated with metadata.#if$_MIAPPE The $_PROJECT will rely on community standards plus additional recommendations applicable in the plant science, such as the Minimum Information About a Plant Phenotyping Experiment (MIAPPE). #endif$_MIAPPE - Unlike cross-domain minimal sets such as the Dublin core, which mostly define the submitter and the general type of data, allow reusability by other researchers by defining properties of the plant (see the preceding section). However, minimal cross-domain annotations also remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow individual releases to be tagged with a Digital Object Identifier (DOI). - #endif$_DATAPLANT #if$_OTHERSTANDARDS $_OTHERSTANDARDINPUT #endif$_OTHERSTANDARDS - -

-

What naming conventions do you follow?

-

- - Data variables will be allocated standard names. For example, genes, proteins and metabolites will be named according to approved nomenclature and conventions. These will also be linked to functional ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by humans. Plant names will include traditional names, binomials, and all strain/cultivar/subspecies/variety identifiers. - -

-

Will search keywords be provided that optimize possibilities for re-use?

-

- - Keywords about the experiment and consortium will be included, as well as an abstract about the data, where useful. In addition, certain keywords can be auto-generated from dense metadata and its - underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized DataPLANT ontologies that are provided where the ontology does not yet include such variables. - #endif$_DATAPLANT - -

-

Do you provide clear version numbers?

-

- - To maintain data integrity and facilitate reanalysis, data sets will be allocated version numbers where this is useful (e.g. raw data must not be changed and will not get a version number and is considered - immutable). #if$_DATAPLANT This is automatically supported by the ARC Git DataPLANT infrastructure. #endif$_DATAPLANT - -

-

- What metadata will be created? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how. -

-

- We will use Investigation, Study, Assay (ISA) specification for metadata creation. #if$_RNASEQ|$_GENOMIC For specific data (e.g., RNASeq or genomic data), we use metadata templates from the end-point repositories. #if$_MINSEQE The Minimum Information About a Next-generation Sequencing Experiment (MinSEQe) will also be used. - - #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC #if$_METABOLOMIC Metabolights submission compliant standards will be used for metabolomic data where this is acccepted by the consortium partners.#issuewarning some Metabolomics partners considers Metabolights - not an accepted standard#endissuewarning#endif$_METABOLOMIC As a part of plant research community, we use #if$_MIAPPE MIAPPE for phenotyping data in the broadest sense, but we will also be rely on #endif$_MIAPPE specific SOPs for additional - annotations #if$_DATAPLANT that consider advanced DataPLANT annotation and ontologies. #endif$_DATAPLANT - -

- - -

Making data openly accessible

-

- - Which data produced and/or used in the project will be made openly available as the default? If certain datasets cannot be shared (or need to be shared under restrictions), we explain why, clearly separating - legal and contractual reasons from voluntary restrictions. - -

-

- - By default, all data sets from the $_PROJECT will be shared with the community and made openly available. However, before the data are released, all will be provided with an opportunity to check for potential IP (according to the consortium agreement and background IP rights). #if$_INDUSTRY This applies in particular to data pertaining to the industry. #endif$_INDUSTRY IP protection will be prioritized for datasets that offer the potential for exploitation. - -

-

- - Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons - for opting out. - -

-

-

How will the data be made accessible (e.g. by deposition in a repository)?

-

-

- - #if!$_DATAPLANT Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in - international discipline related repositories which use specialized technologies (Sequencing at the #if$_NCBI national US center: NCBI:#endif$_NCBI #if$_GEO Gene Expression Ominibus: GEO;#endif$_GEO European Bioinformatics Institute (EBI) archives: #if$_ENA European Nucleotide Archive: ENA;#endif$_ENA #if$_ARRAYEXPRESS Functional Genomics Data Archive: ArrayExpress;#endif$_ARRAYEXPRESS - #if$_PRIDE Proteome database: PRIDE;#endif$_PRIDE #if$_METABOLIGHTS metabolomic database: MetaboLights;#endif$_METABOLIGHTS #if$_OTHEREP and $_OTHEREP #endif$_OTHEREP ) will be used to store data and the data will be processed there as well. #endif!$_DATAPLANT - -

-

- - Specialized repositories will be used where appropriate, such as INSDC (GenBank, EBI, DDBJ) for nucleotide sequence data, PIR/UniProt/SWISS-PROT for proteins, PDB for protein structures, GEO for transcriptomics, PRIDE for proteomics data, and METLIN for metabolomics data. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI). #if$_DATAPLANT Whole datasets will also be wrapped into an ARC with allocated DOIs. The ARC and the converters provided by DataPLANT will ensure that the upload into the endpoint repositories is fast and easy. - #endif$_DATAPLANT - -

-

-

-

What methods or software tools are needed to access the data?

-

#if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. #endif$_PROPRIETARY

-

- - #if!$_PROPRIETARY No specialized software will be needed to access the data, just a modern browser. Access will be possible through web interfaces. For data processing after obtaining raw data, typical - open-source software can be used. #endif!$_PROPRIETARY - -

-

- - #if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC commander, arcCommander, and DataPLAN - #endif$_DATAPLANT - -

-

Is documentation about the software needed to access the data included?

-

- - #if$_DATAPLANT DataPLANT resources are well described, and their setup is documented on a github project guide is provided on the GitHub project pages. #endif$_DATAPLANT - All external software documentation will be duplicated locally and stored near the software. - -

-

Is it possible to include the relevant software (e.g. in open-source code)?

-

As stated above, the $_PROJECT will use publicly available open-source and well-documented certified software #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY.

-

-

- Where will the data and associated metadata, documentation and code be deposited? Preference should be given to certified repositories that support open access, where possible. -

-

- - As noted above, specialized repositories will be used for common data types. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI).#if$_DATAPLANT The Whole datasets will also be wrapped into an ARC with allocated DOIs.#endif$_DATAPLANT. - -

-

Have you explored appropriate arrangements with the identified repository?

-

- - The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. - #if$_DATAPLANT , and this has been confirmed for data associated with DataPLANT #endif$_DATAPLANT. #issuewarning if no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication. #endissuewarning - -

-

If there are restrictions on use, how will access be provided?

-

There are no restrictions beyond the IP screening described above, which is in line with European open data policies.

-

- - -

Is there a need for a data access committee?

-

There is no need for a data access committee.

-

Are there well described conditions for access (i.e. a machine-readable license)?

-

Yes, where possible, e.g. CC REL will be used for data not submitted to specialized repositories such as ENA.

-

How will the identity of the person accessing the data be ascertained?

-

- - Where data are shared only within the consortium, if the datasets are not yet finished or are undergoing IP checks, the data will be hosted internally and a username and password will be required for access (see GDPR rules). When the data are made public in EU or US repositories, completely anonymous access is normally allowed. This is the case for ENA as well and both are in line with GDPR requirements. - -

-

- - #if$_DATAPLANT Currently, data management relies on the annotated research context (ARC). It is password protected, so before any data or samples can be obtained, user authentication is required. - #endif$_DATAPLANT - -

-

Making data interoperable

-

- - Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organizations, countries, etc. (i.e. adhering to standards for formats, as much - as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)? - -

-

- - Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata to interpret and analyze data in a biological context. By default, no proprietary formats will be - used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In addition, text - files might be edited in text processor files, but will be shared as pdf. - -

-

What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?

-

- - As noted above, we foresee using minimal standards such as #if$_RNASEQ|$_GENOMIC #if$_MINSEQE MinSEQe for sequencing data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites #if$_MIAPPE - and MIAPPE for phenotyping-like data #endif$_MIAPPE. The minimal information standards will allow the integration of data across projects, and its reuse according to established and tested protocols. We will also use - ontological terms to enrich the data sets relying on free and open ontologies where possible. Additional ontology terms might be created and canonized during the $_PROJECT. - -

-

Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?

-

- - Open ontologies will be used where they are mature. As stated above, some ontologies and controlled vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT - -

-

In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?

-

Common and open ontologies will be used, so this question does not apply.

-

Increase data reuse (by clarifying licences)

-

-

How will the data be licensed to permit the widest re-use possible?

-

Open licenses, such as Creative Commons (CC), will be used whenever possible.

- - -

- - When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made - available as soon as possible. - -

-

- - #if$_early The data will be published as soon as possible to guarantee reusability. #endif$_early #if$_ipissue IP issues will be checked before publication. #endif$_ipissue All consortium partners will be - encouraged to make data available before publication, openly and/or under pre-publication agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International Data - Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are complete. - -

-

Are the data produced and/or used in the project usable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why.

-

There will be no restrictions once the data are made public.

-

How long is it intended that the data remains re-usable?

-

The data will be made available for many years#if$_DATAPLANT and ideally indefinitely after the end of the project#endif$_DATAPLANT.

-

- Data submitted to repositories (as detailed above) e.g. ENA /PRIDE would be subject to local data storage regulation. -

-

Are data quality assurance processes described?

-

- - The data will be checked and curated. #if$_DATAPLANT Furthermore, data will be quality controlled (QC) using automatic procedures as well as manual curation #endif$_DATAPLANT. - -

-

2.3    Allocation of resources

-

What are the costs for making data FAIR in your project?

-

The $_PROJECT will bear the costs of data curation, #if$_DATAPLANT ARC consistency checks, #endif$_DATAPLANT and data maintenance/security before transfer to public repositories. Subsequent costs are then borne by the operators of these repositories.

-

- - Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) but not charged against the $_PROJECT or its members but by the operation budget of these repositories. - -

-

- How will these be covered? Note that costs related to open access to research data are eligible as part of the Horizon 2020 or Horizon Europe grant (if compliant with the Grant Agreement conditions). -

-

The cost born by the $_PROJECT are covered by the project funding. Pre-existing structures #if$_DATAPLANT such as structures, tools, and knowledge laid down in the DataPLANT consortium#endif$_DATAPLANT will also be used.

-

Who will be responsible for data management in your project?

-

The responsible person will be $_DATAOFFICER of the $_PROJECT.

-

Are the resources for long term preservation discussed (costs and potential value, who decides and how/what data will be kept and for how long)?

-

- - The data officer #if$_PARTNERS or $_PARTNERS #endif$_PARTNERS will ultimately decides on the strategy to preserve data that are not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT when the - project ends. This will be in line with EU guidlines, institute policies, and data sharing based on EU and international standards. - -

-

2.4    Data security

-

What provisions are in place for data security (including data recovery as well as secure storage and transfer of sensitive data)?

-

- - Online platforms will be protected by vulnerability scanning, two-factor authorization and daily automatic backups allowing immediate recovery. All partners holding confidential project data to use secure platforms with automatic backups and offsite secure copies. - #if$_DATAPLANT DataHUB and ARCs have been generated in DataPLANT, data security will be imposed. This comprises secure storage, and the use of password and usernames is generally transferred via separate safe media.#endif$_DATAPLANT - -

-

Is the data safely stored in certified repositories for long term preservation and curation?

-

- - Wherever there are certified repositories, these will be used as end-point repositories. #if$_RNASEQ Transcriptomics data and gene sequence data will be also made available upon publication via the standards - ENA/SRA, #endif$_RNASEQ #if$_METABOLOMIC metabolite data in e.g. Metabolights (and/or Nationwide repositories like the German NFDI or the French INRAe), #endif$_METABOLOMIC #if$_PROTEOMIC Proteomics data in - e.g. Pride/Proteomexchange #endif$_PROTEOMIC. In addition, the national resource will maintain safekeeping of data also after the project ends. #if$_DATAPLANT In addition, databases like e.g. Proteomexchange - do not support deep plant specific metadata; hence ARCs will be maintained to ensure the reusability of plant-specific metadata. #endif$_DATAPLANT - -

-

2.5    Ethical aspects

-

- - Are there any ethical or legal issues that can have an impact on data sharing? These can also be discussed in the context of an ethics review. If relevant, include references to ethics deliverables and - ethics chapter in the Description of the Action (DoA). - -

-

- - At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee to deal with data from plants, although we will diligently follow the Nagoya protocol on access and benefit sharing. (🡺see Nagoya protocol). #issuewarning you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya gets also part of sequence information. In any case - if you use material not from your (partner) country and characterize this physically e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action unless this is from e.g. - US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. #endissuewarning - -

-

Is informed consent for data sharing and long term preservation included in questionnaires dealing with personal data?

-

- - The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication - activities using specific methods and procedures developed by the $_PROJECT partners to adhere to data protection. #issuewarning you need to inform and better get WRITTEN consent that you store emails and - names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them #endissuewarning - -

-

2.6    Other issues

-

Do you make use of other national/funder/sectorial/departmental procedures for data management? If yes, which ones?

-

Yes, the $_PROJECT will use common Research Data Management (RDM) tools#if$_DATAPLANT and in particular resources developed by the NFDI of Germany#endif$_DATAPLANT.

-

-

-

3     Annexes

-

-

3.1     Abbreviations

-

- #if$_DATAPLANT

ARC Annotated Research Context

#endif$_DATAPLANT - -

CC Creative Commons

-

CC CEL Creative Commons Rights Expression Language

-

DDBJ DNA Data Bank of Japan

-

DMP Data Management Plan

-

DoA Description of Action

-

DOI Digital Object Identifier

-

EBI European Bioinformatics Institute

-

ENA European Nucleotide Archive

-

EU European Union

-

FAIR Findable Accessible Interoperable Reproducible

-

GDPR General data protection regulation (of the EU)

-

IP Intellectual Property

-

ISO International Organization for Standardization

-

MIAMET Minimal Information about Metabolite experiment

-

MIAPPE Minimal Information about Plant Phenotyping Experiment

-

MinSEQe Minimum Information about a high-throughput Sequencing Experiment

-

NCBI National Center for Biotechnology Information

-

NFDI National Research Data Infrastructure (of Germany)

-

NGS Next Generation Sequencing

-

RDM Research Data Management

-

RNASeq RNA Sequencing

-

SOP Standard Operating Procedures

-

SRA Short Read Archive

- #if$_DATAPLANT

SWATE Swate Workflow Annotation Tool for Excel

#endif$_DATAPLANT -

ONP Oxford Nanopore

-

qRTPCR quantitative real time polymerase chain reaction

-

WP Work Package

-

-

-

-

-

-
-
-
-
-
-
-
- -
-
-
- - -
Data Management Plan of the Horizon Europe Project $_PROJECT
- -
- - - -
- - - - - - - - - - - - - - - - - - - - - - - - -

Action Number:

$_PROJECT

Action Acronym:

$_PROJECT

Action Title:

$_PROJECT

Date:

17. März 2023

DMP version:

$_DMPVERSION

- -
-
- -
- - - -

Introduction

-

#if$_EU The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. #endif$_EU To best profit from open data, it is necessary not only to store data but to make data Findable, Accessible, Interoperable, and Reusable (FAIR).#if$_PROTECT We support open and FAIR data, however, we also consider the need to protect individual data sets. #endif$_PROTECT - -

-

- - The aim of this document is to provide guidelines on principles guiding the data management in the $_PROJECT and what data will be stored by using the responses to the EU questionnaire on Data Management Plan (DMP) as a DMP - document. - -

-

- - The detailed DMP instructs how data will be handled during and after the project. The $_PROJECT DMP is modified according to the Horizon Europe and Horizon Europe online Manual. #if$_UPDATE It will be updated/its validity checked during the - $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. #endif$_UPDATE

- -

1.    Data Summary

-

Will you re-use any existing data and what will you re-use it for? State the reasons if re-use of any existing data has been considered but discarded.

-

- - - The project builds on existing data sets and relies on them. #if$_RNASEQ For instance, without a proper genomic reference it is very difficult to analyze NGS data sets.#endif$_RNASEQ It is also important to include existing data sets on the expression and metabolic behaviour of $_STUDYOBJECT, but of course, also on existing characterization and the background knowledge. #if$_PARTNERS of the partners. #endif$_PARTNERS Genomic references can simply be gathered from reference databases for genomes/sequences, like the National Center for Biotechnology Information: NCBI (US); European Bioinformatics Institute: EBI (EU); DNA Data Bank of Japan: DDBJ (JP). Furthermore, prior 'unstructured' data in the form of publications and data contained therein will be used for decision making. -

What types and formats of data will the project generate or re-use?

- - -

- - The $_PROJECT will collect and/or generate the following types of raw data : $_PHENOTYPIC, $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEoMIC, $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data will also be processed and modified using analytical pipelines, which may yield different results or include ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be tracked in the DataPLANT ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive these resources (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise in the DataPLANT consortium#endif$_DATAPLANT. - -

- -

What is the purpose of the data generation or re-use and its relation to the objectives of the project?

-

- - The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analyzing data. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section. - -

- - -

What is the expected size of the data that you intend to generate or re-use?

-

We expect to generate raw data in the range of $_RAWDATA GB of data. The size of the derived data will be about $_DERIVEDDATA GB.

- -

What is the origin/provenance of the data, either generated or re-used?

-

Public data will be extracted as described in the previous paragraph. For the $_PROJECT, specific data sets will be generated by the consortium partners.

- - -

- Data of different types or representing different domains will be generated using unique approaches. For example: -

- + } - #if$_PREVIOUSPROJECTS -

Data from previous projects such as $_PREVIOUSPROJECTS will be considered.

- #endif$_PREVIOUSPROJECTS - -

To whom might it be useful ('data utility'), outside your project?

-

- - The data will initially benefit the $_PROJECT partners, but will also be made available to selected stakeholders closely involved in the project, and then the scientific community working on $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also use the data after publication. The data will be disseminated according to the $_PROJECT's dissemination and communication plan#if$_DATAPLANT, which aligns with DataPLANT platform or other means#endif$_DATAPLANT. - -

-

- - $_DATAUTILITY - -

-

2    FAIR data

-

2.1. Making data findable, including provisions for metadata

-

- - Will data be identified by a persistent identifier? - -

-

- All data sets will receive unique identifiers, and they will be annotated with metadata. - -

- -

- Will rich metadata be provided to allow discovery? What metadata will be created? What disciplinary or general standards will be followed? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how. - -

-

- - #if$_MIAPPE The $_PROJECT will rely on community standards plus additional recommendations necessary in plant science adapted by e.g. using suggestions from the Minimum Information About a Plant Phenotyping Experiment (MIAPPE). #endif$_MIAPPE - These unlike cross-domain minimal sets such as Dublin core, which mostly defines the submitter and what general type of data is being dealt with (e.g. images), allow reusability by other researchers as it also defines properties of the plant (see the preceding section). However, of course minimal cross-domain annotations are part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow one to tag individual releases with a Digital Object Identifier (DOI). - #endif$_DATAPLANT #if$_OTHERSTANDARDS $_OTHERSTANDARDINPUT #endif$_OTHERSTANDARDS - -

-

Will search keywords be provided in the metadata to optimize the possibility for discovery and then potential re-use?

-

- - Keywords about the experiment and the general consortium will be included, as well as an abstract about the data, where useful. In addition, certain keywords can be auto-generated from dense metadata and its - underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized DataPLANT ontologies that are supplemented where the ontology does not yet include the variables. - #endif$_DATAPLANT - -

-

Will metadata be offered in such a way that it can be harvested and indexed?

-

- - To maintain data integrity and to be able to re-analyze data, data sets will get version numbers where this is useful (e.g. raw data must not be changed and will not get a version number and is considered - immutable). #if$_DATAPLANT This is automatically supported by the ARC Git DataPLANT infrastructure. #endif$_DATAPLANT - - - Data variables will be allocated standard names. For example, genes, proteins and metabolites will be named according to approved nomenclature and conventions. These will also be linked to functional ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by humans. Plant names will include traditional names, binomials, and all strain/cultivar/subspecies/variety identifiers. -

- + /** + * match a place holder and then modify it. + * @param {string} in_range - the extended keyword (includes #if and #endif) need to be replaced. + * + */ - -

2.2.    Making data accessible

-

Repository

-

Will the data be deposited in a trusted repository?

- -

-

- - #if!$_DATAPLANT Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in - international discipline related repositories which use specialized technologies (Sequencing at the #if$_NCBI national US center: NCBI:#endif$_NCBI #if$_GEO Gene Expression Ominibus: GEO;#endif$_GEO European Bioinformatics Institute (EBI) archives: #if$_ENA European Nucleotide Archive: ENA;#endif$_ENA #if$_ARRAYEXPRESS Functional Genomics Data Archive: ArrayExpress;#endif$_ARRAYEXPRESS - #if$_PRIDE Proteome database: PRIDE;#endif$_PRIDE #if$_METABOLIGHTS metabolomic database: MetaboLights;#endif$_METABOLIGHTS #if$_OTHEREP and $_OTHEREP #endif$_OTHEREP ) will be used to store data and the data will be processed there as well. #endif!$_DATAPLANT - -

-

- - Specialized repositories will be used where appropriate, such as INSDC (GenBank, EBI, DDBJ) for nucleotide sequence data, PIR/UniProt/SWISS-PROT for proteins, PDB for protein structures, GEO for transcriptomics, PRIDE for proteomics data, and METLIN for metabolomics data. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI). #if$_DATAPLANT Whole datasets will also be wrapped into an ARC with allocated DOIs. The ARC and the converters provided by DataPLANT will ensure that the upload into the endpoint repositories is fast and easy. - #endif$_DATAPLANT - -

-

- - -

Have you explored appropriate arrangements with the identified repository where your data will be deposited?

-

- - The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. #if$_DATAPLANT For DataPLANT, this has been agreed upon, as all the omics repositories of International Nucleotide Sequence Database Collaboration (INSDC) will be used. #endif$_DATAPLANT #issuewarning if no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication. #endissuewarning - -

- -

Does the repository ensure that the data is assigned an identifier? Will the repository resolve the identifier to a digital object?

-

- - #if!$_DATAPLANT Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in - international discipline related repositories which use specialized technologies (Sequencing at the #if$_NCBI national US center: NCBI:#endif$_NCBI #if$_GEO Gene Expression Ominibus: GEO;#endif$_GEO European Bioinformatics Institute (EBI) archives: #if$_ENA European Nucleotide Archive: ENA;#endif$_ENA #if$_ARRAYEXPRESS Functional Genomics Data Archive: ArrayExpress;#endif$_ARRAYEXPRESS - #if$_PRIDE Proteome database: PRIDE;#endif$_PRIDE #if$_METABOLIGHTS metabolomic database: MetaboLights;#endif$_METABOLIGHTS #if$_OTHEREP and $_OTHEREP #endif$_OTHEREP ) will be used to store data and the data will be processed there as well. The ARC and the converters provided by DataPLANT will guarantee the upload into the endpoint repositories is fast and easy.#endif!$_DATAPLANT - -

-

- - As noted above, specialized repositories like SRA /ENA, Pride /Proteomexchange are the most common ones and will be used when appropriate. In the case of unstructured less standardized data (e.g. experimental - phenotypic measurements), these will be metadata annotated and if complete given a digital object identifier (DOI). #if$_DATAPLANT and the whole data sets wrapped into an ARC will get DOIs as well. - #endif$_DATAPLANT - -

-

Data:

-

Will all data be made openly available? If certain datasets cannot be shared (or need to be shared under restricted access conditions), explain why, clearly separating legal and contractual reasons from intentional restrictions. Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if opening their data goes against their legitimate interests or other constraints as per the Grant Agreement.

-

- - By default, all data sets from the $_PROJECT will be shared with the community and made openly available. However, before the data are released, all will be provided with an opportunity to check for potential IP (according to the consortium agreement and background IP rights). #if$_INDUSTRY This applies in particular to data pertaining to the industry. #endif$_INDUSTRY IP protection will be prioritized for datasets that offer the potential for exploitation. - -

-

- - Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons - for opting out. - -

+ function match_mod(in_range, before) { -

If an embargo is applied to give time to publish or seek protection of the intellectual property (e.g. patents), specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.

-

- - #if$_early The data will be published as soon as possible to guarantee reusability. #endif$_early #if$_ipissue IP issues will be checked before publication. #endif$_ipissue All consortium partners will be - encouraged to make data available before publication, openly and/or under pre-publication agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International Data - Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are complete. - -

- -

Will the data be accessible through a free and standardized access protocol? -

-

- - #if$_DATAPLANT DataPLANT stores data in the ARC, which is a git repo. The DataHUB shares data and metadata as a gitlab instance. The "Git" and "Web" protocol are opensourced and freely accessible. In addition, #endif$_DATAPLANT Zenodo and the endpoint repositories will also be used for access. In General, web-based protocols are free and standardized for access. - -

+ const text_after_if = (in_range.toString().slice(3)); + //verbose console.log("mach_mod function: check_box_item: " + before); -

If there are restrictions on use, how will access be provided to the data, both during and after the end of the project? -

-

- - There are no restrictions, beyond the aforementioned IP checks, which are in line with e.g. European open data policies. - -

+ const parent_node = document.createElement("span"); + parent_node.classList.add("text-primary"); + parent_node.setAttribute("onmouseover", 'const e = document.getElementById("' + before + '") ;e.parentElement.classList.add("border-highlight");e.focus({preventScroll: true})'); - -

How will the identity of the person accessing the data be ascertained? -

-

- - In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally and username and password will be required (see also our GDPR rules). In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. This is the case for ENA as well and both are in line with GDPR requirements. - #if$_DATAPLANT Currently, data management relies on the annotated research context ARC. It is password protected, so before any data can be obtained or samples generated an authentication needs to take place. #endif$_DATAPLANT - -

-

Is there a need for a data access committee (e.g. to evaluate/approve access requests to personal/sensitive data)? -

-

- - Consequently, there is no need for a committee. + parent_node.setAttribute("onmouseleave", 'document.getElementById("' + before + '").parentElement.classList.remove("border-highlight"); '); + parent_node.setAttribute("onclick", 'document.getElementById("' + before + '").parentElement.scrollIntoView();document.getElementById("' + before + '").focus(); '); + parent_node.setAttribute("name", before + "-to-"); + let text_range = document.createRange(); + text_range.setStart(in_range.endContainer, in_range.endOffset); - -

- - - -

- - Metadata: - -

-

- - Will metadata be made openly available and licenced under a public domain dedication CC0, as per the Grant Agreement? If not, please clarify why. Will metadata contain information to enable the user to access the data? - -

-

- - Yes, where possible, e.g. CC REL will be used for data not submitted to specialized repositories such as ENA. - -

-

- - How long will the data remain available and findable? Will metadata be guaranteed to remain available after data is no longer available? - -

-

- -

The data will be made available for many years#if$_DATAPLANT and ideally indefinitely after the end of the project#endif$_DATAPLANT. - In any case data submitted to repositories (as detailed above) e.g. ENA /PRIDE would be subject to local data storage regulation. -

- -

-

- - Will documentation or reference about any software be needed to access or read the data be included? Will it be possible to include the relevant software (e.g. in open source code)? - -

-

- - #if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. #endif$_PROPRIETARY - #if!$_PROPRIETARY No specialized software will be needed to access the data, usually just a modern browser. Access will be possible through web interfaces. For data processing after obtaining raw data, typical open-source software can be used. #endif!$_PROPRIETARY - #if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC commander, and the DMP tool which will not necessarily make the interaction with data more convenient. #endif$_DATAPLANT #if$_DATAPLANT However, DataPLANT resources are well described, and their setup is documented on their github project pages. #endif$_DATAPLANT - As stated above, here we use publicly available open-source and well-documented certified software #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY - -

- -

2.3. Making data interoperable

-

- - What data and metadata vocabularies, standards, formats or methodologies will you follow to make your data interoperable to allow data exchange and re-use within and across disciplines? Will you follow community-endorsed interoperability best practices? Which ones? - -

-

- - - As noted above, we foresee using minimal standards such as #if$_RNASEQ|$_GENOMIC #if$_MINSEQE MinSEQe for sequencing data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites #if$_MIAPPE and MIAPPE for phenotyping-like data #endif$_MIAPPE. The minimal information standards will allow the integration of data across projects, and its reuse according to established and tested protocols. - Specialized repositories will be used for common data types. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI).#if$_DATAPLANT The Whole datasets will also be wrapped into an ARC with allocated DOIs.#endif$_DATAPLANT. - Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata to interpret and analyze data in a biological context. By default, no proprietary formats will be used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In addition, text files might be edited in text processor files, but will be shared as pdf. - Open ontologies will be used where they are mature. As stated above, some ontologies and controlled vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT + window.find("#endif" + text_after_if); + const endif_selection = window.getSelection(); + let end_if_range = endif_selection.getRangeAt(0); + in_range.deleteContents(); + text_range.setEnd(end_if_range.startContainer, end_if_range.startOffset); + const doc_frag = text_range.cloneContents(); + parent_node.appendChild(doc_frag); + //verbose console.log("mach_mod function: parent_node: " + parent_node.innerHTML); - -

+ end_if_range.deleteContents(); + end_if_range.detach(); + text_range.deleteContents(); + in_range.insertNode(parent_node); + text_range.detach(); + //verbose console.log("in_range is :" + in_range.toString()); + } -

- - In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies? Will you openly publish the generated ontologies or vocabularies to allow reusing, refining or extending them? - -

-

- - Common and open ontologies will be used. In fact, open biomedical ontologies will be used where they are mature. As stated in the previous question, sometimes ontologies and controlled vocabularies might have to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the DataPLANT biology ontology (DPBO) developed in DataPLANT. #endif$_DATAPLANT. Ontology databases such as OBO Foundry will be used to publish ontology. #if$_DATAPLANT The DPBO is also published in GitHub https://github.com/nfdi4plants/nfdi4plants_ontology #endif$_DATAPLANT. - -

-

- - Will your data include qualified references to other data (e.g. other data from your project, or datasets from previous research)? - -

-

- - The references to other data will be made in the form of DOI and ontology terms. - -

- -

2.4. Increase data re-use

-

How will you provide documentation needed to validate data analysis and facilitate data re-use (e.g. readme files with information on methodology, codebooks, data cleaning, analyses, variable definitions, units of measurement, etc.)?

- -

- - The documentation will be provided in the form of ISA (Investigation Study Assay) and CWL (Common Workflow Language). #if$_DATAPLANT Here, the $_PROJECT will build on the ARC container, which includes all the data, metadata, and documentations. #endif$_DATAPLANT - -

-

Will your data be made freely available in the public domain to permit the widest re-use possible? Will your data be licensed using standard reuse licenses, in line with the obligations set out in the Grant Agreement? -

- -

- - Yes, our data will be made freely available in the public domain to permit the widest re-use possible. Open licenses, such as Creative Commons (CC), will be used whenever possible. - -

-

Will the data produced in the project be useable by third parties, in particular after the end of the project? -

- -

- - There will be no restrictions once the data is made public. - -

-

Will the provenance of the data be thoroughly documented using the appropriate standards? Describe all relevant data quality assurance processes. -

-

- - The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analyzing data. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section. - -

+ function check_uncheck(checked, callback1, callback2, check_or_uncheck) { -

Describe all relevant data quality assurance processes. Further to the FAIR principles, DMPs should also address research outputs other than data, and should carefully consider aspects related to the allocation of resources, data security and ethical aspects. -

- -

- - The data will be checked and curated by using data collection protocol, personnel training, data cleaning, data analysis, and quality control #if$_DATAPLANT Furthermore, data will be analyzed for quality control (QC) problems using automatic procedures as well as by manual curation #endif$_DATAPLANT. Document all data quality assurance processes, including the data collection protocol, data cleaning procedures, data analysis techniques, and quality control measures. This documentation should be kept for future reference and should be made available to stakeholders upon request. + for (let i = 0; i < checked.length; i++) { - -

-

3    Other research outputs

-

In addition to the management of data, beneficiaries should also consider and plan for the management of other research outputs that may be generated or re-used throughout their projects. Such outputs can be either digital (e.g. software, workflows, protocols, models, etc.) or physical (e.g. new materials, antibodies, reagents, samples, etc.). + var check_box_item = checked[i]; + ////verbose console.log(check_box ); + let name = check_box_item.split('_')[1]; + // here the ________ should not match anything, make sure not to use it as a real placeholder + let match_name = "_________"; + try { + match_name = prefix.concat(name.toUpperCase()); + //verbose console.log("check for loop, starts " + match_name + " " + checked.length); + } catch (e) { } -

-

- - In the current data management plan, any digital output including but not limited to software, workflows, protocols, models, documents, templates, notebooks are all treated as data. Therefore, all aforementioned digital objects are already described in detail. For the non-digital objects, the data management plan will be closely connected to the digitalisation of the physical objects. #if$_DATAPLANT $_PROJECT will build a workflow which connects the ARC with an electronic lab notebook in order to also manage the physical objects. #endif$_DATAPLANT - -

-

Beneficiaries should consider which of the questions pertaining to FAIR data above, can apply to the management of other research outputs, and should strive to provide sufficient detail on how their research outputs will be managed and shared, or made available for re-use, in line with the FAIR principles. -

-

- - Open licenses, such as Creative Commons CC, will be used whenever possible even on the other digital objects. + // this is very tedious but necessary, because the find and replace of multiple character words in DOI is not trivial. - -

- + if (window.find) //for chrome, firefox + { - + let ns = window.find(match_name, aCaseSensitive = 0, aBackwards = 0, aWrapAround = 1, + aWholeWord = 0, aSearchInFrames = is_firefox, aShowDialog = 0); + while (ns) { + // this is very tedious but necessary, because the find and replace of multiple character words in DOM is not trivial. + //verbose console.log("found"); -

4.    Allocation of resources

-

What will the costs be for making data or other research outputs FAIR in your project (e.g. direct and indirect costs related to storage, archiving, re-use, security, etc.)?

-

The $_PROJECT will bear the costs of data curation, #if$_DATAPLANT ARC consistency checks, #endif$_DATAPLANT and data maintenance/security before transfer to public repositories. Subsequent costs are then borne by the operators of these repositories.

-

- - Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) but not charged against the $_PROJECT or its members but by the operation budget of these repositories. - -

-

- How will these be covered? Note that costs related to research data/output management are eligible as part of the Horizon Europe grant (if compliant with the Grant Agreement conditions) -

-

The cost born by the $_PROJECT are covered by the project funding. Pre-existing structures #if$_DATAPLANT such as structures, tools, and knowledge laid down in the DataPLANT consortium#endif$_DATAPLANT will also be used.

-

Who will be responsible for data management in your project?

-

The responsible person will be $_DATAOFFICER of the $_PROJECT.

-

How will long term preservation be ensured? Discuss the necessary resources to accomplish this (costs and potential value, who decides and how, what data will be kept and for how long)?

-

- - The data officer #if$_PARTNERS or $_PARTNERS #endif$_PARTNERS will ultimately decides on the strategy to preserve data that are not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT when the - project ends. This will be in line with EU guidlines, institute policies, and data sharing based on EU and international standards. - -

-

5.    Data security

-

What provisions are or will be in place for data security (including data recovery as well as secure storage/archiving and transfer of sensitive data)?

-

- - Online platforms will be protected by vulnerability scanning, two-factor authorization and daily automatic backups allowing immediate recovery. All partners holding confidential project data to use secure platforms with automatic backups and offsite secure copies. - #if$_DATAPLANT DataHUB and ARCs have been generated in DataPLANT, data security will be imposed. This comprises secure storage, and the use of password and usernames is generally transferred via separate safe media.#endif$_DATAPLANT - -

-

Will the data be safely stored in trusted repositories for long term preservation and curation?

-

- - Wherever there are certified repositories, these will be used as end-point repositories. #if$_RNASEQ Transcriptomics data and gene sequence data will be also made available upon publication via the standards - ENA/SRA, #endif$_RNASEQ #if$_METABOLOMIC metabolite data in e.g. Metabolights (and/or Nationwide repositories like the German NFDI or the French INRAe), #endif$_METABOLOMIC #if$_PROTEOMIC Proteomics data in - e.g. Pride/Proteomexchange #endif$_PROTEOMIC. In addition, the national resource will maintain safekeeping of data also after the project ends. #if$_DATAPLANT In addition, databases like e.g. Proteomexchange - do not support deep plant specific metadata; hence ARCs will be maintained to ensure the reusability of plant-specific metadata. #endif$_DATAPLANT - -

-

6.    Ethics

-

- - Are there, or could there be, any ethics or legal issues that can have an impact on data sharing? These can also be discussed in the context of the ethics review. If relevant, include references to ethics deliverables and ethics chapter in the Description of the Action (DoA). - -

-

- - At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee, however, diligence for plant resource benefit - sharing is considered (🡺see Nagoya protocol). #issuewarning you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya gets also part of sequence information. In any case - if you use material not from your (partner) country and characterize this physically e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action unless this is from e.g. - US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. #endissuewarning - -

-

Will informed consent for data sharing and long term preservation be included in questionnaires dealing with personal data?

-

- - The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication - activities using specific methods and procedures developed by the $_PROJECT partners to adhere to data protection. #issuewarning you need to inform and better get WRITTEN consent that you store emails and - names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them #endissuewarning - -

-

7.    Other issues

-

Do you, or will you, make use of other national/funder/sectorial/departmental procedures for data management? If yes, which ones (please list and briefly describe them)?

-

Yes, the $_PROJECT will use common Research Data Management (RDM) tools#if$_DATAPLANT and in particular resources developed by the NFDI of Germany#endif$_DATAPLANT.

-

-

-

3     Annexes

-

-

3.1     Abbreviations

-

- #if$_DATAPLANT

ARC Annotated Research Context

#endif$_DATAPLANT - -

CC Creative Commons

-

CC CEL Creative Commons Rights Expression Language

-

DDBJ DNA Data Bank of Japan

-

DMP Data Management Plan

-

DoA Description of Action

-

DOI Digital Object Identifier

-

EBI European Bioinformatics Institute

-

ENA European Nucleotide Archive

-

EU European Union

-

FAIR Findable Accessible Interoperable Reproducible

-

GDPR General data protection regulation (of the EU)

-

IP Intellectual Property

-

ISO International Organization for Standardization

-

MIAMET Minimal Information about Metabolite experiment

-

MIAPPE Minimal Information about Plant Phenotyping Experiment

-

MinSEQe Minimum Information about a high-throughput Sequencing Experiment

-

NCBI National Center for Biotechnology Information

-

NFDI National Research Data Infrastructure (of Germany)

-

NGS Next Generation Sequencing

-

RDM Research Data Management

-

RNASeq RNA Sequencing

-

SOP Standard Operating Procedures

-

SRA Short Read Archive

- #if$_DATAPLANT

SWATE Swate Workflow Annotation Tool for Excel

#endif$_DATAPLANT -

ONP Oxford Nanopore

-

qRTPCR quantitative real time polymerase chain reaction

-

WP Work Package

-

-

-

-

-

- -
-
-
-
-
-
-
+ let remove_range = window.getSelection().getRangeAt(0).cloneRange(); + let left_range = window.getSelection().getRangeAt(0).cloneRange(); + let right_range = window.getSelection().getRangeAt(0).cloneRange(); + let original_range = window.getSelection().getRangeAt(0).cloneRange(); + let if_range = window.getSelection().getRangeAt(0).cloneRange(); -
Data Management Plan of the DFG Project $_PROJECT
-

- 1.    Data description -

-

- 1.1    Introduction -

- - #if$_EU -

The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. - #endif$_EU To best profit from open data, it is necessary not only to store data but to make data Findable, Accessible, Interoperable and Reusable (FAIR). #if$_PROTECT Open and FAIR data, however, considers the need - to protect individual data sets. #endif$_PROTECT - -

-

- - The aim of this document is to provide guidelines on principles guiding the data management in the $_PROJECT and what data will be stored by using the responses to the DFG Data Management Plan (DMP) checklist - to generate a DMP document. - -

-

- - The detailed DMP instructs how data will be handled during and after the project. The $_PROJECT DMP is modified according to the DFG data management checklist. #if$_UPDATE It will be updated/its validity - checked during the $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. #endif$_UPDATE - -

-

- 1.2    How does your project generate new data? -

-

- Data of different types or of different domains will be generated differently. For example: -

-
    - #if$_RNASEQ -
  • - Short read sequencing will be either collected or outsourced and raw data will be received. -

  • - #endif$_RNASEQ #if$_METABOLOMIC -
  • - Metabolomic data will be generated using chromatography coupled to mass spectrometry and from enzyme platforms mostly. -

  • - #endif$_METABOLOMIC #if$_PROTEOMIC -
  • -

    - Proteomic data will be generated using an EU platform which are in line with community standards. -

    -
  • - #endif$_PROTEOMIC - - #if$_IMAGE -
  • -

    - Image data will be generated by using equipment (cameras, scanners, and microscopes) or software. Original images which contain metadata such as exif photo information will be archived. -

    -
  • - #endif$_IMAGE - - #if$_GENOMIC -
  • -

    - Genomic data will be created from sequencing data. The sequencing data will be collected by Next Generation Sequencing (NGS) equipment#if$_PARTNERS or get from parterners#endif$_PARTNERS. Then the sequencing data will be processed to get the genomic data. -

    -
  • - #endif$_GENOMIC - - #if$_GENETIC -
  • -

    - Genetic data will be generated by using Next Generation Sequencing (NGS) equipment. -

    -
  • - #endif$_GENETIC - - #if$_TARGETED -
  • -

    - Targeted assays (e.g. glucose and fructose content) will be generated using specific equipment or experiments. The procedure is fully documented in the lab book. -

    -
  • - #endif$_TARGETED - - - #if$_MODELS -
  • -

    - Models data will be generated by software simulations. The complete workflow, which includes the environment, runtime, parameter and results will be documented and achieved. -

    -
  • - #endif$_MODELS - - #if$_CODE -
  • -

    - The code data will be generated by programmers. -

    -
  • - #endif$_CODE - - #if$_EXCEL -
  • -

    - The Excel data will be generated by experimentalists or data analysts by using Office or open-source software. -

    -
  • - #endif$_EXCEL - - #if$_CLONED-DNA -
  • -

    - The cloned DNA data will be generated by using a sequencing tool. -

    -
  • - #endif$_CLONED-DNA - - #if$_PHENOTYPIC -
  • -

    - Phenotypic data will be generated using phenotyping platforms. -

    -
  • - #endif$_PHENOTYPIC -
- -

- - The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analyzing data. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards, as laid out in the next section. - -

+ let extended_begin_node; + let extended_begin_focus; + let extended_end_node; + let extended_end_focus; + let begin_node; + let begin_focus; -

- Public data will be extracted as described in paragraph 1.3. For the $_PROJECT, specific data sets will be generated by the consortium partners. -

-

- 1.3    Is existing data reused? -

-

- - The project builds on existing data sets and relies on them. #if$_RNASEQ For instance, without a proper genomic reference it is very difficult to analyze NGS data sets.#endif$_RNASEQ It is also important to include existing data sets on the expression and metabolic behaviour of $_STUDYOBJECT, but of course, also on existing characterization and the background knowledge. #if$_PARTNERS of the partners. #endif$_PARTNERS Genomic references can simply be gathered from reference databases for genomes/sequences, like the National Center for Biotechnology Information: NCBI (US); European Bioinformatics Institute: EBI (EU); DNA Data Bank of Japan: DDBJ (JP). Furthermore, prior 'unstructured' data in the form of publications and data contained therein will be used for decision making. - -

+ if ((window.getSelection().anchorOffset === 0) && (window.getSelection().anchorNode.previousSibling === null)) { + begin_node = window.getSelection().anchorNode; + begin_focus = window.getSelection().anchorOffset; + console.log("begin extended through focus " + begin_node + " end focus is " + begin_focus + " anchor case no.1"); + } else if (window.getSelection().anchorOffset < 3) { -

- 1.4    Which data types (in terms of data formats like image data, text data or measurement data) arise in your project and in what way are they further processed? -

-

- - We foresee that the following data about $_STUDYOBJECT will be collected and generated at the very least: $_PHENOTYPIC, $_GENETIC, $_GENOMIC, $_METABOLOMIC, $_RNASEQ, $_IMAGE, $_PROTEOMIC, $_TARGETED, - $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA and result data. Furthermore, data derived from the original raw data sets will also be collected. This is important, as different analytical pipelines - might yield different results or include - - ad-hoc - - data analysis parts#if$_DATAPLANT and these pipelines will be tracked in the DataPLANT ARC#endif$_DATAPLANT. Therefore, specific care will be taken, to document and archive these resources (including the - analytic pipelines) as well#if$_DATAPLANT relying on the vast expertise in the DataPLANT consortium #endif$_DATAPLANT. - -

+ begin_node = window.getSelection().anchorNode.previousSibling; + begin_focus = window.getSelection().anchorNode.previousSibling.length + window.getSelection().anchorOffset - 3; + //verbose console.log("anchor case 2 begin extended through focus " + begin_node); + } else { - 1.5    To what extent do these arise or what is the anticipated data volume? + begin_node = window.getSelection().anchorNode; -

- We expect to generate raw data in the range of $_RAWDATA GB of data. The size of the derived data will be about $_DERIVEDDATA GB. -

+ begin_focus = window.getSelection().anchorOffset - 3; + //verbose console.log("anchor case 3 begin extended through node " + begin_node.data); -

- 2.    Documentation and data quality -

-

- 2.1.    What approaches are being taken to describe the data in a comprehensible manner (such as the use of available metadata, documentation standards or ontologies)? -

+ } + if ((window.getSelection().focusOffset == window.getSelection().focusNode.length) && (window.getSelection().focusNode.nextSibling === null)) { + extended_end_node = window.getSelection().focusNode; + extended_end_focus = window.getSelection().focusOffset; + //verbose console.log("focus case 1 end extended through focus " + extended_end_node.data); -

- - As noted above, we foresee using minimal standards such as #if$_RNASEQ|$_GENOMIC #if$_MINSEQE MinSEQe for sequencing data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites #if$_MIAPPE and MIAPPE for phenotyping-like data #endif$_MIAPPE. The minimal information standards will allow the integration of data across projects, and its reuse according to established and tested protocols. - Specialized repositories will be used for common data types. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI).#if$_DATAPLANT The Whole datasets will also be wrapped into an ARC with allocated DOIs.#endif$_DATAPLANT. - Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata to interpret and analyze data in a biological context. By default, no proprietary formats will be used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In addition, text files might be edited in text processor files, but will be shared as pdf. -

- We will use Investigation, Study, Assay (ISA) specification for metadata creation. #if$_RNASEQ|$_GENOMIC For specific data (e.g., RNASeq or genomic data), we use metadata templates from the end-point repositories. #if$_MINSEQE The Minimum Information About a Next-generation Sequencing Experiment (MinSEQe) will also be used. - - #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC #if$_METABOLOMIC Metabolights submission compliant standards will be used for metabolomic data where this is acccepted by the consortium partners.#issuewarning some Metabolomics partners considers Metabolights - not an accepted standard#endissuewarning#endif$_METABOLOMIC As a part of plant research community, we use #if$_MIAPPE MIAPPE for phenotyping data in the broadest sense, but we will also be rely on #endif$_MIAPPE specific SOPs for additional - annotations #if$_DATAPLANT that consider advanced DataPLANT annotation and ontologies. #endif$_DATAPLANT - -

+ } else if (window.getSelection().focusOffset == window.getSelection().focusNode.length) { + extended_end_node = window.getSelection().focusNode.nextSibling; + extended_end_focus = 1; + //verbose console.log("focus case 2 end extended through next node "); - #if$_OTHERSTANDARDS Other standards will also be used, such as $_OTHERSTANDARDINPUT. #endif$_OTHERSTANDARDS - -

+ } else { + extended_end_node = window.getSelection().focusNode; + extended_end_focus = window.getSelection().focusOffset + 1; + //verbose console.log("focus case 3 end extended through focus " + extended_end_node.data); + } + right_range.setEnd(extended_end_node, extended_end_focus); - -

- - Open ontologies will be used where they are mature. As stated above, some ontologies and controlled vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT - Keywords about the experiment and the general consortium will be included, as well as an abstract about the data, where useful. In addition, certain keywords can be auto-generated from dense metadata and - its underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized DataPLANT ontologies that are supplemented where the ontology does not yet include the variables. - #endif$_DATAPLANT - -

-

- - In fact, open biomedical ontologies will be used where they are mature. As stated in the previous question, sometimes ontologies and controlled vocabularies might have to be extended. #if$_DATAPLANT Here, the - $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT - -

-

- 2.2    What measures are being adopted to ensure high data quality? -

+ if_range.setStart(begin_node, begin_focus); + let if_text = if_range.toString(); + const if_cha = if_text.slice(-match_name.length - 1, -match_name.length); + const right_text = right_range.toString(); -

- - The $_PROJECT aims at the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also be informed about the provenance of data analyzing data. Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure that the data are well generated and also well annotated with metadata using open standards. - - Data variables will be allocated standard names. For example, genes, proteins and metabolites will be named according to approved nomenclature and conventions. These will also be linked to functional ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by humans. Plant names will include traditional names, binomials, and all strain/cultivar/subspecies/variety identifiers. - - -

+ const right_cha = right_text.slice(-1); -

- - To maintain data integrity and to be able to re-analyze data, data sets will get version numbers where this is useful (e.g. raw data must not be changed and will not get a version number and is considered - immutable). #if$_DATAPLANT this is automatically supported by the ARC Git DataPLANT infrastructure. #endif$_DATAPLANT - -

-

- - As mentioned above, we foresee using e.g. #if$_RNASEQ|$_GENOMIC #if$_MINSEQE MinSEQe for sequencing data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for - metabolites#if$_MIAPPE as well as MIAPPE for phenotyping-like data#endif$_MIAPPE. The latter will thus allow the integration of data across projects and safeguards that reuse established and tested protocols. - Additionally, we will use ontology terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the $_PROJECT. - -

-

- 2.3    Are quality controls in place and if so, how do they operate? -

-

- - The data will be checked and curated through the project period. #if$_DATAPLANT Furthermore, data will be analyzed for quality control (QC) problems using automatic procedures as well as by manual curation. - #endif$_DATAPLANT Phd students and lab professionals will be responsible for the first-hand quality control. Afterwards, the data will be checked and annotated by $_DATAOFFICER. #if$_RNASEQ|$_GENOMIC - FastQC will be conducted on the base-calling. #endif$_RNASEQ|$_GENOMIC Before publication, the data will be controlled again. - -

+ const left_text = left_range.toString(); -

- 2.4    Which digital methods and tools (e.g. software) are required to use the data? -

-

- The $_PROJECT will use common Research Data Management (RDM) tools#if$_DATAPLANT and in particular resources developed by the NFDI of Germany#endif$_DATAPLANT. -

-

- #if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. #endif$_PROPRIETARY -

-

- - #if!$_PROPRIETARY No specialized software will be needed to access the data, usually just a modern browser. Access will be possible through web interfaces. For data processing after obtaining raw data, - typical open-source software can be used. As no proprietary software is needed, no documentation needs to be provided. #endif!$_PROPRIETARY - -

-

- - #if$_DATAPLANT However, DataPLANT resources are well described, and their setup is documented on their github project pages. - #endif$_DATAPLANT - -

-

- - #if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC commander, and the DMP tool which will not necessarily make the interaction with data more convenient. - #endif$_DATAPLANT - -

- As stated above, here we use publicly available open-source and well-documented certified software #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY. -

-

- 3.    Storage and technical archiving the project -

+ place_holder = /\#if\$\_.{0,10}\$\_\w*\b/.exec(left_text); + //console.log("left_text is: "+ left_text +" place holder: " + place_holder); + //verbose console.log("if_cha: " + if_cha); -

- 3.1    How is the data to be stored and archived throughout the project duration? -

-

- - Wherever there are certified repositories, these will be used as end-point repositories. #if$_RNASEQ Transcriptomics data and gene sequence data will be also made available upon publication via the standards - ENA/SRA, #endif$_RNASEQ #if$_METABOLOMIC metabolite data in e.g. Metabolights (and/or Nationwide repositories like the German NFDI or the French INRAe), #endif$_METABOLOMIC #if$_PROTEOMIC Proteomics data in - e.g. Pride/Proteomexchange #endif$_PROTEOMIC. In addition, the national resource will maintain safekeeping of data also after the project ends. #if$_DATAPLANT In addition, databases like e.g. Proteomexchange - do not support deep plant-specific metadata; hence ARCs will be maintained to ensure reusability. #endif$_DATAPLANT - -

-

- Data will be made available for many years#if$_DATAPLANT and potentially indefinitely after the end of the project#endif$_DATAPLANT. -

-

- In any case data submitted to international - discipline related repositories which use specialized technologies (as detailed above) e.g. ENA /Pride would be subject to local data storage regulation. -

-

- 3.2    What is in place to secure sensitive data throughout the project duration (access and usage rights)? -

-

- - #if$_DATAPLANT In DataPLANT, data management relies on the Annotated Research Context (ARC). It is password protected, so before any data can be obtained or samples generated, an authentication needs to - take place. #endif$_DATAPLANT - -

-

- - In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally, and the username and the password will be required (see also our GDPR rules). - In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. this is the case for ENA as well and both are in line with GDPR requirements. - -

-

- There will be no restrictions once the data is made public. -

+ if (if_cha == "|" && check_or_uncheck == "check") { + if (window.getSelection().anchorOffset < 13 && window.getSelection().anchorNode.previousSibling !== undefined) { + extended_begin_node = window.getSelection().anchorNode.previousSibling; + extended_begin_focus = window.getSelection().anchorNode.previousSibling.length + window.getSelection().anchorOffset - 13; + //verbose console.log("begin extended through focus " + extended_begin_node); -

- 4.    Legal obligations and conditions -

-

- 4.1    What are the legal specifics associated with the handling of research data in your project? -

-

- - At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, since this is plant data, there is no need for an ethics committee, however, diligence for plant resource - benefit sharing is considered (🡺see Nagoya protocol). #issuewarning you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya gets also part of sequence information. - In any case if you use material not from your (partner) country and characterize this physically e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action unless - this is from e.g. US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. #endissuewarning - -

-

- - The only personal data that will potentially be stored is the submitter name and affiliation in the metadata for data. In addition, personal data will be collected for dissemination and communication - activities using specific methods and procedures developed by the $_PROJECT partners to adhere to data protection. #issuewarning you need to inform and better get WRITTEN consent that you store emails and - names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t invent them #endissuewarning - -

-

- 4.2    Do you anticipate any implications or restrictions regarding subsequent publication or accessibility? -

-

- - Once data is transferred to the $_PROJECT platform#if$_DATAPLANT and ARCs have been generated in DataPLANT#endif$_DATAPLANT, data security will be imposed. This comprises secure storage, and the use of - passwords and usernames is generally transferred via separate safe media. - -

+ } else { -

- 4.3    What is in place to consider aspects of use and copyright law as well as ownership issues? -

-

- Open licenses, such as Creative Commons (CC), will be used whenever possible. -

-

- 4.4    Are there any significant research codes or professional standards to be taken into account? -

+ extended_begin_node = window.getSelection().anchorNode; -

- - Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata to interpret and analyze data in a biological context. By default, no proprietary formats will - be used; however, Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components in form#endif$_DATAPLANT. In addition, - text files might be edited in text processor files, but will be shared as pdf. - -

+ extended_begin_focus = window.getSelection().anchorOffset - 13; + //verbose console.log("begin extended through node " + extended_begin_node.data); -

- 5.    Data exchange and long-term data accessibility -

+ } -

- 5.1    Which data sets are especially suitable for use in other contexts? -

-

- - The data will be useful for the $_PROJECT partners, the scientific community working on $_STUDYOBJECT or the general public interested in $_STUDYOBJECT. Hence, the $_PROJECT also strives to collect the data - that has been disseminated and potentially advertise it#if$_DATAPLANT e.g. through the DataPLANT platform or other means #endif$_DATAPLANT, if it is not included in a publication anyway, which is the most - likely form of dissemination. - -

+ left_range.setStart(extended_begin_node, extended_begin_focus); + const left_text = left_range.toString(); -

- 5.2    Which criteria are used to select research data to make it available for subsequent use by others? -

+ //verbose console.log("left_text: " + left_text); + place_holder = /\#if\$\_.{0,10}\$\_\w*\b/.exec(left_text); + //verbose console.log("place holder: " + place_holder); + try { selection.removeAllRanges(); } catch (e) { -

- - By default, all data sets from the $_PROJECT will be shared with the community and made openly available. This is, however, after partners have had the ability to check for IP protection (according to - agreements and background rights). #if$_INDUSTRY This applies in particular to data pertaining to the industry. #endif$_INDUSTRY However, all partners also strive for IP protection of data sets which will - be tested and due diligence will be given. - -

-

- - Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the - reasons for opting out. - -

-

- 5.3    Are you planning to archive your data in a suitable infrastructure? -

-

- #if$_DATAPLANT As the $_PROJECT is closely aligned with DataPLANT, the ARC converter and DataHUB will be used to find the end-point repositories and upload to the repositories automatically. #endif$_DATAPLANT - -

-

- - #if!$_DATAPLANT Data will be made available via the $_PROJECT platform using a user-friendly front end that allows data visualization. Besides this it will be ensured that data which can be stored in - international discipline related repositories which use specialized technologies (Sequencing at the #if$_NCBI national US center: NCBI:#endif$_NCBI #if$_GEO Gene Expression Ominibus: GEO;#endif$_GEO European Bioinformatics Institute (EBI) archives: #if$_ENA European Nucleotide Archive: ENA;#endif$_ENA #if$_ARRAYEXPRESS Functional Genomics Data Archive: ArrayExpress;#endif$_ARRAYEXPRESS - #if$_PRIDE Proteome database: PRIDE;#endif$_PRIDE #if$_METABOLIGHTS metabolomic database: MetaboLights;#endif$_METABOLIGHTS #if$_OTHEREP and $_OTHEREP #endif$_OTHEREP ) will be used to store data and the data will be processed there as well. The ARC and the converters provided by DataPLANT will guarantee the upload into the endpoint repositories is fast and easy.#endif!$_DATAPLANT - -

-

- - As noted above, specialized repositories like SRA /ENA, Pride /Proteomexchange are the most common ones and will be used when appropriate. In the case of unstructured less standardized data (e.g. experimental - phenotypic measurements), these will be metadata annotated and if complete given a digital object identifier (DOI). #if$_DATAPLANT and the whole data sets wrapped into an ARC will get DOIs as well. - #endif$_DATAPLANT - -

-

- - The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. - #if$_DATAPLANT For DataPLANT, this has been agreed upon. #endif$_DATAPLANT #issuewarning if no data management platform such as DataPLANT is used, then you need to find appropriate repository to store or archive your data after publication. #endissuewarning - -

-

- 5.4    If so, how and where? Are there any retention periods? -

-

- There are no restrictions, beyond the aforementioned IP checks, which are in line with e.g. European open data policies. -

-

- - The $_PARTNERS decides on preservation of data not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT#endif$_DATAPLANT after project end. This will be in line with EU - institute policies and data sharing based on EU and international standards. - -

-

- 5.5    When is the research data available for use by third parties? -

-

- - #if$_early The data will be published as soon as possible to guarantee reusability. #endif$_early #if$_ipissue In general, IP issues will first be checked. #endif$_ipissue All consortium partners will be - encouraged to make data available prior to publication openly and/or under pre-publication agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International Data - Release Workshop. #endif$_GENOMIC - -

+ }; + window.find(place_holder); + if_range = window.getSelection().getRangeAt(0); + if_text = if_range.toString(); -

- 6.    Responsibilities and resources -

+ //verbose console.log(" selection updated because of \| : " + window.getSelection().focusNode.data + "________" + if_text); -

- 6.1    Who is responsible for adequate handling of the research data (description of roles and responsibilities within the project)? -

-

- The responsible will be $_DATAOFFICER as data Officer. - The data responsible(s) (data officer#if$_PARTNERS or $_PARTNERS #endif$_PARTNERS) decides on the preservation of data not submitted to end-point subject area repositories #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT after the - project end. This will be in line with EU institute policies, and data sharing based on EU and international standards. -

-

- 6.2    Which resources (costs; time or other) are required to implement adequate handling of research data within the project? -

-

- The costs comprise data curation, #if$_DATAPLANT ARC consistency checks, #endif$_DATAPLANT and maintenance on the $_PROJECT´s side. -

-

- - Additionally, last-level costs for storage are incurred by end-point repositories (e.g. ENA) but not charged against the $_PROJECT or its members but by the operation budget of these repositories. - -

-

- A large part of the cost is covered by the $_PROJECT #if$_DATAPLANT and the structures, tools and knowledge laid down in the DataPLANT consortium. #endif$_DATAPLANT -

-

- 6.3    Who is responsible for curating the data once the project has ended? -

-

- As applicable, $_DATAOFFICER, who is responsible for ongoing data maintenance will also take care of it after the finish of the $_PROJECT. #if$_DATAPLANT DataPLANT as external data archives may provide such services in some cases. #endif$_DATAPLANT -

- -

- -

-

- -

-

- 7     - Annexes -

-

- -

-

- 7.1     - - Abbreviations -

-

- -

-#if$_DATAPLANT

ARC Annotated Research Context

#endif$_DATAPLANT - -

CC Creative Commons

-

CC CEL Creative Commons Rights Expression Language

-

DDBJ DNA Data Bank of Japan

-

DMP Data Management Plan

-

DoA Description of Action

-

DOI Digital Object Identifier

-

EBI European Bioinformatics Institute

-

ENA European Nucleotide Archive

-

EU European Union

-

FAIR Findable Accessible Interoperable Reproducible

-

GDPR General data protection regulation (of the EU)

-

IP Intellectual Property

-

ISO International Organization for Standardization

-

MIAMET Minimal Information about Metabolite experiment

-

MIAPPE Minimal Information about Plant Phenotyping Experiment

-

MinSEQe Minimum Information about a high-throughput Sequencing Experiment

-

NCBI National Center for Biotechnology Information

-

NFDI National Research Data Infrastructure (of Germany)

-

NGS Next Generation Sequencing

-

RDM Research Data Management

-

RNASeq RNA Sequencing

-

SOP Standard Operating Procedures

-

SRA Short Read Archive

- #if$_DATAPLANT

SWATE Swate Workflow Annotation Tool for Excel

#endif$_DATAPLANT -

ONP Oxford Nanopore

-

qRTPCR quantitative real time polymerase chain reaction

-

WP Work Package

-

-

-

-

-

-
-
-
-
-

- - Practical Data Management Guide of the $_PROJECT - -

-

- -

- - This practical guide of data management in the $_PROJECT should be considered as a minimum description, leaving flexibility to include additional actions of specific domain or to national or local - legislation.#if$_EU The $_PROJECT will follow EU FAIR principles.  #endif$_EU  - -

-
-

- - The practical guide of data management in the $_PROJECT aims at providing a complete walkthrough for the researcher. The contents are customized based on the user input in the Data Management Plant - Generator (DMPG). The practices in this guide are customized to fit related legal, ethical, standardization and funding body requirements. The suitable practices will cover all steps of a data - management life-cycle: - -

-
-
    -
  1. -

    - : " + left_text); + callback2(left_range, check_box_item); + + } else { + + //verbose console.log("_____other cases"); + //verbose console.log("last character is: ", right_cha); + callback1(if_range, check_box_item); + if_range.detach(); + } + + //window.find("#endif"); + + if_range.detach(); + ns = window.find(match_name, aCaseSensitive = 0, aBackwards = 0, aWrapAround = 1, + aWholeWord = 0, aSearchInFrames = is_firefox, aShowDialog = 0); + } + } + + + //else if (document.body.createTextRange) //for ie etc. + //{ + // let range_if = document.body.createTextRange(); + // while (range_if.findText(match_name)) { + // range_if.pasteHTML('atest'); + // } + //} + + //verbose console.log("check for loop, ends " + match_name); + + } + } + + + + /** + * match a place holder and then remove it. + * @param {string} right_range - the extended keyword (includes #if and #endif) need to be removed. + * + */ + function match_remove(right_range, check_box_item) { + + const text_after_if = (right_range.toString().slice(3)); + //verbose console.log("mach_remove function: after_endif: " + text_after_if); + + window.find("#endif" + text_after_if); + + right_range.setEnd(window.getSelection().focusNode, window.getSelection().focusOffset); + right_range.deleteContents(); + right_range.detach(); + + } + + function HtmlEncode(s) { + var el = document.createElement("div"); + el.innerText = el.textContent = s; + s = el.innerHTML; + return s; + } + + /** + * replace a keyword by its formal meaning. + * @param {string} before - the keyword need to be replaced. + * @param {string} after - the formal text represented by the keyword. + */ + function find_replace(option) { + + const selection = window.getSelection(); + selection.removeAllRanges(); + + const before = Object.keys(option)[0]; + const after = option[before]; + //verbose console.log("before: " + before + ", after: " + after); + + if (window.find) //for chrome, firefox + { + + //verbose console.log("window.find is: " + window.find); + while (window.find(before, aCaseSensitive = 0, aBackwards = is_firefox, aWrapAround = is_firefox, + aWholeWord = 0, aSearchInFrames = 0, aShowDialog = 0)) { + + let last_letter; + const w_selection = window.getSelection(); + const range = w_selection.getRangeAt(0); + w_selection.focusNode; + if (w_selection.focusNode.length == w_selection.focusOffset) { + //verbose console.log("first if : " + w_selection.focusNode.length); + last_letter = ""; + + } else { + range.setEnd(w_selection.focusNode, w_selection.focusOffset + 1); + last_letter = range.toString().slice(-1); + } + if ((w_selection.anchorOffset == 0) && (w_selection.anchorNode.previousSibling === null)) { + + first_letter = ""; + } else if ((w_selection.anchorOffset == 0)) { + const old_offset = w_selection.anchorOffset; + //verbose console.log("no enough range"); + range.setStart(w_selection.anchorNode.previousSibling, w_selection.anchorNode.previousSibling.length - 1); + first_letter = range.toString().slice(0, 1); + } else { + //verbose console.log("enough range" + w_selection.anchorOffset); + range.setStart(w_selection.anchorNode, w_selection.anchorOffset - 1); + first_letter = range.toString().slice(0, 1); + } + + if (/[,]/.test(last_letter)) { + //verbose console.log(range.toString(), "last letter is comma: " + last_letter); + range.deleteContents(); + //verbose console.log(encodeURIComponent(after)); + let newText = document.createElement("span"); + + newText.setAttribute("class", "text-warning text-center"); + newText.setAttribute("name", before.split("_")[1] + "-to-"); + newText.setAttribute("onmouseover", 'const e = document.getElementsByName("' + before + '"); e[0].classList.add("border-highlight"); e[0].focus({preventScroll:true}) '); + newText.setAttribute("onmouseleave", 'document.getElementsByName("' + before + '")[0].classList.remove("border-highlight"); '); + newText.setAttribute("onclick", 'document.getElementsByName("' + before + '")[0].parentElement.scrollIntoView();document.getElementsByName("' + before + '")[0].focus(); '); + newText.innerHTML = " " + first_letter + HtmlEncode(after) + last_letter; + range.insertNode(newText); + + } else if (/[a-zA-Z\|\$\!]/.test(last_letter)) { + //verbose console.log(range.toString(), "last letter " + last_letter); + + } else if (/[a-zA-Z\|\$\!]/.test(first_letter)) { + //verbose console.log(range.toString(), "first letter " + first_letter); + } else { + //verbose console.log(range.toString(), "deleted with last letter: " + last_letter); + range.deleteContents(); + //verbose console.log(encodeURIComponent(after)); + let newText = document.createElement("span"); + + newText.setAttribute("class", "text-warning text-center"); + newText.setAttribute("name", before.split("_")[1] + "-to-"); + newText.setAttribute("onmouseover", 'const e = document.getElementsByName("' + before + '"); e[0].classList.add("border-highlight"); e[0].focus({preventScroll:true}) '); + newText.setAttribute("onmouseleave", 'document.getElementsByName("' + before + '")[0].classList.remove("border-highlight"); '); + newText.setAttribute("onclick", 'document.getElementsByName("' + before + '")[0].parentElement.scrollIntoView();document.getElementsByName("' + before + '")[0].focus(); '); + + // newText.innerHTML =" "+ first_letter + HtmlEncode(after) + last_letter+ " "; + newText.innerHTML = first_letter + HtmlEncode(after) + last_letter; + + range.insertNode(newText); + + } + + } + } else if (document.createTextRange) //for ie etc. + { + var range = document.createTextRange(); + while (range.findText(before)) { + range.pasteHTML(after); + } + } + + } + + //progress-bar + var x = 0; + + window.addEventListener('beforeunload', function (e) { + // Cancel the event + e.preventDefault(); // If you prevent default behavior in Mozilla Firefox prompt will always be shown + // Chrome requires returnValue to be set + e.returnValue = ''; + + }); + + window.addEventListener('pagehide', (event) => { + scroll_y = document.getElementById("split-0").scrollTop; + window.localStorage.setItem("scroll_y", scroll_y); + console.log("scroll_y page hide to cache is " + scroll_y); + }); + + // initialization + window.addEventListener("load", function () { + cookie_modal = new bootstrap.Modal(document.getElementById('cookie_modal'), { + keyboard: false + }) + if (cached_template == null) { + cached_template = doc_name; + } else if (cached_template == "user-defined") { + //doc_name = "user-defined"; + document.getElementById("user-defined").innerHTML = window.localStorage.getItem("fulltext_user"); + } + if (window.localStorage.getItem("cookie_consent") !== "yes") { + + cookie_modal.show(); + } + var toast_el_list = [].slice.call(document.querySelectorAll('.toast')); + toast_list = toast_el_list.map(function (toast_el) { + return new bootstrap.Toast(toast_el, { animation: true, dely: 0 }) + }); + window.localStorage.setItem("user_info", "visited"); + saved_a = update_saved_json(saved_a), + updateSavedAnswers(); + load_dmp(reload_answers, name = cached_template); + //document.getElementById("split-0").scrollTop= scroll_y; + + //verbose console.log("current url is: " + window.location.href); + const urls = window.location.href.split("?"); + if (urls.length > 1) { + + fetch(urls[1]) + .then(response => response.json()) + .then(json => { + saved_a = json; + updateSavedAnswers(); + load_dmp(reload_answers, name = cached_template); + }) + + } + const urlroute = window.location.href.split("#"); + switch (urlroute.slice(-1)[0]) { + case "tutorial": + tour.show(); + break; + case "Tutorial": + tour.show(); + break; + case "t": + tour.show(); + break; + case "Horizon2020": + load_dmp(reload_answers, name = "dmp1"); + break; + case "HorizonEurope": + load_dmp(reload_answers, name = "horizon_europe"); + break; + case "DFG": + load_dmp(reload_answers, name = "dfg-dmp"); + break; + case "dfg": + load_dmp(reload_answers, name = "dfg-dmp"); + break; + case "bmbf": + load_dmp(reload_answers, name = "bmbf-dmp"); + break; + case "PracticeGuide": + load_dmp(reload_answers, name = "practical-guide"); + break; + + + + } + var cache_management = new bootstrap.Offcanvas(document.getElementById('cache_management')); + var upload_json1 = document.getElementById("upload"); + upload_json1.addEventListener("change", open_storage_modal, false); + var upload_json2 = document.getElementById("upload_menu"); + upload_json2.addEventListener("change", open_storage_modal, false); + + + + function open_storage_modal() { + + const list1 = document.getElementById("current_input"); + current_input_answer = JSON.parse(JSON.stringify(saved_a)); + + current_input_answer["update"]["storage"][0]["answer"]["replace"] = JSON.parse(JSON.stringify(saved_a["replace"])); + current_input_answer["update"]["storage"][0]["answer"]["checkbox"] = JSON.parse(JSON.stringify(saved_a["checkbox"])); + list1.innerHTML = ""; + uploaded_origin = 0; + current_origin = 0; + + document.getElementById("current_answer_link").innerHTML = "Current input"; + document.getElementById("uploaded_answer_link").innerHTML = "Uploaded input"; + for (let [key, value] of Object.entries(saved_a["update"]["storage"])) { + + + const new_option = document.createElement("li"); + const option_link = document.createElement("a"); + + option_link.setAttribute("class", "dropdown-item"); + option_link.setAttribute("href", "#"); + option_link.setAttribute("onclick", "assign_current_input(this)"); + option_link.setAttribute("data-option-no", key); + if (key == 0) { + option_link.innerHTML = "Default cache"; + new_option.appendChild(option_link); + list1.appendChild(new_option); + } else if (value["answer"]["timestamp"] != undefined) { + + option_link.innerHTML = "Slot " + key + " " + value["answer"]["timestamp"]; + new_option.appendChild(option_link); + list1.appendChild(new_option); + + } + } + + let a2; + try { + upload_json1.files[0].text().then(text => (a2 = JSON.parse(text))).then(a2 => (uploaded_input_answer = a2, update_storage_dropdown(a2), cache_management.show(), + update_table())); + + } catch (e) { + upload_json2.files[0].text().then(text => (a2 = JSON.parse(text))).then(a2 => (uploaded_input_answer = a2, update_storage_dropdown(a2), cache_management.show(), + update_table())); + } + + this.value = null; + return false; + + } + parseText(); + const dmp1_vis = document.getElementById("dmp1").querySelector("#vis").childNodes[0].childNodes[1]; + // Options for the observer (which mutations to observe) + const config = { attributes: true, childList: false, subtree: false }; + + // Callback function to execute when mutations are observed + const callback = (mutationList, observer) => { + for (const mutation of mutationList) { + var canvas = document.createElement('canvas'); + canvas.setAttribute("width", "600px"); + canvas.setAttribute("height", "600px"); + + var ctx = canvas.getContext('2d'); + // instead of using raw xml text, XMLSerializer can produce better html text + var data = (new XMLSerializer()).serializeToString(document.getElementById("dmp1").querySelector("#vis").childNodes[0]); + var DOMURL = window.URL || window.webkitURL || window; + var img = new Image(); + var svg = new Blob([data], { type: 'image/svg+xml;charset=utf-8' }); + var url = DOMURL.createObjectURL(svg); + var png = new Image(); + img.onload = function () { + ctx.drawImage(img, 0, 0, canvas.width, canvas.height); + DOMURL.revokeObjectURL(url); + const png_url = canvas.toDataURL('image/png'); + png.src = png_url; + } + + img.src = url; + try { + document.getElementById("doc3").querySelector("#vis").innerHTML = ""; + } catch (error) { + + } + + if (mutation.type === 'childList') { + //verbose console.log('A child node has been added or removed.'); + try { + document.getElementById("doc3").querySelector("#vis").appendChild(png); + png.setAttribute("style", "max-width:100%"); + png.setAttribute("class", "mx-auto d-block"); + } catch (error) { + + } + + } else if (mutation.type === 'attributes') { + //verbose console.log(`The ${mutation.attributeName} attribute was modified.`); + try { + document.getElementById("doc3").querySelector("#vis").appendChild(png); + png.setAttribute("style", "max-width:100%"); + png.setAttribute("class", "mx-auto d-block"); + } catch (error) { + + } + } else if (mutation.type === 'subtree') { + //verbose console.log(`subtree`); + try { + document.getElementById("doc3").querySelector("#vis").appendChild(png); + png.setAttribute("style", "max-width:100%"); + png.setAttribute("class", "mx-auto d-block"); + } catch (error) { + + } + } + } + + }; + + // Create an observer instance linked to the callback function + const observer = new MutationObserver(callback); + + // Start observing the target node for configured mutations + + + observer.observe(dmp1_vis, config); + + + }); + + + + function update_saved_json(answers) { + try { + answers["update"]; + answers["update"]["storage"]; + answers["update"]["storage"][0]; + } catch (e) { + answers.update = {}; + answers.update.storage = [ + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + + ]; + answers["update"]["storage"][0]["answer"]["replace"] = answers["replace"]; + answers["update"]["storage"][0]["answer"]["checkbox"] = answers["checkbox"]; + }; + if (answers["update"]["storage"].length < 5) { + answers["update"]["storage"] = [ + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + + ]; + answers["update"]["storage"][0]["answer"]["replace"] = answers["replace"]; + answers["update"]["storage"][0]["answer"]["checkbox"] = answers["checkbox"]; + + + } else if (answers["update"]["storage"].length == 5) { + answers["update"]["storage"].push({ "answer": [], "name": [], }); + answers["update"]["storage"][5] = answers["update"]["storage"][4]; + answers["update"]["storage"][4] = answers["update"]["storage"][3]; + answers["update"]["storage"][3] = answers["update"]["storage"][2]; + answers["update"]["storage"][2] = answers["update"]["storage"][1]; + answers["update"]["storage"][1] = answers["update"]["storage"][0]; + answers["update"]["storage"][0]["answer"]["replace"] = answers["replace"]; + answers["update"]["storage"][0]["answer"]["checkbox"] = answers["checkbox"]; + + } else { + answers["update"]["storage"][0]["answer"]["replace"] = answers["replace"]; + answers["update"]["storage"][0]["answer"]["checkbox"] = answers["checkbox"]; + + } + + + + return answers; + } + function update_storage_dropdown(a2) { + const list2 = document.getElementById("uploaded_input"); + list2.innerHTML = ""; + + uploaded_input_all = update_saved_json(a2); + for (let [key, value] of Object.entries(uploaded_input_all["update"]["storage"])) { + + const new_option = document.createElement("li"); + const option_link = document.createElement("a"); + + + option_link.setAttribute("class", "dropdown-item"); + option_link.setAttribute("href", "#"); + option_link.setAttribute("onclick", "assign_uploaded_input(this)"); + option_link.setAttribute("data-option-no", key); + if (key == 0) { + option_link.innerHTML = "Default cache"; + new_option.appendChild(option_link); + list2.appendChild(new_option); + + + } else if (value["answer"]["timestamp"] === undefined) { + + + + } else { + option_link.innerHTML = "Slot " + key + " " + value["answer"]["timestamp"]; + new_option.appendChild(option_link); + list2.appendChild(new_option); + + } + + + + + + + + + + } + // at the initiation, the top level answer is used. + + + + + + } + + function assign_current_input(element) { + let a = saved_a; + element.parentElement.parentElement.previousElementSibling.innerHTML = element.innerHTML.split(",")[0]; + const no = element.getAttribute("data-option-no"); + current_origin = no; + + update_table(); + saved_a["replace"] = saved_a["update"]["storage"][current_origin]["answer"]["replace"]; + + } + + + function assign_uploaded_input(element) { + const no = element.getAttribute("data-option-no"); + element.parentElement.parentElement.previousElementSibling.innerHTML = element.innerHTML.split(",")[0]; + uploaded_origin = no; + uploaded_input_answer = uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]; + update_table(); + + + } + + function update_table() { + + const compare_list = compare_answers(saved_a["update"]["storage"][current_origin]["answer"], uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]); + + const body = document.getElementById("cache_management_modal_body"); + const table = document.createElement("div"); + + for (let [key, value] of Object.entries(saved_a["update"]["storage"][current_origin]["answer"]["replace"])) { + + + + const row = document.createElement("div"); + + const key_pool = compare_list["4"]; + + + + + + + row.setAttribute("class", "row border-bottom border border-1 mb-1"); + row.setAttribute("name", "input_table_row"); + const col1 = document.createElement("div"); + col1.setAttribute("class", "col-3"); + col1.innerHTML = key.toUpperCase().split("_")[1]; + const col2 = document.createElement("div"); + col2.innerHTML = value; + col2.setAttribute("class", "col-4"); + const col3 = document.createElement("div"); + col3.setAttribute("class", "col-4"); + col3.innerHTML = uploaded_input_answer["replace"][key]; + const col4 = document.createElement("div"); + col4.setAttribute("class", "col-1"); + + const col4_button = document.createElement("input"); + + col4_button.setAttribute("class", "form-check-input"); + col4_button.setAttribute("type", "checkbox"); + col4_button.setAttribute("aria-label", "..."); + col4_button.setAttribute("id", "storage_checkbox_" + key); + //verbose console.log(key); + + + + col4_button.addEventListener("change", function () { + if (this.checked) { + saved_a["update"]["storage"][current_origin]["answer"]["replace"][key] = + uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]["replace"][key]; + row.setAttribute("style", "background-color: #f0f5e6"); + col2.innerHTML = uploaded_input_answer["replace"][key]; + + saved_a["replace"][key] = saved_a["update"]["storage"][current_origin]["answer"]["replace"][key]; + + const now = new Date(); + saved_a["update"]["storage"][current_origin]["answer"]["timestamp"] = now.toLocaleString(); + load_dmp(reload_answers, name = cached_template); + } else { + saved_a["update"]["storage"][current_origin]["answer"]["replace"][key] = current_input_answer["update"]["storage"][current_origin]["answer"]["replace"][key]; + row.setAttribute("style", "background-color: #E6A5B0"); + col2.innerHTML = value; + saved_a["replace"][key] = saved_a["update"]["storage"][current_origin]["answer"]["replace"][key]; + load_dmp(reload_answers, name = cached_template); + + + + } + } + + ); + + + if (Object.values(key_pool).includes(key)) { + row.setAttribute("style", "background-color: #f0f5e6") + + } else { + col4.append(col4_button); + row.setAttribute("style", "background-color: #E6A5B0"); + } + + row.append(col1, col2, col3, col4); + value == "" ? console.log("empty input") : table.append(row); + + + + } + + for (let [key, value] of Object.entries(saved_a["update"]["storage"][current_origin]["answer"]["checkbox"])) { + + + + const row = document.createElement("div"); + + const key_pool = compare_list["5"]; + + + + row.setAttribute("class", "row border-bottom border border-1 mb-1"); + row.setAttribute("name", "input_table_row"); + const col1 = document.createElement("div"); + col1.setAttribute("class", "col-3"); + col1.innerHTML = key.toUpperCase().split("_")[1]; + const col2 = document.createElement("div"); + col2.innerHTML = JSON.stringify(value, null, "\t"); + col2.setAttribute("class", "col-4"); + const col3 = document.createElement("div"); + col3.setAttribute("class", "col-4"); + col3.innerHTML = JSON.stringify(uploaded_input_answer["checkbox"][key], null, "\t"); + const col4 = document.createElement("div"); + col4.setAttribute("class", "col-1"); + + const col4_button = document.createElement("input"); + + col4_button.setAttribute("class", "form-check-input"); + col4_button.setAttribute("type", "checkbox"); + col4_button.setAttribute("aria-label", "..."); + col4_button.setAttribute("id", "storage_checkbox_checkbox_" + key); + //verbose console.log(key); + + + + col4_button.addEventListener("change", function () { + if (this.checked) { + saved_a["update"]["storage"][current_origin]["answer"]["checkbox"][key] = + uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]["checkbox"][key]; + row.setAttribute("style", "background-color: #f0f5e6"); + col2.innerHTML = JSON.stringify(uploaded_input_answer["checkbox"][key], null, "\t"); + this.checked = true; + saved_a["checkbox"][key] = saved_a["update"]["storage"][current_origin]["answer"]["checkbox"][key]; + + const now = new Date(); + saved_a["update"]["storage"][current_origin]["answer"]["timestamp"] = now.toLocaleString(); + load_dmp(reload_answers, name = cached_template); + } else { + saved_a["update"]["storage"][current_origin]["answer"]["checkbox"][key] = current_input_answer["update"]["storage"][current_origin]["answer"]["checkbox"][key]; + row.setAttribute("style", "background-color: #E6A5B0"); + col2.innerHTML = JSON.stringify(value, null, "\t"); + saved_a["checkbox"][key] = saved_a["update"]["storage"][current_origin]["answer"]["checkbox"][key]; + load_dmp(reload_answers, name = cached_template); + + + } + } + + ); + + + if (Object.values(key_pool).includes(key)) { + row.setAttribute("style", "background-color: #f0f5e6") + + } else { + col4.append(col4_button); + row.setAttribute("style", "background-color: #E6A5B0"); + } + + row.append(col1, col2, col3, col4); + value == "" ? console.log("empty input") : table.append(row); + + + + } + try { document.querySelectorAll('[name="input_table_row"]').forEach(e => e.remove()); } + catch (e) { }; + body.append(table); + + } + + function overwrite_all() { + + toast_list[2].show(); + saved_a = uploaded_input_all; + updateSavedAnswers(); + load_dmp(reload_answers, name = cached_template); + + ////verbose console.log(JSON.stringify(saved_a)); + + + } + function overwrite_current() { + const now = new Date(); + saved_a["replace"] = uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]["replace"]; + saved_a["update"]["storage"][current_origin]["answer"]["replace"] = uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]["replace"]; + saved_a["update"]["storage"][current_origin]["answer"]["timestamp"] = uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]["timestamp"]; + + saved_a["checkbox"] = uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]["checkbox"]; + saved_a["update"]["storage"][current_origin]["answer"]["checkbox"] = uploaded_input_all["update"]["storage"][uploaded_origin]["answer"]["checkbox"]; + saved_a["update"]["storage"][current_origin]["answer"]["timestamp"] = now.toLocaleString(); + load_dmp(reload_answers, name = cached_template); + syn_load_cache(); + toast_list[1].show(); + + ////verbose console.log(JSON.stringify(saved_a)); + + + } + function syn_load_cache() { + + const cached_timestamp = saved_a["update"]["storage"].map((e) => e["answer"]); + const saved_timestamp = document.querySelectorAll('[name="load_from_cache"]'); + + saved_timestamp.forEach((e, index) => { + if (cached_timestamp[index + 1]["timestamp"] != undefined) { e.innerHTML = cached_timestamp[index + 1]["timestamp"] } else { + e.innerHTML = "Empty Slot"; + + } + }); + + + + } + + + function copy_doc() { + + const compared_list = compare_replace(temp_a, saved_a); + let warning_element_children = Array.from(Array(99).keys()); + + if ((compared_list[2] == compared_list[0]) && (compared_list[3] == compared_list[1])) { + issue_warning_show(); + warning_element_children = document.getElementById("warningText").children; + if (warning_element_children[0] == undefined) { + dmp_update(saved_a); + copy_doc1(); + } else { + update_reminder(compared_list, warning_element_children); + } + + } else { + + update_reminder(compared_list, warning_element_children); + } + + } + + function copy_doc1() { + try { + document.querySelector(".tour-exit").click(); + + } catch (e) { }; + parseText(); + const text = document.getElementById("doc3"); + const text_start = text.querySelector("#text_start"); + + const selection = window.getSelection(); + const range = document.createRange(); + range.selectNodeContents(text); + selection.removeAllRanges(); + //range.setStart(text_start,0); + selection.addRange(range); + try { + + + + document.execCommand("copy"); // Security exception may be thrown by some browsers. + toast_list[0].show(); + + + + + } catch (error) { + console.warn("Copy to clipboard failed.", error); + + + return false; + } + + } + + function printDiv() { + const compared_list = compare_replace(temp_a, saved_a); + let warning_element_children = Array.from(Array(99).keys()); + + if ((compared_list[2] == compared_list[0]) && (compared_list[3] == compared_list[1])) { + issue_warning_show(); + warning_element_children = document.getElementById("warningText").children; + if (warning_element_children[0] == undefined) { + try { + dmp_update(saved_a); + return printDiv1("doc3"); // Security exception may be thrown by some browsers. + const selection = window.getSelection(); + + } catch (error) { + console.warn("print failed.", error); + return false; + } + } else { + update_reminder(compared_list, warning_element_children); + } + + } else { + + update_reminder(compared_list, warning_element_children); + } + + } + + function save_to_cache(id, name = " ") { + if (saved_a["update"]["storage"].length < 5) { + saved_a["update"]["storage"] = [ + { "answer": { "replace": saved_a["replace"], "checkbox": saved_a["checkbox"] }, "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + { "answer": [], "name": [], }, + + + + ]; + + + } else if (saved_a["update"]["storage"].length == 5) { + saved_a["update"]["storage"].push({ "answer": [], "name": [], }); + saved_a["update"]["storage"][5] = saved_a["update"]["storage"][4]; + saved_a["update"]["storage"][4] = saved_a["update"]["storage"][3]; + saved_a["update"]["storage"][3] = saved_a["update"]["storage"][2]; + saved_a["update"]["storage"][2] = saved_a["update"]["storage"][1]; + saved_a["update"]["storage"][1] = saved_a["update"]["storage"][0]; + saved_a["update"]["storage"][0] = { "answer": { "replace": saved_a["replace"], "checkbox": saved_a["checkbox"] }, "name": [], }; + } + if (saved_a["update"]["storage"][id]["answer"].length != 0) { + let confirm_overwrite = confirm("There is already a saved answer, do you want to overwrite saved answer here?"); + if (confirm_overwrite) { + save_to_cache1(id, name); + toast_list[3].show(); + } else { + toast_list[4].show(); + } + + + } else { + + save_to_cache1(id, name); + } + + + + } + + function save_to_cache1(id, name = " ") { + const now = new Date(); + saved_a["update"]["storage"][id]["answer"] = { "replace": JSON.parse(JSON.stringify(saved_a["replace"])), "checkbox": JSON.parse(JSON.stringify(saved_a["checkbox"])), "timestamp": now.toLocaleString() }; + saved_a["update"]["storage"][id]["name"] = name; + trim_saved_a(saved_a); + window.localStorage.setItem("saved_a", JSON.stringify(saved_a)); + let update_element = document.querySelector("#load_from_cache_menu_" + id); + update_element.innerHTML = now.toLocaleString(); + toast_list[3].show(); + } + + function load_from_cache(id) { + if (Object.keys(saved_a["update"]["storage"][id]["answer"]).length > 0) { + saved_a["replace"] = saved_a["update"]["storage"][id]["answer"]["replace"] + saved_a["checkbox"] = saved_a["update"]["storage"][id]["answer"]["checkbox"]; + load_dmp(reload_answers, doc_name); + toast_list[5].show(); + } else { + toast_list[7].show(); + } + + + } + function dmp_update(saved_a) { + + if (saved_a["update"]["timeline"].length == 0) { + clear_dmp_updates(); + document.getElementById("dmp-update-modal").click(); + + + } else { + clear_dmp_updates(); + document.getElementById("dmp-update-modal").click(); + saved_a["update"]["timeline"].forEach( + function (e) { + add_date_div(e["name"], e["date"]) + + } + + + ) + + + } + + + + + } + + function clear_dmp_updates() { + + document.querySelector("#dmp-update-content").innerText = ""; + + try { datepicker_list.forEach(e => e.remove()); } + catch (e) { } + datepicker_list = []; + + } + + + function slist(target) { + + + let items = target.getElementsByTagName("li"), current = null; + + + for (let i of items) { + + // Drag start + i.ondragstart = (ev) => { + current = i; + for (let it of items) { + if (it != current) { it.classList.add("hint"); } + } + }; + + // Drag enter + i.ondragenter = (ev) => { + if (i != current) { + setTimeout(() => { i.classList.add("active"), "10" }); + //verbose console.log("entering and adding active"); + } + + }; + + + // drag leave + i.ondragleave = () => { + i.classList.remove("active"); + //verbose console.log("leaving and removing active"); + }; + + // drag end + i.ondragend = () => { + for (let it of items) { + it.classList.remove("hint"); + it.classList.remove("active"); + } + }; + + // drag over stop default + i.ondragover = (evt) => { evt.preventDefault(); }; + [].slice.call(i.children).forEach((e) => e.ondragover = (evt) => { + evt.preventDefault(); + } + ); + + + // when drop, insert the element + i.ondrop = (evt) => { + evt.preventDefault(); + if (i != current) { + let currentpos = 0, droppedpos = 0; + for (let it = 0; it < items.length; it++) { + if (current == items[it]) { currentpos = it; } + if (i == items[it]) { droppedpos = it; } + } + if (currentpos < droppedpos) { + i.parentNode.insertBefore(current, i.nextSibling); + } else { + i.parentNode.insertBefore(current, i); + } + } + }; + } + } + + + + function add_date_div(name = "", date1 = Date()) { + + const parent = document.querySelector("#dmp-update-content"); + const length = datepicker_list.length; + let reminder_row = document.createElement("li"); + reminder_row.setAttribute("class", "input-group justify-content-between"); + reminder_row.setAttribute("draggable", "true"); + const id = length + 1; + const reminder_id = "reminder" + id; + reminder_row.setAttribute("id", reminder_id); + //reminder_row.innerText=reminder_id; + + const svg = document.createElement("svg"); + svg.innerHTML = 'draggable' + + const name_input = document.createElement("input"); + name_input.setAttribute("id", "name_" + reminder_id); + name_input.setAttribute("placeholder", "name your reminder"); + name_input.setAttribute("name", "update_name_input"); + name_input.setAttribute("onfocusin", "this.select()"); + name_input.setAttribute("class", "bg-transparent border-white") + name_input.setAttribute("onfocusout", 'datepicker_list[' + length + ']["name"]= this.value'); + name_input.setAttribute("value", name); + + const date_input = document.createElement("input"); + date_input.setAttribute("type", "date"); + date_input.setAttribute("id", "update_date_input" + reminder_id); + date_input.setAttribute("name", "update_date_input"); + date_input.setAttribute("placeholder", "click to select date"); + date_input.setAttribute("data-list", length); + date_input.setAttribute("value", date1); + date_input.setAttribute("class", "bg-transparent border-white") + date_input.setAttribute("onfocusout", 'datepicker_list[' + length + ']["date"]= this.value'); + + const delete_button = document.createElement("button"); + + delete_button.setAttribute("class", "btn-close m-2"); + delete_button.setAttribute("onclick", "this.parentElement.remove();datepicker_list[" + length + "]=null"); + + + reminder_row.appendChild(svg); + reminder_row.appendChild(name_input); + reminder_row.appendChild(date_input); + reminder_row.appendChild(delete_button); + parent.appendChild(reminder_row); + + + slist(document.getElementById("dmp-update-content")); + datepicker_list.push({ "name": name_input.value, "date": date_input.value, }); + + } + + function datepicker_list_to_JSON() { + + datepicker_list = []; + [].slice.call(document.querySelectorAll("#dmp-update-content li")).forEach((e) => datepicker_list.push({ "name": e.children[1].value, "date": e.children[2].value, })); + + const filtered_list = datepicker_list.filter(function (i) { return i != null; }); + + + try { saved_a["update"]["timeline"] = filtered_list; } + catch (e) { saved_a["update"]["timeline"] = [] } + trim_saved_a(saved_a); + + window.localStorage.setItem("saved_a", JSON.stringify(saved_a)); + generate_vcal(saved_a); + document.querySelector("#update_date_close").click(); + } + + function generate_vcal(saved_a) { + + const vcal_begin = ` +BEGIN:VCALENDAR +VERSION:2.0 +PRODID:-//DataPLAN//NONSGML Reminder of DMP update//EN +CALSCALE:GREGORIAN +METHOD:PUBLISH`; + const vcal_end = "END:VCALENDAR"; + const filtered_list = datepicker_list.filter(function (i) { return i != null; }); + let vcal = vcal_begin + " "; + now = new Date(); + filtered_list.forEach(function callback(e, index) { + const cal_date = new Date(e["date"]); + const current_time = `${cal_date.getFullYear()}${String(cal_date.getMonth() + 1).padStart(2, '0')}${String(cal_date.getDate() + 1).padStart(2, '0')}T${String(cal_date.getHours() + 1).padStart(2, '0')}${String(cal_date.getMinutes() + 1).padStart(2, '0')}${String(cal_date.getSeconds() + 1).padStart(2, '0')}Z`; + vcal += ` +BEGIN:VEVENT +DTSTART:${cal_date.getFullYear()}${String(cal_date.getMonth() + 1).padStart(2, '0')}${String(cal_date.getDate()).padStart(2, '0')}T080000Z +DTEND:${cal_date.getFullYear()}${String(cal_date.getMonth() + 1).padStart(2, '0')}${String(cal_date.getDate() + 1).padStart(2, '0')}T110000Z +DTSTAMP:${current_time} +UID:${current_time}@nfdi4plants.org +CREATED:${cal_date.getFullYear()}${String(cal_date.getMonth() + 1).padStart(2, '0')}${String(cal_date.getDate()).padStart(2, '0')}T000000Z +DESCRIPTION:Name of the reminder: ${e["name"]}; +LAST-MODIFIED:${cal_date.getFullYear()}${String(cal_date.getMonth() + 1).padStart(2, '0')}${String(cal_date.getDate()).padStart(2, '0')}T000000Z +SEQUENCE:0 +STATUS:CONFIRMED +SUMMARY:No. ${index + 1} Reminder to write DMP of ${saved_a["replace"]["$_PROJECT"]} (created by DataPLAN) +TRANSP:OPAQUE +END:VEVENT`; + + } + + ); + + vcal += vcal_end; + + var blob = new Blob([vcal], { + type: "text/plain;charset=utf-8;", + }); + if (filtered_list.length != 0) { // no need to download file without calendar event + saveAs(blob, "calendar.ics"); + + } + + return vcal; + } + + function printDiv1(id = "doc3") { + var divContents = document.getElementById(id).innerHTML; + var a = window.open('', '', 'height=800, width=600'); + a.document.write(''); + a.document.write(' '); + a.document.write(divContents); + a.document.write(''); + a.document.close(); + a.print(); + } + + var reminders; + function update_reminder(compared_list, warning_element_children) { + const question_n = compared_list[1] + compared_list[0] - compared_list[2] - compared_list[3]; + const issue_n = warning_element_children.length; + let question_reminder, + issue_reminder; + switch (question_n) { + case 0: + question_reminder = "Congrats, all questions are answered."; + break; + case 1: + question_reminder = "Only 1 question is left to be answered.
    Please click the next button to finish answering the questions."; + break; + default: + question_reminder = "There are " + String(question_n) + " questions not answered.
    Please click the \"next\" button to finish answering the questions."; + + } + switch (issue_n) { + case 0: + issue_reminder = "Congrats, all warnings are resolved."; + break; + case 1: + issue_reminder = "Only 1 issue left."; + break; + // 99 means that there are questions not answered. + case 99: + issue_reminder = " "; + break; + default: + issue_reminder = "There are " + String(issue_n) + " issues not resolved." + ' If you are aware of all the issues and you want to remove all the warnings, please click: '; + + } + reminders = [{ + title: "Question reminders", + content: question_reminder + "
    " + issue_reminder + + ` +

    If you need a unchecked or unfinished DMP,
    you can also or or it anyway. + ` + + // + } + ]; + + for (const [index, element] of compared_list[4].entries()) { + const ele_id = "form_" + element.split("_")[1]; + + + //verbose console.log("element id is " + ele_id); // , + + const id_and_content = { + id: String(ele_id.toLowerCase()), + content: "this question was default value or empty, please revise it " + }; + reminders.push(id_and_content); + //verbose console.log("id_is " + Object.values(id_and_content)); + + } + + //verbose console.log(compared_list[5].entries()); + for (const [index, element] of compared_list[5].entries()) { + const ele_id = "form_" + element; + + //const ele = document.getElementById(ele_id); + //verbose console.log("element id is " + ele_id); // , + + const id_and_content = { + id: String(ele_id.toLowerCase()), + content: "this question was default value or empty, please revise it " + }; + reminders.push(id_and_content); + //verbose console.log("id_is " + Object.values(id_and_content)); + + } + let reminder = new Tour(reminders); + reminder.show(); + return reminders; + } + + function tab_next(ele) { + var total_list = document.querySelectorAll('form[class~="show"] input:not([class~="d-none"])'); + //var list = Array.from(total_list).filter( single => single.tabIndex >= "0" ); + var index = Array.from(total_list).indexOf(ele); + return total_list[index + 1] || total_list[0]; + + + + } + + function set_scrollbar_marker(scroll_marker, elements) { + scroll_marker = document.querySelector('.scroll-marker'); + const doc = document.getElementById("split-0"); + let doc_height = doc.offsetHeight; + let doc_top = doc.scrollHeight; + + + + elements.forEach(e => { + let e_top = e.offsetTop; + let e_bot = e_top + e.offsetHeight; + let parent_top = e.offsetParent.offsetTop; + let marker_top = Math.ceil((e_top + parent_top) / doc_top * doc_height); + let marker_bot = Math.ceil((e_bot + parent_top) / doc_top * doc_height); + let marker_e = document.createElement("span"); + marker_e.style.backgroundColor = "#E08F9C"; + marker_e.style.top = marker_top + "px"; + marker_e.style.height = (marker_bot - marker_top) + "px"; + marker_e.setAttribute("name", "scroll-bar-span"); + scroll_marker.appendChild(marker_e); + + } + + ) + + } + + + + function export2word(element, filename = '') { + parseText(); + const compared_list = compare_replace(temp_a, saved_a); + let warning_element_children = Array.from(Array(99).keys()); + + if ((compared_list[2] == compared_list[0]) && (compared_list[3] == compared_list[1])) { + issue_warning_show(); + warning_element_children = document.getElementById("warningText").children; + if (warning_element_children[0] == undefined) { + try { + dmp_update(saved_a); + return export2word1(element, filename); // Security exception may be thrown by some browsers. + const selection = window.getSelection(); + + } catch (error) { + console.warn("export doc failed.", error); + return false; + } + } else { + update_reminder(compared_list, warning_element_children); + } + + } else { + + update_reminder(compared_list, warning_element_children); + } + } + + + + function export2word1(element, filename = '') { + var preHtml = "Export HTML To Doc"; + var postHtml = ""; + var html = preHtml + document.getElementById(element).innerHTML + postHtml; + + var blob = new Blob(['\ufeff', html], { + type: 'application/msword' + }); + + // Specify link url + var url = 'data:application/vnd.ms-word;charset=utf-8,' + encodeURIComponent(html); + + // Specify file name + filename = filename ? filename + '.doc' : 'document.doc'; + + // Create download link element + var downloadLink = document.createElement("a"); + + document.body.appendChild(downloadLink); + + if (navigator.msSaveOrOpenBlob) { + navigator.msSaveOrOpenBlob(blob, filename); + } else { + // Create a link to the file + downloadLink.href = url; + + // Setting the file name + downloadLink.download = filename; + + //triggering the function + downloadLink.click(); + } + + document.body.removeChild(downloadLink); + } + + + + +

    +
    +
    + + Logo + + +
    + + + + + + + + + +
    +
    +
    + +
    +
    +
    +
    + +
    + + +
    +
    + + + + + + + + +
    +
    + + + + + + + + + +
    + + + +
    +
    +
    +
    +
    +
    +
    +
    + +

    1 Basic Information:

    + +
    +

    1.1 What is the project name or acronym?

    + + + + +
    + + +
    + + + + + + + +
    + +
    + + + + +
    +
    +

    + Who is the + most likely to benefit from the data? + + + +

    + + + + +
    +
    +

    1.3 Other + + + + DMP Metadata

    + + + + +
    +
    + + + + +
    + +
    + + + + +
    + + +
    + + + + +
    + +
    + + + + +
    + +
    + + + + +
    + +
    + + + + +
    + +

    1.4 Please select from the following options

    +
    +
    + + +
    +
    + + + +
    +
    + + +
    +
    + + + + +
    +
    + + +
    +
    + + + +
    +
    + + + +
    +
    + + +
    +
    + + + + + + + +
    + + + + +
    + + + + +
    + + + + +
    + + + + +
    + + + + + +
    + + +
    +

    2. What kind of data will you handle?

    + +
    + + +
    + +
    + + +
    + +
    + + +
    + +
    + + +
    + +
    + + +
    + + +
    + + + +
    + +
    + + +
    + + +
    + + +
    + +
    + + +
    + +
    + + +
    + + +
    + + +
    +
    + + +
    +
    + + +
    + + + + +
    + + + + + +
    +
    +

    2.1 Where will you submit your data as endpoints?

    + + +
    + + +
    +
    + + +
    +
    + + +
    +
    + + +
    +
    + + +
    + +
    + + +
    + +
    + + +
    + +
    + + +
    + +
    + + +
    + +
    + + + +
    + + +
    + + +
    + +
    + + +
    + + + + +
    + + +
    + +
    + + +
    + + + + + +
    + + +
    +
    +
    + + + + +

    3. How much data will you likely to generate?

    + +
    + +
    + + GB + + +
    +
    + +
    + +
    + + GB + + +
    +
    + +
    +
    +
    +
    +
    +
    +
    + +

    4. Are any of the following standards relevant to your project? +

    +
    + +
    + + + +
    + +
    + + +
    + +
    + + +
    +
    + + +
    + +
    + + +
    +
    + + +
    +
    + + +
    +
    + + +
    +
    + + +
    +
    + + +
    +
    + + +
    + + + +
    + + +
    + + +
    + + +
    + +
    + + +
    + +
    + + +
    +
    + +

    4.1 Will you adhere to any high level metadata submission standards?

    +
    + + +
    +
    + + +
    + +

    4.2 Project data will be published:

    + +
    + + +
    + + +
    + + +
    + +
    + + +
    + +
    + + +
    + +
    + + +
    +
    + + +
    + +

    4.4 Will you follow national standards or archived in national infrastructures?

    +
    + + +
    +
    + + +
    +
    + + +
    + + + + +

    5. Do you intend to use data visualization in your project?

    +
    + + + + + +
    +
    + + + + + + + +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    +
    + +
    +
    +
    The project aim should be a apart of a sentence.
    + +
    +
    +

    Example 1: aims at creating a computational model of carbon and water flow within a whole plant + architecture

    +
    +

    Example 2: aims at generating data management plan with minimal effort and making the data as + open as possible

    +
    +
    + +
    +
    +
    The project object = target.
    + +
    +
    +

    Example 1: carbon and water flow in plants

    +
    +

    Example 2: data management plan

    +
    +
    +
    +
    +
    Here is the space for additional sentence.
    + +
    +
    +

    Example 1: Industry, politicians and students can also use the data for different purposes.

    +
    +

    Example 2: The data acquired in the project can be used by a wide range of people with different + purpose.

    +
    +
    + + +
    +
    +
    + +
    +
    +

    Information in this section is only used in DMP metadata and not used in the document

    + +
    +
    +
    +
    +
    + +
    +
    +

    Data officers are also known as data stewards and curator.

    + +
    +
    + +
    +
    +
    + +
    +
    +

    software that legally remains the property of the organization, group, or individual who created it. +

    + +
    +
    + + +
    +
    +

    User-defined template +

    + + +
    + +
    +

    + You can click the dotted box to start editing.
    + Click the grey buttons to reuse templates.
    + Click submit when you finished. +
    + + + + + + +

    +
    +
    +
    + + + + + + + + + + + + + + +
    +
    +
    +
    +
    +
    +
    +
    + + +
    +

    Data Management Plan of the H2020 Project $_PROJECT

    +
    + +
    + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Action Number:

    +
    +

    $_FUNDINGPROGRAMME

    +
    +

    Action Acronym:

    +
    +

    $_PROJECT

    +
    +

    Action Title:

    +
    +

    $_PROJECT

    +
    +

    Creation Date:

    +
    +

    $_CREATIONDATE

    +
    +

    Modification Date:

    +
    +

    $_MODIFICATIONDATE

    +
    +

    DMP version:

    +
    +

    $_DMPVERSION

    +
    + +
    +
    + +
    +
    +
    + +

    1    Introduction

    +

    #if$_EU The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. + #endif$_EU To best profit from open data, it is necessary not only to store the data but to make it + Findable, Accessible, Interoperable, and Reusable (FAIR).#if$_PROTECT We support open and FAIR data, + however, we also consider the need to protect individual data sets. #endif$_PROTECT + +

    +

    + + The aim of this document is to provide guidelines on the principles of data management in the + $_PROJECT and to specify which type of data will be stored, this will be achieved by using the + responses to the EU questionnaire on Data Management Plan (DMP) as a DMP + document. + +

    +

    + + The detailed DMP states how data will be handled during and after the project. The $_PROJECT DMP is + prepared according to the Horizon 2020 and Horizon Europe online manual. #if$_UPDATE It will be + updated/its validity checked during the + $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. + #endif$_UPDATE + +

    +

    2    Data Management Plan EU Template

    + +

    2.1    Data Summary

    +

    What is the purpose of the data collection/generation and its relation to the + objectives of the project?

    +

    + + + The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION + and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization + #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary + #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely + necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also + be informed about the provenance of data analysis information. Stakeholders must also be informed + about the provenance of data. It is therefore necessary to ensure that the data are well generated + and also well annotated with metadata using open standards, as laid out in the next section. +

    What types and formats of data will the project + generate/collect?

    + +

    +

    + + The $_PROJECT will collect and/or generate the following types of raw data : $_GENETIC, $_GENOMIC, + $_TRANSCRIPTOMIC, $_RNASEQ, $_METABOLOMIC, $_PROTEOMIC, $_PHENOTYPIC, $_TARGETED, $_IMAGE, $_MODELS, + $_CODE, $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data + will also be processed and modified using analytical pipelines, which may yield different results or + include ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be + tracked in the DataPLANT ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive + these resources (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise + in the DataPLANT consortium#endif$_DATAPLANT. + +

    +

    +

    Will you re-use any existing data and how?

    +

    + + The project builds on existing data sets and relies on them. #if$_RNASEQ|$_GENOMIC For example, + without a proper genomic reference it is very difficult to analyze next-generation sequencing (NGS) + data sets.#endif$_RNASEQ|$_GENOMIC It is also important to include existing data-sets on the + expression and metabolic behavior of the $_STUDYOBJECT, and on existing background knowledge. + #if$_PARTNERS of the partners. #endif$_PARTNERS + Genomic references can be gathered from reference databases for genomes/ and sequences, like the US + National Center for Biotechnology Information: NCBI, European Bioinformatics Institute: EBI; DNA + Data + Bank of Japan: DDBJ. Furthermore, prior 'unstructured' data in the form of publications and data + contained therein will be used for decision making. + +

    + + +

    What is the origin of the data?

    +

    Public data will be extracted as described in the previous paragraph. For the + $_PROJECT, specific data sets will be generated by the consortium partners.

    + + +

    + Data of different types or representing different domains will be generated using + unique approaches. For example: +

    +
      + + + #if$_GENETIC +
    • +

      + Genetic data will be generated targeting crosses and breeding experiments, and + will include recombination frequencies and crossover event that position genetic markers and + quantitative trait loci that can be associated with physical genomic markers/variants. + +

      +
    • + #endif$_GENETIC + + #if$_GENOMIC +
    • +

      + Genomic data will be created from sequencing data, which will be processed to + identify genes, regulatory elements, transposable elements, and physical markers such as + SNPs, microsatellites and structural variants. +

      +
    • + #endif$_GENOMIC + + #if$_CLONED-DNA +
    • +

      + The origin and assembly of cloned DNA will include (a) source of original + vector sequence with Add gene reference where available, and source of insert DNA (e.g., + amplification by PCR from a given sample, or obtained from existing library), (b) cloning + strategy (e.g., restriction endonuclease digests/ligation, PCR, TOPO cloning, Gibson + assembly, LR recombination), and (c) verified DNA data sequence of final recombinant vector. + +

      +
    • + #endif$_CLONED-DNA + + #if$_TRANSCRIPTOMIC +
    • +

      + Methods of transcriptomics data collection will be selected from microarrays, + quantitative PCR, Northern blotting, RNA immunoprecipitation, fluorescence in situ + hybridization. RNA-Seq data will be collected in seperate methods. +

      +
    • + #endif$_TRANSCRIPTOMIC + + #if$_RNASEQ +
    • +

      + RNA sequencing will be generated using short-read or long-read plantforms, + either in house or outsourced to academic facilities or commercial services, and the raw + data will be processed using estabilished biofirmatics piplines. +

      +
    • + #endif$_RNASEQ + #if$_METABOLOMIC +
    • +

      + Metabolomic data will be generated by coupled chromatography and mass + spectrometry using targeted or untargeted approaches. +

      +
    • + #endif$_METABOLOMIC #if$_PROTEOMIC +
    • +

      + Proteomic data will be generated using coupled chromatography and mass + spectrometry for the analysis of protein abundance and protein identification, as well as + additional techniques for structural analysis, the identification of post-translational + modifications and the characterization of protein interactions. +

      +
    • + #endif$_PROTEOMIC + + #if$_PHENOTYPIC +
    • +

      + Phenotypic data will be generated using phenotyping platforms and + corresponding ontologies, including number/size of organs such as leaves, flowers, buds + etc., size of whole plant, stem/root architecture (number of lateral branches/roots etc), + organ structures/morphologies, quantitative metrics such as color, turgor, health/nutrition + indicators, among others. +

      +
    • + #endif$_PHENOTYPIC + + + #if$_TARGETED +
    • +

      + Targeted assays data (e.g. glucose and fructose concentrations or + production/ultilization rates) will be generated using specific equipment and methods that + are fully documented in the laboratory notebook. +

      +
    • + #endif$_TARGETED + + + #if$_IMAGE +
    • +

      + Image data will be generated by equipment such as cameras, scanners, and + microscopes combined with software. Original images which contain metadata such as exif + photo information will be archived. +

      +
    • + #endif$_IMAGE + + #if$_MODELS +
    • +

      + Model data will be generated by using software simulations. The complete + workflow, which includes the environment, runtime, parameters, and results, will be + documented and archived. +

      +
    • + #endif$_MODELS + + #if$_CODE +
    • +

      + Computer code will be produced by programmers. +

      +
    • + #endif$_CODE + + #if$_EXCEL +
    • +

      + Excel data will be generated by data analysts by using MS Office or + open-source software. +

      +
    • + #endif$_EXCEL + + + + +
    + + + #if$_PREVIOUSPROJECTS +

    Data from previous projects such as $_PREVIOUSPROJECTS will be considered. +

    + #endif$_PREVIOUSPROJECTS +

    What is the expected size of the data?

    +

    We expect to generate $_RAWDATA GB of raw data and up to $_DERIVEDDATA GB of + processed data.

    +

    +

    To whom might it be useful ('data utility')?

    +

    + + The data will initially benefit the $_PROJECT partners, but will also be made available to selected + stakeholders closely involved in the project, and then the scientific community working on + $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also + use the data after publication. The data will be disseminated according to the $_PROJECT's + dissemination and communication plan, #if$_DATAPLANT which aligns with DataPLANT platform or other + means#endif$_DATAPLANT + +

    +

    + + + +

    +

    2.2    FAIR data

    +

    Making data findable, including provisions for + metadata

    +

    + + Are the data produced and/or used in the project discoverable with metadata, identifiable and + locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers + such as Digital + Object Identifiers)? + +

    +

    + + All datasets will be associated with unique identifiers and will be annotated with metadata. We will + use Investigation, Study, Assay (ISA) specification for metadata creation. The $_PROJECT will rely + on community standards plus additional recommendations applicable in the plant science, such as the + #if$_PHENOTYPIC #if$_MIAPPE MIAPPE (Minimum Information About a Plant Phenotyping + Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC + #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS + #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU + #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG + #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS + #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: + Specimen),#endif$_MIMARKSSPECIMEN + #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: + Survey),#endif$_MIMARKSSURVEY + #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG + #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG + #endif$_GENOMIC|$_GENETIC + #if$_TRANSCRIPTOMIC + #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing + Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC + #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray + Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC + #if$_IMAGE + #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI + #endif$_IMAGE + #if$_PROTEOMIC + #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE + #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX + #endif$_PROTEOMIC + + + + These specific standard unlike cross-domain minimal sets such as the Dublin core, which + mostly define the submitter and the general type of data, allow reusability by other researchers by + defining properties of the plant (see the preceding section). However, minimal cross-domain + annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also + remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow + individual releases to be tagged with a Digital Object Identifier (DOI). + #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered + to. #endif$_OTHERSTANDARDS + +

    +

    What naming conventions do you follow?

    +

    + + Data variables will be allocated standard names. For example, genes, proteins and metabolites will + be named according to approved nomenclature and conventions. These will also be linked to functional + ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by + humans. Plant names will include traditional names, binomials, and all + strain/cultivar/subspecies/variety identifiers. + +

    +

    Will search keywords be provided that optimize possibilities for + re-use?

    +

    + + Keywords about the experiment and consortium will be included, as well as an abstract about the + data, where useful. In addition, certain keywords can be auto-generated from dense metadata and its + underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized + DataPLANT ontologies that are provided where the ontology does not yet include such variables. + #endif$_DATAPLANT + +

    +

    Do you provide clear version numbers?

    +

    + + To maintain data integrity and facilitate reanalysis, data sets will be allocated version numbers + where this is useful (e.g. raw data must not be changed and will not get a version number and is + considered + immutable). #if$_DATAPLANT This is automatically supported by the ARC Git DataPLANT infrastructure. + #endif$_DATAPLANT + +

    +

    + What metadata will be created? In case metadata standards do not exist in your + discipline, please outline what type of metadata will be created and how. +

    +

    + We will use Investigation, Study, Assay (ISA) specification for metadata creation. + #if$_RNASEQ|$_GENOMIC For specific data (e.g., RNASeq or genomic data), we use metadata templates + from the end-point repositories. #if$_MINSEQE The Minimum Information About + a Next-generation Sequencing Experiment (MinSEQe) will also be used. #endif$_MINSEQE + #endif$_RNASEQ|$_GENOMIC + + The following metadata/ minimum informatin standards will be used to collect metadata: + #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS + #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU + #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG + #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS + #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: + Specimen),#endif$_MIMARKSSPECIMEN + #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: + Survey),#endif$_MIMARKSSURVEY + #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG + #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG + #endif$_GENOMIC|$_GENETIC + #if$_TRANSCRIPTOMIC + #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing + Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC + #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray + Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC + #if$_IMAGE + #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI + #endif$_IMAGE + #if$_PROTEOMIC + #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE + #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX + #endif$_PROTEOMIC + #if$_METABOLOMIC #if$_METABOLIGHTS Metabolights submission compliant standards will be used for + metabolomic data where this is acccepted by the consortium partners.#issuewarning some Metabolomics + partners considers Metabolights not an accepted standard.#endissuewarning #endif$_METABOLIGHTS + #endif$_METABOLOMIC As a part of plant research community, we use #if$_MIAPPE MIAPPE for phenotyping + data in the broadest sense, but we will also be rely on #endif$_MIAPPE specific SOPs for additional + annotations #if$_DATAPLANT that consider advanced DataPLANT annotation and ontologies. + #endif$_DATAPLANT + + +

    + + +

    Making data openly accessible

    +

    + + Which data produced and/or used in the project will be made openly available as the default? If + certain datasets cannot be shared (or need to be shared under restrictions), we explain why, clearly + separating + legal and contractual reasons from voluntary restrictions. + +

    +

    + + By default, all data sets from the $_PROJECT will be shared with the community and made openly + available. However, before the data are released, all will be provided with an opportunity to check + for potential IP (according to the consortium agreement and background IP rights). #if$_INDUSTRY + This applies in particular to data pertaining to the industry. #endif$_INDUSTRY IP protection will + be prioritized for datasets that offer the potential for exploitation. + +

    +

    + + Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their + data closed if relevant provisions are made in the consortium agreement and are in line with the + reasons + for opting out. + +

    +

    +

    How will the data be made accessible (e.g. by deposition in a + repository)?

    +

    +

    + + Data will be made available via the $_PROJECT platform using a user-friendly front end that allows + data visualization. Besides this it will be ensured that data which can be stored in + international discipline related repositories which use specialized technologies: + + + +

    + #if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC +

    + +

    + #if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA + #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_TRANSCRIPTOMIC +

    + +

    + #if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE + #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE +

    + +

    + #if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS + EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics + Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT + #endif$_METABOLOMIC +

    +

    + #if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB + PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of + Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC +

    + +

    + #if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & + Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC +

    + + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as + well.#endif$_OTHEREP + +

    +

    + + For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will + be annotated with metadata and if complete allocated a digital object identifier (DOI). + #if$_DATAPLANT Whole datasets will also be wrapped into an ARC with allocated DOIs. The ARC and the + converters provided by DataPLANT will ensure that the upload into the endpoint repositories is fast + and easy. + #endif$_DATAPLANT + +

    +

    +

    +

    What methods or software tools are needed to access the data?

    +

    #if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. + #endif$_PROPRIETARY

    +

    + + #if!$_PROPRIETARY No specialized software will be needed to access the data, just a modern browser. + Access will be possible through web interfaces. For data processing after obtaining raw data, + typical + open-source software can be used. #endif!$_PROPRIETARY + +

    +

    + + #if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC + commander, arcCommander, and DataPLAN + #endif$_DATAPLANT + +

    +

    Is documentation about the software needed to access the data + included?

    +

    + + #if$_DATAPLANT DataPLANT resources are well described, and their setup is documented on a github + project guide is provided on the GitHub project pages. #endif$_DATAPLANT + All external software documentation will be duplicated locally and stored near the software. + +

    +

    Is it possible to include the relevant software (e.g. in open-source + code)?

    +

    As stated above, the $_PROJECT will use publicly available open-source and + well-documented certified software #if$_PROPRIETARY except for $_PROPRIETARY + #endif$_PROPRIETARY.

    +

    +

    + Where will the data and associated metadata, documentation and code be deposited? + Preference should be given to certified repositories that support open access, where + possible. +

    +

    + + As noted above, specialized repositories will be used for common data types. For unstructured and + less standardized data (e.g., experimental phenotypic measurements), these will be annotated with + metadata and if complete allocated a digital object identifier (DOI).#if$_DATAPLANT The Whole + datasets will also be wrapped into an ARC with allocated DOIs.#endif$_DATAPLANT. + +

    +

    Have you explored appropriate arrangements with the identified + repository?

    +

    + + The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. + Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. + #if$_DATAPLANT , and this has been confirmed for data associated with DataPLANT #endif$_DATAPLANT. + #issuewarning if no data management platform such as DataPLANT is used, then you need to find + appropriate repository to store or archive your data after publication. #endissuewarning + +

    +

    If there are restrictions on use, how will access be provided?

    +

    There are no restrictions beyond the IP screening described above, which is + in line with European open data policies.

    +

    + + +

    Is there a need for a data access committee?

    +

    There is no need for a data access committee.

    +

    Are there well described conditions for access (i.e. a machine-readable + license)?

    +

    Yes, where possible, e.g. CC REL will be used for data not submitted to + specialized repositories such as ENA.

    +

    How will the identity of the person accessing the data be ascertained? +

    +

    + + Where data are shared only within the consortium, if the datasets are not yet finished or are + undergoing IP checks, the data will be hosted internally and a username and password will be + required for access (see GDPR rules). When the data are made public in EU or US repositories, + completely anonymous access is normally allowed. This is the case for ENA as well and both are in + line with GDPR requirements. + +

    +

    + + #if$_DATAPLANT Currently, data management relies on the annotated research context (ARC). It is + password protected, so before any data or samples can be obtained, user authentication is required. + #endif$_DATAPLANT + +

    +

    Making data interoperable

    +

    + + Are the data produced in the project interoperable, that is allowing data exchange and re-use + between researchers, institutions, organizations, countries, etc. (i.e. adhering to standards for + formats, as much + as possible compliant with available (open) software applications, and in particular facilitating + re-combinations with different datasets from different origins)? + +

    +

    + + Whenever possible, data will be stored in common and openly defined formats including all the + necessary metadata to interpret and analyze data in a biological context. By default, no proprietary + formats will be + used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as + intermediates by the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In + addition, text + files might be edited in text processor files, but will be shared as pdf. + +

    +

    What data and metadata vocabularies, standards or methodologies will you + follow to make your data interoperable?

    +

    + + As noted above, we foresee using minimal standards such as #if$_RNASEQ|$_GENOMIC #if$_MINSEQE + MinSEQe for sequencing data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible + forms for metabolites #if$_MIAPPE + and MIAPPE for phenotyping-like data #endif$_MIAPPE. The minimal information standards will allow + the integration of data across projects, and its reuse according to established and tested + protocols. We will also use + ontological terms to enrich the data sets relying on free and open ontologies where possible. + Additional ontology terms might be created and canonized during the $_PROJECT. + +

    +

    Will you be using standard vocabularies for all data types present in your + data set, to allow inter-disciplinary interoperability?

    +

    + + Open ontologies will be used where they are mature. As stated above, some ontologies and controlled + vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the + advanced ontologies developed in DataPLANT. #endif$_DATAPLANT + +

    +

    In case it is unavoidable that you use uncommon or generate project specific + ontologies or vocabularies, will you provide mappings to more commonly used ontologies?

    +

    Common and open ontologies will be used, so this question does not + apply.

    +

    Increase data reuse (by clarifying licences) +

    +

    +

    How will the data be licensed to permit the widest re-use possible? +

    +

    Open licenses, such as Creative Commons (CC), will be used whenever + possible.

    + + +

    + + When will the data be made available for re-use? If an embargo is sought to give time to publish or + seek patents, specify why and how long this will apply, bearing in mind that research data should be + made + available as soon as possible. + +

    +

    + + + #if$_early Some raw data is made public as soon as it is collected and processed.#endif$_early + #if$_beforepublication Relevant processed datasets are made public when the research findings + are published.#endif$_beforepublication #if$_endofproject At the end of the project, all data + without embargo period will be published.#endif$_endofproject #if$_embargo Data, which is + subject to an embargo period, is not publicly accessible until the end of embargo + period.#endif$_embargo #if$_request Data is made available upon request, allowing controlled + sharing while ensuring responsible use.#endif$_request #if$_ipissue IP issues will be checked + before publication. #endif$_ipissue All consortium partners will be + encouraged to make data available before publication, openly and/or under pre-publication + agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto + International Data + Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are + complete. + + +

    +

    Are the data produced and/or used in the project usable by third parties, in + particular after the end of the project? If the re-use of some data is restricted, explain + why.

    +

    There will be no restrictions once the data are made public.

    +

    How long is it intended that the data remains re-usable?

    +

    The data will be made available for many years#if$_DATAPLANT and ideally + indefinitely after the end of the project#endif$_DATAPLANT.

    +

    + Data submitted to repositories (as detailed above) e.g. ENA /PRIDE would be subject to + local data storage regulation. +

    +

    Are data quality assurance processes described?

    +

    + + The data will be checked and curated. #if$_DATAPLANT Furthermore, data will be quality controlled + (QC) using automatic procedures as well as manual curation #endif$_DATAPLANT. + +

    +

    2.3    Allocation of + resources

    +

    What are the costs for making data FAIR in your project?

    +

    The $_PROJECT will bear the costs of data curation, #if$_DATAPLANT ARC + consistency checks, #endif$_DATAPLANT and data maintenance/security before transfer to public + repositories. Subsequent costs are then borne by the operators of these repositories.

    +

    + + Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) + but not charged against the $_PROJECT or its members but by the operation budget of these + repositories. + +

    +

    + How will these be covered? Note that costs related to open access to research data are + eligible as part of the Horizon 2020 or Horizon Europe grant (if compliant with the Grant Agreement + conditions). +

    +

    The cost born by the $_PROJECT are covered by the project funding. + Pre-existing structures #if$_DATAPLANT such as structures, tools, and knowledge laid down in the + DataPLANT consortium#endif$_DATAPLANT will also be used.

    +

    Who will be responsible for data management in your project?

    +

    The responsible person will be $_DATAOFFICER of the $_PROJECT.

    +

    Are the resources for long term preservation discussed (costs and potential + value, who decides and how/what data will be kept and for how long)?

    +

    + + The data officer #if$_PARTNERS or $_PARTNERS #endif$_PARTNERS will ultimately decides on the + strategy to preserve data that are not submitted to end-point subject area repositories + #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT when the + project ends. This will be in line with EU guidlines, institute policies, and data sharing based on + EU and international standards. + +

    +

    2.4    Data security

    +

    What provisions are in place for data security (including data recovery as + well as secure storage and transfer of sensitive data)?

    +

    + + Online platforms will be protected by vulnerability scanning, two-factor authorization and daily + automatic backups allowing immediate recovery. All partners holding confidential project data to use + secure platforms with automatic backups and offsite secure copies. + #if$_DATAPLANT DataHUB and ARCs have been generated in DataPLANT, data security will be imposed. + This comprises secure storage, and the use of password and usernames is generally transferred via + separate safe media.#endif$_DATAPLANT + +

    +

    Is the data safely stored in certified repositories for long term + preservation and curation?

    +

    + + Data will be made available via the $_PROJECT platform using a user-friendly front end that allows + data visualization. Besides this it will be ensured that data which can be stored in international + discipline related repositories which use specialized technologies: + + + +

    + #if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC +

    + +

    + #if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA + #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_TRANSCRIPTOMIC +

    + +

    + #if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE + #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE +

    + +

    + #if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS + EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics + Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT + #endif$_METABOLOMIC +

    +

    + #if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB + PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of + Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC +

    +

    + #if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & + Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC +

    + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as + well.#endif$_OTHEREP + +

    +

    2.5    Ethical aspects

    +

    + + Are there any ethical or legal issues that can have an impact on data sharing? These can also be + discussed in the context of an ethics review. If relevant, include references to ethics deliverables + and + ethics chapter in the Description of the Action (DoA). + +

    +

    + + At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, + since this is plant data, there is no need for an ethics committee to deal with data from plants, + although we will diligently follow the Nagoya protocol on access and benefit sharing. #issuewarning + you have to check here and enter any due diligence here at the moment we are awaiting if Nagoya + (🡺see Nagoya protocol). + gets also part of sequence information. In any case + if you use material not from your (partner) country and characterize this physically e.g., + metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action + unless this is from e.g. + US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. + #endissuewarning + +

    +

    Is informed consent for data sharing and long term preservation included in + questionnaires dealing with personal data?

    +

    + + The only personal data that will potentially be stored is the submitter name and affiliation in the + metadata for data. In addition, personal data will be collected for dissemination and communication + activities using specific methods and procedures developed by the $_PROJECT partners to adhere to + data protection. #issuewarning you need to inform and better get WRITTEN consent that you store + emails and + names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t + invent them #endissuewarning + +

    +

    2.6    Other issues

    +

    Do you make use of other national/funder/sectorial/departmental procedures + for data management? If yes, which ones?

    +

    Yes, the $_PROJECT will use common Research Data Management (RDM) tools such as #if$_DATAPLANT|$_NFDI resources developed by the NFDI of Germany,#endif$_DATAPLANT|$_NFDI #if$_FRENCH infrastructure developed by INRAe from France, #endif$_FRENCH #if$_EOSC and cloud service developed by EOSC (European Open Science Cloud)#endif$_EOSC . +

    +

    +

    +

    3     + Annexes

    +

    +

    3.1     Abbreviations

    +

    + #if$_DATAPLANT

    ARC Annotated Research Context +

    #endif$_DATAPLANT + +

    CC Creative Commons

    +

    CC CEL Creative Commons Rights Expression + Language

    +

    DDBJ DNA Data Bank of Japan

    +

    DMP Data Management Plan

    +

    DoA Description of Action

    +

    DOI Digital Object Identifier

    +

    EBI European Bioinformatics Institute

    +

    ENA European Nucleotide Archive

    +

    EU European Union

    +

    FAIR Findable Accessible Interoperable + Reproducible

    +

    GDPR General data protection regulation (of the + EU)

    +

    IP Intellectual Property

    +

    ISO International Organization for + Standardization

    +

    MIAMET Minimal Information about Metabolite + experiment

    +

    MIAPPE Minimal Information about Plant Phenotyping + Experiment

    +

    MinSEQe Minimum Information about a high-throughput + Sequencing Experiment

    +

    NCBI National Center for Biotechnology + Information

    +

    NFDI National Research Data Infrastructure (of + Germany)

    +

    NGS Next Generation Sequencing

    +

    RDM Research Data Management

    +

    RNASeq RNA Sequencing

    +

    SOP Standard Operating Procedures

    +

    SRA Short Read Archive

    + #if$_DATAPLANT

    SWATE Swate Workflow Annotation Tool + for Excel

    #endif$_DATAPLANT +

    ONP Oxford Nanopore

    +

    qRTPCR quantitative real + time polymerase chain reaction

    +

    WP Work Package

    +

    +

    +

    +

    +

    +
    +
    +
    +
    +
    +
    +
    + +
    +
    +
    + +
    +
    +
    + + +
    +

    Data Management Plan of the Horizon Europe Project $_PROJECT

    +
    + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    +

    Action Number:

    +
    +

    $_FUNDINGPROGRAMME

    +
    +

    Action Acronym:

    +
    +

    $_PROJECT

    +
    +

    Action Title:

    +
    +

    $_PROJECT

    +
    +

    Creation Date:

    +
    +

    $_CREATIONDATE

    +
    +

    Modification Date:

    +
    +

    $_MODIFICATIONDATE

    +
    + +
    +
    + +
    + +
    + +

    Introduction

    +

    #if$_EU The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. + #endif$_EU To best profit from open data, it is necessary not only to store data but to make data + Findable, Accessible, Interoperable, and Reusable (FAIR).#if$_PROTECT We support open and FAIR data, + however, we also consider the need to protect individual data sets. #endif$_PROTECT + +

    +

    + + The aim of this document is to provide guidelines on principles guiding the data management in the + $_PROJECT and what data will be stored by using the responses to the EU questionnaire on Data + Management Plan (DMP) as a DMP + document. + +

    +

    + + The detailed DMP instructs how data will be handled during and after the project. The $_PROJECT DMP + is modified according to the Horizon Europe and Horizon Europe online Manual. #if$_UPDATE It will be + updated/its validity checked during the + $_PROJECT project several times. At the very least, this will happen at month $_UPDATEMONTH. + #endif$_UPDATE +

    + +

    1.    Data Summary

    +

    Will you re-use any existing data and what will you re-use it for? State the + reasons if re-use of any existing data has been considered but discarded.

    +

    + + + The project builds on existing data sets and relies on them. #if$_RNASEQ For instance, without a + proper genomic reference it is very difficult to analyze NGS data sets.#endif$_RNASEQ It is also + important to include existing data sets on the expression and metabolic behaviour of $_STUDYOBJECT, + but of course, also on existing characterization and the background knowledge. #if$_PARTNERS of the + partners. #endif$_PARTNERS Genomic references can simply be gathered from reference databases for + genomes/sequences, like the National Center for Biotechnology Information: NCBI (US); European + Bioinformatics Institute: EBI (EU); DNA Data Bank of Japan: DDBJ (JP). Furthermore, prior + 'unstructured' data in the form of publications and data contained therein will be used for decision + making. + +

    +

    What types and formats of data will the project generate or re-use? +

    + + +

    + + The $_PROJECT will collect and/or generate the following types of raw data : $_PHENOTYPIC, + $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEoMIC, $_TARGETED, $_MODELS, $_CODE, + $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data will also + be processed and modified using analytical pipelines, which may yield different results or include + ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be tracked in the DataPLANT + ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive these resources + (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise in the DataPLANT + consortium#endif$_DATAPLANT. + +

    + +

    What is the purpose of the data generation or re-use and its relation to the + objectives of the project?

    +

    + + The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION + and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization + #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary + #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely + necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also + be informed about the provenance of data analysis information. Stakeholders must also be informed + about the provenance of data. It is therefore necessary to ensure that the data are well generated + and also well annotated with metadata using open standards, as laid out in the next section. + +

    + + +

    What is the expected size of the data that you intend to generate or + re-use?

    +

    We expect to generate raw data in the range of $_RAWDATA GB of data. The size + of the derived data will be about $_DERIVEDDATA GB.

    + +

    What is the origin/provenance of the data, either generated or + re-used?

    +

    Public data will be extracted as described in the previous paragraph. For the + $_PROJECT, specific data sets will be generated by the consortium partners.

    + + +

    + Data of different types or representing different domains will be generated using + unique approaches. For example: +

    +
      + + + #if$_GENETIC +
    • +

      + Genetic data will be generated targeting crosses and breeding experiments, and + will include recombination frequencies and crossover event that position genetic markers and + quantitative trait loci that can be associated with physical genomic markers/variants. + +

      +
    • + #endif$_GENETIC + + #if$_GENOMIC +
    • +

      + Genomic data will be created from sequencing data, which will be processed to + identify genes, regulatory elements, transposable elements, and physical markers such as + SNPs, microsatellites and structural variants. +

      +
    • + #endif$_GENOMIC + + #if$_CLONED-DNA +
    • +

      + The origin and assembly of cloned DNA will include (a) source of original + vector sequence with Add gene reference where available, and source of insert DNA (e.g., + amplification by PCR from a given sample, or obtained from existing library), (b) cloning + strategy (e.g., restriction endonuclease digests/ligation, PCR, TOPO cloning, Gibson + assembly, LR recombination), and (c) verified DNA data sequence of final recombinant vector. + +

      +
    • + #endif$_CLONED-DNA + + #if$_TRANSCRIPTOMIC +
    • +

      + Methods of transcriptomics data collection will be selected from microarrays, + quantitative PCR, Northern blotting, RNA immunoprecipitation, fluorescence in situ + hybridization. RNA-Seq data will be collected in seperate methods. +

      +
    • + #endif$_TRANSCRIPTOMIC + + #if$_RNASEQ +
    • +

      + RNA sequencing will be generated using short-read or long-read plantforms, + either in house or outsourced to academic facilities or commercial services, and the raw + data will be processed using estabilished biofirmatics piplines. +

      +
    • + #endif$_RNASEQ + #if$_METABOLOMIC +
    • +

      + Metabolomic data will be generated by coupled chromatography and mass + spectrometry using targeted or untargeted approaches. +

      +
    • + #endif$_METABOLOMIC #if$_PROTEOMIC +
    • +

      + Proteomic data will be generated using coupled chromatography and mass + spectrometry for the analysis of protein abundance and protein identification, as well as + additional techniques for structural analysis, the identification of post-translational + modifications and the characterization of protein interactions. +

      +
    • + #endif$_PROTEOMIC + + #if$_PHENOTYPIC +
    • +

      + Phenotypic data will be generated using phenotyping platforms and + corresponding ontologies, including number/size of organs such as leaves, flowers, buds + etc., size of whole plant, stem/root architecture (number of lateral branches/roots etc), + organ structures/morphologies, quantitative metrics such as color, turgor, health/nutrition + indicators, among others. +

      +
    • + #endif$_PHENOTYPIC + + + #if$_TARGETED +
    • +

      + Targeted assays data (e.g. glucose and fructose concentrations or + production/ultilization rates) will be generated using specific equipment and methods that + are fully documented in the laboratory notebook. +

      +
    • + #endif$_TARGETED + + + #if$_IMAGE +
    • +

      + Image data will be generated by equipment such as cameras, scanners, and + microscopes combined with software. Original images which contain metadata such as exif + photo information will be archived. +

      +
    • + #endif$_IMAGE + + #if$_MODELS +
    • +

      + Model data will be generated by using software simulations. The complete + workflow, which includes the environment, runtime, parameters, and results, will be + documented and archived. +

      +
    • + #endif$_MODELS + + #if$_CODE +
    • +

      + Computer code will be produced by programmers. +

      +
    • + #endif$_CODE + + #if$_EXCEL +
    • +

      + Excel data will be generated by data analysts by using MS Office or + open-source software. +

      +
    • + #endif$_EXCEL + + + + +
    + + + #if$_PREVIOUSPROJECTS +

    Data from previous projects such as $_PREVIOUSPROJECTS will be considered. +

    + #endif$_PREVIOUSPROJECTS + +

    To whom might it be useful ('data utility'), outside your project?

    +

    + + The data will initially benefit the $_PROJECT partners, but will also be made available to selected + stakeholders closely involved in the project, and then the scientific community working on + $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also + use the data after publication. The data will be disseminated according to the $_PROJECT's + dissemination and communication plan#if$_DATAPLANT, which aligns with DataPLANT platform or other + means#endif$_DATAPLANT. + +

    +

    + + $_DATAUTILITY + +

    +

    2    FAIR data

    +

    2.1. Making data findable, including provisions for metadata

    +

    + + Will data be identified by a persistent identifier? + +

    +

    + All data sets will receive unique identifiers, and they will be annotated with + metadata. + +

    + +

    + Will rich metadata be provided to allow discovery? What metadata will be created? What + disciplinary or general standards will be followed? In case metadata standards do not exist in your + discipline, please outline what type of metadata will be created and how. + +

    +

    + + All datasets will be associated with unique identifiers and will be annotated with metadata. We will + use Investigation, Study, Assay (ISA) specification for metadata creation. The $_PROJECT will rely + on community standards plus additional recommendations applicable in the plant science, such as the + #if$_PHENOTYPIC #if$_MIAPPE MIAPPE (Minimum Information About a Plant Phenotyping + Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC + #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS + #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU + #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG + #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS + #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: + Specimen),#endif$_MIMARKSSPECIMEN + #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: + Survey),#endif$_MIMARKSSURVEY + #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG + #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG + #endif$_GENOMIC|$_GENETIC + #if$_TRANSCRIPTOMIC + #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing + Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC + #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray + Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC + #if$_IMAGE + #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI + #endif$_IMAGE + #if$_PROTEOMIC + #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE + #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX + #endif$_PROTEOMIC + + + + These specific standard unlike cross-domain minimal sets such as the Dublin core, which + mostly define the submitter and the general type of data, allow reusability by other researchers by + defining properties of the plant (see the preceding section). However, minimal cross-domain + annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also + remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow + individual releases to be tagged with a Digital Object Identifier (DOI). + #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered + to. #endif$_OTHERSTANDARDS + +

    +

    Will search keywords be provided in the metadata to optimize the possibility + for discovery and then potential re-use?

    +

    + + Keywords about the experiment and the general consortium will be included, as well as an abstract + about the data, where useful. In addition, certain keywords can be auto-generated from dense + metadata and its + underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with standardized + DataPLANT ontologies that are supplemented where the ontology does not yet include the variables. + #endif$_DATAPLANT + +

    +

    Will metadata be offered in such a way that it can be harvested and + indexed?

    +

    + + To maintain data integrity and to be able to re-analyze data, data sets will get version numbers + where this is useful (e.g. raw data must not be changed and will not get a version number and is + considered + immutable). #if$_DATAPLANT This is automatically supported by the ARC Git DataPLANT infrastructure. + #endif$_DATAPLANT + + + Data variables will be allocated standard names. For example, genes, proteins and metabolites will + be named according to approved nomenclature and conventions. These will also be linked to functional + ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by + humans. Plant names will include traditional names, binomials, and all + strain/cultivar/subspecies/variety identifiers. +

    + + + +

    2.2.    Making data accessible

    +

    Repository

    +

    Will the data be deposited in a trusted repository?

    + +

    + + Data will be made available via the $_PROJECT platform using a user-friendly front end that allows + data visualization. Besides this it will be ensured that data which can be stored in + international discipline related repositories which use specialized technologies: + + + +

    + #if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC +

    + +

    + #if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA + #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_TRANSCRIPTOMIC +

    + +

    + #if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE + #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE +

    + +

    + #if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS + EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics + Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT + #endif$_METABOLOMIC +

    +

    + #if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB + PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of + Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC +

    +

    + #if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & + Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC +

    + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as + well.#endif$_OTHEREP + +

    + + +

    Have you explored appropriate arrangements with the identified repository + where your data will be deposited?

    +

    + + The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. + Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. + #if$_DATAPLANT For DataPLANT, this has been agreed upon, as all the omics repositories of + International Nucleotide Sequence Database Collaboration (INSDC) will be used. #endif$_DATAPLANT + #issuewarning if no data management platform such as DataPLANT is used, then you need to find + appropriate repository to store or archive your data after publication. #endissuewarning + +

    + +

    Does the repository ensure that the data is assigned an identifier? Will the + repository resolve the identifier to a digital object?

    +

    + + Data will be stored in the following repositories: + + + +

    + #if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC +

    + +

    + #if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA + #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_TRANSCRIPTOMIC +

    + +

    + #if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE + #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE +

    + +

    + #if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS + EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics + Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT + #endif$_METABOLOMIC +

    +

    + #if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB + PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of + Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC +

    +

    + #if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & + Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC +

    + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as + well.#endif$_OTHEREP In the case of unstructured less standardized data (e.g. experimental + phenotypic measurements), these will be metadata annotated and if complete given a digital object + identifier (DOI).#if$_DATAPLANT and the whole data sets wrapped into an ARC will get DOIs as well. + #endif$_DATAPLANT + +

    + Those repositories are the most appropriate ones. + +

    Data:

    +

    Will all data be made openly available? If certain datasets cannot be shared + (or need to be shared under restricted access conditions), explain why, clearly separating legal and + contractual reasons from intentional restrictions. Note that in multi-beneficiary projects it is + also possible for specific beneficiaries to keep their data closed if opening their data goes + against their legitimate interests or other constraints as per the Grant Agreement.

    +

    + + By default, all data sets from the $_PROJECT will be shared with the community and made openly + available. However, before the data are released, all will be provided with an opportunity to check + for potential IP (according to the consortium agreement and background IP rights). #if$_INDUSTRY + This applies in particular to data pertaining to the industry. #endif$_INDUSTRY IP protection will + be prioritized for datasets that offer the potential for exploitation. + +

    +

    + + Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their + data closed if relevant provisions are made in the consortium agreement and are in line with the + reasons + for opting out. + +

    + +

    If an embargo is applied to give time to publish or seek protection of the + intellectual property (e.g. patents), specify why and how long this will apply, bearing in mind that + research data should be made available as soon as possible.

    +

    + + #if$_early Some raw data is made public as soon as it is collected and processed.#endif$_early + #if$_beforepublication Relevant processed datasets are made public when the research findings are + published.#endif$_beforepublication #if$_endofproject At the end of the project, all data without + embargo period will be published.#endif$_endofproject #if$_embargo Data, which is subject to an + embargo period, is not publicly accessible until the end of embargo period.#endif$_embargo + #if$_request Data is made available upon request, allowing controlled sharing while ensuring + responsible use.#endif$_request #if$_ipissue IP issues will be checked before publication. + #endif$_ipissue All consortium partners will be + encouraged to make data available before publication, openly and/or under pre-publication agreements + #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International + Data + Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are + complete. + +

    + +

    Will the data be accessible through a free and standardized access protocol? +

    +

    + + #if$_DATAPLANT DataPLANT stores data in the ARC, which is a git repo. The DataHUB shares data and + metadata as a gitlab instance. The "Git" and "Web" protocol are opensourced and freely accessible. + In addition, #endif$_DATAPLANT Zenodo and the endpoint repositories will also be used for access. In + General, web-based protocols are free and standardized for access. + +

    + +

    If there are restrictions on use, how will access be provided to the data, + both during and after the end of the project? +

    +

    + + There are no restrictions, beyond the aforementioned IP checks, which are in line with e.g. European + open data policies. + +

    + + +

    How will the identity of the person accessing the data be ascertained? +

    +

    + + In case data is only shared within the consortium, if the data is not yet finished or under IP + checks, the data is hosted internally and username and password will be required (see also our GDPR + rules). In the case data is made public under final EU or US repositories, completely anonymous + access is normally allowed. This is the case for ENA as well and both are in line with GDPR + requirements. + #if$_DATAPLANT Currently, data management relies on the annotated research context ARC. It is + password protected, so before any data can be obtained or samples generated an authentication needs + to take place. #endif$_DATAPLANT + + +

    +

    Is there a need for a data access committee (e.g. to evaluate/approve access + requests to personal/sensitive data)? +

    +

    + + Consequently, there is no need for a committee. + + +

    + + + + + +

    + + Metadata: + +

    +

    + + Will metadata be made openly available and licenced under a public domain dedication CC0, as per the + Grant Agreement? If not, please clarify why. Will metadata contain information to enable the user to + access the data? + +

    +

    + + Yes, where possible, e.g. CC REL will be used for data not submitted to specialized repositories + such as ENA. + +

    +

    + + How long will the data remain available and findable? Will metadata be guaranteed to remain + available after data is no longer available? + +

    +

    + +

    The data will be made available for many years#if$_DATAPLANT and + ideally indefinitely after the end of the project#endif$_DATAPLANT. + In any case data submitted to repositories (as detailed above) e.g. ENA /PRIDE would be + subject to local data storage regulation. + +

    + + +

    + +

    + + Will documentation or reference about any software be needed to access or read the data be included? + Will it be possible to include the relevant software (e.g. in open source code)? + +

    +

    + + #if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. #endif$_PROPRIETARY + #if!$_PROPRIETARY No specialized software will be needed to access the data, usually just a modern + browser. Access will be possible through web interfaces. For data processing after obtaining raw + data, typical open-source software can be used. #endif!$_PROPRIETARY + #if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC + commander, and the DMP tool which will not necessarily make the interaction with data more + convenient. #endif$_DATAPLANT #if$_DATAPLANT However, DataPLANT resources are well described, and + their setup is documented on their github project pages. #endif$_DATAPLANT + As stated above, here we use publicly available open-source and well-documented certified software + #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY + +

    + +

    2.3. Making data interoperable

    +

    + + What data and metadata vocabularies, standards, formats or methodologies will you follow to make + your data interoperable to allow data exchange and re-use within and across disciplines? Will you + follow community-endorsed interoperability best practices? Which ones? + +

    +

    + + + As noted above, we foresee using minimal standards such as the #if$_PHENOTYPIC #if$_MIAPPE MIAPPE + (Minimum Information About a Plant Phenotyping Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC + #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS + #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU + #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG + #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS + #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: + Specimen),#endif$_MIMARKSSPECIMEN + #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: + Survey),#endif$_MIMARKSSURVEY + #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG + #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG + #endif$_GENOMIC|$_GENETIC + #if$_TRANSCRIPTOMIC + #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing + Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC + #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray + Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC + #if$_IMAGE + #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI + #endif$_IMAGE + #if$_PROTEOMIC + #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE + #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX + #endif$_PROTEOMIC + + + + These specific standard unlike cross-domain minimal sets such as the Dublin core, which + mostly define the submitter and the general type of data, allow reusability by other researchers by + defining properties of the plant (see the preceding section). However, minimal cross-domain + annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also + remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow + individual releases to be tagged with a Digital Object Identifier (DOI). + #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered + to. #endif$_OTHERSTANDARDS + +

    + Whenever possible, data will be stored in common and openly defined formats including all the necessary + metadata to interpret and analyze data in a biological context. By default, no proprietary formats will be + used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by + the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In addition, text files might be + edited in text processor files, but will be shared as pdf. + Open ontologies will be used where they are mature. As stated above, some ontologies and controlled + vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the advanced + ontologies developed in DataPLANT. #endif$_DATAPLANT + + + +

    + + + +

    + + In case it is unavoidable that you use uncommon or generate project specific ontologies or + vocabularies, will you provide mappings to more commonly used ontologies? Will you openly publish + the generated ontologies or vocabularies to allow reusing, refining or extending them? + +

    +

    + + Common and open ontologies will be used. In fact, open biomedical ontologies will be used where they + are mature. As stated in the previous question, sometimes ontologies and controlled vocabularies + might have to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the DataPLANT biology + ontology (DPBO) developed in DataPLANT. #endif$_DATAPLANT. Ontology databases such as OBO Foundry + will be used to publish ontology. #if$_DATAPLANT The DPBO is also published in GitHub + https://github.com/nfdi4plants/nfdi4plants_ontology #endif$_DATAPLANT. + + +

    + +

    + + Will your data include qualified references to other data (e.g. other data from your project, or + datasets from previous research)? + +

    +

    + + The references to other data will be made in the form of DOI and ontology terms. + +

    + +

    2.4. Increase data re-use

    +

    How will you provide documentation needed to validate data analysis and + facilitate data re-use (e.g. readme files with information on methodology, codebooks, data cleaning, + analyses, variable definitions, units of measurement, etc.)?

    + +

    + + The documentation will be provided in the form of ISA (Investigation Study Assay) and CWL (Common + Workflow Language). #if$_DATAPLANT Here, the $_PROJECT will build on the ARC container, which + includes all the data, metadata, and documentations. #endif$_DATAPLANT + +

    +

    Will your data be made freely available in the public domain to permit the + widest re-use possible? Will your data be licensed using standard reuse licenses, in line with the + obligations set out in the Grant Agreement? +

    + +

    + + Yes, our data will be made freely available in the public domain to permit the widest re-use + possible. Open licenses, such as Creative Commons (CC), will be used whenever possible. + +

    +

    Will the data produced in the project be useable by third parties, in + particular after the end of the project? +

    + +

    + + There will be no restrictions once the data is made public. + +

    +

    Will the provenance of the data be thoroughly documented using the + appropriate standards? Describe all relevant data quality assurance processes. +

    + +

    + + The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION + and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization + #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary + #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely + necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also + be informed about the provenance of data analysis information. Stakeholders must also be informed + about the provenance of data. It is therefore necessary to ensure that the data are well generated + and also well annotated with metadata using open standards, as laid out in the next section. + +

    + +

    Describe all relevant data quality assurance processes. Further to the FAIR + principles, DMPs should also address research outputs other than data, and should carefully consider + aspects related to the allocation of resources, data security and ethical aspects. +

    + +

    + + The data will be checked and curated by using data collection protocol, personnel training, data + cleaning, data analysis, and quality control #if$_DATAPLANT Furthermore, data will be analyzed for + quality control (QC) problems using automatic procedures as well as by manual curation + #endif$_DATAPLANT. Document all data quality assurance processes, including the data collection + protocol, data cleaning procedures, data analysis techniques, and quality control measures. This + documentation should be kept for future reference and should be made available to stakeholders upon + request. + + +

    +

    3    Other research outputs

    +

    In addition to the management of data, beneficiaries should also consider and + plan for the management of other research outputs that may be generated or re-used throughout their + projects. Such outputs can be either digital (e.g. software, workflows, protocols, models, etc.) or + physical (e.g. new materials, antibodies, reagents, samples, etc.). + +

    +

    + + In the current data management plan, any digital output including but not limited to software, + workflows, protocols, models, documents, templates, notebooks are all treated as data. Therefore, + all aforementioned digital objects are already described in detail. For the non-digital objects, the + data management plan will be closely connected to the digitalisation of the physical objects. + #if$_DATAPLANT $_PROJECT will build a workflow which connects the ARC with an electronic lab + notebook in order to also manage the physical objects. #endif$_DATAPLANT + + +

    + +

    Beneficiaries should consider which of the questions pertaining to FAIR data + above, can apply to the management of other research outputs, and should strive to provide + sufficient detail on how their research outputs will be managed and shared, or made available for + re-use, in line with the FAIR principles. + +

    +

    + + Open licenses, such as Creative Commons CC, will be used whenever possible even on the other digital + objects. + + +

    + + + + + +

    4.    Allocation of resources

    +

    What will the costs be for making data or other research outputs FAIR in + your project (e.g. direct and indirect costs related to storage, archiving, re-use, security, + etc.)?

    +

    The $_PROJECT will bear the costs of data curation, #if$_DATAPLANT ARC + consistency checks, #endif$_DATAPLANT and data maintenance/security before transfer to public + repositories. Subsequent costs are then borne by the operators of these repositories.

    +

    + + Additionally, costs for after publication storage are incurred by end-point repositories (e.g. ENA) + but not charged against the $_PROJECT or its members but by the operation budget of these + repositories. + +

    +

    + How will these be covered? Note that costs related to research data/output management + are eligible as part of the Horizon Europe grant (if compliant with the Grant Agreement + conditions) +

    +

    The cost born by the $_PROJECT are covered by the project funding. + Pre-existing structures #if$_DATAPLANT such as structures, tools, and knowledge laid down in the + DataPLANT consortium#endif$_DATAPLANT will also be used.

    +

    Who will be responsible for data management in your project?

    +

    The responsible person will be $_DATAOFFICER of the $_PROJECT.

    +

    How will long term preservation be ensured? Discuss the necessary resources + to accomplish this (costs and potential value, who decides and how, what data will be kept and for + how long)?

    +

    + + The data officer #if$_PARTNERS or $_PARTNERS #endif$_PARTNERS will ultimately decides on the + strategy to preserve data that are not submitted to end-point subject area repositories + #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT when the + project ends. This will be in line with EU guidlines, institute policies, and data sharing based on + EU and international standards. + +

    +

    5.    Data security

    +

    What provisions are or will be in place for data security (including data + recovery as well as secure storage/archiving and transfer of sensitive data)?

    +

    + + Online platforms will be protected by vulnerability scanning, two-factor authorization and daily + automatic backups allowing immediate recovery. All partners holding confidential project data to use + secure platforms with automatic backups and offsite secure copies. + #if$_DATAPLANT DataHUB and ARCs have been generated in DataPLANT, data security will be imposed. + This comprises secure storage, and the use of password and usernames is generally transferred via + separate safe media.#endif$_DATAPLANT + +

    +

    Will the data be safely stored in trusted repositories for long term + preservation and curation?

    +

    + + Data will be made available via the $_PROJECT platform using a user-friendly front end that allows + data visualization. Besides this it will be ensured that data which can be stored in + international discipline related repositories which use specialized technologies: + + + +

    + #if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC +

    + +

    + #if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA + #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_TRANSCRIPTOMIC +

    + +

    + #if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE + #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE +

    + +

    + #if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS + EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics + Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT + #endif$_METABOLOMIC +

    +

    + #if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB + PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of + Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC +

    +

    + #if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & + Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC +

    + + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as + well.#endif$_OTHEREP + +

    +

    6.    Ethics

    +

    + + Are there, or could there be, any ethics or legal issues that can have an impact on data sharing? + These can also be discussed in the context of the ethics review. If relevant, include references to + ethics deliverables and ethics chapter in the Description of the Action (DoA). + +

    +

    + + At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, + since this is plant data, there is no need for an ethics committee, however, diligence for plant + resource benefit + sharing is considered . #issuewarning you have to check here and enter any due diligence here at the + moment we are awaiting if Nagoya (🡺see Nagoya protocol) gets also part of sequence information. In + any case + if you use material not from your (partner) country and characterize this physically e.g., + metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action + unless this is from e.g. + US (non partner), Ireland (not signed still contact them) etc but other laws might apply…. + #endissuewarning + +

    +

    Will informed consent for data sharing and long term preservation be + included in questionnaires dealing with personal data?

    +

    + + The only personal data that will potentially be stored is the submitter name and affiliation in the + metadata for data. In addition, personal data will be collected for dissemination and communication + activities using specific methods and procedures developed by the $_PROJECT partners to adhere to + data protection. #issuewarning you need to inform and better get WRITTEN consent that you store + emails and + names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t + invent them #endissuewarning + +

    +

    7.    Other issues

    +

    Do you, or will you, make use of other national/funder/sectorial/departmental + procedures for data management? If yes, which ones (please list and briefly describe them)? +

    +

    Yes, the $_PROJECT will use common Research Data Management (RDM) tools such as #if$_DATAPLANT|$_NFDI resources developed by the NFDI of Germany,#endif$_DATAPLANT|$_NFDI #if$_FRENCH infrastructure developed by INRAe from France, #endif$_FRENCH #if$_EOSC and cloud service developed by EOSC (European Open Science Cloud)#endif$_EOSC . +

    +

    +

    +

    3     + Annexes

    +

    +

    3.1     Abbreviations

    +

    + #if$_DATAPLANT

    ARC Annotated Research Context +

    #endif$_DATAPLANT + +

    CC Creative Commons

    +

    CC CEL Creative Commons Rights Expression + Language

    +

    DDBJ DNA Data Bank of Japan

    +

    DMP Data Management Plan

    +

    DoA Description of Action

    +

    DOI Digital Object Identifier

    +

    EBI European Bioinformatics Institute

    +

    ENA European Nucleotide Archive

    +

    EU European Union

    +

    FAIR Findable Accessible Interoperable + Reproducible

    +

    GDPR General data protection regulation (of the + EU)

    +

    IP Intellectual Property

    +

    ISO International Organization for + Standardization

    +

    MIAMET Minimal Information about Metabolite + experiment

    +

    MIAPPE Minimal Information about Plant Phenotyping + Experiment

    +

    MinSEQe Minimum Information about a high-throughput + Sequencing Experiment

    +

    NCBI National Center for Biotechnology + Information

    +

    NFDI National Research Data Infrastructure (of + Germany)

    +

    NGS Next Generation Sequencing

    +

    RDM Research Data Management

    +

    RNASeq RNA Sequencing

    +

    SOP Standard Operating Procedures

    +

    SRA Short Read Archive

    + #if$_DATAPLANT

    SWATE Swate Workflow Annotation Tool + for Excel

    #endif$_DATAPLANT +

    ONP Oxford Nanopore

    +

    qRTPCR quantitative real + time polymerase chain reaction

    +

    WP Work Package

    +

    +

    +

    +

    +

    + +
    +
    +
    +
    +
    +
    +
    +
    +
    + +
    Data Management Plan of the DFG Project $_PROJECT
    +
    +

    + 1.    Data description +

    +

    + 1.1    Introduction +

    + + #if$_EU +

    The $_PROJECT is part of the Open Data Initiative (ODI) of the EU. + #endif$_EU To best profit from open data, it is necessary not only to store data but to make data + Findable, Accessible, Interoperable and Reusable (FAIR). #if$_PROTECT Open and FAIR data, however, + considers the need + to protect individual data sets. #endif$_PROTECT + +

    +

    + + The aim of this document is to provide guidelines on principles guiding the data management in the + $_PROJECT and what data will be stored by using the responses to the DFG Data Management Plan (DMP) + checklist + to generate a DMP document. + +

    +

    + + The detailed DMP instructs how data will be handled during and after the project. The $_PROJECT DMP + is modified according to the DFG data management checklist. #if$_UPDATE It will be updated/its + validity + checked during the $_PROJECT project several times. At the very least, this will happen at month + $_UPDATEMONTH. #endif$_UPDATE + +

    + +

    + 1.2    How does your project generate new data? +

    +

    + Data of different types or of different domains will be generated differently. For + example: +

    +
      + #if$_TRANSCRIPTOMIC +
    • +

      + Methods of transcriptomics data collection will be selected from microarrays, + quantitative PCR, Northern blotting, RNA immunoprecipitation, fluorescence in situ + hybridization. RNA-Seq data will be collected in seperate methods. +

      +
    • + #endif$_TRANSCRIPTOMIC + #if$_RNASEQ +
    • +

      + RNA sequencing will be generated using short-read or long-read plantforms, + either in house or outsourced to academic facilities or commercial services, and the raw + data will be processed using estabilished biofirmatics piplines. +

      +
    • + #endif$_RNASEQ + + #if$_METABOLOMIC +
    • +

      + Metabolomic data will be generated using chromatography coupled to mass + spectrometry and from enzyme platforms mostly. +

      +
    • + #endif$_METABOLOMIC #if$_PROTEOMIC +
    • +

      + Proteomic data will be generated using an EU platform which are in line with + community standards. +

      +
    • + #endif$_PROTEOMIC + + #if$_IMAGE +
    • +

      + Image data will be generated by using equipment (cameras, scanners, and + microscopes) or software. Original images which contain metadata such as exif photo + information will be archived. +

      +
    • + #endif$_IMAGE + + #if$_GENOMIC +
    • +

      + Genomic data will be created from sequencing data. The sequencing data will be + collected by Next Generation Sequencing (NGS) equipment#if$_PARTNERS or get from + parterners#endif$_PARTNERS. Then the sequencing data will be processed to get the genomic + data. +

      +
    • + #endif$_GENOMIC + + #if$_GENETIC +
    • +

      + Genetic data will be generated by using Next Generation Sequencing (NGS) + equipment. +

      +
    • + #endif$_GENETIC + + #if$_TARGETED +
    • +

      + Targeted assays (e.g. glucose and fructose content) will be generated using + specific equipment or experiments. The procedure is fully documented in the lab book. + +

      +
    • + #endif$_TARGETED + + + #if$_MODELS +
    • +

      + Models data will be generated by software simulations. The complete workflow, + which includes the environment, runtime, parameter and results will be documented and + achieved. +

      +
    • + #endif$_MODELS + + #if$_CODE +
    • +

      + The code data will be generated by programmers. +

      +
    • + #endif$_CODE + + #if$_EXCEL +
    • +

      + The Excel data will be generated by experimentalists or data analysts by using + Office or open-source software. +

      +
    • + #endif$_EXCEL + + #if$_CLONED-DNA +
    • +

      + The cloned DNA data will be generated by using a sequencing tool. +

      +
    • + #endif$_CLONED-DNA + + #if$_PHENOTYPIC +
    • +

      + Phenotypic data will be generated using phenotyping platforms. +

      +
    • + #endif$_PHENOTYPIC +
    + +

    + + The $_PROJECT has the following aim: $_PROJECTAIM. Therefore, data collection#if!$_VVISUALIZATION + and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, integration and visualization + #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC structure are absolutely necessary + #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data management process is absolutely + necessary #endif!$_DATAPLANT because the data are used not only to understand principles, but also + be informed about the provenance of data analysis information. Stakeholders must also be informed + about the provenance of data. It is therefore necessary to ensure that the data are well generated + and also well annotated with metadata using open standards, as laid out in the next section. + +

    + +

    + Public data will be extracted as described in paragraph 1.3. For the $_PROJECT, + specific data sets will be generated by the consortium partners. +

    + +

    + 1.3    Is existing data reused? +

    +

    + + The project builds on existing data sets and relies on them. #if$_RNASEQ For instance, without a + proper genomic reference it is very difficult to analyze NGS data sets.#endif$_RNASEQ It is also + important to include existing data sets on the expression and metabolic behaviour of $_STUDYOBJECT, + but of course, also on existing characterization and the background knowledge. #if$_PARTNERS of the + partners. #endif$_PARTNERS Genomic references can simply be gathered from reference databases for + genomes/sequences, like the National Center for Biotechnology Information: NCBI (US); European + Bioinformatics Institute: EBI (EU); DNA Data Bank of Japan: DDBJ (JP). Furthermore, prior + 'unstructured' data in the form of publications and data contained therein will be used for decision + making. + +

    + +

    + 1.4    Which data types (in terms of data formats like image data, + text data or measurement data) arise in your project and in what way are they further + processed? +

    +

    + + We foresee that the following data about $_STUDYOBJECT will be collected and generated at the very + least: $_PHENOTYPIC, $_GENETIC, $_GENOMIC, $_METABOLOMIC, $_RNASEQ, $_IMAGE, $_PROTEOMIC, + $_TARGETED, + $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA and result data. Furthermore, data derived from the original + raw data sets will also be collected. This is important, as different analytical pipelines + might yield different results or include + + ad-hoc + + data analysis parts#if$_DATAPLANT and these pipelines will be tracked in the DataPLANT + ARC#endif$_DATAPLANT. Therefore, specific care will be taken, to document and archive these + resources (including the + analytic pipelines) as well#if$_DATAPLANT relying on the vast expertise in the DataPLANT consortium + #endif$_DATAPLANT. + +

    + + 1.5    To what extent do these arise or what is the anticipated data + volume? + +

    + We expect to generate raw data in the range of $_RAWDATA GB of data. The size of the + derived data will be about $_DERIVEDDATA GB. +

    + +

    + 2.    Documentation and data quality +

    +

    + 2.1.    What approaches are being taken to describe the data in a + comprehensible manner (such as the use of available metadata, documentation standards or + ontologies)? +

    + +

    +

    + + All datasets will be associated with unique identifiers and will be annotated with metadata. We will + use Investigation, Study, Assay (ISA) specification for metadata creation. The $_PROJECT will rely + on community standards plus additional recommendations applicable in the plant science, such as the + #if$_PHENOTYPIC #if$_MIAPPE MIAPPE (Minimum Information About a Plant Phenotyping + Experiment),#endif$_MIAPPE #endif$_PHENOTYPIC + #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS + #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU + #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG + #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS + #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: + Specimen),#endif$_MIMARKSSPECIMEN + #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: + Survey),#endif$_MIMARKSSURVEY + #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG + #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG + #endif$_GENOMIC|$_GENETIC + #if$_TRANSCRIPTOMIC + #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing + Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC + #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray + Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC + #if$_IMAGE + #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI + #endif$_IMAGE + #if$_PROTEOMIC + #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE + #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX + #endif$_PROTEOMIC + + + + These specific standard unlike cross-domain minimal sets such as the Dublin core, which + mostly define the submitter and the general type of data, allow reusability by other researchers by + defining properties of the plant (see the preceding section). However, minimal cross-domain + annotations #if$_DUBLINCORE Dublin Core,#endif$_DUBLINCORE #if$_MARC21 MARC 21,#endif$_MARC21 also + remain part of the $_PROJECT. #if$_DATAPLANT The core integration with DataPLANT will also allow + individual releases to be tagged with a Digital Object Identifier (DOI). + #endif$_DATAPLANT #if$_OTHERSTANDARDS Other standards such as $_OTHERSTANDARDINPUT are also adhered + to. #endif$_OTHERSTANDARDS + +

    +

    + +

    + + Open ontologies will be used where they are mature. As stated above, some ontologies and controlled + vocabularies might need to be extended. #if$_DATAPLANT Here, the $_PROJECT will build on the + advanced ontologies developed in DataPLANT. #endif$_DATAPLANT + Keywords about the experiment and the general consortium will be included, as well as an abstract + about the data, where useful. In addition, certain keywords can be auto-generated from dense + metadata and + its underlying ontologies. #if$_DATAPLANT Here, DataPLANT strives to complement these with + standardized DataPLANT ontologies that are supplemented where the ontology does not yet include the + variables. + #endif$_DATAPLANT + +

    +

    + + In fact, open biomedical ontologies will be used where they are mature. As stated in the previous + question, sometimes ontologies and controlled vocabularies might have to be extended. #if$_DATAPLANT + Here, the + $_PROJECT will build on the advanced ontologies developed in DataPLANT. #endif$_DATAPLANT + +

    +

    + 2.2    What measures are being adopted to ensure high data + quality? +

    + +

    + + The $_PROJECT aims at the following aim: $_PROJECTAIM. Therefore, data + collection#if!$_VVISUALIZATION and integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, + integration and visualization #endif$_VVISUALIZATION #if$_DATAPLANT using the DataPLANT ARC + structure are absolutely necessary #endif$_DATAPLANT #if!$_DATAPLANT through a standardized data + management process is absolutely necessary #endif!$_DATAPLANT because the data are used not only to + understand principles, but also be informed about the provenance of data analysis information. + Stakeholders must also be informed about the provenance of data. It is therefore necessary to ensure + that the data are well generated and also well annotated with metadata using open standards. + + Data variables will be allocated standard names. For example, genes, proteins and metabolites will + be named according to approved nomenclature and conventions. These will also be linked to functional + ontologies where possible. Datasets will also be named I a meaningful way to ensure readability by + humans. Plant names will include traditional names, binomials, and all + strain/cultivar/subspecies/variety identifiers. + + + +

    + +

    + + To maintain data integrity and to be able to re-analyze data, data sets will get version numbers + where this is useful (e.g. raw data must not be changed and will not get a version number and is + considered + immutable). #if$_DATAPLANT this is automatically supported by the ARC Git DataPLANT infrastructure. + #endif$_DATAPLANT + +

    +

    + + As mentioned above, we foresee using e.g. #if$_RNASEQ|$_GENOMIC #if$_MINSEQE MinSEQe for sequencing + data and #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for + metabolites#if$_MIAPPE as well as MIAPPE for phenotyping-like data#endif$_MIAPPE. The latter will + thus allow the integration of data across projects and safeguards that reuse established and tested + protocols. + Additionally, we will use ontology terms to enrich the data sets relying on free and open + ontologies. In addition, additional ontology terms might be created and be canonized during the + $_PROJECT. + +

    +

    + 2.3    Are quality controls in place and if so, how do they + operate? +

    + +

    + + The data will be checked and curated through the project period. #if$_DATAPLANT Furthermore, data + will be analyzed for quality control (QC) problems using automatic procedures as well as by manual + curation. + #endif$_DATAPLANT Phd students and lab professionals will be responsible for the first-hand quality + control. Afterwards, the data will be checked and annotated by $_DATAOFFICER. #if$_RNASEQ|$_GENOMIC + FastQC will be conducted on the base-calling. #endif$_RNASEQ|$_GENOMIC Before publication, the data + will be controlled again. + +

    + +

    + 2.4    Which digital methods and tools (e.g. software) are required + to use the data? +

    +

    + The $_PROJECT will use common Research Data Management (RDM) tools such as #if$_DATAPLANT|$_NFDI resources developed by the NFDI of Germany,#endif$_DATAPLANT|$_NFDI #if$_FRENCH infrastructure developed by INRAe from France, #endif$_FRENCH #if$_EOSC and cloud service developed by EOSC (European Open Science Cloud)#endif$_EOSC . +

    +

    + #if$_PROPRIETARY The $_PROJECT relies on the tool(s) $_PROPRIETARY. + #endif$_PROPRIETARY +

    +

    + + #if!$_PROPRIETARY No specialized software will be needed to access the data, usually just a modern + browser. Access will be possible through web interfaces. For data processing after obtaining raw + data, + typical open-source software can be used. As no proprietary software is needed, no documentation + needs to be provided. #endif!$_PROPRIETARY + +

    +

    + + #if$_DATAPLANT However, DataPLANT resources are well described, and their setup is documented on + their github project pages. + #endif$_DATAPLANT + +

    +

    + + #if$_DATAPLANT DataPLANT offers tools such as the open-source SWATE plugin for Excel, the ARC + commander, and the DMP tool which will not necessarily make the interaction with data more + convenient. + #endif$_DATAPLANT + +

    + + As stated above, here we use publicly available open-source and well-documented certified + software #if$_PROPRIETARY except for $_PROPRIETARY #endif$_PROPRIETARY. +

    + +

    + 3.    Storage and technical archiving the project +

    + +

    + 3.1    How is the data to be stored and archived throughout the + project duration? +

    +

    + + Data will be made available via the $_PROJECT platform using a user-friendly front end that allows + data visualization. Besides this it will be ensured that data which can be stored in + international discipline related repositories which use specialized technologies: #if$_GENETIC #if$_GENBANK NCBI-GenBank,#endif$_GENBANK #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_GENETIC #if$_TRANSCRIPTOMIC|$_GENETIC #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO + NCBI-GEO,#endif$_GEO #endif$_TRANSCRIPTOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_BIOIMAGE + EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR + #endif$_IMAGE #if$_METABOLOMIC #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS + #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact + (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC #if$_PROTEOMIC #if$_PRIDE + EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI + Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC #if$_PHENOTYPIC #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) + #endif$_edal #endif$_PHENOTYPIC + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as + well.#endif$_OTHEREP + +

    +

    + Data will be made available for many years#if$_DATAPLANT and potentially indefinitely + after the end of the project#endif$_DATAPLANT. +

    +

    + In any case data submitted to international + discipline related repositories which use specialized technologies (as detailed above) e.g. ENA + /Pride would be subject to local data storage regulation. +

    +

    + 3.2    What is in place to secure sensitive data throughout the + project duration (access and usage rights)? +

    +

    + + #if$_DATAPLANT In DataPLANT, data management relies on the Annotated Research Context (ARC). It is + password protected, so before any data can be obtained or samples generated, an authentication needs + to + take place. #endif$_DATAPLANT + +

    +

    + + In case data is only shared within the consortium, if the data is not yet finished or under IP + checks, the data is hosted internally, and the username and the password will be required (see also + our GDPR rules). + In the case data is made public under final EU or US repositories, completely anonymous access is + normally allowed. this is the case for ENA as well and both are in line with GDPR requirements. + +

    +

    + There will be no restrictions once the data is made public. +

    + +

    + 4.    Legal obligations and conditions +

    + +

    + 4.1    What are the legal specifics associated with the handling of + research data in your project? +

    + +

    + + At the moment, we do not anticipate ethical or legal issues with data sharing. In terms of ethics, + since this is plant data, there is no need for an ethics committee, however, diligence for plant + resource + benefit sharing is considered. #issuewarning you have to check here and enter any due diligence here + at the moment we are awaiting if Nagoya (🡺see Nagoya protocol) gets also part of sequence + information. + In any case if you use material not from your (partner) country and characterize this physically + e.g., metabolites, proteome, biochemically RNASeq etc. this might represent a Nagoya relevant action + unless + this is from e.g. US (non partner), Ireland (not signed still contact them) etc but other laws might + apply…. #endissuewarning + +

    +

    + + The only personal data that will potentially be stored is the submitter name and affiliation in the + metadata for data. In addition, personal data will be collected for dissemination and communication + activities using specific methods and procedures developed by the $_PROJECT partners to adhere to + data protection. #issuewarning you need to inform and better get WRITTEN consent that you store + emails and + names or even pseudonyms such as twitter handles, we are very sorry about these issues we didn’t + invent them #endissuewarning + +

    +

    + 4.2    Do you anticipate any implications or restrictions regarding + subsequent publication or accessibility? +

    +

    + + Once data is transferred to the $_PROJECT platform#if$_DATAPLANT and ARCs have been generated in + DataPLANT#endif$_DATAPLANT, data security will be imposed. This comprises secure storage, and the + use of + passwords and usernames is generally transferred via separate safe media. + +

    + +

    + 4.3    What is in place to consider aspects of use and copyright + law as well as ownership issues? +

    +

    + Open licenses, such as Creative Commons (CC), will be used whenever possible. +

    +

    + 4.4    Are there any significant research codes or professional + standards to be taken into account? +

    + +

    + + Whenever possible, data will be stored in common and openly defined formats including all the + necessary metadata to interpret and analyze data in a biological context. By default, no proprietary + formats will + be used; however, Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as + intermediates by the consortium#if$_DATAPLANT and by some ARC components in form#endif$_DATAPLANT. + In addition, + text files might be edited in text processor files, but will be shared as pdf. + +

    + +

    + 5.    Data exchange and long-term data accessibility +

    + +

    + 5.1    Which data sets are especially suitable for use in other + contexts? +

    + +

    + + The data will be useful for the $_PROJECT partners, the scientific community working on + $_STUDYOBJECT or the general public interested in $_STUDYOBJECT. Hence, the $_PROJECT also strives + to collect the data + that has been disseminated and potentially advertise it#if$_DATAPLANT e.g. through the DataPLANT + platform or other means #endif$_DATAPLANT, if it is not included in a publication anyway, which is + the most + likely form of dissemination. + +

    + +

    + 5.2    Which criteria are used to select research data to make it + available for subsequent use by others? +

    + +

    + + By default, all data sets from the $_PROJECT will be shared with the community and made openly + available. This is, however, after partners have had the ability to check for IP protection + (according to + agreements and background rights). #if$_INDUSTRY This applies in particular to data pertaining to + the industry. #endif$_INDUSTRY However, all partners also strive for IP protection of data sets + which will + be tested and due diligence will be given. + +

    +

    + + Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their + data closed if relevant provisions are made in the consortium agreement and are in line with the + reasons for opting out. + +

    +

    + 5.3    Are you planning to archive your data in a suitable + infrastructure? +

    +

    + #if$_DATAPLANT As the $_PROJECT is closely aligned with DataPLANT, the ARC converter and DataHUB will be + used to find the end-point repositories and upload to the repositories automatically. #endif$_DATAPLANT + +

    + +

    + + Data will be made available via the $_PROJECT platform using a user-friendly front end that allows + data visualization. Besides this it will be ensured that data which can be stored in + international discipline related repositories which use specialized technologies: + + + +

    + #if$_GENETIC For genetic data: #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO #endif$_GENETIC +

    + +

    + #if$_TRANSCRIPTOMIC For Transcriptomic data: #if$_SRA NCBI-SRA,#endif$_SRA + #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_TRANSCRIPTOMIC +

    + +

    + #if$_IMAGE For image data: #if$_BIOIMAGE EBI-BioImage Archive,#endif$_BIOIMAGE + #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE +

    + +

    + #if$_METABOLOMIC For metabolomic data: #if$_METABOLIGHTS + EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics + Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular interactions),#endif$_INTACT + #endif$_METABOLOMIC +

    +

    + #if$_PROTEOMIC For proteomics data: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE #if$_PDB + PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities of + Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC +

    + +

    + #if$_PHENOTYPIC For phenotypic data: #if$_edal e!DAL-PGP (Plant Genomics & + Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC +

    + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be processed there as + well.#endif$_OTHEREP + +

    +

    + + The submission is for free, and it is the goal (at least of ENA) to obtain as much data as possible. + Therefore, arrangements are neither necessary nor useful. Catch-all repositories are not required. + #if$_DATAPLANT For DataPLANT, this has been agreed upon. #endif$_DATAPLANT #issuewarning if no data + management platform such as DataPLANT is used, then you need to find appropriate repository to store + or archive your data after publication. #endissuewarning + +

    +

    + 5.4    If so, how and where? Are there any retention periods? + +

    +

    + There are no restrictions, beyond the aforementioned IP checks, which are in line with + e.g. European open data policies. +

    +

    + + The $_PARTNERS decides on preservation of data not submitted to end-point subject area repositories + #if$_DATAPLANT or ARCs in DataPLANT#endif$_DATAPLANT after project end. This will be in line with EU + institute policies and data sharing based on EU and international standards. + +

    +

    + 5.5    When is the research data available for use by third + parties? +

    +

    + + #if$_early Some raw data is made public as soon as it is collected and processed.#endif$_early + #if$_beforepublication Relevant processed datasets are made public when the research findings are + published.#endif$_beforepublication #if$_endofproject At the end of the project, all data without + embargo period will be published.#endif$_endofproject #if$_embargo Data, which is subject to an + embargo period, is not publicly accessible until the end of embargo period.#endif$_embargo + #if$_request Data is made available upon request, allowing controlled sharing while ensuring + responsible use.#endif$_request #if$_ipissue IP issues will be checked before publication. + #endif$_ipissue All consortium partners will be + encouraged to make data available before publication, openly and/or under pre-publication agreements + #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International + Data + Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are + complete. + +

    + +

    + 6.    Responsibilities and resources +

    + +

    + 6.1    Who is responsible for adequate handling of the research + data (description of roles and responsibilities within the project)? +

    +

    + The responsible will be $_DATAOFFICER as data Officer. + The data responsible(s) (data officer#if$_PARTNERS or $_PARTNERS #endif$_PARTNERS) + decides on the preservation of data not submitted to end-point subject area repositories + #if$_DATAPLANT or ARCs in DataPLANT #endif$_DATAPLANT after the + project end. This will be in line with EU institute policies, and data sharing based on EU and + international standards. +

    +

    + 6.2    Which resources (costs; time or other) are required to + implement adequate handling of research data within the project? +

    +

    + The costs comprise data curation, #if$_DATAPLANT ARC consistency checks, + #endif$_DATAPLANT and maintenance on the $_PROJECT´s side. +

    +

    + + Additionally, last-level costs for storage are incurred by end-point repositories (e.g. ENA) but not + charged against the $_PROJECT or its members but by the operation budget of these repositories. + +

    +

    + A large part of the cost is covered by the $_PROJECT #if$_DATAPLANT and the structures, + tools and knowledge laid down in the DataPLANT consortium. #endif$_DATAPLANT +

    +

    + 6.3    Who is responsible for curating the data once the project + has ended? +

    +

    + As applicable, $_DATAOFFICER, who is responsible for ongoing data maintenance will also + take care of it after the finish of the $_PROJECT. #if$_DATAPLANT DataPLANT as external data + archives may provide such services in some cases. #endif$_DATAPLANT +

    + +

    + +

    +

    + +

    +

    + 7     + Annexes +

    +

    + +

    +

    + 7.1     + + Abbreviations +

    +

    + +

    + #if$_DATAPLANT

    ARC Annotated Research Context +

    #endif$_DATAPLANT + +

    CC Creative Commons

    +

    CC CEL Creative Commons Rights Expression + Language

    +

    DDBJ DNA Data Bank of Japan

    +

    DMP Data Management Plan

    +

    DoA Description of Action

    +

    DOI Digital Object Identifier

    +

    EBI European Bioinformatics Institute

    +

    ENA European Nucleotide Archive

    +

    EU European Union

    +

    FAIR Findable Accessible Interoperable + Reproducible

    +

    GDPR General data protection regulation (of the + EU)

    +

    IP Intellectual Property

    +

    ISO International Organization for + Standardization

    +

    MIAMET Minimal Information about Metabolite + experiment

    +

    MIAPPE Minimal Information about Plant Phenotyping + Experiment

    +

    MinSEQe Minimum Information about a high-throughput + Sequencing Experiment

    +

    NCBI National Center for Biotechnology + Information

    +

    NFDI National Research Data Infrastructure (of + Germany)

    +

    NGS Next Generation Sequencing

    +

    RDM Research Data Management

    +

    RNASeq RNA Sequencing

    +

    SOP Standard Operating Procedures

    +

    SRA Short Read Archive

    + #if$_DATAPLANT

    SWATE Swate Workflow Annotation Tool + for Excel

    #endif$_DATAPLANT +

    ONP Oxford Nanopore

    +

    qRTPCR quantitative real + time polymerase chain reaction

    +

    WP Work Package

    +

    +

    +

    +

    +

    +
    +
    +
    +
    +

    + + Practical Data Management Guide of the $_PROJECT + +

    +

    + +

    + + This practical guide of data management in the $_PROJECT should be considered as a minimum + description, leaving flexibility to include additional actions of specific domain or to + national or local + legislation.#if$_EU The $_PROJECT will follow EU FAIR principles.  #endif$_EU  + +

    +
    +

    + + The practical guide of data management in the $_PROJECT aims at providing a complete + walkthrough for the researcher. The contents are customized based on the user input in the + Data Management Plant + Generator (DMPG). The practices in this guide are customized to fit related legal, ethical, + standardization and funding body requirements. The suitable practices will cover all steps + of a data + management life-cycle: + +

    +
    +
      +
    1. +

      + text-decoration: none; vertical-align: baseline; - " - > - Data acquisition: - -

      -
    2. -
        -
      1. + Data acquisition: + +

        +
      2. +
          +
        1. text-decoration: none; vertical-align: baseline; - " - aria-level="2" - > -

          - +

          + vertical-align: baseline; - " - > - Data generation - -

          -
        2. -
        -
      -

      - + Data generation + +

      + +
    +

+

+ vertical-align: baseline; - " - > - Data should be generated by devices that are compatible with the open-source format. The $_STUDYOBJECT should be compliant to biodiversity protocols. The protocols used to collect $_PHENOTYPIC, - $_GENETIC, $_GENOMIC, $_METABOLOMIC, $_RNASEQ data about $_STUDYOBJECT will be stored#if$_DATAPLANT in the assays folder of ARC repositories.#endif$_DATAPLANT#if!$_DATAPLANT in a FAIR data - storage. #endif!$_DATAPLANT  - -

-
    -
  1. + Data should be generated by devices that are compatible with the open-source format. The + $_STUDYOBJECT should be compliant to biodiversity protocols. The protocols used to collect + $_PHENOTYPIC, + $_GENETIC, $_GENOMIC, $_METABOLOMIC, $_RNASEQ data about $_STUDYOBJECT will be + stored#if$_DATAPLANT in the assays folder of ARC + repositories.#endif$_DATAPLANT#if!$_DATAPLANT in a FAIR data + storage. #endif!$_DATAPLANT  + +

    +
      +
    1. vertical-align: baseline; margin-left: 36pt; - " - aria-level="2" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Data collection - -

      -
    2. -
    -

    - + Data collection + +

    +
  2. +
+

+ vertical-align: baseline; - " - > - The data collection process is conducted by experimental scientists and stewarded by $_DATAOFFICER.#if$_DATAPLANT An electronic lab notebook will be used to ensure enough metadata is recorded and - guarantees that the data can be further reused.#endif$_DATAPLANT  - -

-
    -
  1. + The data collection process is conducted by experimental scientists and stewarded by + $_DATAOFFICER.#if$_DATAPLANT An electronic lab notebook will be used to ensure enough + metadata is recorded and + guarantees that the data can be further reused.#endif$_DATAPLANT  + +

    +
      +
    1. vertical-align: baseline; margin-left: 36pt; - " - aria-level="2" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Data Organization - -

      -
    2. -
    -

    - + Data Organization + +

    +
  2. +
+

+ vertical-align: baseline; - " - > - The data organization process is conducted by $_DATAOFFICER. The detailed organization method and procedure are reported to the PIs. #if$_DATAPLANT The data organization will profit from the - knowledge-base and data-base of DataPLANT, elastic search will be used to find better ways to organize the data. #endif$_DATAPLANT  - -

-
-
-
    -
  1. + The data organization process is conducted by $_DATAOFFICER. The detailed organization + method and procedure are reported to the PIs. #if$_DATAPLANT The data organization will + profit from the + knowledge-base and data-base of DataPLANT, elastic search will be used to find better ways + to organize the data. #endif$_DATAPLANT  + +

    +
    +
    +
      +
    1. text-decoration: none; vertical-align: baseline; - " - aria-level="1" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Annotation - -

      -
    2. -
        -
      1. + Annotation + +

        +
      2. +
          +
        1. text-decoration: none; vertical-align: baseline; - " - aria-level="2" - > -

          - +

          + vertical-align: baseline; - " - > - Workflow documentation - -

          -
        2. -
        -
      -

      - + Workflow documentation + +

      + +
    +
+

+ vertical-align: baseline; - " - > - Because the data collection process is conducted by experimental scientists and stewarded by $_DATAOFFICER.#if$_DATAPLANT An electronic lab notebook was used to ensure enough metadata is recorded - and guarantees that the data can be further reused. The workflow can be retrieved from the electronic workbook by using the toolkits provided from the DataPLANT such as SWATE and arccommander. - #endif$_DATAPLANT  - -

-
    -
  1. + Because the data collection process is conducted by experimental scientists and stewarded by + $_DATAOFFICER.#if$_DATAPLANT An electronic lab notebook was used to ensure enough metadata + is recorded + and guarantees that the data can be further reused. The workflow can be retrieved from the + electronic workbook by using the toolkits provided from the DataPLANT such as SWATE and + arccommander. + #endif$_DATAPLANT  + +

    +
      +
    1. vertical-align: baseline; margin-left: 36pt; - " - aria-level="2" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Metadata completion - -

      -
    2. -
    -

    - + Metadata completion + +

    +
  2. +
+

+ vertical-align: baseline; - " - > - In case some of the metadata is still missing from the documentation from the experimental scientists and data officer. #if$_DATAPLANT Raw data identifier and parsers provided by DataPLANT will be - used to get meta data directly from the raw data file. The metadata collected from the raw data file can also be used to validate the metadata previously collected in case there are any mistakes. - #endif$_DATAPLANT We foresee using #if$_RNASEQ|$_GENOMIC e.g.#if$_MINSEQE MinSEQe for sequencing data and#endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms for metabolites as - well as MIAPPE for phenotyping like data. The latter will thus allow the integration of data across projects and safeguards that reuse established and tested protocols. Additionally, we will use ontology - terms to enrich the data sets relying on free and open ontologies. In addition, additional ontology terms might be created and be canonized during the $_PROJECT. - -

-
    -
    -
  1. + In case some of the metadata is still missing from the documentation from the experimental + scientists and data officer. #if$_DATAPLANT Raw data identifier and parsers provided by + DataPLANT will be + used to get meta data directly from the raw data file. The metadata collected from the raw + data file can also be used to validate the metadata previously collected in case there are + any mistakes. + #endif$_DATAPLANT We foresee using #if$_RNASEQ|$_GENOMIC e.g.#if$_MINSEQE MinSEQe for + sequencing data and#endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights compatible forms + for metabolites as + well as MIAPPE for phenotyping like data. The latter will thus allow the integration of data + across projects and safeguards that reuse established and tested protocols. Additionally, we + will use ontology + terms to enrich the data sets relying on free and open ontologies. In addition, additional + ontology terms might be created and be canonized during the $_PROJECT. + +

    +
      +
      +
    1. text-decoration: none; vertical-align: baseline; - " - aria-level="1" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Maintenance:  - -

      -
    2. -
    -
      -
    1. + Maintenance:  + +

      +
    2. +
    +
      +
    1. vertical-align: baseline; margin-left: 36pt; - " - aria-level="2" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Data storage - -

      -
    2. -
    -

    - + Data storage + +

    +
  2. +
+

+ vertical-align: baseline; - " - > - Raw data collected in previous steps are stored immediately by using#if$_DATAPLANT the infrastructure of DataPLANT #endif$_DATAPLANT #if!$_DATAPLANT in a secure infrastructure. ARC (Annotated - Research Context) is used as a container to store the raw data as well as metadata and workflow.#endif!$_DATAPLANT - -

-
    -
  1. + Raw data collected in previous steps are stored immediately by using#if$_DATAPLANT the + infrastructure of DataPLANT #endif$_DATAPLANT #if!$_DATAPLANT in a secure infrastructure. ARC + (Annotated + Research Context) is used as a container to store the raw data as well as metadata and + workflow.#endif!$_DATAPLANT + +

    +
      +
    1. vertical-align: baseline; margin-left: 36pt; - " - aria-level="2" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Data curation - -

      -
    2. -
    -

    - + Data curation + +

    +
  2. +
+

+ vertical-align: baseline; - " - > - #if$_DATAPLANT Data stored in ARC is curated regularly as long as there are needs for update or revision.#endif$_DATAPLANT #if!$_DATAPLANT Data is curated regularly as long as there are needs for - update or revision.#endif!$_DATAPLANT - -

-
-
-
    -
  1. + #if$_DATAPLANT Data stored in ARC is curated regularly as long as there are needs for update + or revision.#endif$_DATAPLANT #if!$_DATAPLANT Data is curated regularly as long as there are + needs for + update or revision.#endif!$_DATAPLANT + +

    +
    +
    +
      +
    1. text-decoration: none; vertical-align: baseline; - " - aria-level="1" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Publication and sharing - -

      -
    2. -
        -
      1. + Publication and sharing + +

        +
      2. +
          +
        1. text-decoration: none; vertical-align: baseline; - " - aria-level="2" - > -

          - +

          + vertical-align: baseline; - " - > - Data publishing - -

          -
        2. -
        -
      -

      - + Data publishing + +

      + +
    +
+

+ vertical-align: baseline; - " - > - #if$_RNASEQ Transcriptomics data and gene sequence data will be also made available upon publication via the standards ENA/SRA. #endif$_RNASEQ #if$_METABOLOMIC Metabolite data in e.g. Metabolights - (and/or Nationwide repositories like the German NFDI or the French INRAe). #endif$_METABOLOMIC #if$_PROTEOMIC and Proteomics data in e.g. Pride/Proteomexchange. #endif$_PROTEOMIC In addition, the - national resource will maintain safekeeping of data also after the project ends. #if$_DATAPLANT In addition, databases like e.g. Proteomexchange does not support deep plant-specific metadata; hence - ARCs will be maintained to ensure reusability. #endif$_DATAPLANT - -

-
    -
  1. +

    + + Data will be made available via the $_PROJECT platform using a user-friendly front + end that allows data visualization. Besides this it will be ensured that data which + can be stored in international discipline related repositories which use specialized technologies: #if$_GENETIC #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_GENETIC #if$_TRANSCRIPTOMIC|$_GENETIC #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO + NCBI-GEO,#endif$_GEO #endif$_TRANSCRIPTOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_BIOIMAGE + EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR + #endif$_IMAGE #if$_METABOLOMIC #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS + #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact + (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC #if$_PROTEOMIC #if$_PRIDE + EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI + Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC #if$_PHENOTYPIC #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) + #endif$_edal #endif$_PHENOTYPIC + + #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be + processed there as well.#endif$_OTHEREP + +

    + +

    +
      +
    1. vertical-align: baseline; margin-left: 36pt; - " - aria-level="2" - > -

      - +

      + text-decoration: none; vertical-align: baseline; - " - > - Data sharing - -

      -
    2. -
    -

    - + Data sharing + +

    +
  2. +
+

+ vertical-align: baseline; - " - > - In case data is only shared within the consortium, if the data is not yet finished or under IP checks, the data is hosted internally, and the username and the password will be required (see also our GDPR - rules). In the case data is made public under final EU or US repositories, completely anonymous access is normally allowed. This is the case for ENA as well and both are in line with GDPR - requirements. - -

-

- + In case data is only shared within the consortium, if the data is not yet finished or under + IP checks, the data is hosted internally, and the username and the password will be required + (see also our GDPR + rules). In the case data is made public under final EU or US repositories, completely + anonymous access is normally allowed. This is the case for ENA as well and both are in line + with GDPR + requirements. + +

+

+ vertical-align: baseline; - " - > - - - + + + vertical-align: baseline; - " - > - - -

-
-

-

- + + +

+
+

+

+ vertical-align: baseline; - " - > - Metadata focus timeline - -

-

- -
-
-
- - - - - - - - - - - - - + +
+ Metadata focus timeline + + +

+ +
+
+
+ + + + + + + + + + + + + - + - - - + + + - + - - - + + + - + - - - - - + + + + - + - - - + + + - + - - - + + + - + - - -
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - scope="col" - > -

- +

+ vertical-align: baseline; - " - > - Stages - -

-
+ Stages + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - scope="col" - > -

- +

+ vertical-align: baseline; - " - > - Actions - -

-
+ Actions + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - scope="col" - > -

- +

+ vertical-align: baseline; - " - > - Study - -

-

- + Study + +

+

+ vertical-align: baseline; - " - > - initialization - -

-
+ initialization + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - scope="col" - > -

- +

+ vertical-align: baseline; - " - > - The metadata of study is created at the beginning of the project and updated continuously afterwards#if$_DATAPLANT, the input of the DMP generator created during the proposal - stage can be reused. #endif$_DATAPLANT  - -

-
+ The metadata of study is created at the beginning of the project and + updated continuously afterwards#if$_DATAPLANT, the input of the DMP + generator created during the proposal + stage can be reused. #endif$_DATAPLANT  + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - scope="col" - > -

- +

+ vertical-align: baseline; - " - > - Sample - -

-

- + Sample + +

+

+ vertical-align: baseline; - " - > - Collection - -

-
+ Collection + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - scope="col" - > -

- +

+ vertical-align: baseline; - " - > - The information used to identify exact samples are initiated before experiments and updated at assay creation stages. - -

-

- + The information used to identify exact samples are initiated before + experiments and updated at assay creation stages. + +

+

+ vertical-align: baseline; - " - > - #if$_DATAPLANT The sample SWATE template will be used to document the sample metadata. A part of sample metadata which can be retrieved from the raw data will be updated - afterwards using the ARC parsers #endif$_DATAPLANT  - -

-
+ #if$_DATAPLANT The sample SWATE template will be used to document the + sample metadata. A part of sample metadata which can be retrieved from + the raw data will be updated + afterwards using the ARC + parsers #endif$_DATAPLANT  + +

+ +
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - > -

- +

+ vertical-align: baseline; - " - > - Assay - -

-

- + Assay + +

+

+ vertical-align: baseline; - " - > - Creation - -

-
+ Creation + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - > -

- +

+ vertical-align: baseline; - " - > - Assay metadata must be collected as a daily routine during the experimental phrase. #if$_DATAPLANT A electronic lab notebooks will be used to guarantee the applicability and - correctness of the notebook content#endif$_DATAPLANT  - -

-
+ Assay metadata must be collected as a daily routine during the + experimental phrase. #if$_DATAPLANT A electronic lab notebooks will be + used to guarantee the applicability and + correctness of the notebook content#endif$_DATAPLANT  + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - > -

- +

+ vertical-align: baseline; - " - > - Computational Analysis - -

-
-
+ Computational Analysis + +

+
+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - > -

- +

+ vertical-align: baseline; - " - > - Workflow annotation will be conducted during the computational analysis phrase. #if$_DATAPLANT The workflow metadata will be stored in the assay folder of the - ARC.#endif$_DATAPLANT  - -

-
+ Workflow annotation will be conducted during the computational analysis + phrase. #if$_DATAPLANT The workflow metadata will be stored in the assay + folder of the + ARC.#endif$_DATAPLANT  + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - > -

- +

+ vertical-align: baseline; - " - > - Results Sharing - -

-
+ Results Sharing + +

+
padding: 5pt 5pt 5pt 5pt; overflow: hidden; overflow-wrap: break-word; - " - > -

- +

+ vertical-align: baseline; - " - > - The metadata of results are collected after all modifications and should not be changed after publication. #if$_DATAPLANT Collection of result metadata before publication and - the conversion from ARC to the repositories will be taken care of by the ARC2REPO converter and done with minimal efforts. #endif$_DATAPLANT  - -

-
-
-
+ "> + The metadata of results are collected after all modifications and should + not be changed after publication. #if$_DATAPLANT Collection of result + metadata before publication and + the conversion from ARC + to the repositories will be taken care of by the ARC2REPO converter and + done with minimal efforts. #endif$_DATAPLANT  + +

+ +

+
+
+

+ + + +

+
+

+

+ + Preferred formats for raw data + +

+

+

+ #if$_GENOMIC   +

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

+ + extension_ident + +

+
+

+ + Format Name + +

+
+

+ .h5 +

+
+

+ + Hierarchical Data Format + +

+
+

+ + .bam + +

+
+

+ + compressed binary version of a SAM file + +

+
+

+ + .cram + +

+
+

+ + compressed columnar file format for storing biological sequences aligned to + a reference sequence + +

+
+

+ .fa +

+
+

+ + fasta + +

+
+

+ + .faa + +

+
+

+ + fasta + +

+
+

+ + .fas + +

+
+

+ + fasta + +

+
+

+ + .fasta + +

+
+

+ + fasta + +

+
+

+ + .fastq + +

+
+

+ + fastq + +

+
+

+ + .ffn + +

+
+

+ + fasta + +

+
+

+ + .fna + +

+
+

+ + fasta + +

+
+

+ .fq +

+
+

+ + fastq + +

+
+

+ + .frn + +

+
+

+ + fasta + +

+
+

+ + .sff + +

+
+

+ + sff-trim + +

+
+
+

+ + #endif$_GENOMIC +

+
+

+ #if$_RNASEQ   +

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

+ + .bam + +

+
+

+ + compressed binary version of a SAM file + +

+
+

+ + .cram + +

+
+

+ + compressed columnar file format for storing biological sequences aligned to + a reference sequence + +

+
+

+ .fa +

+
+

+ + fasta + +

+
+

+ + .faa + +

+
+

+ + fasta + +

+
+

+ + .fas + +

+
+

+ + fasta + +

+
+

+ + .fast5 + +

+
+

+ + HDF5 + +

+
+

+ + .fasta + +

+
+

+ + fasta + +

+
+

+ + .fastq + +

+
+

+ + fastq + +

+
+

+ + .ffn + +

+
+

+ + fasta + +

+
+

+ + .fna + +

+
+

+ + fasta + +

+
+

+ .fq +

+
+

+ + fastq + +

+
+

+ + .frn + +

+
+

+ + fasta + +

+
+

+ + .sff + +

+
+

+ + sff-trim + +

+
+

+ + bas.h5 + +

+
+

+ + HDF5 + +

+
+

+ .h5 +

+
+

+ + Hierarchical Data Format + +

+
+
+

+ + #endif$_RNASEQ +

+
+

+  #if$_METABOLOMIC   +

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

+ + .cdf + +

+
+

+ + netCDF (AIA/ANDI) interchange data format + +

+
+

+ + .cmp + +

+
+

+ + netCDF compare file + +

+
+

+ + .abf + +

+
+

+ + Axon Binary File + +

+
+

+ .d +

+
+

+ + Agilent + +

+
+

+ + .dat + +

+
+

+ + Chromtech, Finnigan, VG + +

+
+

+ + .idb + +

+
+

+ + MASSLAB binary file + +

+

- + style="font-size: 10pt; background-color: transparent; font-variant-numeric: normal; font-variant-east-asian: normal; vertical-align: baseline; "> + .jpf + +

+
+

+ + Mass Center Main Mass Spectrometry Data (JEOL USA, Inc.) + +

+
+

+ + .lcd + +

+
+

+ + Shimadzu LC Solution / Labsolutions Data File + +

+
+

+ + .mgf + +

+
+

+ + Mascot Generic File + +

+
+

+ + .raw + +

+
+

+ + Thermo Xcalibur, Micromass (Waters), PerkinElmer, Waters + +

+
+

+ + .scan + +

+
+

+ + a spectrum or a Total Ion Chromatogram (TIC) + +

+
+

+ + .wiff + +

+
+

+ + ABI/Sciex + +

+
+

+ + .xps + +

+
+

+ + Thermo Fisher Scientific K-Alpha+ spectrometer file + +

+
+

+ + cdf.cmp + +

+
+

+ + netCDF compare file + +

+
+
+

+ + #endif$_METABOLOMIC +

+
+

+  #if$_PROTEOMIC   +

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

+ + .baf + +

+
+

+ + Bruker + +

+
+

+ .d +

+
+

+ + Agilent + +

+
+

+ + .dat + +

+
+

+ + Chromtech, Finnigan, VG + +

+
+

+ + .fid + +

+
+

+ + Bruker + +

+
+

+ + .ita + +

+
+

+ + ION-TOF + +

+
+

+ + .itm + +

+
+

+ + ION-TOF + +

+
+

+ + .mgf  + +

+
+

+ + Mascot Generic File + +

+
+

+ .ms +

+
+

+ + Finnigan (Thermo) + +

+
+

+ + .ms2 + +

+
+

+ + Sequest MS/MS peak list + +

+
+

+ + .pkl + +

+
+

+ + Micromass peak list + +

+
+

+ + .qgd + +

+
+

+ + Shimadzu + +

+
+

+ + .qgd + +

+
+

+ + Shimadzu + +

+
+

+ + .raw + +

+
+

+ + Thermo Xcalibur, Micromass (Waters), PerkinElmer, Waters + +

+
+

+ + .raw + +

+
+

+ + Physical Electronics/ULVAC-PHI + +

+
+

+ + .sms + +

+
+

+ + Bruker/Varian + +

+
+

+ + .spc + +

+
+

+ + Shimadzu + +

+
+

+ + .splib  + +

+
+

+ + spectral library file + +

+
+

+ + .t2d + +

+
+

+ + ABI/Sciex + +

+
+

+ + .tdc + +

+
+

+ + Physical Electronics/ULVAC-PHI + +

+
+

+ + .wiff + +

+
+

+ + ABI/Sciex + +

+
+

+ + .xms + +

+
+

+ + Bruker/Varian + +

+
+

+ + .yep + +

+
+

+ + Bruker + +

+
+

+ + .dta + +

+
+

+ + Sequest MS/MS peak list

- - -

- - Preferred formats for raw data +

+

+ + .msp + +

+
+
+

+ + .nist + +

+
+
+
+

+ + #endif$_PROTEOMIC +

+
+
+

+
+
+ +
+
+

+

+

Datenmanagementplan (Beta test)
+

+

+

Projektname: $_PROJECT

+

Forschungsförderer: Bundesministerium für Bildung und + Forschung

+

Förderprogramm: $_FUNDINGPROGRAMME

+

FKZ: $_DMPVERSION

+ +

Projektkoordinator: $_USERNAME

+

Kontaktperson Datenmanagement: $_DATAOFFICER

+ +

Kontakt: $_EMAIL

+

Projektbeschreibung: + + +

+ Das $_PROJECT hat folgendes Ziel: $_PROJECTAIM. Daher sind Datenerhebung#if!$_VVISUALIZATION + und Integration #endif!$_VVISUALIZATION#if$_VVISUALIZATION, Integration und Visualisierung + #endif$_VVISUALIZATION#if$_DATAPLANT unter Verwendung der DataPLANT ARC-Struktur absolut + notwendig,#endif$_DATAPLANT#if!$_DATAPLANT durch einen standardisierten + Datenmanagementprozess absolut notwendig,#endif!$_DATAPLANT da die Daten nicht nur zum + Verständnis von Prinzipien verwendet werden, sondern auch über die Herkunft der analysierten + Daten informiert werden muss. Stakeholder müssen ebenfalls über die Herkunft der Daten + informiert werden. Es ist daher notwendig sicherzustellen, dass die Daten gut generiert und + auch gut mit Metadaten unter Verwendung offener Standards annotiert werden, wie im nächsten + Abschnitt dargelegt. + +

+ +

+ + Das $_PROJECT wird die folgenden Arten von Rohdaten sammeln und/oder generieren: + $_PHENOTYPIC, $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEOMIC, + $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA Daten, die sich auf $_STUDYOBJECT + beziehen. Zusätzlich werden die Rohdaten auch durch analytische Pipelines verarbeitet und + modifiziert, was zu unterschiedlichen Ergebnissen führen kann oder ad-hoc-Datenanalyse-Teile + umfassen kann. #if$_DATAPLANT Diese Pipelines werden im DataPLANT ARC + verfolgt.#endif$_DATAPLANT Daher wird darauf geachtet, diese Ressourcen (einschließlich der + analytischen Pipelines) zu dokumentieren und zu archivieren#if$_DATAPLANT unter Rückgriff + auf die Expertise im DataPLANT-Konsortium#endif$_DATAPLANT. + +

+

+

+

Erstellungsdatum: $_CREATIONDATE

+

Änderungsdatum: $_MODIFICATIONDATE

+

Zu beachtende Vorgaben:

+ +

#if$_EU Das $_PROJECT ist Teil der Open Data Initiative (ODI) der EU. + #endif$_EU Um optimal von offenen Daten zu profitieren, ist es notwendig, die Daten nicht nur zu + speichern, sondern sie auch auffindbar, zugänglich, interoperabel und wiederverwendbar (FAIR) zu + machen. #if$_PROTECT Wir unterstützen offene und FAIR-Daten, berücksichtigen jedoch auch die + Notwendigkeit, einzelne Datensätze zu schützen. #endif$_PROTECT + +

+

#if$_DATAPLANT Durch die Implementierung von DataPLANT können Forscher + sicherstellen, dass alle relevanten Richtlinien und Anforderungen im Zusammenhang mit dem + Datenmanagement eingehalten werden, was zu einer höheren Qualität und Zuverlässigkeit der + Forschungsdaten führt. #endif$_DATAPLANT +

+ + + +

Datenerhebung

+ +

+

Öffentliche Daten werden wie im vorherigen Absatz beschrieben + extrahiert. Für das $_PROJECT werden spezifische Datensätze von den Konsortialpartnern + generiert.

+ + +

+ Daten unterschiedlicher Typen oder aus verschiedenen Bereichen werden mit + einzigartigen Ansätzen generiert. Zum Beispiel: +

+
    + + #if$_GENETIC +
  • +

    + Genetische Daten werden durch Kreuzungen und Zuchtexperimente + generiert und umfassen Rekombinationsfrequenzen und Crossover-Ereignisse, die + genetische Marker und quantitative Merkmalsloci positionieren können, die mit + physischen genomischen Markern/Varianten assoziiert werden können. +

    +
  • + #endif$_GENETIC + + #if$_GENOMIC +
  • +

    + Genomische Daten werden aus Sequenzdaten erstellt, die verarbeitet + werden, um Gene, regulatorische Elemente, transponierbare Elemente und physikalische + Marker wie SNPs, Mikrosatelliten und strukturelle Varianten zu identifizieren. - -

    -

    - #if$_GENOMIC   -

    -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    -

    - - extension_ident - -

    -
    -

    - - Format Name - -

    -
    -

    - .h5 -

    -
    -

    - - Hierarchical Data Format - -

    -
    -

    - - .bam - -

    -
    -

    - - compressed binary version of a SAM file - -

    -
    -

    - - .cram - -

    -
    -

    - - compressed columnar file format for storing biological sequences aligned to a reference sequence - -

    -
    -

    - .fa -

    -
    -

    - - fasta - -

    -
    -

    - - .faa - -

    -
    -

    - - fasta - -

    -
    -

    - - .fas - -

    -
    -

    - - fasta - -

    -
    -

    - - .fasta - -

    -
    -

    - - fasta - -

    -
    -

    - - .fastq - -

    -
    -

    - - fastq - -

    -
    -

    - - .ffn - -

    -
    -

    - - fasta - -

    -
    -

    - - .fna - -

    -
    -

    - - fasta - -

    -
    -

    - .fq -

    -
    -

    - - fastq - -

    -
    -

    - - .frn - -

    -
    -

    - - fasta - -

    -
    -

    - - .sff - -

    -
    -

    - - sff-trim - -

    -
    -
    -

    - #endif$_GENOMIC -

    -
    -

    - #if$_RNASEQ   +

    +

  • + #endif$_GENOMIC + #if$_CLONED-DNA +
  • +

    + Der Ursprung und die Zusammenstellung der klonierten DNA umfassen (a) + die Quelle der ursprünglichen Vektorsequenz mit Add-Gene-Referenz, sofern verfügbar, + und die Quelle der Insert-DNA (z.B. Amplifikation durch PCR aus einer bestimmten + Probe oder aus einer vorhandenen Bibliothek), (b) die Klonierungsstrategie (z.B. + Restriktionsendonuklease-Verdau/Ligation, PCR, TOPO-Klonierung, Gibson-Assembly, + LR-Rekombination), und (c) die verifizierte DNA-Sequenz des finalen rekombinanten + Vektors. +

    +
  • + #endif$_CLONED-DNA + + #if$_TRANSCRIPTOMIC +
  • +

    + Methoden zur Erfassung von Transkriptomik-Daten werden aus + Mikroarrays, quantitativer PCR, Northern Blotting, RNA-Immunpräzipitation und + Fluoreszenz-in-situ-Hybridisierung ausgewählt. RNA-Seq-Daten werden mit separaten + Methoden gesammelt. +

    +
  • + #endif$_TRANSCRIPTOMIC + #if$_RNASEQ +
  • +

    + RNA-Sequenzierung wird unter Verwendung von Short-Read- oder + Long-Read-Plattformen entweder intern oder an akademische Einrichtungen oder + kommerzielle Dienste ausgelagert und die Rohdaten werden mit etablierten + bioinformatischen Pipelines verarbeitet. +

    +
  • + #endif$_RNASEQ + #if$_METABOLOMIC +
  • +

    + Metabolomische Daten werden durch gekoppelte Chromatographie und + Massenspektrometrie unter Verwendung gezielter oder ungezielter Ansätze + generiert. +

    +
  • + #endif$_METABOLOMIC #if$_PROTEOMIC +
  • +

    + Proteomische Daten werden durch gekoppelte Chromatographie und + Massenspektrometrie zur Analyse der Proteinmenge und -identifikation sowie durch + zusätzliche Techniken zur Strukturanalyse, zur Identifizierung posttranslationaler + Modifikationen und zur Charakterisierung von Proteininteraktionen generiert. +

    +
  • + #endif$_PROTEOMIC + + #if$_PHENOTYPIC +
  • +

    + Phänotypische Daten werden mit Hilfe von Phänotypisierungsplattformen + und entsprechenden Ontologien generiert, einschließlich Anzahl/Größe von Organen wie + Blätter, Blumen, Knospen usw., Größe der gesamten Pflanze, + Stängel/Wurzel-Architektur (Anzahl der seitlichen Zweige/Wurzeln usw.), + Organstrukturen/Morphologien, quantitativen Metriken wie Farbe, Turgor, + Gesundheits-/Nährstoffindikatoren und anderen. +

    +
  • + #endif$_PHENOTYPIC + + + + #if$_TARGETED +
  • +

    + Gezielte Assays-Daten (z. B. Glukose- und Fruktosekonzentrationen oder + Produktions-/Nutzungsraten) werden mit spezifischen Geräten und Methoden generiert, + die im Laborbuch vollständig dokumentiert sind. +

    +
  • + #endif$_TARGETED + + #if$_IMAGE +
  • +

    + Bilddaten werden durch Geräte wie Kameras, Scanner und Mikroskope in + Kombination mit Software generiert. Originalbilder, die Metadaten wie + EXIF-Fotoinformationen enthalten, werden archiviert. +

    +
  • + #endif$_IMAGE + + #if$_MODELS +
  • +

    + Modelldaten werden durch Softwaresimulationen generiert. Der + vollständige Workflow, einschließlich der Umgebung, Laufzeit, Parameter und + Ergebnisse, wird dokumentiert und archiviert. +

    +
  • + #endif$_MODELS + + #if$_CODE +
  • +

    + Computercode wird von Programmierern erstellt. +

    +
  • + #endif$_CODE + + #if$_EXCEL +
  • +

    + Excel-Tabellen werden durch Ausfüllen spezifischer Dateien erstellt, + die Feldbeobachtungen oder andere digitale Erhebungen enthalten. +

    +
  • + #endif$_EXCEL + + + + +
+ + #if$_PREVIOUSPROJECTS +

Daten aus früheren Projekten wie $_PREVIOUSPROJECTS werden + berücksichtigt.

+ #endif$_PREVIOUSPROJECTS + +

Wir erwarten die Erzeugung von $_RAWDATA GB Rohdaten und bis zu + $_DERIVEDDATA GB verarbeiteten Daten.

+

+ +

+ + +

Datenspeicherung:

+ +

+ + #if$_DATAPLANT In DataPLANT, die Datenspeicherung basiert auf dem Annotated Research Context (ARC). + Dieser ist passwortgeschützt, daher muss vor dem Erhalt von Daten oder der Generierung von Proben + eine Authentifizierung erfolgen. #endif$_DATAPLANT + +

+ + +

+ + Online-Plattformen werden durch Schwachstellen-Scans, Zwei-Faktor-Authentifizierung und tägliche + automatische Backups geschützt, die eine sofortige Wiederherstellung ermöglichen. Alle Partner, die + vertrauliche Projektdaten halten, nutzen sichere Plattformen mit automatischen Backups und sicheren + externen Kopien. + #if$_DATAPLANT DataHUB und ARCs wurden in DataPLANT generiert, Datensicherheit wird durchgesetzt. + Dies umfasst sichere Speicherung, und die Verwendung von Passwörtern und Benutzernamen wird generell + über separate sichere Medien übertragen. #endif$_DATAPLANT + +

+ +

+

Das $_PROJECT trägt die Kosten für die Datenkuratierung, + #if$_DATAPLANT ARC-Konsistenzprüfungen, #endif$_DATAPLANT und die Datenwartung/-sicherheit + vor der Übertragung an öffentliche Repositorien. Nachfolgende Kosten werden dann von den + Betreibern dieser Repositorien getragen.

+ +

+ + Zusätzlich werden Kosten für die Speicherung nach der Veröffentlichung von den + Endpunkt-Repositorien (z.B. ENA) getragen, jedoch nicht vom $_PROJECT oder seinen + Mitgliedern, sondern durch das Betriebsbudget dieser Repositorien. + +

+ + Es wird sichergestellt, dass Daten, die in internationalen, disziplinspezifischen Repositories + gespeichert werden können, die spezialisierte Technologien nutzen: + + + +

+ #if$_GENETIC Für genetische Daten: #if$_GENBANK + NCBI-GenBank,#endif$_GENBANK #if$_SRA NCBI-SRA,#endif$_SRA #if$_ENA EBI-ENA,#endif$_ENA + #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS #if$_GEO NCBI-GEO,#endif$_GEO + #endif$_GENETIC +

+ +

+ #if$_TRANSCRIPTOMIC Für Transkriptomdaten: #if$_SRA NCBI-SRA,#endif$_SRA + #if$_GEO NCBI-GEO,#endif$_GEO #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_TRANSCRIPTOMIC +

+ +

+ #if$_IMAGE Für Bilddaten: #if$_BIOIMAGE EBI-BioImage + Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR #endif$_IMAGE + +

+ +

+ #if$_METABOLOMIC Für Metabolomdaten: #if$_METABOLIGHTS + EBI-MetaboLights,#endif$_METABOLIGHTS #if$_METAWORKBENCH Metabolomics + Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact (Molecular + interactions),#endif$_INTACT #endif$_METABOLOMIC +

+

+ #if$_PROTEOMIC Für Proteomikdaten: #if$_PRIDE EBI-PRIDE,#endif$_PRIDE + #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI Chebi (Chemical Entities + of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC +

+ +

+ #if$_PHENOTYPIC Für phänotypische Daten: #if$_edal e!DAL-PGP (Plant + Genomics & Phenomics Research Data Repository) #endif$_edal #endif$_PHENOTYPIC +

+ #if$_OTHEREP und $_OTHEREP werden auch verwendet, um Daten zu speichern und die Daten werden + dort ebenfalls verarbeitet.#endif$_OTHEREP + +
+ + +

+ +

Die Dateibenennung erfolgt nach folgendem Standard:

+

+ + Datenvariablen werden mit Standardnamen versehen. Zum Beispiel werden Gene, Proteine und Metaboliten + gemäß anerkannter Nomenklatur und Konventionen benannt. Diese werden nach Möglichkeit auch mit + funktionalen Ontologien verknüpft. Datensätze werden ebenfalls sinnvoll benannt, um die Lesbarkeit + durch Menschen zu gewährleisten. Pflanzennamen umfassen traditionelle Namen, Binomialnamen und alle + Stamm-/Kultivar-/Unterart-/Sortenbezeichner. + +

+ +

+

Datendokumentation

+

+ Wir verwenden die Investigation, Study, Assay (ISA) Spezifikation zur + Metadaten-Erstellung. #if$_RNASEQ|$_GENOMIC Für spezifische Daten (z.B. RNASeq oder genomische + Daten) verwenden wir Metadatentemplates der Endpunkt-Repositorien. #if$_MINSEQE The Minimum Information About a Next-generation Sequencing Experiment + (MinSEQe) wird ebenfalls verwendet. #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC + + Die folgenden Metadaten-/Mindestinformationsstandards werden zur Sammlung von Metadaten verwendet: + #if$_GENOMIC|$_GENETIC #if$_MIXS MIxS (Minimum Information about any (X) Sequence),#endif$_MIXS + #if$_MIGSEU MigsEu (Minimum Information about a Genome Sequence: Eucaryote),#endif$_MIGSEU + #if$_MIGSORG MigsOrg (Minimum Information about a Genome Sequence: Organelle),#endif$_MIGSORG + #if$_MIMS MIMS (Minimum Information about Metagenome or Environmental),#endif$_MIMS + #if$_MIMARKSSPECIMEN MIMARKSSpecimen (Minimal Information about a Marker Specimen: + Specimen),#endif$_MIMARKSSPECIMEN + #if$_MIMARKSSURVEY MIMARKSSurvey (Minimal Information about a Marker Specimen: + Survey),#endif$_MIMARKSSURVEY + #if$_MISAG MISAG (Minimum Information about a Single Amplified Genome),#endif$_MISAG + #if$_MIMAG MIMAG (Minimum Information about Metagenome-Assembled Genome),#endif$_MIMAG + #endif$_GENOMIC|$_GENETIC + #if$_TRANSCRIPTOMIC + #if$_MINSEQE MINSEQE (Minimum Information about a high-throughput SEQuencing + Experiment),#endif$_MINSEQE #endif$_TRANSCRIPTOMIC + #if$_TRANSCRIPTOMIC #if$_MIAME MIAME (Minimum Information About a Microarray + Experiment),#endif$_MIAME #endif$_TRANSCRIPTOMIC + #if$_IMAGE + #if$_REMBI REMBI (Recommended Metadata for Biological Images),#endif$_REMBI + #endif$_IMAGE + #if$_PROTEOMIC + #if$_MIAPE MIAPE (Minimum Information About a Proteomics Experiment),#endif$_MIAPE + #if$_MIMIX MIMix (Minimum Information about any (X) Sequence),#endif$_MIMIX + #endif$_PROTEOMIC + #if$_METABOLOMIC #if$_METABOLIGHTS Metabolights-Einreichungskonforme Standards werden für + metabolomische Daten verwendet, wo dies von den Konsortialpartnern akzeptiert wird.#issuewarning + Einige Metabolomik-Partner betrachten Metabolights nicht als akzeptierten Standard.#endissuewarning + #endif$_METABOLIGHTS #endif$_METABOLOMIC Als Teil der Pflanzenforschungsgemeinschaft verwenden wir + #if$_MIAPPE MIAPPE für Phänotypisierungsdaten im weitesten Sinne, werden aber auch auf + #endif$_MIAPPE spezifische SOPs für zusätzliche Annotationen #if$_DATAPLANT zurückgreifen, die + fortgeschrittene DataPLANT-Annotationen und Ontologien berücksichtigen. #endif$_DATAPLANT + + + +

+

+ In dem Fall, dass einige Metadaten noch fehlen, werden diese von den experimentellen + Wissenschaftlern und dem Datenbeauftragten dokumentiert. #if$_DATAPLANT Rohdaten-Identifier und + Parser, die von DataPLANT bereitgestellt werden, um + Metadaten direkt aus der Rohdatei zu extrahieren. Die aus der Rohdatei gesammelten Metadaten können + auch verwendet werden, um die zuvor gesammelten Metadaten zu validieren, falls Fehler auftreten. + #endif$_DATAPLANT Wir sehen vor, #if$_RNASEQ|$_GENOMIC z.B.#if$_MINSEQE MinSEQe für + Sequenzierungsdaten zu verwenden und#endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC Metabolights-kompatible + Formulare für Metaboliten sowie MIAPPE für phänotypische Daten. + Letzteres ermöglicht die Integration von Daten über Projekte hinweg und stellt sicher, dass + etablierte und getestete Protokolle wiederverwendet werden. Darüber hinaus werden wir + Ontologiebegriffe verwenden, um die Datensätze mit freien und offenen Ontologien anzureichern. + Zusätzlich könnten zusätzliche Ontologiebegriffe erstellt und während des $_PROJECT kanonisiert + werden.

+ + +

Legitimität

+

+

+ + Im Moment erwarten wir keine ethischen oder rechtlichen Probleme beim Datenaustausch. In + Bezug auf Ethik, da es sich um Pflanzendaten handelt, ist kein Ethikkomitee erforderlich, + jedoch wird Sorgfalt bei der Aufteilung der Vorteile von Pflanzenressourcen berücksichtigt. + #issuewarning Sie müssen hier überprüfen und jegliche Sorgfaltspflicht hier eintragen. Im + Moment warten wir, ob Nagoya (🡺siehe Nagoya-Protokoll) auch Teil der Sequenzinformationen + wird. In jedem Fall, wenn Sie Material verwenden, das nicht aus Ihrem (Partner-)Land stammt + und dieses physikalisch charakterisieren, z.B. Metaboliten, Proteom, biochemisch RNASeq + usw., könnte dies eine Nagoya-relevante Aktion darstellen, es sei denn, es stammt z.B. aus + den USA (kein Partner), Irland (nicht unterzeichnet, trotzdem kontaktieren) usw., aber + andere Gesetze könnten gelten…. #endissuewarning + +

+

+ + Die einzigen personenbezogenen Daten, die möglicherweise gespeichert werden, sind der Name + und die Zugehörigkeit des Einreichers in den Metadaten der Daten. Darüber hinaus werden + personenbezogene Daten für Verbreitungs- und Kommunikationsaktivitäten gesammelt, wobei + spezifische Methoden und Verfahren verwendet werden, die von den $_PROJECT-Partnern + entwickelt wurden, um den Datenschutz einzuhalten. #issuewarning Sie müssen informieren und + besser eine SCHRIFTLICHE Zustimmung einholen, dass Sie E-Mails und Namen oder sogar + Pseudonyme wie Twitter-Handles speichern, wir entschuldigen uns sehr für diese Probleme, die + wir nicht erfunden haben. #endissuewarning + +

+ + +

+ +

Data Sharing

+

+

+ + Falls Daten nur innerhalb des Konsortiums geteilt werden, wenn die Daten noch nicht fertig + sind oder sich in der IP-Prüfung befinden, werden die Daten intern gehostet und der + Benutzername und das Passwort werden benötigt (siehe auch unsere GDPR-Regeln). + Wenn Daten unter finalen EU- oder US-Repositorys öffentlich gemacht werden, ist + normalerweise ein vollständig anonymer Zugang erlaubt. Dies ist auch bei ENA der Fall und + beide entsprechen den GDPR-Anforderungen. + + +

+

+ Es wird keine Einschränkungen geben, sobald die Daten öffentlich gemacht + werden. + + #if$_early Einige Rohdaten werden sofort nach ihrer Erfassung und Verarbeitung öffentlich + gemacht.#endif$_early #if$_beforepublication Relevante verarbeitete Datensätze werden + öffentlich gemacht, wenn die Forschungsergebnisse veröffentlicht + werden.#endif$_beforepublication #if$_endofproject Am Ende des Projekts werden alle Daten + ohne Sperrfrist veröffentlicht.#endif$_endofproject #if$_embargo Daten, die einer Sperrfrist + unterliegen, sind bis zum Ende der Sperrfrist nicht öffentlich zugänglich.#endif$_embargo + #if$_request Daten werden auf Anfrage verfügbar gemacht, was eine kontrollierte Weitergabe + ermöglicht und gleichzeitig eine verantwortungsvolle Nutzung sicherstellt.#endif$_request + #if$_ipissue IP-Probleme werden vor der Veröffentlichung überprüft. #endif$_ipissue Alle + Konsortialpartner werden ermutigt, + Daten vor der Veröffentlichung zugänglich zu machen, offen und/oder unter + Vorveröffentlichungsvereinbarungen #if$_GENOMIC wie die in Fort Lauderdale gestarteten und + durch den Toronto International Data Release Workshop festgelegten Vereinbarungen. + #endif$_GENOMIC Dies wird umgesetzt, sobald die IP-bezogenen Überprüfungen abgeschlossen + sind. + +

+ + +

+ + Die Daten werden zunächst den $_PROJECT Partnern zugutekommen, aber auch ausgewählten + Stakeholdern, die eng in das Projekt eingebunden sind, und dann der wissenschaftlichen + Gemeinschaft, die an $_STUDYOBJECT arbeitet. $_DATAUTILITY Darüber hinaus können auch die + allgemeine Öffentlichkeit, die an $_STUDYOBJECT interessiert ist, die Daten nach der + Veröffentlichung nutzen. Die Daten werden gemäß dem Verbreitungs- und Kommunikationsplan des + $_PROJECT verbreitet, #if$_DATAPLANT der sich mit der DataPLANT-Plattform oder anderen + Mitteln abstimmt #endif$_DATAPLANT. + + +

+

+

Datenerhalt

+

+ +

+ Wir erwarten, dass wir Rohdaten im Bereich von $_RAWDATA GB an Daten + generieren. Die Größe der abgeleiteten Daten wird etwa $_DERIVEDDATA GB betragen. + +

+ + +

+ #if$_DATAPLANT Da das $_PROJECT eng mit DataPLANT abgestimmt ist, werden der ARC-Konverter und + DataHUB verwendet, um die Endpunkt-Repositories zu finden und die Daten automatisch in die + Repositories hochzuladen. #endif$_DATAPLANT + + +

+

+ + Die Daten werden über die $_PROJECT-Plattform mit einer benutzerfreundlichen Oberfläche + verfügbar gemacht, die eine Datenvisualisierung ermöglicht. Die Endpunkt-Repositories sind: + #if$_GENETIC #if$_GENBANK NCBI-GenBank,#endif$_GENBANK + #if$_ENA EBI-ENA,#endif$_ENA #if$_ARRAYEXPRESS EBI-ArrayExpress,#endif$_ARRAYEXPRESS + #endif$_GENETIC #if$_TRANSCRIPTOMIC|$_GENETIC #if$_SRA NCBI-SRA,#endif$_SRA #if$_GEO + NCBI-GEO,#endif$_GEO #endif$_TRANSCRIPTOMIC|$_GENETIC #if$_TRANSCRIPTOMIC #if$_ARRAYEXPRESS + EBI-ArrayExpress,#endif$_ARRAYEXPRESS #endif$_TRANSCRIPTOMIC #if$_IMAGE #if$_BIOIMAGE + EBI-BioImage Archive,#endif$_BIOIMAGE #if$_IDR IDR (Image Data Resource),#endif$_IDR + #endif$_IMAGE #if$_METABOLOMIC #if$_METABOLIGHTS EBI-MetaboLights,#endif$_METABOLIGHTS + #if$_METAWORKBENCH Metabolomics Workbench,#endif$_METAWORKBENCH #if$_INTACT Intact + (Molecular interactions),#endif$_INTACT #endif$_METABOLOMIC #if$_PROTEOMIC #if$_PRIDE + EBI-PRIDE,#endif$_PRIDE #if$_PDB PDB (Protein Data Bank archive),#endif$_PDB #if$_CHEBI + Chebi (Chemical Entities of Biological Interest),#endif$_CHEBI #endif$_PROTEOMIC #if$_PHENOTYPIC #if$_edal e!DAL-PGP (Plant Genomics & Phenomics Research Data Repository) + #endif$_edal #endif$_PHENOTYPIC + + #if$_OTHEREP und $_OTHEREP werden auch verwendet, um Daten zu speichern und die Daten werden + dort ebenfalls verarbeitet.#endif$_OTHEREP + + +

+

+ + Die Einreichung ist kostenlos, und es ist das Ziel (zumindest von ENA), so viele Daten wie + möglich zu erhalten. Daher sind Absprachen weder notwendig noch sinnvoll. + Catch-all-Repositories sind nicht erforderlich. + #if$_DATAPLANT Für DataPLANT wurde dies vereinbart. #endif$_DATAPLANT #issuewarning Wenn + keine Datenmanagementplattform wie DataPLANT verwendet wird, müssen Sie ein geeignetes + Repository finden, um Ihre Daten nach der Veröffentlichung zu speichern oder zu archivieren. + #endissuewarning + + +

+ +

+
+
+
+
+
+ Data management plan of $_PROJECT for BBSRC + +

+
-
+ #if$_OTHEREP and $_OTHEREP will also be used to store data and the data will be + processed there as well.#endif$_OTHEREP + +

Proprietary Data – Open public + data will be used whenever possible. +
  • Timeframes + #if$_early Some raw data is made public as soon as it is collected and processed.#endif$_early + #if$_beforepublication Relevant processed datasets are made public when the research findings + are published.#endif$_beforepublication #if$_endofproject At the end of the project, all data + without embargo period will be published.#endif$_endofproject #if$_embargo Data, which is + subject to an embargo period, is not publicly accessible until the end of embargo + period.#endif$_embargo #if$_request Data is made available upon request, allowing controlled + sharing while ensuring responsible use.#endif$_request #if$_ipissue IP issues will be checked + before publication. #endif$_ipissue All consortium partners will be + encouraged to make data available before publication, openly and/or under pre-publication + agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto + International Data + Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are + complete. +
  • +
  • Format of the Final Dataset – Whenever possible, data + will be stored in common and openly defined formats including all the necessary metadata. By + default, no proprietary formats will be used. However Microsoft Excel files (according to + ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some + ARC components#endif$_DATAPLANT. In addition, text files might be edited in text processor + files, but will be shared as pdf. + + +
    +
    + a document template +
    +
    -
    - -

    Datenmanagementplan

    -

    Projektname: $_PROJECT

    -

    Forschungsförderer: Bundesministerium für Bildung und Forschung

    -

    Förderprogramm:

    -

    FKZ:

    -

    Primärforscher/Wissenschaftler:

    -

    ID Primärforscher/Wissenschaftler: $_USERNAME

    -

    Kontaktperson Datenmanagement: $_DATAOFFICER

    -

    ID Kontaktperson Datenmanagement:

    -

    Kontakt: $_EMAIL

    -

    Projektbeschreibung:

    -

    -

    Erstellungsdatum:

    -

    Änderungsdatum:

    -

    Zu beachtende Vorgaben:

    -

    -

    Datenspeicherung

    -

    -

    -

    Die Dateibenennung erfolgt nach folgendem Standard:

    -

    -

    Dateien werden in möglichst offenen, standardisierten Formaten gespeichert.

    -

    -

    Datendokumentation

    -

    -

    Folgende Dokumente werden erstellt:

    -

    Public data will be extracted as described in the previous paragraph. For $_PROJECT, specific data sets will be generated by the consortium partners.

    - #if$_RNASEQ -

    Short read sequencing will either be collected or outsourced and raw data will be received.

    - #endif$_RNASEQ#if$_METABOLOMIC -

    Metabolomic data will be generated using chromatography coupled to mass spectrometry and from enzyme platforms mostly.

    - #endif$_METABOLOMIC#if$_PROTEOMIC -

    proteomic data will be generated using an EU platform which are in line with community standards.

    - #endif$_PROTEOMIC -

    #if$_PREVIOUSPROJECTS data from previous projects such as $_PREVIOUSPROJECTS will be considered. #endif$_PREVIOUSPROJECTS

    -

    -

    Legitimität

    -

    Data Sharing

    -

    -

    Datenerhalt

    -

    -
    - -
    -
    - Data management plan of $_PROJECT for BBSRC
      - -
    • Data Areas and Data Types – The $_PROJECT will collect and/or generate the following types of raw data : $_PHENOTYPIC, $_GENETIC, $_IMAGE, $_RNASEQ, $_GENOMIC, $_METABOLOMIC, $_PROTEoMIC, $_TARGETED, $_MODELS, $_CODE, $_EXCEL, $_CLONED-DNA data which are related to $_STUDYOBJECT. In addition, the raw data will also be processed and modified using analytical pipelines, which may yield different results or include ad hoc data analysis parts. #if$_DATAPLANT These pipelines will be tracked in the DataPLANT ARC.#endif$_DATAPLANT Therefore, care will be taken to document and archive these resources (including the analytical pipelines) as well#if$_DATAPLANT relying on the expertise in the DataPLANT consortium#endif$_DATAPLANT.
    • - - We expect to generate raw data in the range of $_RAWDATA GB of data. The size of the derived data will be about $_DERIVEDDATA GB. -

      -
    • Standards and Metadata – We will use Investigation, Study, Assay (ISA) specification for metadata creation. #if$_RNASEQ|$_GENOMIC For specific data (e.g., RNASeq or genomic data), we use metadata templates from the end-point repositories. #if$_MINSEQE The Minimum Information About a Next-generation Sequencing Experiment (MinSEQe) will also be used. #endif$_MINSEQE #endif$_RNASEQ|$_GENOMIC #if$_METABOLOMIC Metabolights submission compliant standards will be used for metabolomic data where this is acccepted by the consortium partners.#issuewarning some Metabolomics partners considers Metabolights not an accepted standard#endissuewarning#endif$_METABOLOMIC As a part of plant research community, we use #if$_MIAPPE MIAPPE  for phenotyping data in the broadest sense, but we will also be rely on #endif$_MIAPPE specific SOPs for additional annotations #if$_DATAPLANT that consider advanced DataPLANT annotation and ontologies. #endif$_DATAPLANT
    • -
    • Reuse of published data – The project builds on existing data sets and relies on them. #if$_RNASEQ For example, without a proper genomic reference it is very difficult to analyze next-generation sequencing (NGS) data sets.#endif$_RNASEQ It is also important to include existing data-sets on the expression and metabolic behavior of the $_STUDYOBJECT, and on existing background knowledge. #if$_PARTNERS of the partners. #endif$_PARTNERS Genomic references can be gathered from reference databases for genomes/ and sequences, like the US National Center for Biotechnology Information: NCBI, European Bioinformatics Institute: EBI; DNA Data Bank of Japan: DDBJ. Furthermore, prior 'unstructured' data in the form of publications and data contained therein will be used for decision making.

      -
    • Secondary Use – The data will initially benefit the $_PROJECT partners, but will also be made available to selected stakeholders closely involved in the project, and then the scientific community working on $_STUDYOBJECT. $_DATAUTILITY In addition, the general public interested in $_STUDYOBJECT can also use the data after publication. The data will be disseminated according to the $_PROJECT's dissemination and communication plan, #if$_DATAPLANT which aligns with DataPLANT platform or other means#endif$_DATAPLANT
    • - -
    • Methods for Data Sharing – Specialized repositories will be used where appropriate, such as INSDC (GenBank, EBI, DDBJ) for nucleotide sequence data, PIR/UniProt/SWISS-PROT for proteins, PDB for protein structures, GEO for transcriptomics, PRIDE for proteomics data, and METLIN for metabolomics data. For unstructured and less standardized data (e.g., experimental phenotypic measurements), these will be annotated with metadata and if complete allocated a digital object identifier (DOI).  #if$_DATAPLANT Whole datasets will also be wrapped into an ARC with allocated DOIs. The ARC and the converters provided by DataPLANT will ensure that the upload into the endpoint repositories is fast and easy. #endif$_DATAPLANT
    • Proprietary Data – Open public data will be used whenever possible.
    • Timeframes #if$_early The data will be published as soon as possible to guarantee reusability. #endif$_early #if$_ipissue IP issues will be checked before publication. #endif$_ipissue All consortium partners will be encouraged to make data available before publication, openly and/or under pre-publication agreements #if$_GENOMIC such as those started in Fort Lauderdale and set forth by the Toronto International Data Release Workshop. #endif$_GENOMIC This will be implemented as soon as IP-related checks are complete.
    • Format of the Final Dataset – Whenever possible, data will be stored in common and openly defined formats including all the necessary metadata. By default, no proprietary formats will be used. However Microsoft Excel files (according to ISO/IEC 29500-1:2016) might be used as intermediates by the consortium#if$_DATAPLANT and by some ARC components#endif$_DATAPLANT. In addition, text files might be edited in text processor files, but will be shared as pdf. -
    -
    -
    -
    - a document template -
    -
    - - - - - - - - - - - - - + + + + + + + + + + + + \ No newline at end of file diff --git a/js/bootstrap.bundle.min.js b/js/bootstrap.bundle.min2106.js similarity index 100% rename from js/bootstrap.bundle.min.js rename to js/bootstrap.bundle.min2106.js diff --git a/js/bootstrap.bundle.min.js.map b/js/bootstrap.bundle.min2106.js.map similarity index 100% rename from js/bootstrap.bundle.min.js.map rename to js/bootstrap.bundle.min2106.js.map diff --git a/js/bs5-intro-tour.js b/js/bs5-intro-tour2106.js similarity index 100% rename from js/bs5-intro-tour.js rename to js/bs5-intro-tour2106.js diff --git a/js/cloud.min.js b/js/cloud.min2106.js similarity index 100% rename from js/cloud.min.js rename to js/cloud.min2106.js diff --git a/js/split.min.js b/js/split.min.js new file mode 100644 index 0000000..49eae33 --- /dev/null +++ b/js/split.min.js @@ -0,0 +1,3 @@ +/*! Split.js - v1.6.0 */ +!function(e,t){"object"==typeof exports&&"undefined"!=typeof module?module.exports=t():"function"==typeof define&&define.amd?define(t):(e=e||self).Split=t()}(this,(function(){"use strict";var e="undefined"!=typeof window?window:null,t=null===e,n=t?void 0:e.document,i=function(){return!1},r=t?"calc":["","-webkit-","-moz-","-o-"].filter((function(e){var t=n.createElement("div");return t.style.cssText="width:"+e+"calc(9px)",!!t.style.length})).shift()+"calc",s=function(e){return"string"==typeof e||e instanceof String},o=function(e){if(s(e)){var t=n.querySelector(e);if(!t)throw new Error("Selector "+e+" did not match a DOM element");return t}return e},a=function(e,t,n){var i=e[t];return void 0!==i?i:n},u=function(e,t,n,i){if(t){if("end"===i)return 0;if("center"===i)return e/2}else if(n){if("start"===i)return 0;if("center"===i)return e/2}return e},l=function(e,t){var i=n.createElement("div");return i.className="gutter gutter-"+t,i},c=function(e,t,n){var i={};return s(t)?i[e]=t:i[e]=r+"("+t+"% - "+n+"px)",i},h=function(e,t){var n;return(n={})[e]=t+"px",n};return function(r,s){if(void 0===s&&(s={}),t)return{};var d,f,v,m,g,p,y=r;Array.from&&(y=Array.from(y));var z=o(y[0]).parentNode,b=getComputedStyle?getComputedStyle(z):null,E=b?b.flexDirection:null,S=a(s,"sizes")||y.map((function(){return 100/y.length})),L=a(s,"minSize",100),_=Array.isArray(L)?L:y.map((function(){return L})),w=a(s,"expandToMin",!1),k=a(s,"gutterSize",10),x=a(s,"gutterAlign","center"),C=a(s,"snapOffset",30),M=a(s,"dragInterval",1),U=a(s,"direction","horizontal"),O=a(s,"cursor","horizontal"===U?"col-resize":"row-resize"),D=a(s,"gutter",l),A=a(s,"elementStyle",c),B=a(s,"gutterStyle",h);function j(e,t,n,i){var r=A(d,t,n,i);Object.keys(r).forEach((function(t){e.style[t]=r[t]}))}function F(){return p.map((function(e){return e.size}))}function R(e){return"touches"in e?e.touches[0][f]:e[f]}function T(e){var t=p[this.a],n=p[this.b],i=t.size+n.size;t.size=e/this.size*i,n.size=i-e/this.size*i,j(t.element,t.size,this._b,t.i),j(n.element,n.size,this._c,n.i)}function N(e){var t,n=p[this.a],r=p[this.b];this.dragging&&(t=R(e)-this.start+(this._b-this.dragOffset),M>1&&(t=Math.round(t/M)*M),t<=n.minSize+C+this._b?t=n.minSize+this._b:t>=this.size-(r.minSize+C+this._c)&&(t=this.size-(r.minSize+this._c)),T.call(this,t),a(s,"onDrag",i)())}function q(){var e=p[this.a].element,t=p[this.b].element,n=e.getBoundingClientRect(),i=t.getBoundingClientRect();this.size=n[d]+i[d]+this._b+this._c,this.start=n[v],this.end=n[m]}function H(e){var t=function(e){if(!getComputedStyle)return null;var t=getComputedStyle(e);if(!t)return null;var n=e[g];return 0===n?null:n-="horizontal"===U?parseFloat(t.paddingLeft)+parseFloat(t.paddingRight):parseFloat(t.paddingTop)+parseFloat(t.paddingBottom)}(z);if(null===t)return e;if(_.reduce((function(e,t){return e+t}),0)>t)return e;var n=0,i=[],r=e.map((function(r,s){var o=t*r/100,a=u(k,0===s,s===e.length-1,x),l=_[s]+a;return o0&&i[r]-n>0){var o=Math.min(n,i[r]-n);n-=o,s=e-o}return s/t*100}))}function I(){var t=p[this.a].element,r=p[this.b].element;this.dragging&&a(s,"onDragEnd",i)(F()),this.dragging=!1,e.removeEventListener("mouseup",this.stop),e.removeEventListener("touchend",this.stop),e.removeEventListener("touchcancel",this.stop),e.removeEventListener("mousemove",this.move),e.removeEventListener("touchmove",this.move),this.stop=null,this.move=null,t.removeEventListener("selectstart",i),t.removeEventListener("dragstart",i),r.removeEventListener("selectstart",i),r.removeEventListener("dragstart",i),t.style.userSelect="",t.style.webkitUserSelect="",t.style.MozUserSelect="",t.style.pointerEvents="",r.style.userSelect="",r.style.webkitUserSelect="",r.style.MozUserSelect="",r.style.pointerEvents="",this.gutter.style.cursor="",this.parent.style.cursor="",n.body.style.cursor=""}function W(t){if(!("button"in t)||0===t.button){var r=p[this.a].element,o=p[this.b].element;this.dragging||a(s,"onDragStart",i)(F()),t.preventDefault(),this.dragging=!0,this.move=N.bind(this),this.stop=I.bind(this),e.addEventListener("mouseup",this.stop),e.addEventListener("touchend",this.stop),e.addEventListener("touchcancel",this.stop),e.addEventListener("mousemove",this.move),e.addEventListener("touchmove",this.move),r.addEventListener("selectstart",i),r.addEventListener("dragstart",i),o.addEventListener("selectstart",i),o.addEventListener("dragstart",i),r.style.userSelect="none",r.style.webkitUserSelect="none",r.style.MozUserSelect="none",r.style.pointerEvents="none",o.style.userSelect="none",o.style.webkitUserSelect="none",o.style.MozUserSelect="none",o.style.pointerEvents="none",this.gutter.style.cursor=O,this.parent.style.cursor=O,n.body.style.cursor=O,q.call(this),this.dragOffset=R(t)-this.end}}"horizontal"===U?(d="width",f="clientX",v="left",m="right",g="clientWidth"):"vertical"===U&&(d="height",f="clientY",v="top",m="bottom",g="clientHeight"),S=H(S);var X=[];function Y(e){var t=e.i===X.length,n=t?X[e.i-1]:X[e.i];q.call(n);var i=t?n.size-e.minSize-n._c:e.minSize+n._b;T.call(n,i)}return(p=y.map((function(e,t){var n,i={element:o(e),size:S[t],minSize:_[t],i:t};if(t>0&&((n={a:t-1,b:t,dragging:!1,direction:U,parent:z})._b=u(k,t-1==0,!1,x),n._c=u(k,!1,t===y.length-1,x),"row-reverse"===E||"column-reverse"===E)){var r=n.a;n.a=n.b,n.b=r}if(t>0){var s=D(t,U,i.element);!function(e,t,n){var i=B(d,t,n);Object.keys(i).forEach((function(t){e.style[t]=i[t]}))}(s,k,t),n._a=W.bind(n),s.addEventListener("mousedown",n._a),s.addEventListener("touchstart",n._a),z.insertBefore(s,i.element),n.gutter=s}return j(i.element,i.size,u(k,0===t,t===y.length-1,x),t),t>0&&X.push(n),i}))).forEach((function(e){var t=e.element.getBoundingClientRect()[d];t0){var i=X[n-1],r=p[i.a],s=p[i.b];r.size=t[n-1],s.size=e,j(r.element,r.size,i._b,r.i),j(s.element,s.size,i._c,s.i)}}))},getSizes:F,collapse:function(e){Y(p[e])},destroy:function(e,t){X.forEach((function(n){if(!0!==t?n.parent.removeChild(n.gutter):(n.gutter.removeEventListener("mousedown",n._a),n.gutter.removeEventListener("touchstart",n._a)),!0!==e){var i=A(d,n.a.size,n._b);Object.keys(i).forEach((function(e){p[n.a].element.style[e]="",p[n.b].element.style[e]=""}))}}))},parent:z,pairs:X}}})); +//# sourceMappingURL=split.min.js.map