diff --git a/docs/File-Formats.md b/docs/File-Formats.md index 37592f25ebc..4760ff90039 100644 --- a/docs/File-Formats.md +++ b/docs/File-Formats.md @@ -21,6 +21,7 @@ * [Study Tags file](#study-tags-file) * [Generic Assay](#generic-assay) * [Arm Level CNA Data](#arm-level-cna-data) + * [Mutational Signature Data](#mutational-signature-data) * [Resource Data](#resource-data) * [Custom namespace columns](#custom-namespace-columns) @@ -1599,6 +1600,36 @@ Allowed values for Arm-level copy-number data are `Loss`, `Gain`, and `Unchanged Please find example file format here: [Meta file example](https://github.com/cBioPortal/cbioportal-frontend/blob/master/end-to-end-test/local/studies/lgg_ucsf_2014_test_generic_assay/meta_armlevel_CNA.txt) and [Data file example](https://github.com/cBioPortal/cbioportal-frontend/blob/master/end-to-end-test/local/studies/lgg_ucsf_2014_test_generic_assay/data_armlevel_CNA.txt) +### Mutational Signature Data +Mutational Signature data is a predefined subtype of Generic Assay Data. Setting `generic_assay_type: MUTATIONAL_SIGNATURE` +in the meta file will make cBioPortal interpret the data as Mutational Signature data. + +#### Mutational Signature meta files +The mutational signature meta files follow the same convention as the [Generic Assay Meta file](#generic-assay-meta-file), +however there are some key differences: +- `genetic_assay_type` should be set to `MUTATIONAL_SIGNATURE` +- `stable_id` values should end with: `{datatype}_{identifier}`, where: + - `datatype` is one of `contribution`, `pvalue` or `matrix` + - `identifier` is consistent between files belonging to the same analysis + - Multiple signatures can be added to a single study, as long as they have different identifiers in their stable id + (e.g., `contribution_SBS` and `contribution_DBS`) +- In `generic_entity_meta_properties` the `NAME` value is required. The `DESCRIPTION` and `URL` values can be added + to display more information and link to external resources in the mutational signatures tab. + +#### Mutational Signature data files +The mutational signature data files follow the same convention as the [Generic Assay Data file](#generic-assay-data-file). +Each collection of mutational signatures can consist of up to three different data files, each with an accompanying meta file. +- Signature contribution file (**required**) + - Data file containing the contribution of each signature-sample pair. Values are expected to be 0 ≥ x ≥ 1. +- Signature p-values file (optional) + - Data file containing p-values for each signature-sample pair. Values below 0.05 will be treated as significant. +- Mutation matrix file (optional) + - Data file containing nucleotide changes of a sample. cBioPortal has specific visualisation options for single-base + substitutions (96 channels), double-base substitutions (72 channels) and insertion/deletions (83 channels), + following the signatures defined by [COSMIC](https://cancer.sanger.ac.uk/signatures/). But other channels can also + be used. Values are expected to be positive integers. + + ## Resource Data The resource data is used to capture resource data in patients, samples and studies. The resources will be represented by URLs with meta data. The types of resources include: