This repository contains the codes that are used for the analyses of the paper "The effects of replication domains on the genome-wide UV-induced DNA damage and repair"
Yanchao Huang, Cem Azgari, Mengdie Yin, Yi-Ying Chiou, Laura A. Lindsey-Boltz, Aziz Sancar, Jinchuan Hu, Ogun Adebali
Nucleotide excision repair is the primary repair mechanism that removes UV-induced DNA lesions in placentals. If the UV-induced lesions are left unrepaired they might turn into mutations during DNA replication. Although the mutagenesis of pyrimidine dimers is reasonably well understood, the direct effects of replication fork progress on nucleotide excision repair are yet to be clarified. Here, we applied Damage-seq and XR-seq techniques and generated replication maps in synchronized UV-treated HeLa cells. The results suggested that ongoing replication stimulates local repair in both early and late replication domains by relaxing surrounding chromatin. On the other hand, it was unveiled that lesions on lagging strand templates were repaired slower in late replication domains, which was probably due to the imbalanced sequence context. The asymmetric relative repair was in line with the strand bias of melanoma mutations, suggesting a role of exogenous damage, repair, and replication in the mutational strand asymmetry.
-
This workflow is prepared using Snakemake workflow management system and conda
-
To run the workflow, you should have conda installed for environment management. All the other packages including Snakemake and their dependencies can be obtained automatically through environments prepared for each step of the workflow. You can follow the installation steps from the link.
-
Initially, you should clone the repository and navigate into the directory:
git clone https://github.com/CompGenomeLab/replicationRepair.git cd replicationRepair
-
Next, you should create a conda environment with the defined packages. Install mamba and create the environment using mamba:
conda install -c conda-forge mamba=0.25.0 mamba create -c bioconda -c conda-forge -c r -n repair snakemake=6.3.0 conda activate repair
-
Genome fasta file of hg19 can be downloaded and unzipped with the commands below:
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz gunzip GRCh37.p13.genome.fa.gz
-
Then the fasta file should be named as
genome_hg19.fa
and moved toresources/ref_genomes/hg19/
directory located inreplicationRepair
.
-
XR-seq and Damage-seq pipelines should be cloned separately (outside of
replicationRepair
directory) from the github link. -
For the sake of reproducibility, you should checkout to branch
v0.6.1
before running the pipeline. -
Lastly, content of
config.yaml
file inxr-ds-seq-snakemake/config/
should be replaced by the content ofconfig_xr_ds_seq.yaml
inreplicationRepair/config/
.
- Simple somatic mutations of melanoma are publicly available in
ICGC Data Portal.
You should download
simple_somatic_mutation.open.MELA-AU.tsv.gz
file, rename it asmelanoma.tsv.gz
, and move it to thereplicationRepair/resources/samples/mutation
directory.
This workflow is prepared according to the structure recommended by Snakemake:
-
config/
: contains the configuration files. -
logs/
: contains the log files of each step. This folder will automatically appear when you run the workflow. -
report/
: contains the description files of figures, which will be used in reports. -
resources/
: containssamples/
where the raw XR-seq and Damage-seq data are stored andref_genomes/
where the reference genome files are stored. -
results/
: contains the generated files and figures. -
workflow/
: containsenvs/
where the environments are stored,rules/
where the Snakemake rules are stored, andscripts/
where the scripts used inside the rules are stored.
-
Initially the XR-seq and Damage-seq pipeline should be run with the provided config files.
-
After the pipeline is competed, produced bed files should be moved to the appropriate directories.
- For XR-seq samples:
cp {path_to_dir}/xr-ds-seq-snakemake/results/processed_files/*_XR_plus.bed {path_to_dir}/replicationRepair/resources/samples/XR/ cp {path_to_dir}/xr-ds-seq-snakemake/results/processed_files/*_XR_minus.bed {path_to_dir}/replicationRepair/resources/samples/XR/
- For Damage-seq samples:
cp {path_to_dir}/xr-ds-seq-snakemake/results/processed_files/*_DS_plus.bed {path_to_dir}/replicationRepair/resources/samples/DS/ cp {path_to_dir}/xr-ds-seq-snakemake/results/processed_files/*_DS_minus.bed {path_to_dir}/replicationRepair/resources/samples/DS/
- For simulated samples:
cp {path_to_dir}/xr-ds-seq-snakemake/results/*/*/*_sim.bed {path_to_dir}/replicationRepair/resources/samples/sim/
You can run the workflow from replicationRepair
directory:
snakemake --cores 64 --use-conda --keep-going
Note: To run the workflow on Slurm Workload Manager as set of jobs, --profile flag must be provided with proper slurm configuration file (config/slurm ). |
---|
To generate detailed HTML report files, the code below should be run after workflow:
snakemake --report report.zip