Skip to content

Epitope Prediction Pipeline (EPP) module designed to format and conduct biostatistics on NetMHCIIpan data

Notifications You must be signed in to change notification settings

CPalmer3200/Epitope-Prediction-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

banner

User manual .txt file for the Epitope Prediction Pipeline (EPP) module

  • Ensure EPP.py is located in /lib/site-packages (or the folder location is appended to sys.path)

*Note - this module requires the use of two external webpages

  1. IEDB MHC-II binding prediction tool (http://tools.iedb.org/mhcii/)
  2. MixMHC2pred (http://mixmhc2pred.gfellerlab.org/)

*The functions presented in this file are designed to be used chronologically

Functions:

allele_types()

This function is used to specify control and risk alleles and accepts two arguments. Argument 1 should be a list of
all the used control alleles (example format: 'HLA-DPA1*01:01/DPAB*01:01', 'HLA-DRB1*01:01'). Argument 2 should be a list of 
the used risk alleles with identical format. The function returns one dictionary variable.

Example:

control_alleles = ['HLA-DPA1*01:01/DPAB*01:01', 'HLA-DRB1*01:01']
risk_alleles = [...]

allele_dictionary = EPP.allele_types(control_alleles, risk_alleles)

*GENERATE DATA WITH THE IEDB MHC-II BINDING PREDICTION TOOL

iedb_format()

This function correctly formats the output of the IEDB MHC-II prediction tool. It accepts three arguments: 1) string name
of the IEDB .csv output file. 2) List of proteins in the SAME ORDER as they were submitted to the IEDB tool. 3) The allele
dictionary created by allele_types(). The function returns a dataframe variable and also creates a .txt file in the current 
working directory titled: 'HLAII_peptide_output.txt' and this should be used as the input for MixMHC2pred.

Example:

protein_list = ['protein1', 'protein2', ...]

IEDB_dataframe = EPP.iedb_format('IEDB_output', protein_list, allele_dictionary)

*GENERATE DATA WITH THE MIXMHC2PRED TOOL

merge_data()

This function correctly formats the output of MixMHC2pred assembles the IEDB and MixMHC2pred dataframes into one large dataframe 
and a separate dictionary of dataframes (DfD). It accepts two arguments: 1) The IEDB dataframe variable created by iedb_format()
, 2) The string name for the MixMHC2pred output file, 3) The allele dictionary created by allele_types(). The function returns
two variables: 1) the master dataframe, 2) the dataframe dictionary.

Example:

merged_data, DfD = EPP.merge_data(IEDB_dataframe, 'MixMHC2pred_output', allele_dictionary)

pdif()

This is the protein differential immunogenicity factor (PDIF) function where the mean adjusted binding values will be compared
between the control and risk HLA alleles. The function accepts two arguments, 1) the DfD dictionary generated by merge_data()
and the allele_dictionary generated by allele_types(). The function outputs two variables: 1) a dataframe of statistical tests
conducted for each protein and allele subtype, 2) a dictionary of posthoc matrices.


Example:

pdif_stats, pdif_posthocs = EPP.pdif(DfD, allele_dictionary)

adif()

This function is the allele differential immunogenicity factor (ADIF) and compares the mean adjusted binding values between
proteins for each HLA allele. The function accepts one argument - the merged_data dataframe assembled by merge_data(). The function
will output two variables: 1: a dataframe of the statistical tests performed, 2) a dictionary of posthoc matrices.


Example:

adif_stats, adif_posthocs = EPP.adif(merged_data)

About

Epitope Prediction Pipeline (EPP) module designed to format and conduct biostatistics on NetMHCIIpan data

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages