- Ensure EPP.py is located in /lib/site-packages (or the folder location is appended to sys.path)
*Note - this module requires the use of two external webpages
- IEDB MHC-II binding prediction tool (http://tools.iedb.org/mhcii/)
- MixMHC2pred (http://mixmhc2pred.gfellerlab.org/)
*The functions presented in this file are designed to be used chronologically
This function is used to specify control and risk alleles and accepts two arguments. Argument 1 should be a list of
all the used control alleles (example format: 'HLA-DPA1*01:01/DPAB*01:01', 'HLA-DRB1*01:01'). Argument 2 should be a list of
the used risk alleles with identical format. The function returns one dictionary variable.
Example:
control_alleles = ['HLA-DPA1*01:01/DPAB*01:01', 'HLA-DRB1*01:01']
risk_alleles = [...]
allele_dictionary = EPP.allele_types(control_alleles, risk_alleles)
*GENERATE DATA WITH THE IEDB MHC-II BINDING PREDICTION TOOL
This function correctly formats the output of the IEDB MHC-II prediction tool. It accepts three arguments: 1) string name
of the IEDB .csv output file. 2) List of proteins in the SAME ORDER as they were submitted to the IEDB tool. 3) The allele
dictionary created by allele_types(). The function returns a dataframe variable and also creates a .txt file in the current
working directory titled: 'HLAII_peptide_output.txt' and this should be used as the input for MixMHC2pred.
Example:
protein_list = ['protein1', 'protein2', ...]
IEDB_dataframe = EPP.iedb_format('IEDB_output', protein_list, allele_dictionary)
*GENERATE DATA WITH THE MIXMHC2PRED TOOL
This function correctly formats the output of MixMHC2pred assembles the IEDB and MixMHC2pred dataframes into one large dataframe
and a separate dictionary of dataframes (DfD). It accepts two arguments: 1) The IEDB dataframe variable created by iedb_format()
, 2) The string name for the MixMHC2pred output file, 3) The allele dictionary created by allele_types(). The function returns
two variables: 1) the master dataframe, 2) the dataframe dictionary.
Example:
merged_data, DfD = EPP.merge_data(IEDB_dataframe, 'MixMHC2pred_output', allele_dictionary)
This is the protein differential immunogenicity factor (PDIF) function where the mean adjusted binding values will be compared
between the control and risk HLA alleles. The function accepts two arguments, 1) the DfD dictionary generated by merge_data()
and the allele_dictionary generated by allele_types(). The function outputs two variables: 1) a dataframe of statistical tests
conducted for each protein and allele subtype, 2) a dictionary of posthoc matrices.
Example:
pdif_stats, pdif_posthocs = EPP.pdif(DfD, allele_dictionary)
This function is the allele differential immunogenicity factor (ADIF) and compares the mean adjusted binding values between
proteins for each HLA allele. The function accepts one argument - the merged_data dataframe assembled by merge_data(). The function
will output two variables: 1: a dataframe of the statistical tests performed, 2) a dictionary of posthoc matrices.
Example:
adif_stats, adif_posthocs = EPP.adif(merged_data)