-
Notifications
You must be signed in to change notification settings - Fork 1
/
logBook2.txt
33 lines (26 loc) · 932 Bytes
/
logBook2.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Input data:
TFBSs intersected with genes in .bed
For each tissue:
Genes with Differential Expression
Desired Output:
TFs differentially expressed in certain tissues
Using pandas dataframe to load the .bed file
Using a dict to load the gene list of each tissue
Create a dataframe with the number of BSs of a TF in each tissue
Store mean number of BSs per tissue of each TF
#The "diffFactor" measures how much the number of BSs of a TF in a tissue is above the mean.
Calculate diffFactor for each pair of tissue and TF, store it all in a pandas dataframe.
Filter the top 5% values.
Sort by diffFactor.
Save results.
New measure to representation:
WithBSToTF ALL
TESTIS 127 1225
ALL-TISSUE 472 21000
Random simulation:
Select random sets of 1225 genes
Count number of genes with BS to TF
Make a distribuition
Position the amount found (127) in the distribution
Usar Z score?
Usar x²