- Implement
regression_hops_sampler()
function. The function can return an always$1$ batch size object loader. Thereturn
serves for the regression training without considering the geometric structure, but each training set of loader does from thenum_hops
. So, it is kind of fair for further geometric pathway training. (DONE) - Normalized the data, fixed the possible
$NAN$ issue - Implemented the
pathway_LassoCV.py
to compute the regression, all the plot results are shown as GROUND TRUTH vs PREDICTION. They are in./results/Lasso
folder
Example:
See detail is ./regression_hop_sampler.ipynb
,./pathway_LassoCV.ipynb
. Also some results figures: ['ABL1'].pdf ['APAF1'].pdf
- Implement
cancer_data()
function, which operates theBreast Caner
data in./BreastCancer/Data_RNASeq2.mat
(DONE) - Implement
hops_sampler()
function that sample the overall genomeic/pathway info and return thedataloader
in specificbatch_size
required. (DONE)
Example:
The detailed example is available in ./test_notebook/
and ./sampler_test.ipynb
- The pure text raw data
./GenomicData/Gene_DATA/sourcePathway.txt
processed method (DONE) - The utility method
data_fetch()
class has been DONE data_fetch
is able to exclude the node type that is no necessary wanted
Example:
$(LIST)
is optional input list type. e.g., ['protein', 'abstract']
from utils.data_fetch as dffter
dffter = data_fetch(argv_list=$(LIST))
- Have more clear idea of dataset
- Get the node information and pattern information
- Learned Regular expression(Regex)
- Convert rough data from
.mat
to.txt
for future purpose - The link and node info are in
./GenomicData/Gene_DATA/sourcePathway.txt
- The further genomic activity feature is stored in
./GenomicData/Data_RNASeq2.mat
NEED TO PROCESS LATER
The following is the regex pattern that can make extract information from source file really quick and efficient.
// General node pattern
^([a-z]+)\s([\w\/\-()+]+)$
// Link pattern extraction
^([\w\/\-()+]+)\s([\w\/\-()+]+)\s([\w\>\|-]+)$