PhenoMapper is an extension of Mapper Interactive, which is a web-based framework for interactive analysis and visualization of high-dimensional point cloud data built upon the mapper algorithm.
git clone git@github.com:tdavislab/PhenoMapper.git
cd PhenoMapper
python3 run.py
After running the above commands, you can run Mapper Interactive by visiting http://127.0.0.1:8000/ on the local machine (If possible, please use Chrome).
This software requires Kepler Mapper, scikit-learn, NetworkX and flask to run.
If you do not have these packages installed, please use the following command to intall them.
pip install scikit-learn
pip install networkx
pip install flask
pip install flask_assets
To perform linear regression, please also make sure you have statsmodels installed.
pip install statsmodels
Please refer to a user-guide here for the command-line API.
To compute a mapper graph, point cloud data can be input in CSV format.
The following is an example of an input CSV file.
x,y,z
0.013,0.275,0.465
0.216,0.981,-0.424
0.804,-0.424,0.217
...
Using the command-line API, mapper graphs can be computed offline given a point cloud datset.
The outputting mapper graph will be saved in JSON format, which can be loaded in the GUI.
The following is an example of a mapper graph file.
{
"nodes":{
"cube0_cluster0": [19, 21, 66],
...
},
"edges":{
"cube0_cluster0": ["cube1_cluster0"],
...
}
}
In the tool, we provide machine learning modules for futher exploring the input data. Currently, the tool allows to perform linear regression, logistic regression, PCA, t-SNE, and feature selection (based on linear SVC) on both the entire population or a selected subgroup of nodes. Moreover, the tool is designed for users to easily extend the framework to include machine learning modules available from Python libraries scikit-learn and statsmodels.
Users should take it upon themselves to properly carry out the most appropriate statistical analysis of their datasets.
We list the model assumptions for each machine learning module. Before performing any kind of the exploratory analysis, please make sure the data meets the corresponding assumptions, otherwise, the fitted model might be less effective.
-
Linear regression: We don't make any additional assumptions for linear regression. The input data should satisfy the following four basic assumptions.
- Linearity: The relationship between the dependent variable y and independent variables X are likely to be linear.
- Homoscedasticity: The variance of residual is the same for any value of X.
- Independence: Observations should be independent of each other.
- Normality: For any fixed value of X, y is normally distributed.
-
Logistic regression:
- The dependent variable y should be binary.
- Independence: Observations should be independent of each other.
-
PCA:
- There is no unique variance, which means the total variance is equal to common variance.
-
t-SNE:
- According to the documentation from scikit-learn, it is highly recommended to use another dimensionality reduction method (e.g. PCA for dense data or TruncatedSVD for sparse data) to reduce the number of dimensions to a reasonable amount (e.g. 50) if the number of features is very high. (https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)
-
Linear SVC
- The dependent variable y should be categorical.
Under Novice user mode, the new modules can be added by describing the module information within the app/static/uploads/new_modules.json
. Currently, the tool allows the addition of supervised and unsupervised learning algorithms that are available via scikit-learn
.
Here is an example of adding a t-SNE module.
{
"modules":
[
{
"name":"TSNE",
"function-name":"sklearn.manifold.TSNE",
"function-parameters":{
"n_components":2
},
"module-type":"unsupervised_learning",
"components":["scatter plot"]
}
]
}
For expert users with programming experience, we provide a template function call_module_function
in app/views.py
. A new module can be added by directly modifying the function. It supports customizable and multistep analysis pipelines.
We also provide a template class app/static/js/new_module.js
in JavaScript for creating new visual components using D3.js
.
The styles of visual components are changed via the CSS file app/static/css/styles.css
.
Pull requests are welcomed.
Pheno-Mapper: An Interactive Toolbox for the Visual Exploration of Phenomics Data.
Youjia Zhou, Methun Kamruzzaman, Patrick Schnable, Bala Krishnamoorthy, Ananth Kalyanaraman, Bei Wang.
ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), 2021.