Skip to content

Commit

Permalink
Completed updates to docs and to history.
Browse files Browse the repository at this point in the history
  • Loading branch information
jlparkI committed Aug 22, 2023
1 parent ab8f068 commit 79c3b27
Show file tree
Hide file tree
Showing 7 changed files with 796 additions and 37 deletions.
9 changes: 7 additions & 2 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,8 @@ requirements file with numpy not yet installed.

Updated dataset builder so that different batches with different
xdim[1] are now accepted when building a dataset. This obviates
the need to zero-pad data (although note that zero-padding is
generally advisable for consistency).
the need to zero-pad data (although note that zero-padding can
still be used).

### Version 0.1.2.3

Expand All @@ -138,3 +138,8 @@ to cpu.
Fixed a bug involving variable-length sequence inputs to
FHT-conv1d kernels. Sped up the nan checking for dataset building.

### Version 0.1.3.0

Added sequence / graph averaging to all convolution kernels as
an option. Added kernel PCA and clustering tools that do not require
a fitted model as input.
5 changes: 4 additions & 1 deletion docs/auxiliary_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,16 @@ Clustering and visualizing data
================================

You can also use random features generated by xGPR for kernel PCA and for
kernel k-means clustering. xGPR has a built-in tool for kernel PCA.
clustering. xGPR has a built-in tool for kernel PCA.
For kernel k-means, we have a tool that will generate the random features
for your data for you, and then you just run k-means on the resulting features
using your package of choice. This is (approximately) equivalent to running
kernel k-means on the original data, but much faster, because there is no
need to construct an N x N kernel matrix! Also, it enables you to use
the graph and sequence kernels in xGPR for clustering or visualization.
(It is also possible to use another algorithm aside from k-means to cluster
the random features representations, although due to its scalability k-means
is a popular choice.)

In a future version of xGPR, we may add our own implementation of k-means.
For now, though, we don't provide one, which means you'll have to use
Expand Down
5 changes: 4 additions & 1 deletion docs/notebooks/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ Examples / Tutorials

We've included 3 examples / tutorials to illustrate how you can use xGPR
for fixed-vector data, protein sequence data and small molecule data
as below:
as below. We've also included a fourth example to illustrate kernel PCA
and clustering with xGPR.

.. toctree::
:maxdepth: 1
Expand All @@ -13,3 +14,5 @@ as below:
molecule_example

sequence_example

kPCA_and_clustering.ipynb
455 changes: 423 additions & 32 deletions docs/notebooks/kPCA_and_clustering.ipynb

Large diffs are not rendered by default.

356 changes: 356 additions & 0 deletions test_data/msa_seqs.txt

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion xGPR/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#Version number. Updated if generating a new release.
#Otherwise, do not change.
__version__ = "0.1.2.5"
__version__ = "0.1.3.0"

#Key imports.
from .xGP_Regression import xGPRegression
Expand Down
1 change: 1 addition & 0 deletions xGPR/kernel_xpca.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ def __init__(self, num_rffs, hyperparams, dataset, n_components = 2,
self.n_components = n_components

#Initialize the kPCA model.
dataset.device = self.device
self.z_mean = self.get_mapped_data_statistics(dataset)
self.eigvecs = self.initialize_kpca(dataset)
dataset.device = "cpu"
Expand Down

0 comments on commit 79c3b27

Please sign in to comment.