Completed updates to docs and to history.

jlparkI · Aug 22, 2023 · 79c3b27 · 79c3b27
1 parent ab8f068
commit 79c3b27
Show file tree

Hide file tree

Showing 7 changed files with 796 additions and 37 deletions.
diff --git a/HISTORY.md b/HISTORY.md
@@ -125,8 +125,8 @@ requirements file with numpy not yet installed.
 
 Updated dataset builder so that different batches with different
 xdim[1] are now accepted when building a dataset. This obviates
-the need to zero-pad data (although note that zero-padding is
-generally advisable for consistency).
+the need to zero-pad data (although note that zero-padding can
+still be used).
 
 ### Version 0.1.2.3
 
@@ -138,3 +138,8 @@ to cpu.
 Fixed a bug involving variable-length sequence inputs to
 FHT-conv1d kernels. Sped up the nan checking for dataset building.
 
+### Version 0.1.3.0
+
+Added sequence / graph averaging to all convolution kernels as
+an option. Added kernel PCA and clustering tools that do not require
+a fitted model as input.
diff --git a/docs/auxiliary_tutorial.rst b/docs/auxiliary_tutorial.rst
@@ -2,13 +2,16 @@ Clustering and visualizing data
 ================================
 
 You can also use random features generated by xGPR for kernel PCA and for
-kernel k-means clustering. xGPR has a built-in tool for kernel PCA.
+clustering. xGPR has a built-in tool for kernel PCA.
 For kernel k-means, we have a tool that will generate the random features
 for your data for you, and then you just run k-means on the resulting features
 using your package of choice. This is (approximately) equivalent to running
 kernel k-means on the original data, but much faster, because there is no
 need to construct an N x N kernel matrix! Also, it enables you to use
 the graph and sequence kernels in xGPR for clustering or visualization.
+(It is also possible to use another algorithm aside from k-means to cluster
+the random features representations, although due to its scalability k-means
+is a popular choice.)
 
 In a future version of xGPR, we may add our own implementation of k-means.
 For now, though, we don't provide one, which means you'll have to use

diff --git a/docs/notebooks/examples.rst b/docs/notebooks/examples.rst
@@ -3,7 +3,8 @@ Examples / Tutorials
 
 We've included 3 examples / tutorials to illustrate how you can use xGPR
 for fixed-vector data, protein sequence data and small molecule data
-as below:
+as below. We've also included a fourth example to illustrate kernel PCA
+and clustering with xGPR.
 
 .. toctree::
   :maxdepth: 1
@@ -13,3 +14,5 @@ as below:
   molecule_example
 
   sequence_example
+
+  kPCA_and_clustering.ipynb
diff --git a/docs/notebooks/kPCA_and_clustering.ipynb b/docs/notebooks/kPCA_and_clustering.ipynb
diff --git a/test_data/msa_seqs.txt b/test_data/msa_seqs.txt
diff --git a/xGPR/__init__.py b/xGPR/__init__.py
@@ -1,6 +1,6 @@
 #Version number. Updated if generating a new release.
 #Otherwise, do not change.
-__version__ = "0.1.2.5"
+__version__ = "0.1.3.0"
 
 #Key imports.
 from .xGP_Regression import xGPRegression

diff --git a/xGPR/kernel_xpca.py b/xGPR/kernel_xpca.py
@@ -70,6 +70,7 @@ def __init__(self, num_rffs, hyperparams, dataset, n_components = 2,
         self.n_components = n_components
 
         #Initialize the kPCA model.
+        dataset.device = self.device
         self.z_mean = self.get_mapped_data_statistics(dataset)
         self.eigvecs = self.initialize_kpca(dataset)
         dataset.device = "cpu"