Dynamic Programming c++ implementation #9

npkamath · 2024-01-02T18:27:03Z

Description

Added a c++ implementation of the dynamic programming class. Header, cpp, dependencies and requirements are all in the the Dynp folder.

Motivation

Optimizes the dynp class to use significantly less time and memory; optimized using c++ eigen and intel tbb libraries.

Do Changes Introduce Breaking Changes

Yes.

Have you (if appropriate)

Updated changelog
Updated Documentation
Add tests
Added name to contributors

npkamath · 2024-01-02T18:28:14Z

I did not delete the pull_request file from .github; I misunderstood and edited it and added it to my my own root directory. The original file is unchancged.

Also format C++ files using clang-format.

b-butler

Add dupin/detect/dynp.py file that creates wrapper around DynamicProgramming class.

Sorry, I don't have a more complete review, but I will continue it later this week. I wanted to go ahead and give you tasks that you could start on.

Also, let me know if you have questions on the new structure of the package.

src/dupininterface.cpp

src/dupin.cpp

b-butler · 2024-01-09T00:20:42Z

src/dupin.cpp

+using namespace std;
+using namespace Eigen;


Don't use the namespace, explicitly scope them. It makes the code more readable.

Suggested change

using namespace std;

using namespace Eigen;

If you use std:: and Eigen:: then you don't need these lines.

src/dupin.cpp

src/dupin.h

b-butler · 2024-01-09T00:44:25Z

src/dupin.h

+      memo;
+
+  int num_bkps;          // Number of breakpoints to detect.
+  int num_parameters;    // Number of features in the dataset.


This should be known from the passed matrix no?

Some of this was just for testing/setting back in ye old days of input.txt. I can just make it use data.size now

Still to be done.

If we keep this we should be more elaborate and use num_features and num_breakpoints.

Also, if possible, expand on the variable names throughout. It will help future developers work with the code.

b-butler · 2024-01-09T00:46:04Z

src/dupin.h

+    Eigen::VectorXd x; // z Independent variable (time steps).
+  };
+
+public:


Most of these should be private or protected.

Many of these methods either don't need to exist the n_timesteps n_parameters stuff or should be private/protected. The constructor, setters, getters, and anything exported to Python should remain here.

src/dupin.h

npkamath · 2024-01-12T15:15:29Z

still have the new function to add

…er indexing

b-butler

Some more comments for now. Thanks for the quick work.

b-butler · 2024-01-15T20:35:03Z

src/dupin.h

+    void initialize(int n) {
+        length = n;
+        matrix.resize(n * (n + 1) / 2, 0.0);
+        row_indices.resize(n);
+        for (int row = 0; row < n; ++row) {
+            row_indices[row] = row * (2 * length - row + 1) / 2;
+        }
+    }


Do we need this function?

I think yes because without a global cost_matrix variable, I will have to pass by reference the entire cost_matrix through the recursion

I mean can't we just initialize the matrix and discard the initialization function. Why do we create it and then size it?

b-butler · 2024-01-15T21:27:56Z

src/dupin.h

+      memo;
+
+  int num_bkps;          // Number of breakpoints to detect.
+  int num_parameters;    // Number of features in the dataset.


Still to be done.

src/dupininterface.cpp

b-butler · 2024-01-15T21:44:26Z

src/dupin.h

+    Eigen::VectorXd x; // z Independent variable (time steps).
+  };
+
+public:


Many of these methods either don't need to exist the n_timesteps n_parameters stuff or should be private/protected. The constructor, setters, getters, and anything exported to Python should remain here.

src/dupin.h

b-butler · 2024-01-15T21:45:23Z

src/dupin.cpp

+void DynamicProgramming::setCostMatrix(
+    const DynamicProgramming::UpperTriangularMatrix &value) {
+  cost_matrix = value;
+}


Should anyone be setting the full cost matrix?

Nope! honestly the setter functions aren't needed anymore. I can probably delete them and just set the getters as actual functions rather than properties.

Resolved but not deleted.

src/dupin.cpp

src/dupin.h

b-butler

Some more comments as you address my other ones (I you have resolved one of my points mark so on the comments page and it will be easier for my to know what you have handled.

CMakeLists.txt

b-butler · 2024-01-18T22:54:20Z

dupin/detect/dynp.py

+        """
+        self.dynp.set_threads(num_threads)
+
+    def return_breakpoints(self) -> list:


This should be unnecessary.

can you clarify which part

I think you meant return_breakpoints, I removed it and just kept fit.

src/CMakeLists.txt

…ommit, rearranged class to private

npkamath · 2024-01-19T07:15:01Z

New commit should address most of the changes you requested: let me know if the private/public structure works or else I can rewrite some of the functions

codecov · 2024-01-19T07:15:58Z

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 0.00%. Comparing base (5b132c8) to head (f51ff1c).
Report is 5 commits behind head on main.

❗ Current head f51ff1c differs from pull request most recent head 77e3e00. Consider uploading reports for the commit 77e3e00 to get more accurate results

Files	Patch %	Lines
dupin/detect/dynp.py	0.00%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #9       +/-   ##
==========================================
- Coverage   76.08%   0.00%   -76.09%     
==========================================
  Files          18      19        +1     
  Lines        1104    1113        +9     
  Branches      234       0      -234     
==========================================
- Hits          840       0      -840     
- Misses        221    1113      +892     
+ Partials       43       0       -43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

DomFijan · 2024-01-19T19:02:21Z

dupin/detect/dynp.py

+class DynP:
+    """Detects the change points in a time series.
+


If this class is supposed to be used by users, Perhaps some additional documentation would be useful here alongside a usage example.

Is that something that goes into the comments of this class or in the documentation/tutorial for dupin? I can add this in comments if needed

Documentation here. A tutorial using DynP may be useful in the future.

yeh, so check for example SweepDetector class in: https://github.com/glotzerlab/dupin/blob/main/dupin/detect/detect.py
You might want to write what this class is used for and what implementation does it use in 2-3 short sentences.

Additionally, if this class should be used differently to the most of the other classes you might want to add a very short example of how to use it. Not sure if thats the case? For example:

def compute(self, orientations): """Calculates the per-particle and global order parameter. Example:: >>> orientations = np.array([[1, 0, 0, 0]] * 100) >>> director = np.array([1, 1, 0]) >>> nematic = freud.order.Nematic(director) >>> nematic.compute(orientations) freud.order.Nematic(u=[...]) Args: orientations (:math:`\left(N_{particles}, 4 \right)` :class:`numpy.ndarray`): Orientations to calculate the order parameter. """

DomFijan · 2024-01-19T19:07:07Z

dupin/detect/dynp.py

+        """Initialize the DynamicProgramming instance with given parameters."""
+        self.dynp = _DynP.DynamicProgramming(data, num_bkps, jump, min_size)
+
+    def set_num_threads(self, num_threads: int):


Long term dev wise, if additional CPP methods will be added we will most likely want num_threads to be controlled on the level of the whole module? Would preprocessor be a good place to have this? @b-butler thoughts?

Adding this to as a _DynP module function and exporting it to dupin/util.py is probably the best solution in my opinion.

so I would just move this function to util.py right? would just have to import _DynP in util.py?

Installs: - TBB - Eigen3 - Ninja

Cannot use a reusable workflow as a step only as a job

b-butler

The code is looking good. I tried to give a thorough review this time since I am going on parental leave tomorrow. No rush on getting this done, as I won't be looking at GitHub for a couple of weeks, likely.

Also, I switched the build system to https://scikit-build-core.readthedocs.io/en/latest/getting_started.html. If you have problems building the package look there.

We need tests. See example tests in tests/detect/test_detect.py.

b-butler · 2024-01-24T18:58:23Z

dupin/detect/dynp.py

+class DynP:
+    """Detects the change points in a time series.
+


Documentation here. A tutorial using DynP may be useful in the future.

dupin/detect/dynp.py

src/dupininterface.cpp

b-butler · 2024-01-24T19:03:23Z

src/dupin.h

+    void initialize(int n) {
+        length = n;
+        matrix.resize(n * (n + 1) / 2, 0.0);
+        row_indices.resize(n);
+        for (int row = 0; row < n; ++row) {
+            row_indices[row] = row * (2 * length - row + 1) / 2;
+        }
+    }


I mean can't we just initialize the matrix and discard the initialization function. Why do we create it and then size it?

src/dupin.cpp

b-butler · 2024-01-24T19:48:57Z

src/dupin.h

+      memo;
+
+  int num_bkps;          // Number of breakpoints to detect.
+  int num_parameters;    // Number of features in the dataset.


If we keep this we should be more elaborate and use num_features and num_breakpoints.

b-butler · 2024-01-24T19:49:31Z

src/dupin.h

+      memo;
+
+  int num_bkps;          // Number of breakpoints to detect.
+  int num_parameters;    // Number of features in the dataset.


Also, if possible, expand on the variable names throughout. It will help future developers work with the code.

dupin/detect/dynp.py

Co-authored-by: Brandon Butler <butlerbr@umich.edu>

b-butler

@npkamath Thank you for all the hard work. This is probably the last thorough review I will give. I am happy to give it a quick glance though if you'd like and feel free to reach out to me.

@DomFijan is taking over the day-to-day management of the package and can help you in the finishing touches. @joaander is also a great resource for specific questions on coding practices. If either of them advise you a different way than I did feel free to disregard my prior comments.

I am approving as to not block any future developments, but please don't merge until @DomFijan approves.

b-butler · 2024-03-05T15:22:42Z

dupin/detect/dynp.py

+    Methods
+    -------
+    __init__(self, data: np.ndarray, num_bkps: int, jump: int, min_size: int)
+        Initializes the DynamicProgramming instance with the time series data
+        and parameters.
+    set_num_threads(self, num_threads: int)
+        Sets the number of threads to be used for parallel computation.
+    fit(self, num_bkps: int) -> list
+        Calculates the cost matrix and identifies the optimal breakpoints in
+        the time series data.


These get documented by the method's docstring itself.

b-butler · 2024-03-05T15:23:54Z

dupin/detect/dynp.py

+    def __init__(
+        self, data: np.ndarray, num_bkps: int, jump: int, min_size: int
+    ):
+        """Initialize the DynamicProgramming instance with given parameters."""


If you document a class's __init__.py method, you need to document its parameters here as well. The alternative is to document both parameters and attributes in the class's docstring.

b-butler · 2024-03-05T15:26:46Z

dupin/detect/dynp.py

+        Parameters
+        ----------
+        num_threads: int
+            The number of threads to use during computation. Default


This is not a parameter default. If you want to add a comment about the default parallelization it should go above this section. Also, the message should state that we use all available cores unless set otherwise.

b-butler · 2024-03-05T15:28:10Z

src/dupin.cpp

+using namespace std;
+using namespace Eigen;


Suggested change

using namespace std;

using namespace Eigen;

If you use std:: and Eigen:: then you don't need these lines.

b-butler · 2024-03-05T15:41:47Z

src/dupin.cpp

-  double slope = x_centered.dot(y_centered) / x_centered.squaredNorm();
-  double intercept = y_mean - slope * x_mean;
+    // everything till this line is functioning fine; I might be overcomplicating it
+    Eigen::MatrixXd regression_lines = (x_matrix.array().colwise() - x_mean).colwise() * slope.array() + intercept.transpose().array();


I think you just need to do something like x_matrix * slope.replicate(x.size(), 1) + interept.replicate(x.size(), 1) though I am not too familiar with Eigen.

b-butler · 2024-03-05T15:53:28Z

src/dupin.h

+  // Structure for storing linear regression parameters.
+  struct linear_fit_struct {


I think this would be a good idea, and hopefully not too much work as the code is pretty modular already.

b-butler · 2024-03-05T15:54:15Z

src/dupin.cpp

+  double x_mean = x.mean();
+  double y_mean = y.mean();


What were your thoughts @npkamath?

npkamath and others added 7 commits January 2, 2024 13:02

added Dynp class

bb10165

reverted extra spaces

d13dfc7

reverted spaces

5bbbaf2

removed read_input function

930430a

Merge remote-tracking branch 'refs/remotes/origin/main'

379897a

removed read_input function

e35cff1

Delete PULL_REQUEST_TEMPLATE.md

e8ccb7c

refactor: Remove Dynp directory

fccca35

Also format C++ files using clang-format.

b-butler requested changes Jan 9, 2024

View reviewed changes

fixed namespace, naming, and cleaned up comments

0c64950

npkamath requested a review from a team January 12, 2024 12:56

npkamath added 2 commits January 14, 2024 16:59

added dynp.py file, fixed constructors, added column_indices for fast…

fe6bac1

…er indexing

fixed index system

245a92f

b-butler requested changes Jan 15, 2024

View reviewed changes

reorganized class variables and added dynp.py functions and fit

0af1b08

b-butler requested changes Jan 18, 2024

View reviewed changes

fit function added with parameter input, removed whitespace with prec…

66e2104

…ommit, rearranged class to private

DomFijan reviewed Jan 19, 2024

View reviewed changes

b-butler added 7 commits January 24, 2024 10:29

Merge remote-tracking branch 'upstream/main'

a4cff6f

build: Switch to scikit-build-core.

9ccf1fe

ci: Add reusable workflow to install system packages

8da0fe0

Installs: - TBB - Eigen3 - Ninja

ci: Fix package installation by using composite action

943a6c0

Cannot use a reusable workflow as a step only as a job

ci: Correctly specify shell for custom action

98ee45c

ci: Fix action step name formatting

744b928

ci: Remove trailing ":"

b8dd0f2

b-butler added 4 commits January 24, 2024 13:10

ci: Update package manager caches before installing

c7e6920

ci: Fix apt-get package names

7771157

ci: Fix one last package name

f7287f8

ci: Yet another package name change

47aedc0

b-butler requested changes Jan 24, 2024

View reviewed changes

npkamath and others added 5 commits February 26, 2024 04:29

documentation and renaming added

4fd3a58

upper triangular restructured and dynp restructured

4dd27f8

upper triangular restructured and dynp restructured

f51ff1c

conditionals

c0afe6f

Co-authored-by: Brandon Butler <butlerbr@umich.edu>

naming and other cpp changes

77e3e00

b-butler approved these changes Mar 5, 2024

View reviewed changes

		// Structure for storing linear regression parameters.
		struct linear_fit_struct {

Dynamic Programming c++ implementation #9

Are you sure you want to change the base?

Dynamic Programming c++ implementation #9

Conversation

npkamath commented Jan 2, 2024

Description

Motivation

Do Changes Introduce Breaking Changes

Have you (if appropriate)

npkamath commented Jan 2, 2024

b-butler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npkamath commented Jan 12, 2024

b-butler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-butler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npkamath commented Jan 19, 2024

codecov bot commented Jan 19, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-butler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-butler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 19, 2024 •

edited

Loading