Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new type PackedArray #275

Merged
merged 14 commits into from
Apr 25, 2024
Merged

Add new type PackedArray #275

merged 14 commits into from
Apr 25, 2024

Conversation

janw20
Copy link
Collaborator

@janw20 janw20 commented Apr 18, 2024

No description provided.

@cschwan
Copy link
Contributor

cschwan commented Apr 18, 2024

PackedArray is possibly a bad name and that would be my fault. If anyone can think of a better name let us know!

The idea is to have a similar interface as the array class from ndarray, but not store zeros to keep down the file size, so PackedArray<T, D> will be the successor ofSparseArray3<T>. The new type will compress a bit better and more importantly it supports arbitrary dimensions.

@cschwan cschwan self-requested a review April 18, 2024 12:06
@cschwan cschwan self-assigned this Apr 18, 2024
@alecandido
Copy link
Member

The idea is to have a similar interface as the array class from ndarray, but not store zeros to keep down the file size, so PackedArray<T, D> will be the successor ofSparseArray3<T>. The new type will compress a bit better and more importantly it supports arbitrary dimensions.

Yes, I just read the first part, and it was not obvious that it was the sparse array, sorry.

Skimming through it, there were no docs, so I made the wrong inference.
I deleted the comment right after, as soon as I've seen the Index and IndexMut implementation.

@cschwan
Copy link
Contributor

cschwan commented Apr 18, 2024

No worries! We're happy you're interested and that you're looking at the code.

@alecandido
Copy link
Member

Btw: what about just using SparseArray? (w/o the 3)

@cschwan
Copy link
Contributor

cschwan commented Apr 18, 2024

The adjective 'sparse' never made sense, because when you'd have to decide whether the arrays were sparse or dense, they're clearly dense.

Another way of explaining it is that the data structures for representing sparse matrices would clearly be inefficient for our use cases and in fact need much more memory. There's usually a connected subset of the array that is non-zero, and when you integrate long enough the this subset should even be simply connected.

@alecandido
Copy link
Member

From your comment, I'd say that it is an optimized strategy for the sparse matrix representation (as there is not a single best strategy). I.e., your matrix has a relevant amount of default elements, and you could compress your representation by avoiding writing down all elements.

However, names are relevant but not crucial. So, if packed sounds more appropriate, just keep that :)

@cschwan
Copy link
Contributor

cschwan commented Apr 18, 2024

From your comment, I'd say that it is an optimized strategy for the sparse matrix representation (as there is not a single best strategy). I.e., your matrix has a relevant amount of default elements, and you could compress your representation by avoiding writing down all elements.

That's basically it, we first map the D-dimensional indices to one-dime sional and then we store the non-zero (non-defaulted) elements a linearized vector. For zeros we just save where they are and how many of them are consecutive. Finally there's an exception where it makes sense to explicitly store zeros, however in practice it shouldn't make a big difference.

@janw20
Copy link
Collaborator Author

janw20 commented Apr 18, 2024

After looking into the sparse matrix Wikipedia article, I think we're doing a kind of band matrix (at least for D=2), which (according to Wikipedia) is a sparse matrix. So I think I would be fine with calling it SparseArray, also because that is more understandable and consistent with SparseArray3

@janw20
Copy link
Collaborator Author

janw20 commented Apr 18, 2024

I wrote 4 benchmarks with the criterion crate where 500 random elements are inserted into a SparseArray3 or a PackedArray of shapes [40, 50, 30] or [10, 5, 7], and it seems like the performances of both are very similar, with PackedArray being about ~5-10% faster, depending on the RNG seed

In this way all necessary modifications are made uniformly (with late
`zip` it's possible to forget the first/second `skip`/`rev`
This should be more efficient since Rust can calculate the final size
and thus allocate only once
Copy link
Contributor

@cschwan cschwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is now in a very good shape, I'm going to merge it into master.

@cschwan cschwan merged commit 1455e6b into NNPDF:master Apr 25, 2024
@felixhekhorn
Copy link
Contributor

just for the sake of knowledge transfer: I think I understand the general idea (you're saving coordinates), but index_mut could only profit from some more comments 🙈 (as this is clearly a crucial part)

@janw20
Copy link
Collaborator Author

janw20 commented Apr 26, 2024

Hi Felix, do you mean /// documentation for the index_mut function? Or just in-line comments? Also, how is this normally done, should I commit to this pull request again, and afterward it will be again rebased? Or should I open a new branch on the main repo?

@cschwan
Copy link
Contributor

cschwan commented Apr 26, 2024

@janw20: I believe @felixhekhorn wants a better description of what's going inside index_mut. In that case document it with // inside the method, because the publicly available documentation (the one marked with ///) should only explain what the method does, not how. You can commit straight to master, let me know if you have permission problems.

@janw20
Copy link
Collaborator Author

janw20 commented Apr 26, 2024

I improved the documentation and committed it in 6a711bb to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants