-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where should we put our particles #18
Comments
Another solution would be to have two separated types: an "owning" molecule and a reference molecule. The owning molecule (called This allow users to manipulate particles through the molecules, without loosing cache locality. |
I am not sure if I got that. Could you elaborate about cache locality? I can see however that the current implementation is a bit uncomfortable to use. Are there problems with something like this (using horrible naming): // method of System
/// Returns the particles of the molecule with index `id`
#[inline]
pub fn particles_in_molecule(&self, id: usize) -> &[Particle] {
let range = self.molecules[id].first() .. self.molecules[id].size();
&self.particles[range]
} or using a // method of System
/// Returns the particles of the molecule `molecule`
#[inline]
pub fn particles_in_molecule(&self, molecule: &Molecule) -> &[Particle] {
let range = molecule.first() .. molecule.size();
&self.particles[range]
} |
I had written a long comment on this issue some days ago, but forgot to post it. Here it is again !
This can not be done, because But this gave me another idea: having a trait ParticleGroupe {
fn particles(&self) -> &[Particle];
fn bonds(&self) -> &[Bonds};
// maybe some other methods dealing with topology
}
// A molecule owning its atoms
struct Molecule {
particles: Vec<Particle>,
bonds: // ...
}
impl ParticleGroup for Molecule {
// ...
}
// The current system
struct System {
particles: Vec<Particle>,
molecules: // ...
}
impl ParticleGroup for System {
// ...
}
// A molecule with a reference to its atoms
struct MoleculeRef<'a> {
particles: &'a [Particle],
bonds: // ...
}
impl<'a> ParticleGroup for MoleculeRef<'a> {
// ...
}
impl System {
fn molecule<'a>(&'a self, i: usize) -> MoleculeRef<'a> {
// ...
}
} Then, we can use this trait in struct Energy;
impl Compute for Energy {
type Output = f64;
fn compute<P: ParticleGroup>(&self, particles: P) -> f64 {
...
}
}
struct CenterOfMass;
impl Compute for CenterOfMass {
type Output = Vector3D;
fn compute<P: ParticleGroup>(&self, particles: P) -> Vector3D {
// ...
}
} |
CPU have multiple level of memory available: slow memory with big capacity (the RAM), and fast memory with small capacity (the CPU L1, L2 and L3 caches). Accessing L1 cache is 4 CPU cycles, and accessing RAM is ~800 CPU cycles. But the RAM can be as big a TB of data, and the CPU L1 cache is 32 KB on Intel Haswell. The CPU also pre-fetch some RAM in the cache when accessing memory, to improve performance of further memory access. For more information concerning CPU cache and how to use them, see:
Here the idea is that we are often accessing our I hope I've been clear, if not I'll try again!
This could work, but will only give access to the particles. My proposition above is a bit more abstract, and gives access to both particles and molecular connectivity (bonds, angles, ...) The other thing I would like to improve here is working with molecules out of a System. For example, the
This is a bit dangerous, as there is no way to check if the molecule does come from this system or not. |
Thank you! So in essence, in the most performance critical parts you want to have all information that you need very close in memory? Your idea with // A molecule owning its atoms
struct Molecule {
particles: Vec<Particle>,
bonds: // ...
}
impl ParticleGroup for Molecule {
// ...
}
// The current system
struct System {
particles: Vec<Particle>,
molecules: // ...
} Is this how the "old way" would look like? After what you told me about the cache, I don't understand why there are
|
Yes.
The initial idea was to abstract the particles and the topology of the system. Also having interactions would be nice for computing forces and energy, but I think it will prevent us from using the ParticleGroup trait for other structs than a System: if you need all the particles, the topology and the interactions, you would be better using a System containing only the particles you want.
What I call Molecule in this example is a new kind of molecule, let's says
Exactly. In this example, there are both a reference molecule type (like |
I see. The challenge will be to provide these functionalities in a structured -- in the best case, easy to use -- manner. To be honest, I have to look up the interfaces of a
I'd say yes, if the "user code" involves writing stuff like new moves, propagators etc. |
Definitively! The initial idea would have been very hard to write, but I feel that we can manage to do something with the more recent ideas. |
After reading your proposal again, I am not sure I fully understand struct System {
particles: Vec<Particle>,
molecules: Vec<MoleculeRef>
} where Anyways, I really like the idea of a trait as an interface to get the particles of a group (we should work on that asap, it would make things cleaner and more elegant). Maybe we could even add a function that gives access to positions directly (which from my understanding could not be a |
Roughly. We can not have struct Bonding {
range: Range<usize>,
bonds: Vec<(usize, usize)>,
angles: Vec<(usize, usize, usize)>,
// ...
}
struct Molecule {
particles: Vec<Particle>,
bonding: Bonding
}
struct System {
particles: Vec<Particle>,
bondings: Vec<Bonding>
}
struct MoleculeRef<'a> {
particles: &'a [Particles],
bonding: &'a Bonding
}
impl System {
fn add_molecule(&mut self, molecule: Molecule {
self.particles.extend(molecule.particles);
self.bonding.push(molecule.bonding);
}
fn molecule(self, id: usize) -> MoleculeRef {
let bonding = &self.bondings[id];
MoleculeRef {
particles: &[self.particles[bonding.range],
bonding: bonding
}
}
}
That is right. But this design can be improved if anyone has more idea/suggestions.
I don't know if this would be worth it, as this would allocate new memory and make it slower. If the users need to collect the positions, they can do it by themselves. |
I though about the above again. From a "physical" point of view it makes a lot of sense to put all information of a particle together in a single struct, including position and velocities. Coming from Fortran (and a lot of legacy codes) I am used to see everything stored separately in a "struct of arrays" style, i.e. there is a vector with all particles' positions, one with all velocities, one with all masses, etc. As far as I understood what you wrote about the cache, this is a very effective way to store the data. For example, a great part of the time is spend on computing distances, where a lot of extra information is not needed (masses, particle names, IDs, velocities ...). I was wondering, did you think about using such a "struct of arrays" instead of "arrays of structs" approach when you designed |
It is fun that you are asking about this, because I have a proof-of-concept of using the upcoming macros 1.1 feature (stable in rustc 1.15) to generate the interface code to have all the cache niceties of using a struct of array, while retaining the API niceties of the current array of structs approach. I was inspired to look at this by this blog post, and it looks actually feasible. I will send the code to a branch here (it will only build using the beta compiler for a few more weeks), so that we can discuss the exact API. |
=) I also read that blog post, after I came across a vlog of Jonathan Blow where he discussed SoA vs AoS in his implementation of a programming language for games.
That's awesome! I am looking forward to see what you came up with. |
Currently, the particles are stored contiguously in a vector, and the molecules have an
usize
field pointing to the first particle of the molecule in this vector.Pro of this design
Cons of this design
Making the particles owned by molecules is not a solution, because it make it very hard to iterate over pairs in the whole system, and add an extra indirection layer.
A possible solution, with a POC implementation here would be to store a pointer to the first atom in the molecule, together with the molecule size. This would make it easier to work with molecular code, but add some
unsafe
usage.In particular, the pointer of all molecules must be updated when adding a new particle to the system, because the vector may allocate additional memory.
So is it worth it to add complexity in the library code in order to simplify the user code?
The text was updated successfully, but these errors were encountered: