Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft of potential masked array implementation. #849

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

andrei-papou
Copy link
Contributor

There are two files:
src/ma/mod.rs - masked array implementation, all the types and traits live there.
tests/ma.rs - a couple of tests that demonstrate the potential public API of masked array.

The main idea is to have a Mask trait which is pretty generic and can be implemented not just by ArrayBase, but by for example a set of whitelist/blacklist indices, set of whitelisted/blacklisted values, etc.

@andrei-papou andrei-papou mentioned this pull request Nov 13, 2020
@andrei-papou andrei-papou marked this pull request as draft November 16, 2020 12:07
@nilgoyette
Copy link
Collaborator

Is the endgoal of this PR (and further work) to be as close as possible to the ones in numpy? I ask because I'm a long time user of numpy, as is everyone in the company I work for, and nobody used this numy.ma, ever. I mean, we do use masked arrays all the times in medical imaging, mostly for ignoring invalid voxel (outside of brain, etc.), but the tools in numy.ma never had any appeal I guess.

In brief, I see 3 problems with this (not your PR spedifically, but the concept)

  1. AFAIK, it's mostly useless. This is my opinion and the result of a survey in my small company. I don't really believe it, but keep in mind that it's possible that we're simply ignorant of the right usage, or have simply never needed it.
  2. It forces the library to duplicate code. Like, mean for Array and MaskedArray, std-dev, etc. Maybe this is less of a hassle than I think.
  3. Most all(?) usages can be replaced with a select() or a Zip::from(arr).and(mask). That's what we're currently doing. in fact, I think that's the problem with numy.ma: it's super easy to avoid. You simply use arr[mask > 0] and some other indexing tools, et voila.

What I would gladly use is a Zip::masked function, like

Zip::from(&brain).and(&wathever).and(&mut out).masked(mask).par_apply(|&b, &w, o| {
    // No mask == false here, yay! But it's simply avoiding a if...
})

but this is somewhat irrelevant to the current discussion :)

@bluss
Copy link
Member

bluss commented Nov 28, 2020

I appreciate reading your sketch andrei, you're more productive than me, just having a go at a draft instead of trying to make something perfect.

I think it's been mentioned before yeah, the question whether to have masked arrays or masked operations on arrays. I dread the complexity of either. Thanks nilgoyette for the candid thoughts too.

I think we should start with masked operations. I think that's what a masked array type (if it were to exist) would need as basis anyway. And it allows having a separate mask too - which should hopefully be more efficent (packed or sparse bitmap?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants