Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parallel::with_min_len to limit splitting #1081

Merged
merged 4 commits into from
Oct 27, 2021
Merged

Add Parallel::with_min_len to limit splitting #1081

merged 4 commits into from
Oct 27, 2021

Conversation

adamreichold
Copy link
Collaborator

@adamreichold adamreichold commented Oct 6, 2021

Splitting of parallel iterators working on ArrayView and Zip is currently not limited and continues until only one element is left which can lead to excessive overhead for algorithms formulated in terms of these iterators.

Since these iterators are also not indexed, Rayon's generic IndexedParallelIterator::with_min_len does not apply. However, since the number of elements is known and currently checked against one to determine if another split is possible, it appears straight-forward to replace this constant by a parameter and make it available to the user via a Parallel::with_min_len inherent method.

}
}

const DEFAULT_MIN_LEN: usize = 1;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to make this default larger than one similar to the existing COLLECT_MAX_SPLITS.

Copy link
Member

@bluss bluss Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it generic, I think we have to keep it at one.

It must be possible to use [1, 2].par_iter() (in equivalent ndarray types) and get execution in two threads - we can't judge how heavy the computation is per element.

The max splits attacks it on the other end - it says there can be at most 1024 distinct jobs, i.e at most 1024-fold parallelism.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe min len should even be zero, as default.

Copy link
Collaborator Author

@adamreichold adamreichold Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe min len should even be zero, as default.

I don't think this would work as this would as this would imply that even a single element producer could be split which would imply splitting it into a single element and a zero element producer. Hence Rayon would go splitting off empty tasks without end due to the way the return values of UnindexedProducer::split are used.

It might make actually make sense to assert that the given min_len is never zero?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes, assertion would actually be good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes, assertion would actually be good

Done.

LSchueler pushed a commit to GeoStat-Framework/GSTools-Core that referenced this pull request Oct 12, 2021
This is a speed-up due to the call to `with_min_len` which however requires
indexed parallel iterators which `ndarray` provides only via `axis_iter` and
`axis_chunk_iter`, so this workaround is necessary until [1] is merged.

[1] rust-ndarray/ndarray#1081
}

impl<I> Parallel<I> {
pub fn with_min_len(self, min_len: usize) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new method requires a doc comment.

Parallel isn't really a type we've thought a lot about - it's just a wrapper to put the implementations on. Now we're making it more prominent. Is that a good idea? How do we change the docs for this?

A few of our Parallel<I> iterators already have with_min_len from IndexedParallelIterator, and this method now overrides that since it's an inherent method. To be careful, this is a breaking change but it might be an ok change - if the semantics are sufficiently compatible?

Copy link
Collaborator Author

@adamreichold adamreichold Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new method requires a doc comment.

Sorry, forgot about this and will add it.

Parallel isn't really a type we've thought a lot about - it's just a wrapper to put the implementations on. Now we're making it more prominent. Is that a good idea? How do we change the docs for this?

This is admittedly somewhat "hackish" being a workaround for some parallel iterators not being indexed.

From an API perspective it would probably be nicer to attach this or a similar method to the first level types wrapped by Parallel, i.e. ArrayView(Mut) and Zip. But I think this would imply that those would need to store the minimum split size on the eventuality that some code will run a parallel algorithm on them. On the other hand, this could be used to make the choice of minimum split size available to directly expose parallel algorithms like Zip::par_map_collect or ArrayBase::par_map_inplace. (I think limiting the splitting is often useful for tuning parallel versions of numerical algorithms.)

A few of our Parallel<I> iterators already have with_min_len from IndexedParallelIterator, and this method now overrides that since it's an inherent method. To be careful, this is a breaking change but it might be an ok change - if the semantics are sufficiently compatible?

I would say that the semantics are equivalent, i.e. I would probably copy Rayon's doc comment for the first issue. But I think the shadowing can be avoided by defining the method within the macros where they do not shadow any trait methods. (The min_len member is actually dead weight for indexed parallel wrappers because it is not used anywhere. A more intrusive change would be to use different parallel wrapper types for the different inner iterators but I am not sure this is worth the breakage.) ((If I fix this up to not shadow any trait methods, this should not be a breaking change, right?))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead weight in Parallel is not the end of the world - it's only used temporarily in most cases. Adding fields to ArrayView is of course not something we can do.

Yeah, not shadowing would work to make it non-breaking.

If this method is too hard to find, it may be worth to call it out on the general doc - either the doc comment for Parallel or for ndarray::parallel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to add the missing doc string (adapted from Rayon's with_min_len) and limited the method to those parallel wrappers where it does not shadow a trait method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this method is too hard to find, it may be worth to call it out on the general doc - either the doc comment for Parallel or for ndarray::parallel?

Expanded the module-level doc-comment since this already deals with other aspects of the same point, i.e. some parallel iterators being indexed while others are not.

@bluss bluss added this to the 0.16.0 milestone Oct 25, 2021
@bluss
Copy link
Member

bluss commented Oct 25, 2021

Also, clippy is making some noise when it reaches new versions, and it's probably best to not care too much about it - not about warnings unrelated to the PR. We will get back to being clippy-clean (sooner if PRs that fix just clippy are posted).

On the topic of clippy, I don't think every warning needs fixing, I don't always agree with their auto-judgment.

@adamreichold
Copy link
Collaborator Author

Also, clippy is making some noise when it reaches new versions, and it's probably best to not care too much about it - not about warnings unrelated to the PR. We will get back to being clippy-clean (sooner if PRs that fix just clippy are posted).

On the topic of clippy, I don't think every warning needs fixing, I don't always agree with their auto-judgment.

Will push the Clippy fix onto a separate PR then.

@bluss bluss modified the milestones: 0.16.0, 0.15.4 Oct 25, 2021
..self
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can deduplicate and write this one like this outside the macro - similar to the inherent method .size().

impl<Parts, D> Parallel<Zip<Parts, D>>
where
    D: Dimension,
{
    /// Sets the minimum number of elements desired to process in each thread. This will not be
    /// split any smaller than this length, but of course a producer could already be smaller to
    /// begin with.
    ///
    /// ***Panics*** if `min_len` is zero.
    pub fn with_min_len(self, min_len: usize) -> Self {
        assert_ne!(min_len, 0, "Minimum number of elements must at least be one to avoid splitting off empty tasks.");

        Self {
            min_len,
            ..self
        }
    }
}

Copy link
Member

@bluss bluss Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to quibble details, but rayon divides into "jobs" and submits them to the thread pool to work on. We can't limit the number of items processed per thread only per job, with threads eating multiple jobs each, if they can. The difference might be useful to understand for someone.

Oh I guess the wording comes from rayon, so it's their wording I'm complaining about..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if they agree with the change, but I hope so rayon-rs/rayon#897

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the inaccuracy of the wording. Will update the doc comment and move this outside of the macro.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed as suggested.

Splitting of parallel iterators working `ArrayView` and `Zip` is currently
not limited and continues until only one element is left which can lead to
excessive overhead for algorithms formulated in terms of these iterators.

Since these iterators are also not indexed, Rayon's generic
`IndexedParallelIterator::with_min_len` does not apply. However, since the
number of elements is known and currently checked against to one to determine if
another split is possible, it appears straight-forward to replace this constant
by a parameter and make it available to the user via a `Parallel::with_min_len`
inherent method.
…ld not shadow a trait method

Additionally, the min_len field does not affect those parallel iterators which
are indexed as the splitting positions are chosen by Rayon which has its own
`with_min_len` method to achieve the same effect.
…rallelIterator::with_min_len in the module-level docs.
@bluss
Copy link
Member

bluss commented Oct 27, 2021

Thanks for working on this, should be useful for many.

@bluss bluss merged commit 72e6798 into rust-ndarray:master Oct 27, 2021
@adamreichold adamreichold deleted the with-min-len branch October 27, 2021 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants