Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement N4409 on top of HPX #1141

Closed
47 tasks done
hkaiser opened this issue May 31, 2014 · 20 comments
Closed
47 tasks done

Implement N4409 on top of HPX #1141

hkaiser opened this issue May 31, 2014 · 20 comments

Comments

@hkaiser
Copy link
Member

hkaiser commented May 31, 2014

Implement N3989 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3989.html) on top of HPX. This finally would be the first step to expose parallel algorithms to application developers.

There is an updated version of the proposal document here (N4409): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4409.pdf.

As always, this implementation will add HPX specific functionality. Several extensions are obvious right away:

  • Add a new execution policy task_execution_policy on top of what the proposal mandates. The difference to the already described parallel_execution_policy would be that all algorithms would return a future<> representing the original result.
  • Add possibility to specify HPX executors to be used with the par and task execution policies. This could be done by adding par(exec) and task(exec) as valid execution policy arguments.
  • Add functionality enabling the use of all algorithms for distributed use cases (see also Extend parallel algorithms to work with hpx::partitioned_vector et.al. #1338).

Another possible extension needs more investigation:

  • Allow for all algorithms to be invoked with sequences of futures themselves, where the predicates/operators are invoked only after the corresponding future became ready.

Here is the list of algorithms mandated by the proposal:

These were added by N4310:

These were added to C++20:

@hkaiser hkaiser added this to the 0.9.9 milestone May 31, 2014
@Syntaf Syntaf changed the title Implement N3989 on top of HPX Implement N4071 on top of HPX Jul 31, 2014
@hkaiser hkaiser modified the milestones: 0.9.10, 0.9.9 Sep 13, 2014
@hkaiser hkaiser changed the title Implement N4071 on top of HPX Implement N4310 on top of HPX Dec 28, 2014
@hkaiser hkaiser modified the milestones: 1.0.0, 0.9.10 Feb 25, 2015
@hkaiser hkaiser changed the title Implement N4310 on top of HPX Implement N4352 on top of HPX Mar 5, 2015
@hkaiser hkaiser changed the title Implement N4352 on top of HPX Implement N4409 on top of HPX May 11, 2015
@hkaiser hkaiser mentioned this issue Dec 7, 2015
hkaiser added a commit that referenced this issue Apr 22, 2016
@diehlpk
Copy link
Member

diehlpk commented Jan 24, 2017

@hkaiser Could you please add a project description here https://github.com/STEllAR-GROUP/hpx/wiki/GSoC-2017-Project-Ideas

@taeguk
Copy link
Member

taeguk commented Mar 25, 2017

I'm preparing GSoC. I have a question.
When implementing parallel algorithms, can I allocate additional memory for optimization or parallelization of algorithms?
For some algorithms, there may be differences in implementation depending on whether additional memory allocation is allowed or not.

@hkaiser
Copy link
Member Author

hkaiser commented Mar 25, 2017

When implementing parallel algorithms, can I allocate additional memory for optimization or parallelization of algorithms?

That's a very good and controversial question. The maximum memory requirements for the parallel algorithms are not specified, only the computational complexity. While allocating memory itself does not change the computational complexity of an algorithm, often the fact that you allocate some intermediate buffer requires more data copying which in turn may change the complexity.

So the first rule of the game is not to exceed the complexity requirements as specified (e.g. don't make an algorithm O(N) if it's supposed to be O(logN), etc.). If an additional allocation does not (indirectly) change the complexity, then please make sure (that even for large data arrays) this does not blow the memory requirements out of proportion.

Generally, I'd suggest to try to avoid memory allocations as much as possible (in the first step) and use the implementation allowing to do things without. I understand that additional allocations may improve the algorithm performance, or even it's complexity, but I'd like to make an implementation correct first before diving into possible optimizations.

I know I have not given a concrete answer to your question. I guess it's a case by case decision we'll have to make as we go.

@hkaiser
Copy link
Member Author

hkaiser commented Mar 25, 2017

@taeguk Most of the missing algorithms are usually implemented based on a variation of a parallel scan. We already have a handful algorithms based on a scan_partitioner (e.g. copy_if, remove_copy, inclusive_scan, etc.). I'd expect for the rest of those to be easily implementable using this very same (and already existing) scan_partitioner. I'd suggest for you to familiarize yourself with how we have implemented those existing algorithms as reusing the partitioner would significantly simplify implementing the missing algorithms.

@msimberg
Copy link
Contributor

Marked inplace_merge as done because #2978 was merged.

@victor-ludorum
Copy link
Contributor

Hello @hkaiser !! As Many algorithms are already implemented . But I have made one list of the unimplemented algorithms .
copy_backward
equal_range
is_permutation
lower_bound
upper_bound
move_backward
prev_permutation
next_permutation
nth_element
partition_point
pop_heap
push_heap
sort_heap
stable_sort
partial_sort

and as we have one numeric adjacent_difference (numeric algorithm) ,
These three numeric algorithms can also be implemented
accumulate
inner_product
partial_sum

As I have started learning about HPX , I have checked that these algorithms haven't been implemented yet. Is the implementation of these algorithms are not important for parallelism and concurrency ?

@hkaiser
Copy link
Member Author

hkaiser commented Feb 18, 2018

@victor-ludorum the algorithms listed here above are the ones specified for C++17.

accumulate, inner_product, and partial_sum are listed under a different name (reduce, transform_reduce, and inclusive_scan).

Somebody already tried to implement the heap algorithms, but that was abandoned (see #1914), feel free to revive that effort.

The algorithms related to sort are listed as not-implemented in our list above (nth_element, partial_sort, etc.), feel free to take those on.

I'm not sure if it's possible to parallelize the permutation algorithms.

copy_backwards and move_backwards can easily be implemented on top of the existing copy and move algorithms (it requires at least or bi-directional iterators, so you could wrap the given iterators into reverse_iterator), alternatively we'd need a separate (but similar to copy/move) implementation.

I don't know what is the difference between partition and partition_point.

@victor-ludorum
Copy link
Contributor

Thanks @hkaiser sir !! I will definitely work on these algorithms. So numeric algorithms are already implemented . Remaining algorithms which is important can be implemented , I hope.

@hkaiser
Copy link
Member Author

hkaiser commented Jul 30, 2020

@fjtapia Just out of curiosity, we still have a couple of algorithms missing that are related to sorting (partial_sort, partial_sort_copy, and nth_element). Do you happen to have implementation available for those? Even some initial code would be very helpful. Our plan is to finally have all of the algorithms as specified by C++20 in place and these are the last missing pieces. Any help you could give would be most appreciated!

@fjtapia
Copy link

fjtapia commented Jul 30, 2020 via email

@hkaiser
Copy link
Member Author

hkaiser commented Jul 30, 2020

Hi Hartmut Glad to contact you again. I will prepare the implementation of the functions and send it to you to examine. But it will be at the end of August because in two days I am going on vacation. Please send me a list of the functions to implement. If they are single-thread or parallel, and if they have any conditions that must be taken into consideration. Regards Francisco

Francisco, thanks for getting back so quickly, and thanks for your interest in helping! As said in my initial message, we are missing the implementations for the parallel versions of the following algorithms: partial_sort, partial_sort_copy, and nth_element (I believe I would be able to derive the partial_sort_copy from a partial_sort implementation myself, if needed - if that simplifies things). Also, we can always fall back to the std library versions for the sequential algorithms, so there is no need for you to look into those.

@hkaiser hkaiser added this to the 1.6.0 milestone Aug 4, 2020
@msimberg msimberg removed this from the 1.6.0 milestone Jan 5, 2021
@hkaiser
Copy link
Member Author

hkaiser commented Nov 8, 2021

This has finally been done! Thanks to everybody who contributed to this task!

@hkaiser hkaiser closed this as completed Nov 8, 2021
@hkaiser hkaiser added this to the 1.8.0 milestone Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants