-
Notifications
You must be signed in to change notification settings - Fork 34
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #287 from insertinterestingnamehere/config
Purge Various Unused Config Options
- Loading branch information
Showing
123 changed files
with
446 additions
and
11,751 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
6901dc07127f54c060ec4046e21d05ccd7f437ab | ||
3ddc9da40f8b34565c90d17ef83a9ef95a9deb18 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,137 +1,60 @@ | ||
[![Build Status](https://travis-ci.org/Qthreads/qthreads.svg?branch=master)](https://travis-ci.org/Qthreads/qthreads) | ||
QTHREADS | ||
======== | ||
|
||
# WELCOME TO THE NEW HOME OF QTHREADS: | ||
# https://github.com/sandialabs/qthreads | ||
The Qthreads API is designed to make using large numbers of threads convenient and easy. | ||
The Qthreads API also provides access to full/empty-bit (FEB) semantics, | ||
where every word of memory can be marked either full or empty, | ||
and a thread can wait for any word to attain either state. | ||
|
||
QTHREADS! | ||
========= | ||
Qthreads is essentially a library for spawning and controlling stackful coroutines: | ||
threads with small (4-8k) stacks. | ||
The exposed user API resembles OS threads, | ||
however the threads are entirely in user-space and use their locked/unlocked status as part of their scheduling. | ||
|
||
The qthreads API is designed to make using large numbers of threads convenient | ||
and easy. The API maps well to both MTA-style threading and PIM-style | ||
threading, and is still quite useful in a standard SMP context. The qthreads | ||
API also provides access to full/empty-bit (FEB) semantics, where every word of | ||
memory can be marked either full or empty, and a thread can wait for any word | ||
to attain either state. | ||
The library's metaphor is that there are many Qthreads and several "shepherds". | ||
Shepherds generally map to specific processors or memory regions, | ||
but this is not an explicit part of the API. | ||
Qthreads are assigned to specific shepherds and are only allowed to migrate | ||
when running on a scheduler that supports work stealing | ||
or when migration is explicitly triggered via user APIs. | ||
|
||
The qthreads library on an SMP is essentially a library for spawning and | ||
controlling coroutines: threads with small (4-8k) stacks. The threads are | ||
entirely in user-space and use their locked/unlocked status as part of their | ||
scheduling. | ||
|
||
The library's metaphor is that there are many qthreads and several "shepherds". | ||
Shepherds generally map to specific processors or memory regions, but this is | ||
not an explicit part of the API. Qthreads are assigned to specific shepherds | ||
and do not generally migrate. | ||
|
||
The API includes utility functions for making threaded loops, sorting, and | ||
similar operations convenient. | ||
The API includes utility functions for making threaded loops, sorting, and similar operations convenient. | ||
|
||
## Collaboration | ||
|
||
Need help or interested in finding out more? Join us on our Slack channel: https://join.slack.com/t/qthreads/signup | ||
Need help or interested in finding out more? Join us on our Slack channel: https://join.slack.com/t/Qthreads/signup | ||
|
||
## Performance | ||
## Compatibility | ||
|
||
On a machine with approximately 2GB of RAM, this library was able to spawn and | ||
handle 350,000 qthreads. With some modifications (mostly in stack-size), it was | ||
able to handle 1,000,000 qthreads. It may be able to do more, but swapping will | ||
become an issue, and you may start to run out of address space. | ||
Millions of Qthreads should run fine even on a machine with a modest amount of RAM. | ||
Generally the primary limit to the number of threads that can be spawned is memory use. | ||
|
||
This library has been tested, and runs well, on a 64-bit machine. It is | ||
occasionally tested on 32-bit machines, and has even been tested under Cygwin. | ||
This library has been tested, and runs well, on 64-bit ARM and X-86 machines. | ||
32-bit versions of those architectures as well as PowerPC-based architectures may also work. | ||
|
||
Currently, the only real limiting factor on the number of threads is the amount | ||
of memory and address space you have available. For more than 2^32 threads, the | ||
thread_id value will need to be made larger (or eliminated, as it is not | ||
*required* for correct operation by the library itself). | ||
This library is compatible with most Linux variants as well as OSX. | ||
There is some preliminary support for BSD operating systems. | ||
Windows is not currently supported. | ||
|
||
For information on how to use qthread or qalloc, there is A LOT of information | ||
in the header files (qthread.h and qalloc.h), but the primary documentation is | ||
man pages. | ||
## Building Qthreads | ||
|
||
## FUTURELIB DOCUMENTATION (the 10-minute version) | ||
Qthreads currently relies on autotools, so automake, autoconf, and libtool are required for building from source. | ||
Hwloc is also highly recommended. | ||
|
||
The most important functions in futurelib that a person is going to use are | ||
mt_loop and mt_loop_returns. The mt_loop function is for parallel iterations | ||
that do not return values, and the mt_loop_returns function is for parallel | ||
iterations that DO return values. The distinction is not always so obvious. | ||
The following compilers are supported and tested regularly: | ||
- gcc 9 or later | ||
- clang 11 or later | ||
- icc (last supported release) | ||
- icx 2023 or later | ||
- aocc 4.2 or later | ||
- acfl 24.04 | ||
- Apple clang 15.4 or later | ||
|
||
`mt_loop` is used in a format like so: | ||
``` | ||
mt_loop<...argtypelist..., looptype> | ||
(function, ...arglist..., startval, stopval, stepval); | ||
``` | ||
The "stepval" is optional, and defaults to 1. | ||
|
||
Essentially what you're doing is in the template setup (in the <>) you're | ||
specifying how to handle the arguments to the parallel functions and what kind | ||
of parallelism you want. Options for 'looptype' (i.e. the kind of parallelism) | ||
are: | ||
|
||
`mt_loop_traits::Par` - fork all iterations, wait for them to finish | ||
`mt_loop_traits::ParNoJoin` - same as Par, but without the waiting | ||
`mt_loop_traits::Future` - a resource-constrained version of par, will limit | ||
the number of threads running at a given time | ||
`mt_loop_traits::FutureNoJoin` - same as Future, but without waiting for | ||
threads to finish | ||
|
||
The argtypelist is a list of conceptual types defining how the arguments to the | ||
parallel function will be handled. Use one conceptual type per argument, in the | ||
order the arguments will be passed. Valid conceptual types are: | ||
|
||
Iterator - The parallel function will be called with the current loop | ||
iteration number passed into this argument. | ||
ArrayPtr - The corresponding argument is a pointer to an array, and each | ||
iteration will be passed the value of array[iteration] | ||
Ref - The corresponding argument will be passed as a reference. | ||
Val - The corresponding argument will be passed as a constant value | ||
(i.e. the same value will be passed to all iterations) | ||
|
||
For example, doing this: | ||
``` | ||
for (int i = 0; i < 10; i++) { | ||
array[i] = i; | ||
} | ||
``` | ||
Would be achieved like so: | ||
``` | ||
void assign(int &array_value, const int i) { | ||
array_value = i; | ||
} | ||
To configure and build from source you can run (in the source directory): | ||
|
||
mt_loop<ArrayPtr, Iterator, mt_loop_traits::Par> | ||
(assign, array, 0, 0, 10); | ||
``` | ||
The `mt_loop_returns` variant adds the specification of what to do with the | ||
return values. The pattern is like this: | ||
``` | ||
mt_loop_returns<returnvaltype, ...argtypelist..., looptype> | ||
(retval, function, ...args..., start, stop, step); | ||
``` | ||
The only difference is in the returnvaltype and the retval. The returnvaltype | ||
can be either an ArrayPtr or a Collect. If it is an ArrayPtr, the loop will | ||
behave similar to the following loop: | ||
``` | ||
for (int i = start; i < stop; i += step) { | ||
retval[i] = function(args); | ||
} | ||
``` | ||
Each return value will be stored in a separate entry in the retval array. The | ||
Collect type is more interesting, and can be either: | ||
|
||
`Collect<mt_loop_traits::Add>` - this sums all of the return values in | ||
parallel | ||
`Collect<mt_loop_traits::Sub>` - this subtracts all of the return values in | ||
parallel. Note that the answer may be nondeterministic. | ||
`Collect<mt_loop_traits::Mult>` - this multiplies all of the | ||
return values in parallel | ||
`Collect<mt_loop_traits::Div>` - this divides all of the | ||
return values in parallel. Note that the answer is nondeterministic. | ||
|
||
For example, `Collect<mt_loop_traits::Add>` is rougly equivalent to the following loop: | ||
``` | ||
for (int i = start; i < stop; i += step) { | ||
retval += function(args); | ||
} | ||
./autogen.sh # not necessary if you're building from a release tarball instead of directly form the github repository | ||
./configure | ||
make -j | ||
``` | ||
|
Oops, something went wrong.