Skip to content

Commit

Permalink
Merge pull request #287 from insertinterestingnamehere/config
Browse files Browse the repository at this point in the history
Purge Various Unused Config Options
  • Loading branch information
insertinterestingnamehere authored Sep 27, 2024
2 parents da06b06 + 9dcca1a commit 6495403
Show file tree
Hide file tree
Showing 123 changed files with 446 additions and 11,751 deletions.
16 changes: 8 additions & 8 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ jobs:
image: ubuntu-2404:edge
resource_class: arm.medium
environment:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
steps:
- checkout
- run: |
Expand All @@ -50,8 +50,8 @@ jobs:
sudo apt-get install -y autoconf automake libtool
sudo apt-get install -y hwloc libhwloc-dev
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository -y 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main'
sudo apt-get install -y clang-18
sudo apt-add-repository -y 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main'
sudo apt-get install -y clang-19
- run: |
./autogen.sh
./configure --enable-picky --with-scheduler=<< parameters.scheduler >> -with-topology=<< parameters.topology >>
Expand All @@ -73,8 +73,8 @@ jobs:
image: ubuntu-2404:edge
resource_class: arm.medium
environment:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
CFLAGS: "-fsanitize=<< parameters.sanitizer >> -fno-sanitize-recover=all"
CXXFLAGS: "-fsanitize=<< parameters.sanitizer >> -fno-sanitize-recover=all"
LDFLAGS: "-fsanitize=<< parameters.sanitizer >> -fno-sanitize-recover=all"
Expand All @@ -88,8 +88,8 @@ jobs:
sudo apt-get install -y autoconf automake libtool
sudo apt-get install -y hwloc libhwloc-dev
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository -y 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main'
sudo apt-get install -y clang-18
sudo apt-add-repository -y 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main'
sudo apt-get install -y clang-19
- run: |
./autogen.sh
./configure --enable-picky --with-scheduler=<< parameters.scheduler >> -with-topology=<< parameters.topology >>
Expand Down
30 changes: 15 additions & 15 deletions .cirrus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -120,33 +120,33 @@ arm_linux_clang_task:
timeout_in: 5m
matrix:
env:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
QTHREADS_SCHEDULER: nemesis
QTHREADS_TOPOLOGY: no
env:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
QTHREADS_SCHEDULER: nemesis
QTHREADS_TOPOLOGY: hwloc
env:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
QTHREADS_SCHEDULER: sherwood
QTHREADS_TOPOLOGY: no
env:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
QTHREADS_SCHEDULER: sherwood
QTHREADS_TOPOLOGY: hwloc
env:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
QTHREADS_SCHEDULER: distrib
QTHREADS_TOPOLOGY: no
env:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
QTHREADS_SCHEDULER: distrib
QTHREADS_TOPOLOGY: hwloc
install_deps_script: |
Expand All @@ -157,9 +157,9 @@ arm_linux_clang_task:
gpg --no-default-keyring --keyring ./tmp.gpg --export --output llvm-snapshot.gpg
rm tmp.gpg
cp llvm-snapshot.gpg /etc/apt/trusted.gpg.d/llvm-snapshot.gpg # This is for CI so no need to do something more complicated to restrict key use to a specific repo.
apt-add-repository -y 'deb https://apt.llvm.org/bookworm/ llvm-toolchain-bookworm-18 main'
apt-add-repository -y 'deb https://apt.llvm.org/bookworm/ llvm-toolchain-bookworm-18 main' # Something's buggy upstream but running this twice fixes it.
apt-get install -y clang-18
apt-add-repository -y 'deb https://apt.llvm.org/bookworm/ llvm-toolchain-bookworm-19 main'
apt-add-repository -y 'deb https://apt.llvm.org/bookworm/ llvm-toolchain-bookworm-19 main' # Something's buggy upstream but running this twice fixes it.
apt-get install -y clang-19
apt-get install -y autoconf automake libtool
apt-get install -y hwloc libhwloc-dev
build_script: |
Expand Down
2 changes: 2 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
6901dc07127f54c060ec4046e21d05ccd7f437ab
3ddc9da40f8b34565c90d17ef83a9ef95a9deb18
24 changes: 13 additions & 11 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
continue-on-error: true
strategy:
matrix:
clang_version: [11, 12, 13, 14, 15, 16, 17]
clang_version: [11, 12, 13, 14, 15, 16, 17, 18]
scheduler: [nemesis, sherwood, distrib]
topology: [hwloc, binders, no]
include:
Expand All @@ -57,6 +57,8 @@ jobs:
gcc_version: 13
- clang_version: 17
gcc_version: 13
- clang_version: 18
gcc_version: 13
env:
CC: clang-${{ matrix.clang_version }}
CXX: clang++-${{ matrix.clang_version }}
Expand Down Expand Up @@ -249,8 +251,8 @@ jobs:
topology: [hwloc, binders, no]
use_libcxx: [false] # disable testing on libcxx since its effect seems very limited for now.
env:
CC: clang-18
CXX: clang++-18
CC: clang-19
CXX: clang++-19
CFLAGS: "-fsanitize=${{ matrix.sanitizer }} -fno-sanitize-recover=all"
CXXFLAGS: ${{ matrix.use_libcxx && format('-stdlib=libc++ -fsanitize={0} -fno-sanitize-recover=all', matrix.sanitizer) || format('-fsanitize={0} -fno-sanitize-recover=all', matrix.sanitizer) }}
LDFLAGS: "-fsanitize=${{ matrix.sanitizer }} -fno-sanitize-recover=all"
Expand All @@ -265,10 +267,10 @@ jobs:
- name: install compiler
run: |
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add - && break || sleep 1
sudo apt-add-repository 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main' && break || sleep 1
sudo apt-get install clang-18
sudo apt-add-repository 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main' && break || sleep 1
sudo apt-get install clang-19
- if: ${{ matrix.use_libcxx }}
run: sudo apt-get install libc++-18-dev libc++abi-18-dev
run: sudo apt-get install libc++-19-dev libc++abi-19-dev
- if: ${{ matrix.topology != 'no' }}
run: |
sudo apt-get install hwloc libhwloc-dev
Expand Down Expand Up @@ -297,8 +299,8 @@ jobs:
- compiler: gcc
use_libcxx: true
env:
CC: ${{ matrix.compiler == 'gcc' && 'gcc-14' || 'clang-18' }}
CXX: ${{ matrix.compiler == 'gcc' && 'g++-14' || 'clang++-18' }}
CC: ${{ matrix.compiler == 'gcc' && 'gcc-14' || 'clang-19' }}
CXX: ${{ matrix.compiler == 'gcc' && 'g++-14' || 'clang++-19' }}
CXXFLAGS: ${{ matrix.use_libcxx && '-stdlib=libc++' || '' }}
QTHREADS_ENABLE_ASSERTS: ${{ matrix.use_asserts && '--enable-asserts' || '' }}
steps:
Expand All @@ -309,10 +311,10 @@ jobs:
- if: ${{ matrix.compiler == 'clang' }}
run: |
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add - && break || sleep 1
sudo apt-add-repository 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main' && break || sleep 1
sudo apt-get install clang-18
sudo apt-add-repository 'deb https://apt.llvm.org/jammy/ llvm-toolchain-jammy-19 main' && break || sleep 1
sudo apt-get install clang-19
- if: ${{ matrix.use_libcxx }}
run: sudo apt-get install libc++-18-dev libc++abi-18-dev
run: sudo apt-get install libc++-19-dev libc++abi-19-dev
- if: ${{ matrix.topology != 'no' }}
run: |
sudo apt-get install hwloc libhwloc-dev
Expand Down
10 changes: 1 addition & 9 deletions README.affinity
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,7 @@ shepherds are allocated. It is implemented as a colon-seperated list of
cpustrings, where each cpustring is a shepherd. It is controlled by the
environment variable QT_CPUBIND. See src/affinity/README.binders for more details.

Failing that, other APIs are also supported, but since they provide less
information, they're used primarily for querying the number of cores and
pinning worker threads to them in a round-robin fashion (libnuma also supports
pinning memory).

Unless such an affinity library is available, worker threads are NOT pinned,
especially on Linux. Unfortunately, the usual sched_setaffinity() function that
Linux provides is not portable, and the differences in arguments across
different versions of Linux are difficult to detect.
Unless hwloc is available, worker threads are NOT pinned.

There are several environment variables that can be used to control CPU
affinity and parallelism. See the qthread_initialize() man page for details.
159 changes: 41 additions & 118 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,137 +1,60 @@
[![Build Status](https://travis-ci.org/Qthreads/qthreads.svg?branch=master)](https://travis-ci.org/Qthreads/qthreads)
QTHREADS
========

# WELCOME TO THE NEW HOME OF QTHREADS:
# https://github.com/sandialabs/qthreads
The Qthreads API is designed to make using large numbers of threads convenient and easy.
The Qthreads API also provides access to full/empty-bit (FEB) semantics,
where every word of memory can be marked either full or empty,
and a thread can wait for any word to attain either state.

QTHREADS!
=========
Qthreads is essentially a library for spawning and controlling stackful coroutines:
threads with small (4-8k) stacks.
The exposed user API resembles OS threads,
however the threads are entirely in user-space and use their locked/unlocked status as part of their scheduling.

The qthreads API is designed to make using large numbers of threads convenient
and easy. The API maps well to both MTA-style threading and PIM-style
threading, and is still quite useful in a standard SMP context. The qthreads
API also provides access to full/empty-bit (FEB) semantics, where every word of
memory can be marked either full or empty, and a thread can wait for any word
to attain either state.
The library's metaphor is that there are many Qthreads and several "shepherds".
Shepherds generally map to specific processors or memory regions,
but this is not an explicit part of the API.
Qthreads are assigned to specific shepherds and are only allowed to migrate
when running on a scheduler that supports work stealing
or when migration is explicitly triggered via user APIs.

The qthreads library on an SMP is essentially a library for spawning and
controlling coroutines: threads with small (4-8k) stacks. The threads are
entirely in user-space and use their locked/unlocked status as part of their
scheduling.

The library's metaphor is that there are many qthreads and several "shepherds".
Shepherds generally map to specific processors or memory regions, but this is
not an explicit part of the API. Qthreads are assigned to specific shepherds
and do not generally migrate.

The API includes utility functions for making threaded loops, sorting, and
similar operations convenient.
The API includes utility functions for making threaded loops, sorting, and similar operations convenient.

## Collaboration

Need help or interested in finding out more? Join us on our Slack channel: https://join.slack.com/t/qthreads/signup
Need help or interested in finding out more? Join us on our Slack channel: https://join.slack.com/t/Qthreads/signup

## Performance
## Compatibility

On a machine with approximately 2GB of RAM, this library was able to spawn and
handle 350,000 qthreads. With some modifications (mostly in stack-size), it was
able to handle 1,000,000 qthreads. It may be able to do more, but swapping will
become an issue, and you may start to run out of address space.
Millions of Qthreads should run fine even on a machine with a modest amount of RAM.
Generally the primary limit to the number of threads that can be spawned is memory use.

This library has been tested, and runs well, on a 64-bit machine. It is
occasionally tested on 32-bit machines, and has even been tested under Cygwin.
This library has been tested, and runs well, on 64-bit ARM and X-86 machines.
32-bit versions of those architectures as well as PowerPC-based architectures may also work.

Currently, the only real limiting factor on the number of threads is the amount
of memory and address space you have available. For more than 2^32 threads, the
thread_id value will need to be made larger (or eliminated, as it is not
*required* for correct operation by the library itself).
This library is compatible with most Linux variants as well as OSX.
There is some preliminary support for BSD operating systems.
Windows is not currently supported.

For information on how to use qthread or qalloc, there is A LOT of information
in the header files (qthread.h and qalloc.h), but the primary documentation is
man pages.
## Building Qthreads

## FUTURELIB DOCUMENTATION (the 10-minute version)
Qthreads currently relies on autotools, so automake, autoconf, and libtool are required for building from source.
Hwloc is also highly recommended.

The most important functions in futurelib that a person is going to use are
mt_loop and mt_loop_returns. The mt_loop function is for parallel iterations
that do not return values, and the mt_loop_returns function is for parallel
iterations that DO return values. The distinction is not always so obvious.
The following compilers are supported and tested regularly:
- gcc 9 or later
- clang 11 or later
- icc (last supported release)
- icx 2023 or later
- aocc 4.2 or later
- acfl 24.04
- Apple clang 15.4 or later

`mt_loop` is used in a format like so:
```
mt_loop<...argtypelist..., looptype>
(function, ...arglist..., startval, stopval, stepval);
```
The "stepval" is optional, and defaults to 1.

Essentially what you're doing is in the template setup (in the <>) you're
specifying how to handle the arguments to the parallel functions and what kind
of parallelism you want. Options for 'looptype' (i.e. the kind of parallelism)
are:

`mt_loop_traits::Par` - fork all iterations, wait for them to finish
`mt_loop_traits::ParNoJoin` - same as Par, but without the waiting
`mt_loop_traits::Future` - a resource-constrained version of par, will limit
the number of threads running at a given time
`mt_loop_traits::FutureNoJoin` - same as Future, but without waiting for
threads to finish

The argtypelist is a list of conceptual types defining how the arguments to the
parallel function will be handled. Use one conceptual type per argument, in the
order the arguments will be passed. Valid conceptual types are:

Iterator - The parallel function will be called with the current loop
iteration number passed into this argument.
ArrayPtr - The corresponding argument is a pointer to an array, and each
iteration will be passed the value of array[iteration]
Ref - The corresponding argument will be passed as a reference.
Val - The corresponding argument will be passed as a constant value
(i.e. the same value will be passed to all iterations)

For example, doing this:
```
for (int i = 0; i < 10; i++) {
array[i] = i;
}
```
Would be achieved like so:
```
void assign(int &array_value, const int i) {
array_value = i;
}
To configure and build from source you can run (in the source directory):

mt_loop<ArrayPtr, Iterator, mt_loop_traits::Par>
(assign, array, 0, 0, 10);
```
The `mt_loop_returns` variant adds the specification of what to do with the
return values. The pattern is like this:
```
mt_loop_returns<returnvaltype, ...argtypelist..., looptype>
(retval, function, ...args..., start, stop, step);
```
The only difference is in the returnvaltype and the retval. The returnvaltype
can be either an ArrayPtr or a Collect. If it is an ArrayPtr, the loop will
behave similar to the following loop:
```
for (int i = start; i < stop; i += step) {
retval[i] = function(args);
}
```
Each return value will be stored in a separate entry in the retval array. The
Collect type is more interesting, and can be either:

`Collect<mt_loop_traits::Add>` - this sums all of the return values in
parallel
`Collect<mt_loop_traits::Sub>` - this subtracts all of the return values in
parallel. Note that the answer may be nondeterministic.
`Collect<mt_loop_traits::Mult>` - this multiplies all of the
return values in parallel
`Collect<mt_loop_traits::Div>` - this divides all of the
return values in parallel. Note that the answer is nondeterministic.

For example, `Collect<mt_loop_traits::Add>` is rougly equivalent to the following loop:
```
for (int i = start; i < stop; i += step) {
retval += function(args);
}
./autogen.sh # not necessary if you're building from a release tarball instead of directly form the github repository
./configure
make -j
```

Loading

0 comments on commit 6495403

Please sign in to comment.