Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RISC-V Vector Extension #3

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mschlaegl
Copy link

(corresponds to
https://sourceforge.net/p/png-mng/mailman/message/37402426/ on the png-mng-implement mailing list and
pnggroup/libpng#405 pull request glennrp/libpng)

Hello,

The long awaited RISC-V Vector Extension (RVV) is finally in
ratification.1 This pull request includes two changes:

  • INSTALL: added missing contrib content
    • Contains a simple fix in INSTALL but is necessary for
      applying the main patch
  • libpng: Add support for RISC-V Vector Extension
    • Support for RISC-V Vector Extension

It supports RVV versions: 0.7.1, 0.8, 0.9, 0.10 and 1.0.
Development and performance evaluation of the filter type
implementations (up, sub, avg and paeth) was done on Allwinner D1 with
RVV 0.7.1 using RVVRadar. Details on that can be found in an article
published on github
2.

The integration in libpng was done very similarly as for other
architectures (ARM NEON, MIPS MSA, ...): There exists a configure
switch "riscv-vector" which can be set to no/off, check, api, yes/on.
And there is also run-time checking code (like for ARM NEON,
MIPS MSA, ...).

More important are the differences:

  1. We introduce a separate configure switch "riscv-vector-compat"
  2. We use inline assembler instead of compiler intrinsics

Both is done for the same reason: To support multiple RVV versions.

As mentioned above, RVV 1.0 is under ratification now. But there are
earlier draft versions which are sporadically already implemented in
real existing hardware. One example is the Allwinner D1 with RVV 0.7.1
which was used for performance evaluation and optimization of these
filter implementations.2 We therefore decided to support the
following RVV versions: 0.7.1, 0.8, 0.9, 0.10 and 1.0.

Unfortunately, there is no easy way to detect the RVV version
supported by the toolchain. It was therefore necessary to add an
additional configure switch "riscv-vector-compat" with following
behavior:

  • not-set/off(default): RVV release 1.0
  • set to "0.10": RVV draft 0.10
  • set to "0.9": RVV draft 0.9
  • set to "0.8": RVV draft 0.8
  • set to "0.7.1": RVV draft 0.7.1

Furthermore compiler intrinsics were not available for all supported
RVV versions. For this reason we had to use inline assembler instead
of compiler intrinsics.

The filter implementations for all supported RVV versions were tested
in the following setups:

RVV Version Branch of riscv-gnu-toolchain3 Test system
1.0 "basic-rvv" qemu_sifive 4
0.10 "rvv-intrinsic" qemu_sifive 4
0.9 "rvv-0.9.x" Allwinner D1 (binary compatible)
0.8 "rvv-0.8.x" Allwinner D1 (binary compatible)
0.7.1 "rvv-0.7.1" Allwinner D1 (RVV 0.7.1)

References:

Footnotes

  1. https://riscv.org/announcements/2021/12/riscv-ratifies-15-new-specifications/

  2. https://github.com/mschlaegl/libpng_rvv-doc/blob/main/README.md 2

  3. https://github.com/riscv-collab/riscv-gnu-toolchain

  4. https://github.com/sifive/qemu/tree/rvv-1.0-upstream-v10 2

The long awaited RISC-V Vector Extension (RVV) is finally in
ratification.[1] With this we add support for RVV optimized png filter
types to libpng.

Development and performance evaluation of the filter type
implementations (up, sub, avg and paeth) was done on Allwinner D1 with
RVV 0.7.1 using RVVRadar. Details on that can be found in an article
published on github.[2]

The integration in libpng was done very similarly as for other
architectures (ARM NEON, MIPS MSA, ...): There exists a configure
switch "riscv-vector" which can be set to no/off, check, api, yes/on.
And there is also run-time checking code (like for ARM NEON,
MIPS MSA, ...).

More important are the differences:
 1. We introduce a separate configure switch "riscv-vector-compat"
 2. We use inline assembler instead of compiler intrinsics

Both is done for the same reason: To support multiple RVV versions.

As mentioned above, RVV 1.0 is under ratification now. But there are
earlier draft versions which are sporadically already implemented in
real existing hardware. One example is the Allwinner D1 with RVV 0.7.1
which was used for performance evaluation and optimization of these
filter implementations.[2] We therefore decided to support the
following RVV versions: 0.7.1, 0.8, 0.9, 0.10 and 1.0.

Unfortunately, there is no easy way to detect the RVV version
supported by the toolchain. It was therefore necessary to add an
additional configure switch "riscv-vector-compat" with following
behavior:
 * not-set/off(default): RVV release 1.0
 * set to "0.10": RVV draft 0.10
 * set to "0.9": RVV draft 0.9
 * set to "0.8": RVV draft 0.8
 * set to "0.7.1": RVV draft 0.7.1

Furthermore compiler intrinsics were not available for all supported
RVV versions. For this reason we had to use inline assembler instead
of compiler intrinsics.

The filter implementations for all supported RVV versions were tested
in the following setups:

 RVV     | Branch of              | Test system
 Version | riscv-gnu-toolchain[3] |
---------------------------------------------------------------------
 1.0     | "basic-rvv"            | qemu_sifive [4]
 0.10    | "rvv-intrinsic"        | qemu_sifive [4]
 0.9     | "rvv-0.9.x"            | Allwinner D1 (binary compatible)
 0.8     | "rvv-0.8.x"            | Allwinner D1 (binary compatible)
 0.7.1   | "rvv-0.7.1"            | Allwinner D1 (RVV 0.7.1)

References:
-----------
[1] https://riscv.org/announcements/2021/12/riscv-ratifies-15-new-specifications/
[2] https://github.com/mschlaegl/libpng_rvv-doc/blob/main/README.md
[3] https://github.com/riscv-collab/riscv-gnu-toolchain
[4] https://github.com/sifive/qemu/tree/rvv-1.0-upstream-v10
@ioquatix
Copy link
Collaborator

ioquatix commented Jul 27, 2022

This generally looks good to me. I have some questions.

What kind of deprecation strategy should we (if any) employ for the older RVV implementations? Is it logical to support the draft RVV implementations - i.e. what's the expected usage? Since there is a maintenance overhead, let's try to focus on the most value for the least effort?

Regarding run-time detection, I'd prefer if we make it as easy as possible (automatic) for users to get the best performance. I recently merged a change to do this for NEON since this now appears to be standardised (compared to when it was first introduced several years ago). How much configuration do we need and can we make the default "just work"?

Is there any way we can introduce CI testing for this architecture? This repo doesn't have any CI setup yet, I plan to do it over the next few days, but the number of permutations we need to test every configuration might be extensive.

@mschlaegl
Copy link
Author

Thanks!

The general motivation for libpng_rvv is to somehow push RVV in FOSS by showing its benefits. RVV is generally still very young. Support for it is not yet included in vanilla toolchains or kernels, but there are already patches waiting to be integrated. It seems that every project is waiting for another to integrate it. So the idea was to start somewhere where it brings benefits. :-)

Regarding your questions:

  • RVV Drafts/Maintenance: Short Answer: Supporting multiple RVV drafts is really simple and practically very useful in this case (= a low hanging fruit ;-) ). More detailed:

    • To the best of my knowledge, there is no real hardware implementing RVV 1.0 yet, but there is hardware implementing older drafts. Such hardware is spreading more and more -- for example the (very affordable) Allwinner D1 (RVV 0.7.1). So to show the benefit of RVV we should support such targets.
    • For the algorithms implemented in libpng it is very easy to provide compatibility to different drafts, because it can be done by simple assembler mnemonic replacement. This is much simpler than for example handling different msa versions in mips/filter_msa_intrinsics.c.
  • Runtime detection:
    I did some research on this and simply implemented the extension to behave very similarly to the other similar extension (assuming there is a reason to do it this way).

    • Regarding Neon (only FYI): As far as I remember from my research, it was done this way for two reasons:
      1. Autodetection from /proc/cpuinfo was not reliable (may be not true anymore)
      2. Autodetection is only implemented for specific targets (Linux, Android), so a build with default settings for other targets may break (but i don't know if this is still such a big issue).
    • Regarding RVV: The behavior is this way (from my tests)
      * without v in march (e.g. CFLAGS=... -march=rv64g)
         * without --enable-riscv-vector -> ok (no riscv code included)
         * <VALUE>=off -> ok (no riscv code included)
         * <VALUE>=on -> ok (riscv intrinsics included -> build fails as expected)
         * <VALUE>=api -> ok (no riscv code included)
         * <VALUE>=check -> ok (no riscv code included)
      * with v in march (e.g. CFLAGS=... -march=rv64gv)
         * without --enable-riscv-vector -> ok (riscv intrinsics not included -> build ok)
         * <VALUE>=off -> ok (riscv code not included)
         * <VALUE>=on -> ok (riscv intrinsics included -> build ok)
         * <VALUE>=api -> ok (riscv intrinsics included -> build ok) -> Autodetection on build time works as expected! 
         * <VALUE>=check -> ok (riscv intrinsics and runtime check included -> build ok) -> Autodetection on build time works as expected!
      
      If we would like to have similar behavior than your neon change, we could simply switch the default from "off" to "check".
      BUT: There is still the issue, that neither /proc/cpuinfo nor the toolchain does provide any information on what RVV version is available (but I expect this to change in the future). -> The user has to select the RVV version anyways.
      So in worst case such a change would lead to a default build having invalid invalid instructions.
      In general, I like your approach to activate optimizations by default (it would also eliminate things like this: buildroot libpng.mk L19 ;-)), but for now I would prefer to keep it the way it is: Expert users, that know their setup regarding RVV activate the support manually.
      I plan to follow this up more closely in the future anyway. As soon as something changes, I would make a change proposal. Would that be acceptable to you?
  • CI: Not yet, but I've thought about this too, since I had to do all my tests manually. It's a good idea to automate things.
    Have you any concrete plans on this?
    To test RVV, we would need at least a RISC-V RVV capable toolchain to test the above configurations, e.g. by static checking for RVV instructions and the auto-detection routine in the resulting .so.
    For runtime tests we would need a running virtual environment (e.g. qemu with buildroot). We would need this for every supported architecture .. I think this is pretty overkill. What do you thing?

(Finally, in case it's not clear yet, I of course plan to continue taking care of the RVV part of libpng in the future.)

@mschlaegl
Copy link
Author

mschlaegl commented Jul 27, 2022

Update/Correction: I took a closer look at 1fd891b (Add compile-time detection of neon instruction set.)
The default was changed to "detect", not "check" as I assumed above.
Still I would keep "off" as default for RVV at least for now. (see above argumentation)

BTW: Your neon change only considers cmake, but there are no corresponding changes in the rest of the build system e.g. autotools (configure.ac, ...) or am I missing something?

@ioquatix
Copy link
Collaborator

ioquatix commented Jul 27, 2022

BTW: Your neon change only considers cmake, but there are no corresponding changes in the rest of the build system e.g. autotools (configure.ac, ...) or am I missing something?

I need to think about this a bit more, the immediate need was cmake.

I don't think runtime check is a good option - it's slow, and it can be hard to implement / unreliable source of truth.

I would like to think about it from the user's point of view.

Most users want a default for (1) development build (optimisations off) (2) deployment build (optimisations on). Most users don't care about the specific optimisations being performed, but usually there are some flags to control the specifics. But for a "library" I think we want a convenient default for both (1) and (2) above.

Neon is established enough that handling (1) and (2) above should be fairly straight forward. By the way (2) shouldn't mean "turn on all the optimisations" just the way -O3 doesn't always turn on all optimisations.

For people building the library, for a given platform on a given OS, we should be able to do reasonably good defaults for a high performance libpng.

Regarding your changes:

@@ -163,6 +163,30 @@ if(CMAKE_SYSTEM_PROCESSOR MATCHES "mipsel*" OR
endif()
endif()

# Set definitions and sources for RISC-V.
if(CMAKE_SYSTEM_PROCESSOR MATCHES "^riscv*")
set(PNG_RISCV_VECTOR_POSSIBLE_VALUES check on off)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a detect option here makes sense - detect the best way for the current compiler, and it should be the default. If it's not possible, I accept we can't do it. But for users, it won't be easy.

@jbowler
Copy link

jbowler commented Dec 29, 2023

It's not clear why this hasn't been merged. RISC-V is an important development in processor architecture and failing to support it is a significant failure of libpng; why is that? This issue has been sitting in the queue for 18 months now, yes, I understand code correctness is important, but honestly, why is this taking so long?

All three (or more) build systems are there and are used, so they have to be supported!

@ioquatix
Copy link
Collaborator

@jbowler do you have time to review this PR?

Copy link

@jbowler jbowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a source-code only review; I have not even done a "gh pr".

This is ONLY a review of the build system changes and is based on my understanding of glennrp/libpng; not libpng. As such the changes seem correct and complete (or, at least, sufficiently complete). Assuming existing CI tests pass (i.e. other architectures) the changes should be safe insofar as they won't break anything other than RISC-V itself.

The PR includes generated files (configure, Makefile.in). These should not be included (the maintainer regenerates them).

The PR does not include changes to scripts/pnglibconf.dfa. These should be provided to allow the hardware stuff to be disabled at compile (build/configure) time via pngusr.dfa or DFA_XTRA

riscv/filter_vector_intrinsics.c includes <stdio.h> and <stdint.h> <stdint.h> is not an ANSI-C header so must not be there (see the recent change to the MIPS files) and stdio.h is not available on standalone (unhosted) systems. Both seem to be spurious.

Reading /proc/cpuinfo sucks; it's not "widely available", the whole body of code is going to be #ifdef disabled on Android (or maybe it will crash; that is what used to happen), MacOS, all the BSDs all legacy Unix systems etc.

Apparently the code won't work at all on most operating systems and, worse, it looks like the commit itself will cause compilation of libpng to fail on a RISC-V BSD. If my analysis is correct this is a MUST fix.

Bottom line: any commit should not break systems which currently build and any system must build without additional configuration. So hardware optimizations must build correctly on every system where they default to be on.

EDIT: stuff might build if linux is included in pngpriv.h, but I'm not sure whether this would be acceptable to the maintainer and it's still the case that the "option" setting would cause a #error if I'm right.

@jbowler
Copy link

jbowler commented Dec 29, 2023

@jbowler do you have time to review this PR?

There are a few things in there I didn't notice before, so I requested changes. I'm not sure about my analsysis; RISC-V automated CI tests would help as they would allow automatic test of RISC-V builds on supported operating systems, well, Linux, BSD (I assume RISC-V supports both), Windows (if supported), MacOS (if supported). As it is unless you are running qemu for check-in tests yourself it's only possible to do a cross-build (which would detect the issues I identified).

@ioquatix
Copy link
Collaborator

ioquatix commented Dec 29, 2023

Does GitHub Actions CI support RISC-V? - Is it easy to run CI in QEMU?

@jbowler
Copy link

jbowler commented Dec 29, 2023

Does GitHub Actions CI support RISC-V? -

From the link you found and from https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration it looks like RISC-V is a work-in-progress but that "self hosted" runners might be provided by RISC-V guys pretty soon now. @mschlaegl might care to comment.

I've never implemented any of this stuff; I've just seen what happens to my own pull requests when it is implemented and it struck me as pretty good news from a maintainer point of view. Sometimes tests fail for reasons apparently unrelated to the pull request itself but maintainers can ignore the failures if necessary (so far as I could see from the requestor viewpoint.)

I've seen CI tests set up for a mixture of operating systems, Ubuntu, Windows (MinGW), MacOS. I don't know about the underlying architectures used, for example I couldn't tell if MacOS was ARM or x86 and I didn't see documentation of the Ubuntu arch. Ubuntu is certainly supported on multiple architectures that have hardware optimizations in libpng; x86 ARM, ARM64, apparently Riscv64 (not 32), possibly not MIPS.

Is it easy to run CI in QEMU?

I've assumed that it is docker-like, so there always is an emulator there (it's not running directly on real hardware). QEMU is just relevant for local testing, docker would work too (I"ve run docker on my x86 NAS to host an ARM implementation of a database).

Automating the stuff, however, is pretty much essential; when I was doing libpng development I was also doing testing on a subset of the systems and that was, effectively, a full time job. At that time Glenn was still alive and he was full time too. I don't know how much Cosmin has automated but it doesn't use the github "continuous integration" stuff.

@ioquatix
Copy link
Collaborator

I am strongly in support of CI, it's more a matter of how we can implement it for RISC-V.

@dkurt
Copy link

dkurt commented Jan 18, 2024

If you might find it useful, I were able to build GitHub runner for RISC-V. Step command as well as actions which depend on Node.js 20.x works fine (tried actions/checkout and actions/upload-artifact): https://github.com/dkurt/github_actions_riscv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants