Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't perform unaligned writes #133

Closed
wants to merge 1 commit into from
Closed

Conversation

jrtc27
Copy link
Contributor

@jrtc27 jrtc27 commented Jul 24, 2017

Some architectures, such as SPARC, trap on unaligned memory accesses, and poke makes no guarantees about its behaviour if the address is not sufficiently aligned. Therefore we should use a temporary buffer if asked to write to an unaligned address.

@bgamari
Copy link
Contributor

bgamari commented Jul 24, 2017

Must we always do this? It feel like this should be conditional on the platform.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Jul 24, 2017

Yes, some platforms will let you get away with it (at a potential performance cost, especially if the kernel has to emulate it, such as on HP PA-RISC). GHC itself has a list of good and bad architectures (bewareLoadStoreAlignment in compiler/cmm/PprC.hs); I don't suppose there's an easy way to get that information?

@bgamari
Copy link
Contributor

bgamari commented Jul 24, 2017

Indeed some platforms deal quite poorly with unaligned accesses; however, X86, which constitutes the majority of our users, is not one of these. AFAIK the penalty (assuming you don't cross a cache line) is only a cycle or two. However, this patch imposes an additional branch on all writes, aligned or otherwise. Mis-predicted branches are expensive yet sadly GHC doesn't currently give you the ability to ensure that it's predicted correctly. This is why I say we should make this conditional.

GHC itself has a list of good and bad architectures (bewareLoadStoreAlignment in compiler/cmm/PprC.hs); I don't suppose there's an easy way to get that information?

Hmm, sadly I don't there there is. However, maybe this is something that we want to expose; perhaps unalignedAccessesAllowed :: Bool flag in, for instance, GHC.Constants?

Naturally, this wouldn't help you today, but at least it's moving in the right direction. As for today, I think we'll just need to have some #ifdefs, sadly.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Jul 24, 2017

That might be sensible, but for now perhaps we could just run it through the preprocessor and use a bunch of #if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__) ...?

@bgamari
Copy link
Contributor

bgamari commented Jul 24, 2017

Yep, I think that is the right answer for now.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Jul 25, 2017

Updated to make use of the preprocessor. I'm not sure whether foo_HOST_ARCH is the right variable to be checking, as from the point of view of the code being compiled, it cares about the host arch, but if the variable is being set based on what the installed (or just-built) GHC was configured for, then it should really be the target arch, as that's what GHC is producing code for. The foo_TARGET_ARCH macros are supposedly deprecated though according to the header.

@bgamari
Copy link
Contributor

bgamari commented Jul 27, 2017

Indeed I'm a bit skeptical that HOST_ARCH is correct here. What header in particular are you referring to?

@bgamari
Copy link
Contributor

bgamari commented Jul 27, 2017

I see, ghcplatform.h. Yes, I'm very skeptical of all of this. At very least it seems like the HOST_ARCH macros are misleadingly named. I'm looking into it

Regarding the CI failure, this is rather confusing and perhaps concerning. I can't reproduce them locally and presumably you can't either (right?). I'll also need to look into this.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Jul 27, 2017

I can reproduce the issues, but they happen locally with master too. There are two problems:

  1. With QuickCheck 2.10+, the test suite fails to compile, as QuickCheck added its own Arbitrary instance for CChar, but bytestring defines it itself. Commenting bytestring's instance out gives:

  2. Failing tests; I haven't investigated them, but they're not caused by my changes.

@bgamari
Copy link
Contributor

bgamari commented Jul 28, 2017

Indeed, I have a commit fixing QuickCheck issue. I'm much more concerned about the failing tests. I think we should get to the bottom of this before merging anything else.

@bgamari
Copy link
Contributor

bgamari commented Aug 1, 2017

Alright, I believe the spurious failures should now be fixed. Sorry for the wait, @jrtc27. Do you think you could rebase this?

@jrtc27
Copy link
Contributor Author

jrtc27 commented Aug 1, 2017

All tests now passing :)

@jrtc27
Copy link
Contributor Author

jrtc27 commented Aug 16, 2017

Ping; is this blocked on the host/target macros issue, or mergeable as-is?

@bgamari
Copy link
Contributor

bgamari commented Aug 16, 2017

We can merge as-is as far as I'm concerned. @dcoutts?

@jrtc27
Copy link
Contributor Author

jrtc27 commented Sep 1, 2017

Ping :)

Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This is ok in that it works, but the extra with allocation is going to be terrible for performance.

BTW, what about ARM platforms?

This is another example of why I want new ghc primops specifically for safe unaligned writes, since the compiler knows the platform and can probably do a better job at generating safe and reasonably good code for unaligned writes. For example some platforms have specific CPU instructions for unaligned reads & writes.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Sep 1, 2017

Old AArch32 (pre-v6) can do weird things on unaligned accesses (e.g. rotating the byte value by how many bytes you are off from the aligned-down address by), but I believe it works. In theory I think AArch64 can choose not to support unaligned accesses, but in practice I doubt that is enforced (and if the hardware doesn't support it, the kernel likely emulates it). Currently the unregisterised C backend assumes AArch32 and AArch64 cannot perform unaligned accesses, hence why I did not whitelist it here, but since there have been no bug reports about Cabal (which I saw reliably crashing due to this on SPARC with my experimental native code-gen) not working on these platforms, it's probably actually ok. I don't want to make that call and break it though as I'm no expert on Arm :)

@jrtc27
Copy link
Contributor Author

jrtc27 commented Sep 1, 2017

And yes, this isn't going to be great for performance, but unless you can think of a better solution (while GHC doesn't let you do a pokeUnaligned), it's the price you have to pay for correctness.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Oct 15, 2017

Ping; is there anything blocking this? Did you want me to whitelist AArch32 and/or AArch64?

@bgamari
Copy link
Contributor

bgamari commented Oct 16, 2017 via email

@Mistuke
Copy link

Mistuke commented Oct 16, 2017

Currently the unregisterised C backend assumes AArch32 and AArch64 cannot perform unaligned accesses.

yeah and it's wrong, I remember asking why this was made so but.. Anyways, for the ARM targets you can just check for the macro __ARM_FEATURE_UNALIGNED instead. This always returns true for AArch64 and for AArch32 will return the correct value for the given compiler options. It's part of ACLE so both GCC and CLANG and other compliant compilers should define it.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Oct 16, 2017

Updated; is that what you had in mind?

@Mistuke
Copy link

Mistuke commented Oct 16, 2017

Yeah, although (defined(arm_HOST_ARCH) || defined(aarch64_HOST_ARCH)) is not needed since only Arm targets define __ARM_FEATURE_UNALIGNED so the check can be simplified :).

@jrtc27
Copy link
Contributor Author

jrtc27 commented Oct 16, 2017

Well, you'd hope so, but who knows; I just did that to be extra safe! I can simplify it if you want though.

@Mistuke
Copy link

Mistuke commented Oct 16, 2017

I'd say if something goes through the trouble of emulating __ARM_FEATURE_UNALIGNED while not being an ARM target, I'd say it deserves to fail :). Anyway, doesn't really matter, just wanted to mention it.

@jrtc27
Copy link
Contributor Author

jrtc27 commented Oct 20, 2017

So is everyone happy with this now? If so, could it be merged please? :)

@hvr
Copy link
Member

hvr commented Oct 20, 2017

@jrtc27 one last thing... would you mind adding a Changelog entry to the 0.10.10 entry? then I'll happily merge this one.

Some architectures, such as SPARC, trap on unaligned memory accesses,
and poke makes no guarantees about its behaviour if the address is not
sufficiently aligned. Therefore we should use a temporary buffer if
asked to write to an unaligned address, unless we know for sure the
architecture supports it without trapping.
@jrtc27
Copy link
Contributor Author

jrtc27 commented Nov 6, 2017

Rebased with a changelog entry (sorry for the delay, I completely forgot about this).

@jrtc27
Copy link
Contributor Author

jrtc27 commented Mar 14, 2018

Ping

@jrtc27
Copy link
Contributor Author

jrtc27 commented Jan 22, 2019

Ping? Seems this got forgotten about by everyone involved.

@Mistuke
Copy link

Mistuke commented Jan 22, 2019

@hvr anything outstanding here?

@chessai
Copy link
Member

chessai commented Oct 13, 2019

Ping

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is good to go. @hvr, @dcoutts could you merge?

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this has already been approved by @dcoutts and signed off by @hvr above. @cartazio could you possibly resolve a conflict in Changelog.md and merge?

@Bodigrim
Copy link
Contributor

@jrtc27 could you please rebase?

@fumieval
Copy link
Contributor

@Bodigrim I rebased the branch and created a new pull request as #232.

@chessai
Copy link
Member

chessai commented Jun 25, 2020

Closing in favour of #232

@chessai chessai closed this Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants