-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ffts_real only works for single-precision #59
Comments
I also get a segfault when trying to use the:
plan, no matter how big I allocate these buffers. |
I modified the test.c file to use the plan
|
Try replacing _mm_store_ps with _mm_storeu_ps as there is no guarantee that the memory is correctly aligned. |
Check the pointers: |
However, I tried what you said, and now it crashes here:
The value of |
Also, the hardcoded |
I have done very little testing with 2D or higher ranking transforms. But most of problems are related with alignment, as intrinsics used require strict 128 bit align. Double datatype casting is used here to simplify following _mm_shuffle_pd instrinsics. We could have used _mm_load_ps (or _mm_loadu_ps) and then cast using _mm_castps_pd to _m128d datatype, You could test with my benchFFTS, https://github.com/linkotec/benchFFTS. It supports tests for 2D, for example "bench_ffts -s ocb512x128" |
Some thing of course: (note the
LLDB gives the same as previously:
|
It looks like the problem is due to the "half spectrum" thing, which causes you to have N/2 + 1 coefficients. The constant coefficient is the one causing the +1 I believe. This is why the |
I think you have found the problem. At the moment I don't have the time to fix it. One option is that you use 2D complex transform with imaginary part set to zero, so all input value are real numbers. It is slower but at least it works. |
Nice! 😄 Well, speed is a very important aspect of my application. So for now, I will use the complex transform. Once fixed, I might change it back to the |
Hi, did anyone work on this? I'd like to use the 2d real transform too, would be great if it works... |
Should also support double-precision
The text was updated successfully, but these errors were encountered: