-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SIMD escape #160
Add SIMD escape #160
Conversation
Wait alignment solution, please comment unsafe solution. |
So I was thinking about this last night and I think we should separate out the long and short benchmarks so we can compare the effect of using AVX2 instructions on them separately (based on my reading, I think AVX2 stuff might actually slow down short string escaping). Can you do that? I'll also want to chew a bit on this code since I myself don't have any experience with AVX2 and I want to really understand it before we merge this. Also I think next we might want to explore searching based on the PCMPESTRI instruction from SSE 4.2, which might be faster than AVX2 according to what's described here. |
Ok, I do the separation in the bench. |
So, high level conclusions just from looking at your benchmark data:
Very good stuff! In light of these findings, I'd like to change the benchmarks a little:
My question here is if the AVX2 stuff is even worth having, or if it's really only effective at levels of escaped characters that are unlikely to come up in the wild. (I'm really sorry if we end up not using some of this painstakingly developed code, but I also want to be realistic about the complexity it adds and whether it's worth it in terms of real-world performance.) |
Also, can you squash the code prettification from the last commit into the related earlier commits? |
If I always finish squash, if I leave the commit is so you can see the changes easily. As I mentioned, it is less effective for string length less than the instruction, 16 or 32. If the e2e would stop escaping numbers or relocate string, duplicated My opinion is that we should improve The benchs are already uploaded. |
We are comparing the efficiency between a truck and a motorcycle to carry a candy. It does not make any sense and it is also that this result warned by gitter when you asked me. Now you can not tell that all the work does not work because what I said has happened. |
Yes, you were right. Just to make sure I understand, are you angry and/or frustrated with me for this? |
I'm not angry nor frustrated, but is very unefficient that after talking and knowing about that possible issue you comment on it like a problem or something unexpected. |
Wtf, Bill, what do we do with that memory? I have icons but not in my memory, perfect. |
Huh? Not sure what you mean. |
In Windows, AppVeyor, it gives an equality error in two strings with the same bytes. But I still have 4k icons and these useful things. I was messing with Windows. I removed the simd support to windows. |
Well, what do you want to do with this pull request? |
It's not a problem per se, but I still didn't know what to expect. That might be because I don't have a lot of experience doing actual performance optimization, even though I know quite a bit about what kinds of things can cause performance issues. So I had expected that AVX2 would be substantially faster at large strings even if there wasn't a lot to escape. Now that it isn't, I'm no longer sure if there's value in it. What do you think we should do? Do you think adding the SSE and AVX versions of the code will lead to better performance on real-world workloads? For that matter, have you tried and compared this branch on an actual project you've used Askama for? My inclination is to see if I can figure out a better benchmark, in the sense that it should be a more reflective mix of longer and shorter strings as well as the relative frequencies of characters from the different relative spaces. For example, it occurs to me now with the benchmark results we have that the trick of ignoring byte values from outside the range of escapable characters is very effective, but that means that digits and non-escaped interpunction will have a different effect on the benchmark results -- yet, the current benchmark data does not contain this class of character at all. |
When I do not like sse, but the avx if I like it. It is still very green and does not give the expected performance. I will continue investigating vectorized operations with @tizgafa and later we will try again on askama. Viewing the results, prioritize askama in the recognition of elements not escapeable and the heuristic of output size. How about? |
I close it. The experimentation of the escape with simd a new repo. In addition to Askama, we will devote ourselves to implement all kinds of algorithms with this tool, in case you want to raise another algorithms. |
#167