Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test utf8 validaiton based on rust-isutf8 #68463

Closed
wants to merge 7 commits into from

Conversation

Licenser
Copy link

DO NOT MERGE

This is a copy of @ArniDagur's rust-isutf8 port of Lemire's utf8 validation to use for validation to test the performance difference as suggested in #68455

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Mark-Simulacrum (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 22, 2020
@Licenser
Copy link
Author

cc @yoshuawuyts

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Jan 22, 2020

⌛ Trying commit 707a653 with merge c93d8c28b4e86aee203bca6ba8f4d7ecb2045c0d...

@rust-highfive
Copy link
Collaborator

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2020-01-22T17:17:35.7002748Z ========================== Starting Command Output ===========================
2020-01-22T17:17:35.7005277Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/91335b33-c956-455e-879a-da3bbdf1713c.sh
2020-01-22T17:17:35.7005317Z 
2020-01-22T17:17:35.7008105Z ##[section]Finishing: Disable git automatic line ending conversion
2020-01-22T17:17:35.7014835Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/68463/merge to s
2020-01-22T17:17:35.7016581Z Task         : Get sources
2020-01-22T17:17:35.7016614Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-01-22T17:17:35.7016646Z Version      : 1.0.0
2020-01-22T17:17:35.7016721Z Author       : Microsoft
---
2020-01-22T17:17:36.6957383Z ##[command]git remote add origin https://github.com/rust-lang/rust
2020-01-22T17:17:36.6968623Z ##[command]git config gc.auto 0
2020-01-22T17:17:36.6971851Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2020-01-22T17:17:36.6973888Z ##[command]git config --get-all http.proxy
2020-01-22T17:17:36.6981419Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/68463/merge:refs/remotes/pull/68463/merge
---
2020-01-22T17:29:13.5854899Z configure: build.locked-deps    := True
2020-01-22T17:29:13.5854964Z configure: llvm.ccache          := sccache
2020-01-22T17:29:13.5855227Z configure: build.cargo-native-static := True
2020-01-22T17:29:13.5858566Z configure: dist.missing-tools   := True
2020-01-22T17:29:13.5859261Z configure: build.configure-args := ['--enable-sccache', '--disable-manage-submodu ...
2020-01-22T17:29:13.5859372Z configure: writing `config.toml` in current directory
2020-01-22T17:29:13.5859437Z configure: 
2020-01-22T17:29:13.5859666Z configure: run `python /checkout/x.py --help`
2020-01-22T17:29:13.5859714Z configure: 
---
2020-01-22T17:30:50.2486906Z Checking rustdoc artifacts (x86_64-unknown-linux-gnu -> i686-pc-windows-gnu)
2020-01-22T17:30:50.2506593Z Checking std artifacts (x86_64-unknown-linux-gnu -> i686-pc-windows-gnu)
2020-01-22T17:30:50.5451613Z    Compiling cc v1.0.49
2020-01-22T17:30:50.5451966Z     Checking core v0.0.0 (/checkout/src/libcore)
2020-01-22T17:30:53.6380321Z error: unused import: `_mm256_set_epi8`
2020-01-22T17:30:53.6380950Z      |
2020-01-22T17:30:53.6380950Z      |
2020-01-22T17:30:53.6381299Z 1539 |         _mm256_set1_epi8, _mm256_set_epi8, _mm256_setr_epi8, _mm256_setzero_si256,
2020-01-22T17:30:53.6381902Z      |
2020-01-22T17:30:53.6382195Z      = note: `-D unused-imports` implied by `-D warnings`
2020-01-22T17:30:53.6382234Z 
2020-01-22T17:30:56.7126371Z    Compiling libc v0.2.66
---
2020-01-22T17:31:01.7241229Z   local time: Wed Jan 22 17:31:01 UTC 2020
2020-01-22T17:31:01.8791910Z   network time: Wed, 22 Jan 2020 17:31:01 GMT
2020-01-22T17:31:01.8794574Z == end clock drift check ==
2020-01-22T17:31:07.9154899Z 
2020-01-22T17:31:07.9247374Z ##[error]Bash exited with code '1'.
2020-01-22T17:31:07.9259525Z ##[section]Finishing: Run build
2020-01-22T17:31:07.9280880Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/68463/merge to s
2020-01-22T17:31:07.9282492Z Task         : Get sources
2020-01-22T17:31:07.9282552Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-01-22T17:31:07.9282593Z Version      : 1.0.0
2020-01-22T17:31:07.9282630Z Author       : Microsoft
2020-01-22T17:31:07.9282630Z Author       : Microsoft
2020-01-22T17:31:07.9282685Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
2020-01-22T17:31:07.9282742Z ==============================================================================
2020-01-22T17:31:08.3709492Z Cleaning any cached credential from repository: rust-lang/rust (GitHub)
2020-01-22T17:31:08.3749291Z ##[section]Finishing: Checkout rust-lang/rust@refs/pull/68463/merge to s
2020-01-22T17:31:08.3876033Z Cleaning up task key
2020-01-22T17:31:08.3877133Z Start cleaning up orphan processes.
2020-01-22T17:31:08.3989879Z Terminate orphan process: pid (3448) (python)
2020-01-22T17:31:08.4739504Z ##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-7 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2020-01-22T20:26:37.6389416Z ========================== Starting Command Output ===========================
2020-01-22T20:26:37.6392802Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/22859c96-9fa8-4726-adef-27486109f5e1.sh
2020-01-22T20:26:37.6393060Z 
2020-01-22T20:26:37.6399542Z ##[section]Finishing: Disable git automatic line ending conversion
2020-01-22T20:26:37.6406485Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/68463/merge to s
2020-01-22T20:26:37.6408444Z Task         : Get sources
2020-01-22T20:26:37.6408532Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-01-22T20:26:37.6408570Z Version      : 1.0.0
2020-01-22T20:26:37.6408604Z Author       : Microsoft
---
2020-01-22T20:26:38.6474082Z ##[command]git remote add origin https://github.com/rust-lang/rust
2020-01-22T20:26:38.6487275Z ##[command]git config gc.auto 0
2020-01-22T20:26:38.6489866Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2020-01-22T20:26:38.6492203Z ##[command]git config --get-all http.proxy
2020-01-22T20:26:38.6499484Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/68463/merge:refs/remotes/pull/68463/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-7 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2020-01-22T21:50:28.2695317Z ========================== Starting Command Output ===========================
2020-01-22T21:50:28.2696706Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/a549a241-32e3-48d5-a6ba-037c4568f51c.sh
2020-01-22T21:50:28.2696740Z 
2020-01-22T21:50:28.2698936Z ##[section]Finishing: Disable git automatic line ending conversion
2020-01-22T21:50:28.2704062Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/68463/merge to s
2020-01-22T21:50:28.2705605Z Task         : Get sources
2020-01-22T21:50:28.2705640Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-01-22T21:50:28.2705666Z Version      : 1.0.0
2020-01-22T21:50:28.2705694Z Author       : Microsoft
---
2020-01-22T21:50:29.2499733Z ##[command]git remote add origin https://github.com/rust-lang/rust
2020-01-22T21:50:29.2512514Z ##[command]git config gc.auto 0
2020-01-22T21:50:29.2516498Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2020-01-22T21:50:29.2520769Z ##[command]git config --get-all http.proxy
2020-01-22T21:50:29.2529352Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/68463/merge:refs/remotes/pull/68463/merge
---
2020-01-22T22:41:58.9946285Z .................................................................................................... 1700/9546
2020-01-22T22:42:05.4844085Z .................................................................................................... 1800/9546
2020-01-22T22:42:16.9748724Z .....................i.............................................................................. 1900/9546
2020-01-22T22:42:24.3110026Z .................................................................................................... 2000/9546
2020-01-22T22:42:38.7840545Z ...........iiiii.................................................................................... 2100/9546
2020-01-22T22:42:49.8528938Z .................................................................................................... 2300/9546
2020-01-22T22:42:52.2665447Z .................................................................................................... 2400/9546
2020-01-22T22:42:58.0116904Z .................................................................................................... 2500/9546
2020-01-22T22:43:17.0053836Z .................................................................................................... 2600/9546
---
2020-01-22T22:46:06.6962120Z .......................................................i...............i............................ 4900/9546
2020-01-22T22:46:16.3453993Z .................................................................................................... 5000/9546
2020-01-22T22:46:23.9319722Z ..................................................................................................i. 5100/9546
2020-01-22T22:46:29.8321948Z .................................................................................................... 5200/9546
2020-01-22T22:46:40.9098421Z ......................................................................ii.ii...........i............. 5300/9546
2020-01-22T22:46:51.5452831Z .......i............................................................................................ 5500/9546
2020-01-22T22:47:01.9726576Z .................................................................................................... 5600/9546
2020-01-22T22:47:09.1366364Z ........................................................i........................................... 5700/9546
2020-01-22T22:47:16.8336638Z .................................................................................................... 5800/9546
2020-01-22T22:47:16.8336638Z .................................................................................................... 5800/9546
2020-01-22T22:47:27.3494365Z .................................................................................................... 5900/9546
2020-01-22T22:47:34.9877922Z ...............................................ii...i..ii...........i............................... 6000/9546
2020-01-22T22:48:00.4533579Z .................................................................................................... 6200/9546
2020-01-22T22:48:10.4985421Z .................................................................................................... 6300/9546
2020-01-22T22:48:10.4985421Z .................................................................................................... 6300/9546
2020-01-22T22:48:18.6628471Z ...........................................................................i..ii.................... 6400/9546
2020-01-22T22:48:46.1376596Z .................................................................................................... 6600/9546
2020-01-22T22:48:50.7844027Z ...................................................i................................................ 6700/9546
2020-01-22T22:48:53.5213249Z .................................................................................................... 6800/9546
2020-01-22T22:48:56.2789350Z ..................................................i................................................. 6900/9546
---
2020-01-22T22:50:37.0896684Z .................................................................................................... 7600/9546
2020-01-22T22:50:43.4700260Z .................................................................................................... 7700/9546
2020-01-22T22:50:50.5032225Z .................................................................................................... 7800/9546
2020-01-22T22:51:01.5823084Z .................................................................................................... 7900/9546
2020-01-22T22:51:07.9377585Z ......iiiiiii....................................................................................... 8000/9546
2020-01-22T22:51:23.4582325Z .................................................................................................... 8200/9546
2020-01-22T22:51:35.1941140Z .................................................................................................... 8300/9546
2020-01-22T22:51:47.9750424Z .................................................................................................... 8400/9546
2020-01-22T22:51:55.0731088Z .................................................................................................... 8500/9546
---
2020-01-22T22:54:23.7185164Z  finished in 8.000
2020-01-22T22:54:23.7342722Z Check compiletest suite=codegen mode=codegen (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-01-22T22:54:23.9299978Z 
2020-01-22T22:54:23.9300864Z running 166 tests
2020-01-22T22:54:27.6657155Z iiii......i........ii..iiii...i....i...........i............i..i..................i....i............ 100/166
2020-01-22T22:54:30.3170293Z i.i.i...iii..iiiiiii.......................iii............ii......
2020-01-22T22:54:30.3176739Z 
2020-01-22T22:54:30.3184452Z  finished in 6.584
2020-01-22T22:54:30.3406847Z Check compiletest suite=codegen-units mode=codegen-units (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-01-22T22:54:30.5021832Z 
---
2020-01-22T22:54:32.8256578Z  finished in 2.485
2020-01-22T22:54:32.8426408Z Check compiletest suite=assembly mode=assembly (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-01-22T22:54:32.9843942Z 
2020-01-22T22:54:32.9845321Z running 9 tests
2020-01-22T22:54:32.9846533Z iiiiiiiii
2020-01-22T22:54:32.9847329Z 
2020-01-22T22:54:32.9847597Z  finished in 0.141
2020-01-22T22:54:33.0044008Z Check compiletest suite=incremental mode=incremental (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-01-22T22:54:33.1981430Z 
---
2020-01-22T22:54:55.0715270Z  finished in 22.067
2020-01-22T22:54:55.0924315Z Check compiletest suite=debuginfo mode=debuginfo-gdb+lldb (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-01-22T22:54:55.2784818Z 
2020-01-22T22:54:55.2785916Z running 116 tests
2020-01-22T22:55:18.9436945Z .iiiii..i.....i..i...i..i.i.i..i..i..ii....i.i....ii..........iiii..........i.....i..i.......ii.i.ii 100/116
2020-01-22T22:55:22.1842874Z .....iiii.....ii
2020-01-22T22:55:22.1844632Z 
2020-01-22T22:55:22.1845219Z  finished in 27.092
2020-01-22T22:55:22.1854187Z Uplifting stage1 rustc (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
2020-01-22T22:55:22.1856108Z Copying stage2 rustc from stage1 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
---
2020-01-22T23:05:40.1366386Z   local time: Wed Jan 22 23:05:40 UTC 2020
2020-01-22T23:05:40.4289680Z   network time: Wed, 22 Jan 2020 23:05:40 GMT
2020-01-22T23:05:40.4293770Z == end clock drift check ==
2020-01-22T23:05:40.8533076Z 
2020-01-22T23:05:40.8626669Z ##[error]Bash exited with code '1'.
2020-01-22T23:05:40.8637703Z ##[section]Finishing: Run build
2020-01-22T23:05:40.8657736Z ##[section]Starting: Checkout rust-lang/rust@refs/pull/68463/merge to s
2020-01-22T23:05:40.8659354Z Task         : Get sources
2020-01-22T23:05:40.8659391Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
2020-01-22T23:05:40.8659427Z Version      : 1.0.0
2020-01-22T23:05:40.8659477Z Author       : Microsoft
2020-01-22T23:05:40.8659477Z Author       : Microsoft
2020-01-22T23:05:40.8659514Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
2020-01-22T23:05:40.8659552Z ==============================================================================
2020-01-22T23:05:41.2697179Z Cleaning any cached credential from repository: rust-lang/rust (GitHub)
2020-01-22T23:05:41.2740442Z ##[section]Finishing: Checkout rust-lang/rust@refs/pull/68463/merge to s
2020-01-22T23:05:41.2847046Z Cleaning up task key
2020-01-22T23:05:41.2847936Z Start cleaning up orphan processes.
2020-01-22T23:05:41.2934606Z Terminate orphan process: pid (3510) (python)
2020-01-22T23:05:41.3205718Z ##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@@ -1484,19 +1485,452 @@ impl<'a> DoubleEndedIterator for LinesAny<'a> {
#[allow(deprecated)]
impl FusedIterator for LinesAny<'_> {}

#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
mod avx2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this specify a bunch of target features? If you don't do so it generates MUCH worse code (and maybe it doesn't even work, not sure).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CryZe good catch, I updated it the guard to use target_feature = "avx2" that should solve this

@Mark-Simulacrum
Copy link
Member

@rust-timer build c93d8c28b4e86aee203bca6ba8f4d7ecb2045c0d

Seems bors lost the try build finishing...

@rust-timer
Copy link
Collaborator

Queued c93d8c28b4e86aee203bca6ba8f4d7ecb2045c0d with parent ae66171, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit c93d8c28b4e86aee203bca6ba8f4d7ecb2045c0d, comparison URL.

@ArniDagur
Copy link

Is that 44% slower? Can that be right?

@CryZe
Copy link
Contributor

CryZe commented Jan 25, 2020

The code was without the AVX2 target feature guard, which has a heavy performance penalty. The timer needs to be rerun.

@mati865
Copy link
Contributor

mati865 commented Jan 25, 2020

x86_64 rightfully doesn't have target_feature = "avx2" by default.
Checks like #[cfg(any(target_feature = "avx2"))] will just return false.

Simply enabling AVX or changing CPU to znver2 (that's what perf.rlo is running on) won't give meaningful results either because base toolchain would also need to be built with that.

@CryZe
Copy link
Contributor

CryZe commented Jan 25, 2020

Yes, iiuc you need to runtime select between the simd and non-simd versions.

@Mark-Simulacrum
Copy link
Member

I don't quite follow the discussion. If we need another timer run, please say so explicitly :)

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Jan 27, 2020

What's interesting to me is how much of a negative impact worsened utf-8 validation seems to have. Like @Mark-Simulacrum postulated it indeed affects shorter programs more than longer ones (as evidenced by "hello world"). This makes it interesting to consider to what degree a positive change to the algorithm might speed up things.


@Mark-Simulacrum 1a24818 includes fixes that should speed things up; another run would be great.

@CryZe
Copy link
Contributor

CryZe commented Jan 27, 2020

@yoshuawuyts I don't think that commit is sufficient. rustc is not built with avx2 active (as would be the case with -C target-cpu=skylake or so), so it doesn't build the simd code. It needs some code to select which version of the algorithm to use at runtime.

@ArniDagur
Copy link

Is there no way to test it with -C target-cpu=skylake or equivelent, just for this PR?

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 6, 2020
@Mark-Simulacrum
Copy link
Member

I do not have the time to suggest a way to enable those compile time flags here (especially because our CI is not guaranteed to run on skylake). If you do not want to dig in, then I would propose that we close this.

@Licenser
Copy link
Author

Licenser commented Feb 7, 2020

I don’t have any control about the benchmark system it’s not about want i can’t change that.

@Licenser Licenser closed this Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants