Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelised check_crate() in rvalue_promotion.rs. #58494

Closed
wants to merge 1 commit into from

Conversation

TheDarkula
Copy link
Contributor

r? @Zoxc

Built the regex crate, commit 60d087a23025e045ae754a345b04003c31d83d93.

Serial:

  time: 0.067; rss: 62MB	parsing
  time: 0.000; rss: 62MB	attributes injection
  time: 0.000; rss: 62MB	recursion limit
  time: 0.000; rss: 62MB	crate injection
  time: 0.000; rss: 62MB	plugin loading
  time: 0.000; rss: 62MB	plugin registration
  time: 0.003; rss: 62MB	pre ast expansion lint checks
    time: 0.168; rss: 96MB	expand crate
    time: 0.000; rss: 96MB	check unused macros
  time: 0.168; rss: 96MB	expansion
  time: 0.000; rss: 96MB	maybe building test harness
  time: 0.002; rss: 96MB	AST validation
  time: 0.000; rss: 96MB	maybe creating a macro crate
  time: 0.047; rss: 102MB	name resolution
  time: 0.015; rss: 102MB	complete gated feature checking
  time: 0.035; rss: 110MB	lowering ast -> hir
  time: 0.006; rss: 111MB	early lint checks
    time: 0.006; rss: 114MB	validate hir map
  time: 0.030; rss: 114MB	indexing hir
  time: 0.000; rss: 114MB	load query result cache
  time: 0.000; rss: 114MB	dep graph tcx init
  time: 0.000; rss: 114MB	looking for entry point
  time: 0.000; rss: 114MB	looking for plugin registrar
  time: 0.000; rss: 114MB	looking for derive registrar
  time: 0.002; rss: 114MB	loop checking
  time: 0.002; rss: 116MB	attribute checking
    time: 0.000; rss: 119MB	builtin::check_trait checking
  time: 0.036; rss: 137MB	stability checking
  time: 0.033; rss: 147MB	type collecting
  time: 0.002; rss: 147MB	impl wf inference
    time: 0.000; rss: 147MB	builtin::check_trait checking
    time: 0.000; rss: 147MB	builtin::check_trait checking
    time: 0.000; rss: 147MB	builtin::check_trait checking
    time: 0.000; rss: 147MB	builtin::check_trait checking
    time: 0.000; rss: 147MB	builtin::check_trait checking
    time: 0.001; rss: 147MB	builtin::check_trait checking
    time: 0.000; rss: 147MB	builtin::check_trait checking
    time: 0.000; rss: 147MB	builtin::check_trait checking
    time: 0.000; rss: 149MB	builtin::check_trait checking
    time: 0.000; rss: 149MB	builtin::check_trait checking
    time: 0.000; rss: 150MB	builtin::check_trait checking
    time: 0.000; rss: 150MB	builtin::check_trait checking
    time: 0.000; rss: 150MB	builtin::check_trait checking
    time: 0.000; rss: 150MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 154MB	builtin::check_trait checking
    time: 0.000; rss: 154MB	builtin::check_trait checking
    time: 0.000; rss: 154MB	unsafety checking
    time: 0.000; rss: 154MB	orphan checking
  time: 0.046; rss: 154MB	coherence checking
  time: 0.125; rss: 157MB	wf checking
  time: 0.051; rss: 164MB	item-types checking
  time: 0.816; rss: 198MB	item-bodies checking
    time: 0.095; rss: 202MB	rvalue promotion
    time: 0.003; rss: 202MB	intrinsic checking
    time: 0.017; rss: 204MB	match checking
    time: 0.013; rss: 204MB	liveness checking
  time: 0.128; rss: 204MB	misc checking
  time: 0.604; rss: 238MB	borrow checking
  time: 0.002; rss: 241MB	MIR borrow checking
  time: 0.000; rss: 241MB	dumping chalk-like clauses
  time: 0.000; rss: 241MB	MIR effect checking
  time: 0.000; rss: 241MB	layout testing
    time: 0.045; rss: 245MB	privacy checking
    time: 0.009; rss: 246MB	death checking
    time: 0.002; rss: 246MB	unused lib feature checking
    time: 0.029; rss: 246MB	lint checking
  time: 0.084; rss: 246MB	misc checking
  time: 0.000; rss: 246MB	resolving dependency formats
        time: 0.002; rss: 251MB	collecting roots
        time: 0.386; rss: 281MB	collecting mono items
      time: 0.388; rss: 281MB	monomorphization collection
      time: 0.022; rss: 288MB	codegen unit partitioning
    time: 0.524; rss: 293MB	write metadata
    time: 0.000; rss: 301MB	llvm function passes [regex.daejiqas-cgu.11]
    time: 0.000; rss: 315MB	llvm function passes [regex.daejiqas-cgu.7]
    time: 0.261; rss: 345MB	llvm module passes [regex.daejiqas-cgu.11]
    time: 0.001; rss: 345MB	llvm function passes [regex.daejiqas-cgu.0]
    time: 0.015; rss: 346MB	llvm module passes [regex.daejiqas-cgu.0]
    time: 0.241; rss: 347MB	llvm module passes [regex.daejiqas-cgu.7]
    time: 0.001; rss: 385MB	llvm function passes [regex.daejiqas-cgu.10]
    time: 0.397; rss: 396MB	codegen passes [regex.daejiqas-cgu.0]
    time: 0.588; rss: 418MB	codegen passes [regex.daejiqas-cgu.11]
    time: 0.000; rss: 407MB	llvm function passes [regex.daejiqas-cgu.15]
    time: 0.017; rss: 408MB	llvm module passes [regex.daejiqas-cgu.15]
    time: 0.001; rss: 419MB	llvm function passes [regex.daejiqas-cgu.3]
    time: 0.010; rss: 420MB	llvm module passes [regex.daejiqas-cgu.3]
    time: 0.405; rss: 423MB	llvm module passes [regex.daejiqas-cgu.10]
    time: 0.357; rss: 432MB	codegen passes [regex.daejiqas-cgu.15]
    time: 0.943; rss: 432MB	codegen passes [regex.daejiqas-cgu.7]
    time: 0.262; rss: 432MB	codegen passes [regex.daejiqas-cgu.3]
    time: 0.000; rss: 433MB	llvm function passes [regex.daejiqas-cgu.2]
    time: 0.012; rss: 414MB	llvm module passes [regex.daejiqas-cgu.2]
    time: 0.000; rss: 414MB	llvm function passes [regex.daejiqas-cgu.1]
    time: 0.005; rss: 414MB	llvm module passes [regex.daejiqas-cgu.1]
    time: 0.001; rss: 421MB	llvm function passes [regex.daejiqas-cgu.6]
    time: 0.003; rss: 421MB	llvm module passes [regex.daejiqas-cgu.6]
    time: 0.263; rss: 422MB	codegen passes [regex.daejiqas-cgu.2]
    time: 0.269; rss: 423MB	codegen passes [regex.daejiqas-cgu.1]
    time: 0.000; rss: 427MB	llvm function passes [regex.daejiqas-cgu.4]
    time: 0.011; rss: 427MB	llvm module passes [regex.daejiqas-cgu.4]
    time: 0.000; rss: 432MB	llvm function passes [regex.daejiqas-cgu.8]
    time: 0.003; rss: 432MB	llvm module passes [regex.daejiqas-cgu.8]
    time: 0.259; rss: 434MB	codegen passes [regex.daejiqas-cgu.6]
    time: 0.000; rss: 434MB	llvm function passes [regex.daejiqas-cgu.5]
    time: 0.002; rss: 434MB	llvm module passes [regex.daejiqas-cgu.5]
    time: 0.297; rss: 434MB	codegen passes [regex.daejiqas-cgu.4]
    time: 0.000; rss: 440MB	llvm function passes [regex.daejiqas-cgu.9]
    time: 0.003; rss: 440MB	llvm module passes [regex.daejiqas-cgu.9]
    time: 0.243; rss: 440MB	codegen passes [regex.daejiqas-cgu.8]
    time: 0.000; rss: 442MB	llvm function passes [regex.daejiqas-cgu.14]
    time: 0.002; rss: 442MB	llvm module passes [regex.daejiqas-cgu.14]
    time: 0.238; rss: 442MB	codegen passes [regex.daejiqas-cgu.5]
    time: 1.910; rss: 443MB	codegen to LLVM IR
    time: 0.000; rss: 443MB	assert dep graph
    time: 0.000; rss: 443MB	serialize dep graph
  time: 3.023; rss: 443MB	codegen
    time: 0.000; rss: 443MB	llvm function passes [regex.daejiqas-cgu.12]
    time: 0.005; rss: 443MB	llvm module passes [regex.daejiqas-cgu.12]
    time: 0.222; rss: 437MB	codegen passes [regex.daejiqas-cgu.9]
    time: 0.000; rss: 437MB	llvm function passes [regex.daejiqas-cgu.13]
    time: 0.002; rss: 437MB	llvm module passes [regex.daejiqas-cgu.13]
    time: 0.172; rss: 437MB	codegen passes [regex.daejiqas-cgu.12]
    time: 0.125; rss: 437MB	codegen passes [regex.daejiqas-cgu.13]
    time: 0.270; rss: 437MB	codegen passes [regex.daejiqas-cgu.14]
    time: 1.412; rss: 449MB	codegen passes [regex.daejiqas-cgu.10]
  time: 2.799; rss: 445MB	LLVM passes
  time: 0.000; rss: 446MB	serialize work products
  time: 0.028; rss: 446MB	linking
Self profiling results for regex:

| Phase                                     | Time (ms)      | Time (%) | Queries        | Hits (%)
| ----------------------------------------- | -------------- | -------- | -------------- | --------
| Codegen                                   |           3036 |    52.49 |         124616 |    89.76
| TypeChecking                              |           1383 |    23.91 |        1008274 |    91.97
| Other                                     |            980 |    16.94 |        1644015 |    94.66
| Expansion                                 |            168 |     2.90 |              0 |     0.00
| BorrowChecking                            |            109 |     1.88 |          13895 |    65.21
| Parsing                                   |             67 |     1.16 |              0 |     0.00
| Linking                                   |             41 |     0.71 |          17357 |    88.70

Optimization level: No
Incremental: off

Parallel:

  time: 0.055; rss: 64MB	parsing
  time: 0.000; rss: 64MB	attributes injection
  time: 0.000; rss: 64MB	recursion limit
  time: 0.000; rss: 64MB	crate injection
  time: 0.000; rss: 64MB	plugin loading
  time: 0.000; rss: 64MB	plugin registration
  time: 0.003; rss: 64MB	pre ast expansion lint checks
    time: 0.123; rss: 97MB	expand crate
    time: 0.000; rss: 97MB	check unused macros
  time: 0.123; rss: 97MB	expansion
  time: 0.000; rss: 97MB	maybe building test harness
  time: 0.002; rss: 97MB	AST validation
  time: 0.000; rss: 97MB	maybe creating a macro crate
  time: 0.053; rss: 103MB	name resolution
  time: 0.016; rss: 103MB	complete gated feature checking
  time: 0.040; rss: 112MB	lowering ast -> hir
  time: 0.008; rss: 112MB	early lint checks
    time: 0.003; rss: 115MB	validate hir map
  time: 0.027; rss: 115MB	indexing hir
  time: 0.000; rss: 115MB	load query result cache
  time: 0.000; rss: 116MB	dep graph tcx init
  time: 0.000; rss: 116MB	looking for entry point
  time: 0.000; rss: 116MB	looking for plugin registrar
  time: 0.000; rss: 116MB	looking for derive registrar
  time: 0.003; rss: 116MB	loop checking
  time: 0.010; rss: 118MB	attribute checking
    time: 0.000; rss: 121MB	builtin::check_trait checking
  time: 0.052; rss: 139MB	stability checking
  time: 0.039; rss: 151MB	type collecting
  time: 0.002; rss: 151MB	impl wf inference
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 151MB	builtin::check_trait checking
    time: 0.000; rss: 152MB	builtin::check_trait checking
    time: 0.001; rss: 152MB	builtin::check_trait checking
    time: 0.000; rss: 152MB	builtin::check_trait checking
    time: 0.000; rss: 152MB	builtin::check_trait checking
    time: 0.000; rss: 153MB	builtin::check_trait checking
    time: 0.000; rss: 154MB	builtin::check_trait checking
    time: 0.000; rss: 154MB	builtin::check_trait checking
    time: 0.000; rss: 154MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	builtin::check_trait checking
    time: 0.000; rss: 157MB	unsafety checking
    time: 0.000; rss: 157MB	orphan checking
  time: 0.047; rss: 157MB	coherence checking
  time: 0.129; rss: 159MB	wf checking
  time: 0.053; rss: 164MB	item-types checking
  time: 0.352; rss: 200MB	item-bodies checking
  time: 0.009; rss: 203MB	intrinsic checking
  time: 0.030; rss: 204MB	liveness checking
  time: 0.067; rss: 208MB	match checking
    time: 0.078; rss: 211MB	rvalue promotion
  time: 0.079; rss: 211MB	misc checking
  time: 0.305; rss: 246MB	borrow checking
  time: 0.002; rss: 246MB	MIR borrow checking
  time: 0.000; rss: 246MB	dumping chalk-like clauses
  time: 0.000; rss: 246MB	MIR effect checking
  time: 0.000; rss: 246MB	layout testing
    time: 0.046; rss: 251MB	privacy checking
  time: 0.049; rss: 251MB	unused lib feature checking
  time: 0.059; rss: 251MB	death checking
  time: 0.079; rss: 251MB	lint checking
  time: 0.079; rss: 251MB	misc checking
  time: 0.000; rss: 251MB	resolving dependency formats
        time: 0.003; rss: 253MB	collecting roots
        time: 0.226; rss: 288MB	collecting mono items
      time: 0.229; rss: 288MB	monomorphization collection
      time: 0.022; rss: 296MB	codegen unit partitioning
    time: 0.372; rss: 300MB	write metadata
    time: 0.000; rss: 313MB	llvm function passes [regex.daejiqas-cgu.11]
    time: 0.000; rss: 327MB	llvm function passes [regex.daejiqas-cgu.7]
    time: 0.266; rss: 354MB	llvm module passes [regex.daejiqas-cgu.11]
    time: 0.001; rss: 355MB	llvm function passes [regex.daejiqas-cgu.0]
    time: 0.009; rss: 355MB	llvm module passes [regex.daejiqas-cgu.0]
    time: 0.235; rss: 361MB	llvm module passes [regex.daejiqas-cgu.7]
    time: 0.001; rss: 409MB	llvm function passes [regex.daejiqas-cgu.10]
    time: 0.495; rss: 409MB	codegen passes [regex.daejiqas-cgu.0]
    time: 0.564; rss: 413MB	codegen passes [regex.daejiqas-cgu.11]
    time: 0.000; rss: 402MB	llvm function passes [regex.daejiqas-cgu.15]
    time: 0.017; rss: 405MB	llvm module passes [regex.daejiqas-cgu.15]
    time: 0.001; rss: 427MB	llvm function passes [regex.daejiqas-cgu.3]
    time: 0.011; rss: 429MB	llvm module passes [regex.daejiqas-cgu.3]
    time: 0.434; rss: 437MB	llvm module passes [regex.daejiqas-cgu.10]
    time: 0.348; rss: 438MB	codegen passes [regex.daejiqas-cgu.15]
    time: 0.974; rss: 446MB	codegen passes [regex.daejiqas-cgu.7]
    time: 0.247; rss: 448MB	codegen passes [regex.daejiqas-cgu.3]
    time: 0.000; rss: 441MB	llvm function passes [regex.daejiqas-cgu.2]
    time: 0.001; rss: 441MB	llvm function passes [regex.daejiqas-cgu.1]
    time: 0.006; rss: 436MB	llvm module passes [regex.daejiqas-cgu.1]
    time: 0.014; rss: 437MB	llvm module passes [regex.daejiqas-cgu.2]
    time: 0.001; rss: 440MB	llvm function passes [regex.daejiqas-cgu.6]
    time: 0.003; rss: 440MB	llvm module passes [regex.daejiqas-cgu.6]
    time: 0.244; rss: 446MB	codegen passes [regex.daejiqas-cgu.1]
    time: 0.304; rss: 447MB	codegen passes [regex.daejiqas-cgu.2]
    time: 0.000; rss: 448MB	llvm function passes [regex.daejiqas-cgu.4]
    time: 0.011; rss: 448MB	llvm module passes [regex.daejiqas-cgu.4]
    time: 0.266; rss: 454MB	codegen passes [regex.daejiqas-cgu.6]
    time: 0.000; rss: 454MB	llvm function passes [regex.daejiqas-cgu.8]
    time: 0.007; rss: 454MB	llvm module passes [regex.daejiqas-cgu.8]
    time: 0.000; rss: 457MB	llvm function passes [regex.daejiqas-cgu.9]
    time: 0.002; rss: 457MB	llvm module passes [regex.daejiqas-cgu.9]
    time: 0.336; rss: 461MB	codegen passes [regex.daejiqas-cgu.4]
    time: 0.208; rss: 461MB	codegen passes [regex.daejiqas-cgu.9]
    time: 0.254; rss: 461MB	codegen passes [regex.daejiqas-cgu.8]
    time: 0.000; rss: 461MB	llvm function passes [regex.daejiqas-cgu.5]
    time: 0.005; rss: 461MB	llvm module passes [regex.daejiqas-cgu.5]
    time: 0.000; rss: 461MB	llvm function passes [regex.daejiqas-cgu.12]
    time: 0.002; rss: 461MB	llvm module passes [regex.daejiqas-cgu.12]
    time: 2.132; rss: 464MB	codegen to LLVM IR
    time: 0.000; rss: 464MB	assert dep graph
    time: 0.000; rss: 464MB	serialize dep graph
  time: 2.990; rss: 464MB	codegen
    time: 0.000; rss: 464MB	llvm function passes [regex.daejiqas-cgu.13]
    time: 0.003; rss: 464MB	llvm module passes [regex.daejiqas-cgu.13]
    time: 0.211; rss: 460MB	codegen passes [regex.daejiqas-cgu.12]
    time: 0.000; rss: 460MB	llvm function passes [regex.daejiqas-cgu.14]
    time: 0.002; rss: 460MB	llvm module passes [regex.daejiqas-cgu.14]
    time: 0.260; rss: 462MB	codegen passes [regex.daejiqas-cgu.5]
    time: 0.153; rss: 463MB	codegen passes [regex.daejiqas-cgu.13]
    time: 0.272; rss: 470MB	codegen passes [regex.daejiqas-cgu.14]
    time: 1.617; rss: 482MB	codegen passes [regex.daejiqas-cgu.10]
  time: 3.135; rss: 479MB	LLVM passes
  time: 0.000; rss: 477MB	serialize work products
  time: 0.147; rss: 477MB	linking
Self profiling results for regex:

| Phase                                     | Time (ms)      | Time (%) | Queries        | Hits (%)
| ----------------------------------------- | -------------- | -------- | -------------- | --------
| Codegen                                   |           3690 |    46.60 |         124616 |    89.76
| TypeChecking                              |           2241 |    28.30 |        1008274 |    91.97
| Other                                     |           1448 |    18.29 |        1646307 |    94.67
| BorrowChecking                            |            200 |     2.53 |          13895 |    65.21
| Linking                                   |            163 |     2.06 |          17357 |    88.70
| Expansion                                 |            123 |     1.55 |              0 |     0.00
| Parsing                                   |             54 |     0.68 |              0 |     0.00

Optimization level: No
Incremental: off

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 15, 2019
@Zoxc
Copy link
Contributor

Zoxc commented Feb 21, 2019

You should benchmark before and after the changes, both with a parallel compiler build, doing multiple runs and look at the timing for the parallel section you are modifying, aka:

  time: 0.009; rss: 203MB	intrinsic checking
  time: 0.030; rss: 204MB	liveness checking
  time: 0.067; rss: 208MB	match checking
    time: 0.078; rss: 211MB	rvalue promotion
  time: 0.079; rss: 211MB	misc checking

It's a good idea to use less threads for benchmarking than cores you have, just to leave some for background processes.

@Centril Centril added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 23, 2019
@Centril
Copy link
Contributor

Centril commented Feb 23, 2019

Ping from triage, @TheDarkula, it seems like we're awaiting more profiling info?

@TheDarkula
Copy link
Contributor Author

@Centril I'll recompile and get back soon

@TheDarkula
Copy link
Contributor Author

Closing in favour of the upcoming #58679

@TheDarkula TheDarkula closed this Mar 3, 2019
Centril added a commit to Centril/rust that referenced this pull request Mar 9, 2019
…ister

Refactor passes and pass execution to be more parallel

For `syntex_syntax` (with 16 threads and 8 cores):
- Cuts `misc checking 1` from `0.096s` to `0.08325s`.
- Cuts `misc checking 2` from `0.3575s` to `0.2545s`.
- Cuts `misc checking 3` from `0.34625s` to `0.21375s`.
- Cuts `wf checking` from `0.3085s` to `0.05025s`.

Reduces overall execution time for `syntex_syntax` (with 8 threads and cores) from `4.92s` to `4.34s`.

Subsumes rust-lang#58494
Blocked on rust-lang#58250

r? @michaelwoerister
Centril added a commit to Centril/rust that referenced this pull request Mar 9, 2019
…ister

Refactor passes and pass execution to be more parallel

For `syntex_syntax` (with 16 threads and 8 cores):
- Cuts `misc checking 1` from `0.096s` to `0.08325s`.
- Cuts `misc checking 2` from `0.3575s` to `0.2545s`.
- Cuts `misc checking 3` from `0.34625s` to `0.21375s`.
- Cuts `wf checking` from `0.3085s` to `0.05025s`.

Reduces overall execution time for `syntex_syntax` (with 8 threads and cores) from `4.92s` to `4.34s`.

Subsumes rust-lang#58494
Blocked on rust-lang#58250

r? @michaelwoerister
Centril added a commit to Centril/rust that referenced this pull request Mar 9, 2019
…ister

Refactor passes and pass execution to be more parallel

For `syntex_syntax` (with 16 threads and 8 cores):
- Cuts `misc checking 1` from `0.096s` to `0.08325s`.
- Cuts `misc checking 2` from `0.3575s` to `0.2545s`.
- Cuts `misc checking 3` from `0.34625s` to `0.21375s`.
- Cuts `wf checking` from `0.3085s` to `0.05025s`.

Reduces overall execution time for `syntex_syntax` (with 8 threads and cores) from `4.92s` to `4.34s`.

Subsumes rust-lang#58494
Blocked on rust-lang#58250

r? @michaelwoerister
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants