Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxed tokens #34

Closed
wants to merge 4 commits into from
Closed

Boxed tokens #34

wants to merge 4 commits into from

Conversation

CAD97
Copy link
Collaborator

@CAD97 CAD97 commented Nov 2, 2019

This is the result of applying this technique of boxing/deduping GreenToken to compress GreenElement and, as a result, the allocation for a GreenNode's children. It also applies a minimal flexible array member technique for the GreenNode's children, allocating them inline.

This is as far as the technique can go with using the standard Arc and NodeOrToken. GreenElement = NodeOrToken<Arc<GreenNode>, Arc<GreenToken>> is 3xusize large, but can theoretically be niched to 2xusize. (NB: the same niche optimization can be applied to SyntaxElement.)

I've split this PR up into commits each representing a logical step.

GreenToken      32
Arc<GreenNode>  16
Arc<GreenToken> 8
GreenElement    24

SyntaxNode    8
SyntaxToken   16
SyntaxElement 24

It is possible to optimize size further, but I'd like to propose those in a second PR on top of this one. Remaining potential optimization steps along these lines:

  • Implement a specific GreenElement to manually niche Arc<GreenNode> and Arc<GreenToken>.
  • Implement a specific SyntaxElement to manually niche SyntaxNode and SyntaxToken.
  • Implement a specific Arc<GreenNode> to make ptr-to-GreenNode a thin ptr instead of a fat ptr.
    • Standard GreenElement = NodeOrToken would be 2xusize. Manually non-zero-cost niched GreenElement (tag in alignment bits) would be 1xusize.

Results in rust-analyzer:

With this branch
PS D:\usr\Documents\Code\Rust\rust-analyzer> cargo run --bin ra_cli --release -- analysis-stats .
    Finished release [optimized + debuginfo] target(s) in 0.46s
     Running `target\release\ra_cli.exe analysis-stats .`
Database loaded, 221 roots, 1.0601376s
Crates in this dir: 27
Total modules found: 331
Total declarations: 11135
Total functions: 3839
Item Collection: 11.8499283s, 0b allocated 0b resident
Total expressions: 89244
Expressions of unknown type: 6960 (7%)
Expressions of partially unknown type: 3522 (3%)
Type mismatches: 3568
Inference: 36.3460289s, 0b allocated 0b resident
Total: 48.1964408s, 0b allocated 0b resident
PS D:\usr\Documents\Code\Rust\rust-analyzer> Measure-Command { type "D:\rust-lang\src\libcore\unicode\tables.rs" | cargo run --bin ra_cli --release -- symbols }
    Finished release [optimized + debuginfo] target(s) in 0.45s


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 738
Ticks             : 7380228
TotalDays         : 8.54193055555555E-06
TotalHours        : 0.000205006333333333
TotalMinutes      : 0.01230038
TotalSeconds      : 0.7380228
TotalMilliseconds : 738.0228



PS D:\usr\Documents\Code\Rust\rust-analyzer> Measure-Command { type "D:\rust-lang\src\libcore\unicode\tables.rs" | cargo run --bin ra_cli --release -- parse --no-dump } 
    Finished release [optimized + debuginfo] target(s) in 0.44s
     Running `target\release\ra_cli.exe parse --no-dump`


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 603
Ticks             : 6035426
TotalDays         : 6.98544675925926E-06
TotalHours        : 0.000167650722222222
TotalMinutes      : 0.0100590433333333
TotalSeconds      : 0.6035426
TotalMilliseconds : 603.5426


Without this branch (5451bfb9)
PS D:\usr\Documents\Code\Rust\rust-analyzer> cargo run --bin ra_cli --release -- analysis-stats .
    Finished release [optimized + debuginfo] target(s) in 0.45s
     Running `target\release\ra_cli.exe analysis-stats .`
Database loaded, 220 roots, 1.0174838s
Crates in this dir: 27
Total modules found: 331
Total declarations: 11135
Total functions: 3839
Item Collection: 10.509602s, 0b allocated 0b resident
Total expressions: 89241                                                                                                                                                 
Expressions of unknown type: 6959 (7%)
Expressions of partially unknown type: 3522 (3%)
Type mismatches: 3569
Inference: 34.963529s, 0b allocated 0b resident
Total: 45.4737377s, 0b allocated 0b resident
PS D:\usr\Documents\Code\Rust\rust-analyzer> Measure-Command { type "D:\rust-lang\src\libcore\unicode\tables.rs" | cargo run --bin ra_cli --release -- symbols }         
    Finished release [optimized + debuginfo] target(s) in 0.44s


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 587
Ticks             : 5875475
TotalDays         : 6.80031828703704E-06
TotalHours        : 0.000163207638888889
TotalMinutes      : 0.00979245833333333
TotalSeconds      : 0.5875475
TotalMilliseconds : 587.5475



PS D:\usr\Documents\Code\Rust\rust-analyzer> Measure-Command { type "D:\rust-lang\src\libcore\unicode\tables.rs" | cargo run --bin ra_cli --release -- parse --no-dump } 
    Finished release [optimized + debuginfo] target(s) in 0.44s
     Running `target\release\ra_cli.exe parse --no-dump`


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 573
Ticks             : 5737453
TotalDays         : 6.64057060185185E-06
TotalHours        : 0.000159373694444444
TotalMinutes      : 0.00956242166666667
TotalSeconds      : 0.5737453
TotalMilliseconds : 573.7453


I can't test allocation pressure on Windows. The way this is right here, it looks like a consistent loss.

@CAD97
Copy link
Collaborator Author

CAD97 commented Nov 5, 2019

Oh and by the way: I ran cargo miri test and everything passes here. Unfortunately, it does appear that miri does not like flexible array members (for further optimization).

@CAD97 CAD97 mentioned this pull request Nov 5, 2019
@@ -97,6 +96,7 @@ impl SyntaxText {

pub fn for_each_chunk<F: FnMut(&str)>(&self, mut f: F) {
enum Void {}
#[allow(clippy::unit_arg)]
Copy link
Member

@matklad matklad Nov 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like clippy-related attributes in the source. Could it be removed?

Moreover, in this case, this is clearly a false-positive, as () is spelled explicitly :)

@@ -191,7 +191,6 @@ fn zip_texts<I: Iterator<Item = (SyntaxToken, TextRange)>>(xs: &mut I, ys: &mut
x.1 = TextRange::from_to(x.1.start(), x.1.len() - advance);
y.1 = TextRange::from_to(y.1.start(), y.1.len() - advance);
}
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -59,7 +58,7 @@ impl SyntaxText {

pub fn slice<R: private::SyntaxTextRange>(&self, range: R) -> SyntaxText {
let start = range.start().unwrap_or_default();
let end = range.end().unwrap_or(self.len());
let end = range.end().unwrap_or_else(|| self.len());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems like a false-positive to me: everything here should be inalienable and side-effect and allocation free, so I'd be surprised if there are any significant perf differences. OTOH, readability suffers because of a lambda.

On a more general note, I do appreciate fixing clippy lints (even I don't agree with some particular ones), but it might be prudent to separate "hey, I've run clippy" and "hey, I've completely rewritten the core of this library, It's supper fast now. Oh, and I've added a dozen of unsafe blocks" pull requests :D

@@ -146,10 +146,10 @@ impl<L: Language> fmt::Display for SyntaxElement<L> {
}

impl<L: Language> SyntaxNode<L> {
pub fn new_root(green: GreenNode) -> SyntaxNode<L> {
pub fn new_root(green: Arc<GreenNode>) -> SyntaxNode<L> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API wise, it might be a good idea to hide the fact that we use Arc internally, and just document that GreenNode is cheap to clone.

@@ -125,7 +124,7 @@ impl FreeList {
for _ in 0..FREE_LIST_LEN {
res.try_push(&mut Rc::new(NodeData {
kind: Kind::Free { next_free: None },
green: ptr::NonNull::dangling(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, ptr::NonNull::danging seems much more straightforward to me than the dummy node shenanigans. Why do we need this change?

Copy link
Collaborator Author

@CAD97 CAD97 Nov 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr::NonNull::dangling (as well as ptr::dangling) requires T: Sized.

The next PR changes to provide a dangling fn.

@CAD97
Copy link
Collaborator Author

CAD97 commented Nov 14, 2019

See #35 now.

@CAD97 CAD97 closed this Nov 14, 2019
@CAD97 CAD97 deleted the boxed-tokens branch November 14, 2019 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants