Improve memory usage for large files #30180

alexdima · 2017-07-06T07:57:17Z

e.g.

open the file and wait for ~ 30s

1. file A

Download Loan Data for 2016 Q3 from https://www.lendingclub.com/info/download-data.action

	empty file	73 MB 99127 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	353 MB	163 MB	90 MB	952.0 B
OSX Heap Snapshot	50 MB	138 MB	88 MB	15 MB	158.6 B

2. file B

Download checker.ts from https://github.com/Microsoft/TypeScript/blob/296660a2a077c32a2ed41cb762ef530031e56417/src/compiler/checker.ts

Generate a 128x times checker:

var fs = require('fs');
var str = fs.readFileSync('checker.ts').toString();
for (var i = 0; i < 7; i++) {
    str = str + '\n' + str;
}
fs.writeFileSync('checker-out.ts', str);

	empty file	177 MB 3171200 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	1474 MB	1284 MB	1107 MB	366.0 B
OSX Heap Snapshot	50 MB	716 MB	666 MB	489 MB	161.7 B

OSX shallow size taken by a ModelLine (via Heap Snapshot) - 72 bytes

The text was updated successfully, but these errors were encountered:

alexdima · 2017-07-06T09:34:08Z

Removing ModelLine._lineNumber

OSX shallow size taken by a ModelLine (via Heap Snapshot) - 64 bytes

1. file A

	empty file	73 MB 99127 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	348 MB	158 MB	85 MB	899.1 B
OSX Heap Snapshot	50 MB	136 MB	86 MB	13 MB	137.51 B

2. file B

	empty file	177 MB 3171200 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	1194 MB	1004 MB	827 MB	273.45 B
OSX Heap Snapshot	50 MB	692 MB	642 MB	465 MB	153.75 B

alexdima · 2017-07-06T10:54:21Z

Using `MinimalModelLine`

OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes

1. file A

	empty file	73 MB 99127 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	347 MB	157 MB	84 MB	888.6 B
OSX Heap Snapshot	50 MB	133 MB	83 MB	10 MB	105.8 B

2. file B

	empty file	177 MB 3171200 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	1078 MB	888 MB	711 MB	235.1 B
OSX Heap Snapshot	50 MB	577 MB	527 MB	350 MB	115.7 B

alexdima · 2017-07-06T13:17:23Z

Using `MinimalModelLine` with lazy markers

OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

	empty file	73 MB 99127 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	348 MB	158 MB	85 MB	899.1 B
OSX Heap Snapshot	50 MB	132 MB	82 MB	9 MB	95.2 B

2. file B

	empty file	177 MB 3171200 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	991 MB	801 MB	624 MB	206.3 B
OSX Heap Snapshot	50 MB	553 MB	503 MB	326 MB	107.8 B

)

alexdima · 2017-07-06T14:27:32Z

Fixing the array leak of #30189

OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

	empty file	73 MB 99127 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	344 MB	154 MB	81 MB	856.8 B
OSX Heap Snapshot	50 MB	132 MB	82 MB	9 MB	95.2 B

2. file B

	empty file	177 MB 3171200 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	971 MB	781 MB	604 MB	199.71 B
OSX Heap Snapshot	50 MB	519 MB	469 MB	292 MB	96.55 B

alexdima · 2017-07-06T18:05:34Z

Using `IdentityLinesCollection`

OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

	empty file	73 MB 99127 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	344 MB	154 MB	81 MB	856.8 B
OSX Heap Snapshot	50 MB	129 MB	79 MB	6 MB	63.5 B

2. file B

	empty file	177 MB 3171200 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	901 MB	711 MB	534 MB	176.5 B
OSX Heap Snapshot	50 MB	461 MB	411 MB	234 MB	77.4 B

alexdima · 2017-07-06T19:39:59Z

Using a hack to avoid `(sliced string)`

OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

	empty file	73 MB 99127 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	341 MB	151 MB	78 MB	825.1 B
OSX Heap Snapshot	50 MB	127 MB	77 MB	4 MB	42.31 B

2. file B

	empty file	177 MB 3171200 lines	actual usage	overhead	overhead per line
OSX RESIDENT	190 MB	928 MB	738 MB	561 MB	185.5 B
OSX Heap Snapshot	50 MB	425 MB	375 MB	198 MB	65.5 B

alexdima · 2017-07-06T20:25:07Z

Conclusions

The resident memory usage is ... random, so looking at the Heap Snapshot (which is something that makes sense):

Edit: turns out the resident memory usage measurements were all skewed by the web worker receiving these large files. The web worker no longer receives such large files.

model memory = c0 + file size + (x + 32) * lineCount , where 32 comes from our allocating a MinimalModelLine per line and x depends on the line length (i.e. the overhead of a string object around a character array). x is smaller for long lines, and larger for short lines; it is around 12 - 36 bytes.

	file A	gain	file B	gain
initial	88 MB	-	666 MB	-
_lineNumber	86 MB	2.27%	642 MB	3.60%
MinimalModelLine	83 MB	3.49%	527 MB	17.91%
lazy markers	82 MB	1.2%	503 MB	4.55%
leak #30189	82 MB	0%	469 MB	6.76%
IdentityLinesCollection	79 MB	3.66%	411 MB	12.37%
(sliced string) hack	77 MB	2.53%	375 MB	8.76%

Totals - file A - 73 MB - 99,127 lines

OSX resident memory usage for the file: 163 MB -> 87 MB : 46.6% less
Heap total memory usage for the file: 88 MB -> 77 MB : 12.5% less
Heap overhead over file size: 15 MB -> 4 MB : 73.3% less

Totals - file B - 177 MB - 3,171,200 lines

OSX resident memory usage for the file: 1284 MB -> 340 MB : 73.5% less
Heap total memory usage for the file: 666 MB -> 375 MB : 43.7% less
Heap overhead over file size: 489 MB -> 198 MB : 59.5% less

Final test on win 32 bits

File	Private Bytes	Working Set	Heap (via Timeline)
empty.txt	107 MB	147 MB	33 MB
file A - 73 MB - 99,127 lines	165 MB	219 MB	105 MB
file B - 177 MB - 3,171,200 lines	361 MB	415 MB	336 MB
file 2*B - 351 MB - 6,342,399 lines	627 MB	679 MB	561 MB
file 4*B - 700 MB - 12MM lines	failed	failed	failed

Final test on win 64 bits

File	Private Bytes	Working Set	Heap (via Timeline)
empty.txt	128 MB	201 MB	56.5 MB
file A - 73 MB - 99,127 lines	225 MB	268 MB	125 MB
file 2*A - 139 MB - 198,253 lines	266 MB	287 MB	201 MB
file 4*A - 278 MB - 396,505 lines	418 MB	435 MB	352 MB
file 8*A - 557 MB - 793,009 lines	737 MB	749 MB	658 MB
file 16*A - 1,115 MB - 1,586,017 lines	1,365 MB	1,367 MB	1,264 MB
file 32*A - 2,230 MB - x lines	failed	failed	failed
file B - 177 MB - 3,171,200 lines	612 MB	688 MB	414 MB
file 2*B - 351 MB - 6,342,399 lines	856 MB	919 MB	776 MB
file 4*B - 700 MB - 12MM lines	failed	failed	failed

It's hard to say what our limits should be based on the above results, since failure is not correlated with file size, nor with line count, but a function of the two. I suggest:

we refuse to open files above 300 MB on win 32 bit
we open any file size everywhere else

…size (#30180)

Chillee · 2017-07-10T08:25:13Z

This seems like a pretty big improvement! I'm surprised to not see it on the iteration plan.

egamma · 2017-07-10T13:36:34Z

@Chillee it is on the July iteration plan: #30209

alexdima added the feature-request Request for new features or functionality label Jul 6, 2017

alexdima added this to the July 2017 milestone Jul 6, 2017

alexdima self-assigned this Jul 6, 2017

alexdima added a commit that referenced this issue Jul 6, 2017

Eliminate ModelLine._lineNumber usage in ModelLine.append (#30180)

1b982d3

alexdima added a commit that referenced this issue Jul 6, 2017

Remove ModelLine._lineNumber (#30180)

93d755c

alexdima added a commit that referenced this issue Jul 6, 2017

Extract IModelLine (#30180)

c851d5d

alexdima added a commit that referenced this issue Jul 6, 2017

Explicit getters and setters for IModelLine.isInvalid (#30180)

08f5e5d

alexdima added a commit that referenced this issue Jul 6, 2017

Extract AbstractModelLine (#30180)

2a0b6e6

alexdima added a commit that referenced this issue Jul 6, 2017

Use IModelLine in split and append (#30180)

80cfe84

alexdima added a commit that referenced this issue Jul 6, 2017

Clarify usage of TextModel.isTooLargeForHavingAMode() (#30180)

4f3327d

alexdima added a commit that referenced this issue Jul 6, 2017

Restrict usages of new ModelLine (#30180)

3f34ed5

alexdima added a commit that referenced this issue Jul 6, 2017

Add and use MinimalModelLine (#30180)

5aa37a7

alexdima added a commit that referenced this issue Jul 6, 2017

Add lazy markers assignment to MinimalModelLine (#30180)

63621e5

alexdima added a commit that referenced this issue Jul 6, 2017

Fixes #30189: Do not leak lines array in change listener closure (#30180

57583bf

)

alexdima added a commit that referenced this issue Jul 6, 2017

Expose TextModel.isTooLargeForTokenization() (#30180)

c95d90d

alexdima added a commit that referenced this issue Jul 6, 2017

Extract IViewModelLinesCollection (#30180)

810164a

alexdima added a commit that referenced this issue Jul 6, 2017

Add and use IdentityCoordinatesConverter (#30180)

eb1d2a5

alexdima added a commit that referenced this issue Jul 6, 2017

Avoid (sliced string) (#30180)

bf3ea73

kieferrm mentioned this issue Jul 6, 2017

Iteration Plan for July 2017 #30209

Closed

39 tasks

This was referenced Jul 7, 2017

Best way to avoid (sliced string) ? nodejs/help#711

Closed

How to deal with strings longer than String::kMaxValue nodejs/help#712

Closed

Crash when opening a 35MB, 13.7MM lines txt file #13187

Closed

vscode does not open large files #9832

Closed

This was referenced Jul 7, 2017

VS Code fails to open big files (60MB) #6474

Closed

Test plan item for large files #30243

Closed

alexdima added a commit that referenced this issue Jul 7, 2017

Avoid (sliced string) slightly faster (#30180)

398a6c5

alexdima added a commit that referenced this issue Jul 7, 2017

Also enter the large file case based on line count, not just on file …

16151e8

…size (#30180)

alexdima added a commit that referenced this issue Jul 7, 2017

Do not compute a hash if it is not used (#30180)

41d2abe

alexdima added a commit that referenced this issue Jul 7, 2017

The indentation guesser only looks at the first 10k lines (#30180)

eb2601b

alexdima added a commit that referenced this issue Jul 7, 2017

Hint lines array size for faster model creation (#30180)

32aebe6

alexdima added a commit that referenced this issue Jul 7, 2017

Do not sync large models to web workers (#30180)

e633f3b

alexdima added a commit that referenced this issue Jul 7, 2017

Tweak file limits based on experiments (#30180)

c5bf26a

alexdima closed this as completed Jul 7, 2017

stef-levesque mentioned this issue Jul 13, 2017

Open a ~15MB file stef-levesque/vscode-hexdump#22

Closed

weinand added the on-testplan label Jul 27, 2017

vscodebot bot locked and limited conversation to collaborators Nov 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory usage for large files #30180

Improve memory usage for large files #30180

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

Chillee commented Jul 10, 2017 •

edited

Loading

egamma commented Jul 10, 2017

Improve memory usage for large files #30180

Improve memory usage for large files #30180

Comments

alexdima commented Jul 6, 2017 • edited Loading

1. file A

2. file B

alexdima commented Jul 6, 2017 • edited Loading

Removing ModelLine._lineNumber

1. file A

2. file B

alexdima commented Jul 6, 2017 • edited Loading

Using MinimalModelLine

1. file A

2. file B

alexdima commented Jul 6, 2017 • edited Loading

Using MinimalModelLine with lazy markers

1. file A

2. file B

alexdima commented Jul 6, 2017 • edited Loading

Fixing the array leak of #30189

1. file A

2. file B

alexdima commented Jul 6, 2017 • edited Loading

Using IdentityLinesCollection

1. file A

2. file B

alexdima commented Jul 6, 2017 • edited Loading

Using a hack to avoid (sliced string)

1. file A

2. file B

alexdima commented Jul 6, 2017 • edited Loading

Conclusions

Totals - file A - 73 MB - 99,127 lines

Totals - file B - 177 MB - 3,171,200 lines

Final test on win 32 bits

Final test on win 64 bits

Chillee commented Jul 10, 2017 • edited Loading

egamma commented Jul 10, 2017

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

Using `MinimalModelLine`

alexdima commented Jul 6, 2017 •

edited

Loading

Using `MinimalModelLine` with lazy markers

alexdima commented Jul 6, 2017 •

edited

Loading

alexdima commented Jul 6, 2017 •

edited

Loading

Using `IdentityLinesCollection`

alexdima commented Jul 6, 2017 •

edited

Loading

Using a hack to avoid `(sliced string)`

alexdima commented Jul 6, 2017 •

edited

Loading

Chillee commented Jul 10, 2017 •

edited

Loading