Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory usage for large files #30180

Closed
alexdima opened this issue Jul 6, 2017 · 9 comments
Closed

Improve memory usage for large files #30180

alexdima opened this issue Jul 6, 2017 · 9 comments
Assignees
Labels
feature-request Request for new features or functionality on-testplan
Milestone

Comments

@alexdima
Copy link
Member

alexdima commented Jul 6, 2017

e.g.

  • open the file and wait for ~ 30s

1. file A

Download Loan Data for 2016 Q3 from https://www.lendingclub.com/info/download-data.action

empty file 73 MB
99127 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 353 MB 163 MB 90 MB 952.0 B
OSX Heap Snapshot 50 MB 138 MB 88 MB 15 MB 158.6 B

2. file B

Download checker.ts from https://github.com/Microsoft/TypeScript/blob/296660a2a077c32a2ed41cb762ef530031e56417/src/compiler/checker.ts

Generate a 128x times checker:

var fs = require('fs');
var str = fs.readFileSync('checker.ts').toString();
for (var i = 0; i < 7; i++) {
    str = str + '\n' + str;
}
fs.writeFileSync('checker-out.ts', str);
empty file 177 MB
3171200 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 1474 MB 1284 MB 1107 MB 366.0 B
OSX Heap Snapshot 50 MB 716 MB 666 MB 489 MB 161.7 B

  • OSX shallow size taken by a ModelLine (via Heap Snapshot) - 72 bytes
@alexdima alexdima added the feature-request Request for new features or functionality label Jul 6, 2017
@alexdima alexdima added this to the July 2017 milestone Jul 6, 2017
@alexdima alexdima self-assigned this Jul 6, 2017
@alexdima
Copy link
Member Author

alexdima commented Jul 6, 2017

Removing ModelLine._lineNumber

  • OSX shallow size taken by a ModelLine (via Heap Snapshot) - 64 bytes

1. file A

empty file 73 MB
99127 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 348 MB 158 MB 85 MB 899.1 B
OSX Heap Snapshot 50 MB 136 MB 86 MB 13 MB 137.51 B

2. file B

empty file 177 MB
3171200 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 1194 MB 1004 MB 827 MB 273.45 B
OSX Heap Snapshot 50 MB 692 MB 642 MB 465 MB 153.75 B

@alexdima
Copy link
Member Author

alexdima commented Jul 6, 2017

Using MinimalModelLine

  • OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes

1. file A

empty file 73 MB
99127 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 347 MB 157 MB 84 MB 888.6 B
OSX Heap Snapshot 50 MB 133 MB 83 MB 10 MB 105.8 B

2. file B

empty file 177 MB
3171200 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 1078 MB 888 MB 711 MB 235.1 B
OSX Heap Snapshot 50 MB 577 MB 527 MB 350 MB 115.7 B

@alexdima
Copy link
Member Author

alexdima commented Jul 6, 2017

Using MinimalModelLine with lazy markers

  • OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

empty file 73 MB
99127 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 348 MB 158 MB 85 MB 899.1 B
OSX Heap Snapshot 50 MB 132 MB 82 MB 9 MB 95.2 B

2. file B

empty file 177 MB
3171200 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 991 MB 801 MB 624 MB 206.3 B
OSX Heap Snapshot 50 MB 553 MB 503 MB 326 MB 107.8 B

@alexdima
Copy link
Member Author

alexdima commented Jul 6, 2017

Fixing the array leak of #30189

  • OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

empty file 73 MB
99127 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 344 MB 154 MB 81 MB 856.8 B
OSX Heap Snapshot 50 MB 132 MB 82 MB 9 MB 95.2 B

2. file B

empty file 177 MB
3171200 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 971 MB 781 MB 604 MB 199.71 B
OSX Heap Snapshot 50 MB 519 MB 469 MB 292 MB 96.55 B

@alexdima
Copy link
Member Author

alexdima commented Jul 6, 2017

Using IdentityLinesCollection

  • OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

empty file 73 MB
99127 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 344 MB 154 MB 81 MB 856.8 B
OSX Heap Snapshot 50 MB 129 MB 79 MB 6 MB 63.5 B

2. file B

empty file 177 MB
3171200 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 901 MB 711 MB 534 MB 176.5 B
OSX Heap Snapshot 50 MB 461 MB 411 MB 234 MB 77.4 B

@alexdima
Copy link
Member Author

alexdima commented Jul 6, 2017

Using a hack to avoid (sliced string)

  • OSX shallow size taken by a MinimalModelLine (via Heap Snapshot) - 40 bytes or 32 bytes

1. file A

empty file 73 MB
99127 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 341 MB 151 MB 78 MB 825.1 B
OSX Heap Snapshot 50 MB 127 MB 77 MB 4 MB 42.31 B

2. file B

empty file 177 MB
3171200 lines
actual usage overhead overhead per line
OSX RESIDENT 190 MB 928 MB 738 MB 561 MB 185.5 B
OSX Heap Snapshot 50 MB 425 MB 375 MB 198 MB 65.5 B

alexdima added a commit that referenced this issue Jul 6, 2017
@alexdima
Copy link
Member Author

alexdima commented Jul 6, 2017

Conclusions

The resident memory usage is ... random, so looking at the Heap Snapshot (which is something that makes sense):

Edit: turns out the resident memory usage measurements were all skewed by the web worker receiving these large files. The web worker no longer receives such large files.

model memory = c0 + file size + (x + 32) * lineCount , where 32 comes from our allocating a MinimalModelLine per line and x depends on the line length (i.e. the overhead of a string object around a character array). x is smaller for long lines, and larger for short lines; it is around 12 - 36 bytes.

file A gain file B gain
initial 88 MB - 666 MB -
_lineNumber 86 MB 2.27% 642 MB 3.60%
MinimalModelLine 83 MB 3.49% 527 MB 17.91%
lazy markers 82 MB 1.2% 503 MB 4.55%
leak #30189 82 MB 0% 469 MB 6.76%
IdentityLinesCollection 79 MB 3.66% 411 MB 12.37%
(sliced string) hack 77 MB 2.53% 375 MB 8.76%

Totals - file A - 73 MB - 99,127 lines

OSX resident memory usage for the file: 163 MB -> 87 MB : 46.6% less
Heap total memory usage for the file: 88 MB -> 77 MB : 12.5% less
Heap overhead over file size: 15 MB -> 4 MB : 73.3% less

Totals - file B - 177 MB - 3,171,200 lines

OSX resident memory usage for the file: 1284 MB -> 340 MB : 73.5% less
Heap total memory usage for the file: 666 MB -> 375 MB : 43.7% less
Heap overhead over file size: 489 MB -> 198 MB : 59.5% less


Final test on win 32 bits

File Private Bytes Working Set Heap (via Timeline)
empty.txt 107 MB 147 MB 33 MB
file A - 73 MB - 99,127 lines 165 MB 219 MB 105 MB
file B - 177 MB - 3,171,200 lines 361 MB 415 MB 336 MB
file 2*B - 351 MB - 6,342,399 lines 627 MB 679 MB 561 MB
file 4*B - 700 MB - 12MM lines failed failed failed

Final test on win 64 bits

File Private Bytes Working Set Heap (via Timeline)
empty.txt 128 MB 201 MB 56.5 MB
file A - 73 MB - 99,127 lines 225 MB 268 MB 125 MB
file 2*A - 139 MB - 198,253 lines 266 MB 287 MB 201 MB
file 4*A - 278 MB - 396,505 lines 418 MB 435 MB 352 MB
file 8*A - 557 MB - 793,009 lines 737 MB 749 MB 658 MB
file 16*A - 1,115 MB - 1,586,017 lines 1,365 MB 1,367 MB 1,264 MB
file 32*A - 2,230 MB - x lines failed failed failed
file B - 177 MB - 3,171,200 lines 612 MB 688 MB 414 MB
file 2*B - 351 MB - 6,342,399 lines 856 MB 919 MB 776 MB
file 4*B - 700 MB - 12MM lines failed failed failed

It's hard to say what our limits should be based on the above results, since failure is not correlated with file size, nor with line count, but a function of the two. I suggest:

  • we refuse to open files above 300 MB on win 32 bit
  • we open any file size everywhere else

@Chillee
Copy link

Chillee commented Jul 10, 2017

This seems like a pretty big improvement! I'm surprised to not see it on the iteration plan.

@egamma
Copy link
Member

egamma commented Jul 10, 2017

@Chillee it is on the July iteration plan: #30209

@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 17, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request Request for new features or functionality on-testplan
Projects
None yet
Development

No branches or pull requests

4 participants