Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage of ctags increases during reindex #2364

Open
gtoph opened this issue Oct 2, 2018 · 15 comments
Open

Memory usage of ctags increases during reindex #2364

gtoph opened this issue Oct 2, 2018 · 15 comments

Comments

@gtoph
Copy link
Contributor

gtoph commented Oct 2, 2018

It looks like the ctags memory usage keep growing. Thought I had seen this mentioned before where some end tag wasn't done correctly, and the buffer kept growing when it was scanned. Wonder if something like that is back.?

After about 10 min of indexing, this is what it's up to:
ctags

Kept it going for another 5 min and were now upto 75k on a few, and over 110k on a couple other instances. It'll just keep growing though.

@tarzanek
Copy link
Contributor

tarzanek commented Oct 3, 2018

what version of ctags do you use? it should most probably be reported to ctags as an issue with the command line used (could be our regexp trigger a leak? or some internal parser has a leak?)

@tarzanek
Copy link
Contributor

tarzanek commented Oct 3, 2018

(I see you use builds from https://github.com/universal-ctags/ctags-win32/releases/ )

@gtoph
Copy link
Contributor Author

gtoph commented Oct 3, 2018 via email

@vladak vladak added the question label Oct 3, 2018
@vladak
Copy link
Member

vladak commented Oct 3, 2018

I guess one could run the ctags with a tool to track memory leaks (like libumem.so + mdb on Solaris).

@tarzanek
Copy link
Contributor

tarzanek commented Oct 4, 2018

I am only aware of an unoptimised memory usage in generating history of big git/hg repos with tags
and perhaps in xml analyzer (which we will hopefully fix soon)

but based on your picture you show process of ctags and blame that to grow, which is completely out of opengroks control (besides the command line and regexps for newer languages, which might have bugs or trigger the leak)

@vladak
Copy link
Member

vladak commented Oct 4, 2018

It would be nice if you could run the ctags binaries under the tools described on https://stackoverflow.com/questions/4593191/memory-profiler-for-c , in particular I'd be interested to see the results from Google's tcmalloc as it can give answer to the why-is-my-process-so-big question.

@vladak
Copy link
Member

vladak commented Oct 4, 2018

Also, very likely this is dependent on the input files sent to the ctags binaries. Is is possible to tell what are the files leading to increased memory usage ?

@vladak vladak changed the title Memory usage of ctags Memory usage of ctags increases during reindex Oct 4, 2018
@vladak
Copy link
Member

vladak commented Oct 4, 2018

Recommended allocators with heap profiling:

@gtoph
Copy link
Contributor Author

gtoph commented Oct 4, 2018

Well I'm running on windows, so a lot of the profiling things aren't ported to that. Tried some MS things, but that didn't work on the first attempt so well.

To answer your question, I have a mix of files really. Standard text, python, xml, but 99% are just standard C source files.

@tarzanek
Copy link
Contributor

tarzanek commented Oct 5, 2018

xmls with loooooooong lines? :) (e.g. base64 encoded images?)

@gtoph
Copy link
Contributor Author

gtoph commented Oct 5, 2018

The projects i have are ginormous with 100s of people working on various areas, but i would highly doubt any thing like that. Some of them might be several 10s of kb, but still, just standard text keys, nothing overly special.

Saw a few utilities searching around like dr memory.. ill see if i can get those working and if they have any useful info.

@vladak
Copy link
Member

vladak commented Oct 5, 2018

I believe at least one of the allocators I mentioned also works on Windows.

@edigaryev
Copy link
Contributor

Not sure if this is related, but I've faced a similar problem when simply indexing Google Chrome's codebase on a 512 MB VPS.

Here's an example of a single 540 KB file that will cause ctags resident set size to spike to ~145 MB: https://github.com/hunspell/hunspell/blob/master/src/hunspell/utf_info.hxx

Here's a way I've used to measure RSS with GNU version of time(1) command:

export TIME="max RSS: %M"
/usr/bin/time ctags -o - utf_info.cxx >/dev/null

@edigaryev
Copy link
Contributor

edigaryev commented Nov 5, 2018

With the fix applied, ctags only uses ~9 MB to parse utf_info.hxx.

But there still exists huge 14 MB header files like this one that cause ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h

@masatake
Copy link

masatake commented Nov 5, 2018

But there still exists huge 14 MB header files like this one that causes ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h

Maybe it is related to Cork. Tag entries created by CPreProcessor parser are stored to the memory until the parser reaches at the end of the current input file. When the parser reaches EOF, ctags writes the entries on the memory to tags file. See
http://docs.ctags.io/en/latest/internal.html?highlight=cork#output-tag-stream about the cork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants