Memory usage of ctags increases during reindex #2364

gtoph · 2018-10-02T23:08:37Z

It looks like the ctags memory usage keep growing. Thought I had seen this mentioned before where some end tag wasn't done correctly, and the buffer kept growing when it was scanned. Wonder if something like that is back.?

After about 10 min of indexing, this is what it's up to:

Kept it going for another 5 min and were now upto 75k on a few, and over 110k on a couple other instances. It'll just keep growing though.

tarzanek · 2018-10-03T05:20:29Z

what version of ctags do you use? it should most probably be reported to ctags as an issue with the command line used (could be our regexp trigger a leak? or some internal parser has a leak?)

tarzanek · 2018-10-03T05:21:16Z

(I see you use builds from https://github.com/universal-ctags/ctags-win32/releases/ )

gtoph · 2018-10-03T05:41:47Z

Ya, i probably pulled ctags a week or two ago, so should be pretty close to their latest master branch. I can always get the very latest tomorrow. I honestly dont know if the leak on them or opengrok, but like i say, i know you fixed something like this a while back.

…

On Tue, Oct 2, 2018, 10:21 PM Lubos Kosco ***@***.***> wrote: (I see you use builds from https://github.com/universal-ctags/ctags-win32/releases/ ) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2364 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVXM_aKOT7TH6JVX08T0mpOVpYUx6u2Iks5uhElYgaJpZM4XFBVv> .

vladak · 2018-10-03T08:36:00Z

I guess one could run the ctags with a tool to track memory leaks (like libumem.so + mdb on Solaris).

tarzanek · 2018-10-04T10:12:59Z

I am only aware of an unoptimised memory usage in generating history of big git/hg repos with tags
and perhaps in xml analyzer (which we will hopefully fix soon)

but based on your picture you show process of ctags and blame that to grow, which is completely out of opengroks control (besides the command line and regexps for newer languages, which might have bugs or trigger the leak)

vladak · 2018-10-04T10:35:54Z

It would be nice if you could run the ctags binaries under the tools described on https://stackoverflow.com/questions/4593191/memory-profiler-for-c , in particular I'd be interested to see the results from Google's tcmalloc as it can give answer to the why-is-my-process-so-big question.

vladak · 2018-10-04T10:52:00Z

Also, very likely this is dependent on the input files sent to the ctags binaries. Is is possible to tell what are the files leading to increased memory usage ?

vladak · 2018-10-04T13:37:11Z

Recommended allocators with heap profiling:

gtoph · 2018-10-04T19:40:51Z

Well I'm running on windows, so a lot of the profiling things aren't ported to that. Tried some MS things, but that didn't work on the first attempt so well.

To answer your question, I have a mix of files really. Standard text, python, xml, but 99% are just standard C source files.

tarzanek · 2018-10-05T13:48:01Z

xmls with loooooooong lines? :) (e.g. base64 encoded images?)

gtoph · 2018-10-05T14:33:05Z

The projects i have are ginormous with 100s of people working on various areas, but i would highly doubt any thing like that. Some of them might be several 10s of kb, but still, just standard text keys, nothing overly special.

Saw a few utilities searching around like dr memory.. ill see if i can get those working and if they have any useful info.

vladak · 2018-10-05T14:44:08Z

I believe at least one of the allocators I mentioned also works on Windows.

edigaryev · 2018-11-04T21:56:29Z

Not sure if this is related, but I've faced a similar problem when simply indexing Google Chrome's codebase on a 512 MB VPS.

Here's an example of a single 540 KB file that will cause ctags resident set size to spike to ~145 MB: https://github.com/hunspell/hunspell/blob/master/src/hunspell/utf_info.hxx

Here's a way I've used to measure RSS with GNU version of time(1) command:

export TIME="max RSS: %M"
/usr/bin/time ctags -o - utf_info.cxx >/dev/null

edigaryev · 2018-11-05T08:11:57Z

With the fix applied, ctags only uses ~9 MB to parse utf_info.hxx.

But there still exists huge 14 MB header files like this one that cause ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h

masatake · 2018-11-05T13:09:31Z

But there still exists huge 14 MB header files like this one that causes ctags to use ~76 MB of RAM: https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/include/asic_reg/nbio/nbio_6_1_sh_mask.h

Maybe it is related to Cork. Tag entries created by CPreProcessor parser are stored to the memory until the parser reaches at the end of the current input file. When the parser reaches EOF, ctags writes the entries on the memory to tags file. See
http://docs.ctags.io/en/latest/internal.html?highlight=cork#output-tag-stream about the cork.

vladak added the question label Oct 3, 2018

vladak changed the title ~~Memory usage of ctags~~ Memory usage of ctags increases during reindex Oct 4, 2018

edigaryev mentioned this issue Nov 4, 2018

Fix xMalloc usage universal-ctags/ctags#1931

Merged

edigaryev mentioned this issue Nov 26, 2018

Issue with indexing large .h file #2531

Open

vladak added the indexer label Jun 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage of ctags increases during reindex #2364

Memory usage of ctags increases during reindex #2364

gtoph commented Oct 2, 2018

tarzanek commented Oct 3, 2018

tarzanek commented Oct 3, 2018

gtoph commented Oct 3, 2018 via email

vladak commented Oct 3, 2018

tarzanek commented Oct 4, 2018

vladak commented Oct 4, 2018

vladak commented Oct 4, 2018

vladak commented Oct 4, 2018

gtoph commented Oct 4, 2018

tarzanek commented Oct 5, 2018

gtoph commented Oct 5, 2018

vladak commented Oct 5, 2018

edigaryev commented Nov 4, 2018

edigaryev commented Nov 5, 2018 •

edited

Loading

masatake commented Nov 5, 2018

Memory usage of ctags increases during reindex #2364

Memory usage of ctags increases during reindex #2364

Comments

gtoph commented Oct 2, 2018

tarzanek commented Oct 3, 2018

tarzanek commented Oct 3, 2018

gtoph commented Oct 3, 2018 via email

vladak commented Oct 3, 2018

tarzanek commented Oct 4, 2018

vladak commented Oct 4, 2018

vladak commented Oct 4, 2018

vladak commented Oct 4, 2018

gtoph commented Oct 4, 2018

tarzanek commented Oct 5, 2018

gtoph commented Oct 5, 2018

vladak commented Oct 5, 2018

edigaryev commented Nov 4, 2018

edigaryev commented Nov 5, 2018 • edited Loading

masatake commented Nov 5, 2018

edigaryev commented Nov 5, 2018 •

edited

Loading