Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use group name as "Communication:To" in chat messages, instead of including each member #2011

Closed
wladimirleite opened this issue Dec 5, 2023 · 5 comments · Fixed by #2268
Assignees

Comments

@wladimirleite
Copy link
Member

As discussed in this #1999 (comment), Telegram groups with a lot of members can take too long to be processed and generate a very large case, if each member is included in the multivalued "Communication:To" metadata.

@patrickdalla patrickdalla changed the title Use group name as "Communication:To" in chat messages, instead including each member Use group name as "Communication:To" in chat messages, instead of including each member May 7, 2024
@aberenguel
Copy link
Contributor

I processed an UFDR with 56GB.
It resulted in a case with 160 GB in index folder and 456 GB in neo4j folder.
I disabled extractMessages in ParserConfig.xml as workaround. Also disabled enableGraphGeneration. Now the case 6.8 GB in index folder.

I think that makes sense messages to be linked with the group, not the members of the group, since it is common huge groups in Telegram (with thousands messages and thousands members). Maybe there should be a metadata in the group containing all members.

@lfcnassif
Copy link
Member

Thanks @aberenguel for your feedback. This change will be definitely implemented and included in 4.2.0 version.

@aberenguel
Copy link
Contributor

aberenguel commented Jul 27, 2024

The fixes above solved the problem. I've reprocessed the case with default profile and phoneParsersToUse = all.

before fixes after fixes
index folder 160 GB 8.7 GB
neo4j folder 612 GB 0.7 GB
processing time 16h ¹ 1h 3min

¹ canceled in "Generating graph database" due to huge time.

@lfcnassif
Copy link
Member

Thank you very much @aberenguel for testing!

@lfcnassif
Copy link
Member

lfcnassif commented Jul 28, 2024

A small test with 12 WhatsApp databases also benefit from this:

before changes after changes
index folder 4 GB 2.34 GB
neo4j folder 15 GB 0.3 GB
processing time 43min 17min

Graph rendering is also much faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants