Telegram iOS DB parser enhancements and bug fixes (#1998, #2000, #2003 and #2005) #1999

wladimirleite · 2023-11-24T21:15:36Z

Closes #1998, #2000, #2003 and #2005.

lfcnassif · 2023-11-25T03:40:16Z

Thank you @wladimirleite! I'll collect samples to help testing this a bit more.

wladimirleite · 2023-11-25T14:22:47Z

Processing 12 Telegram iOS databases I collected here, I got:

	Master	This PR
Chats	841	1,354
Messages	23,170	81,987

lfcnassif · 2023-11-27T23:01:25Z

Hi @wladimirleite! Here are the testing results on 19 Telegram iOS databases I collected here so far:

	Master	This PR
Chats	1,633	2,762
Messages	6,938	61,999

After commit I pushed, we are recognizing more Telegram databases. In the other hand, 13 new exceptions are being thrown because of no such table t0 (ios user account extract query) and no such table t2 (ios contacts extract query). We should handle new tables from new versions properly...

wladimirleite · 2023-11-27T23:19:37Z

After commit I pushed, we are recognizing more Telegram databases. In the other hand, 13 new exceptions are being thrown because of no such table t0 (ios user account extract query) and no such table t2 (ios contacts extract query). We should handle new tables from new versions properly...

Great!
I can take a look at this, unless you already started. If not, please send me some sample databases that caused these exceptions.

lfcnassif · 2023-11-27T23:32:12Z

I can take a look at this, unless you already started. If not, please send me some sample databases that caused these exceptions.

I didn't, this week I'm in a meeting all days... I'll send you by Teams, thank you very much @wladimirleite!

…gDecode

lfcnassif · 2023-11-28T01:59:40Z

please send me some sample databases that caused these exceptions.

Just sent.

wladimirleite · 2023-11-28T20:50:45Z

@lfcnassif, taking a closer look in the samples you sent me and others I have here, I think that the databases with different set tables, like (t0, t2 and t3 only) and (t15, t16, t17, t19, t20 and t21), should not be identified as "application/x-telegram-db-ios".

Although they are named "db_sqlite" and used by Telegram, they contain different types of information, not related to contacts or messages chats. And they are not a newer version of the Telegram database we support.

Below is an example of a folder structure present in a very recent iOS extraction.
Although all files are named "db_sqlite", only the first one is actually the kind of database that contains contacts and chats.

So I am not sure if the changes you made for #2000 are the best approach.
The main issue with the previous detector code (before your changes) was expecting a "ft41" table, which is not present in all "valid" databases.

Tables actually used are t0, t2, t6, t7 and t9.
My suggestion is check only for these tables, as all the samples I have, more recent or older ones, including the ones you sent me, have these tables.
I would also remove detection by name, as it is detecting false positives.
Do you see any drawback of making these changes?
Anyway, I will commit the changes I made, after running more tests, so you can test with your set of databases when you have some time.
We can revert them later, if necessary.

wladimirleite · 2023-11-28T21:04:43Z

@lfcnassif, I found another (not directly related) issue that some databases I have here are not being parsed because of an exception in contacts parsing.
Looking at the code and the raw bytes, I found that the decoding process uses the function findOfset(key, tString), which may find incorrect offsets in rare situations.

For example, when extracting the "title" property of a contact, it calls that function to find a "t".
Then it will search for 3 bytes: the length (1), 't' (116) and the type string (which is 4). The problem is that [1, 116, 4] may be another thing, not this "title" tag. So when it tries to read a string after these 3 bytes, the length of the string (next 4 bytes) may be something else (like a huge or negative number).

This is rare, but it happens a couple of times in the DB samples I have.
There is no simple bug fix, as this is more a conceptual problem.
I will try to evaluate how complex it would be to rewrite it.

lfcnassif · 2023-11-28T22:59:11Z

@lfcnassif, taking a closer look in the samples you sent me and others I have here, I think that the databases with different set tables, like (t0, t2 and t3 only) and (t15, t16, t17, t19, t20 and t21), should not be identified as "application/x-telegram-db-ios".

Sorry @wladimirleite, I didn't test my change for false positive detections, my fault.

Tables actually used are t0, t2, t6, t7 and t9.
My suggestion is check only for these tables, as all the samples I have, more recent or older ones, including the ones you sent me, have these tables.
I would also remove detection by name, as it is detecting false positives.
Do you see any drawback of making these changes?

I totally agree with both recommendations, testing just for those tables and removing detection by name, I didn't know other different databases with db_sqlite name exist.

For example, when extracting the "title" property of a contact, it calls that function to find a "t".
Then it will search for 3 bytes: the length (1), 't' (116) and the type string (which is 4). The problem is that [1, 116, 4] may be another thing, not this "title" tag. So when it tries to read a string after these 3 bytes, the length of the string (next 4 bytes) may be something else (like a huge or negative number).

I see... If it is is a heuristic to read serialized data (I didn't check the code) maybe using a proper deserialize function may help...

wladimirleite · 2023-11-29T12:25:43Z

For example, when extracting the "title" property of a contact, it calls that function to find a "t".
Then it will search for 3 bytes: the length (1), 't' (116) and the type string (which is 4). The problem is that [1, 116, 4] may be another thing, not this "title" tag. So when it tries to read a string after these 3 bytes, the length of the string (next 4 bytes) may be something else (like a huge or negative number).

I see... If it is is a heuristic to read serialized data (I didn't check the code) maybe using a proper deserialize function may help...

As this is not related to the original issues, and this is clearly a bug (although rare), I am creating another issue.

…gDecode

wladimirleite · 2023-12-22T23:01:24Z

There are other things that could be improved in the Telegram internal parser, but this PR already include too many changes.
I am stopping here, so this can be reviewed.

lfcnassif · 2023-12-22T23:27:43Z

Thank you very much @wladimirleite!

@hauck-jvsh, since you are the original author of this parser, could you review this PR when you have available time?

hauck-jvsh · 2024-01-03T00:25:23Z

I can review, if time isn't a problem, this when I return from my vacation.

hauck-jvsh · 2024-01-17T23:08:54Z

I will start reviewing this tomorrow, is there any chances of getting some IOS databases? @lfcnassif, @wladimirleite

wladimirleite · 2024-01-17T23:30:50Z

I will start reviewing this tomorrow, is there any chances of getting some IOS databases? @lfcnassif, @wladimirleite

I am sending you (through Teams) the samples I have (the whole folder, which include also a few Android databases). It also includes iOS samples @lfcnassif sent me recently.

lfcnassif · 2024-01-17T23:53:29Z

I will start reviewing this tomorrow, is there any chances of getting some IOS databases? @lfcnassif, @wladimirleite

Thank you @hauck-jvsh! I think all my samples are included in @wladimirleite's data set.

…egramMsgDecode

hauck-jvsh · 2024-01-18T21:35:52Z

Excellent work, you have rewrite the PostBoxCoding from scratch, now you are decoding the entire object instead of the previous approach of find specific key inside the bytes. I think that now it is much more general, preventing decoding data with the wrong type or even data from other places.

hauck-jvsh · 2024-01-19T20:43:11Z

@wladimirleite, I'm running some tests here and it is taking a lot of time to process. I think that this is same problem you mentioned before, that messages have thousands of number in the messageTo metadata. Did you forget to push the changes?

wladimirleite · 2024-01-19T20:53:22Z

@wladimirleite, I'm running some tests here and it is taking a lot of time to process. I think that this is same problem you mentioned before, that messages have thousands of number in the messageTo metadata. Did you forget to push the changes?

Yes, it takes way too long for some DBs. Discussing with @lfcnassif, I created a new issue (#2011), as it is not directly related to the issues addressed by this PR, and ideally any solution (probably changing the current behavior) would have to be applied to other chat parsers.
To test this PR, I commented out the loop that sets "Communication:To" property for each message (for groups chats).

hauck-jvsh · 2024-01-22T18:55:44Z

After digging a little I manage to extract some thumbs from the database.

hauck-jvsh · 2024-01-22T19:16:32Z

Just one thing, previously the default behavior was to use the UFED resultfor IOS and the internal parser for Android, but now the default behavior will change, but it looks good to me, I think that we can merge.
@wladimirleite can you check the extraction of thumbs please?

wladimirleite · 2024-01-22T20:05:54Z

@wladimirleite can you check the extraction of thumbs please?

Sure! I will take a look as soon as possible.

wladimirleite · 2024-01-25T21:10:59Z

@wladimirleite can you check the extraction of thumbs please?

@hauck-jvsh, the thumbnail extraction worked great!
That is a really useful enhancement! Thank you!!!

I ran a few more tests, checking with an iPhone that I am working on, and everything looks fine.

I just pushed two minor commits (code formatting and preserving line breaks in messages). The line breaks make some messages more readable (I made a similar change in WhatsApp parser recently).

hauck-jvsh · 2024-01-25T21:28:24Z

I think that this is ready for merging. I had to reinstall my eclipse and I didn't apply the project formatting style.

lfcnassif · 2024-01-25T21:58:09Z

Sorry guys, I had to force push to fix a wrong conflict resolution pushed before, seems fine now.

Thank you very much @wladimirleite and @hauck-jvsh for this great work!

wladimirleite added 3 commits November 24, 2023 17:47

'#1998: Review Telegram message decoding, to handle more recent DBs.

baa0e44

'#1998: Minor code adjustments.

9486322

'#1998: Minor code adjustments.

7cda2a0

lfcnassif mentioned this pull request Nov 27, 2023

Better detection of Telegram iOS databases #2000

Closed

close #2000: better detection for Telegram iOS databases

d35d069

Merge remote-tracking branch 'origin/master' into #1998_FixTelegramMs…

5050d2e

…gDecode

wladimirleite added 4 commits November 28, 2023 10:30

'#1998: Remove unused query.

02b0e67

'#2000: Some Telegram iOS databases have ft41* tables.

e9d9aa9

'#1998: Check if iOS Telegram tables exists before querying.

2f158b1

'#1998: Ignore Telegram iOS contact with id=0.

c8de758

wladimirleite added 2 commits November 28, 2023 20:53

'#2000: Check tables t0, t2, t6, t7 and t9 to detect Telegram iOS DBs.

2319d60

'#2000: Do not use the file name to detect Telegram iOS DBs.

84d504b

wladimirleite mentioned this pull request Nov 29, 2023

In rare cases, Telegram iOS DBs decoding may fail and cause exceptions #2003

Closed

wladimirleite changed the title ~~Review Telegram chat messages decoding, to handle more recent DBs (#1998)~~ Telegram iOS DB parser enhancements and bug fixes (#1998, #2000 and #2003) Nov 29, 2023

wladimirleite marked this pull request as draft November 29, 2023 14:18

This was linked to issues Nov 29, 2023

Better detection of Telegram iOS databases #2000

Closed

In rare cases, Telegram iOS DBs decoding may fail and cause exceptions #2003

Closed

wladimirleite added 3 commits November 29, 2023 15:22

'#2003: Refactor Telegram iOS PostBox decoding.

40ed95c

'#1998: Revert commit 2f158b1, as tables are checked in DB detection.

422cd07

'#2003: Remove unused utility methods.

f2fb2b1

Merge remote-tracking branch 'origin/master' into #1998_FixTelegramMs…

57611d4

…gDecode

wladimirleite marked this pull request as ready for review December 22, 2023 23:01

patrickdalla mentioned this pull request Jan 5, 2024

Tika Metadata Add performance optimization #2043

Open

Merge branch 'master' of github.com:sepinf-inc/IPED into #1998_FixTel…

ccb2ac9

…egramMsgDecode

Extract small thumbs from the database.

ecc757e

Define the jpeg header as a static constant

f4d6e85

hauck-jvsh self-requested a review January 25, 2024 14:24

hauck-jvsh approved these changes Jan 25, 2024

View reviewed changes

wladimirleite added 2 commits January 25, 2024 17:55

'#2000: Apply code formatting.

733fa03

'#2000: Keep line breaks present in Telegram messages.

fffa129

Merge branch 'master' into #1998_FixTelegramMsgDecode

183016b

lfcnassif force-pushed the #1998_FixTelegramMsgDecode branch from 7a0cc1c to 183016b Compare January 25, 2024 21:54

lfcnassif merged commit 183016b into master Jan 25, 2024
2 checks passed

lfcnassif deleted the #1998_FixTelegramMsgDecode branch January 25, 2024 22:02

lfcnassif linked an issue Jan 26, 2024 that may be closed by this pull request

Enable Telegram iOS parser for UFDR and expose "extractMessages" parameter for Telegram Parser #2005

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Telegram iOS DB parser enhancements and bug fixes (#1998, #2000, #2003 and #2005) #1999

Telegram iOS DB parser enhancements and bug fixes (#1998, #2000, #2003 and #2005) #1999

wladimirleite commented Nov 24, 2023 •

edited

Loading

lfcnassif commented Nov 25, 2023

wladimirleite commented Nov 25, 2023

lfcnassif commented Nov 27, 2023

wladimirleite commented Nov 27, 2023

lfcnassif commented Nov 27, 2023

lfcnassif commented Nov 28, 2023

wladimirleite commented Nov 28, 2023

wladimirleite commented Nov 28, 2023

lfcnassif commented Nov 28, 2023

wladimirleite commented Nov 29, 2023

wladimirleite commented Dec 22, 2023

lfcnassif commented Dec 22, 2023 •

edited

Loading

hauck-jvsh commented Jan 3, 2024 •

edited

Loading

hauck-jvsh commented Jan 17, 2024

wladimirleite commented Jan 17, 2024

lfcnassif commented Jan 17, 2024

hauck-jvsh commented Jan 18, 2024 •

edited

Loading

hauck-jvsh commented Jan 19, 2024

wladimirleite commented Jan 19, 2024

hauck-jvsh commented Jan 22, 2024

hauck-jvsh commented Jan 22, 2024

wladimirleite commented Jan 22, 2024

wladimirleite commented Jan 25, 2024

hauck-jvsh commented Jan 25, 2024

lfcnassif commented Jan 25, 2024

Telegram iOS DB parser enhancements and bug fixes (#1998, #2000, #2003 and #2005) #1999

Telegram iOS DB parser enhancements and bug fixes (#1998, #2000, #2003 and #2005) #1999

Conversation

wladimirleite commented Nov 24, 2023 • edited Loading

lfcnassif commented Nov 25, 2023

wladimirleite commented Nov 25, 2023

lfcnassif commented Nov 27, 2023

wladimirleite commented Nov 27, 2023

lfcnassif commented Nov 27, 2023

lfcnassif commented Nov 28, 2023

wladimirleite commented Nov 28, 2023

wladimirleite commented Nov 28, 2023

lfcnassif commented Nov 28, 2023

wladimirleite commented Nov 29, 2023

wladimirleite commented Dec 22, 2023

lfcnassif commented Dec 22, 2023 • edited Loading

hauck-jvsh commented Jan 3, 2024 • edited Loading

hauck-jvsh commented Jan 17, 2024

wladimirleite commented Jan 17, 2024

lfcnassif commented Jan 17, 2024

hauck-jvsh commented Jan 18, 2024 • edited Loading

hauck-jvsh commented Jan 19, 2024

wladimirleite commented Jan 19, 2024

hauck-jvsh commented Jan 22, 2024

hauck-jvsh commented Jan 22, 2024

wladimirleite commented Jan 22, 2024

wladimirleite commented Jan 25, 2024

hauck-jvsh commented Jan 25, 2024

lfcnassif commented Jan 25, 2024

wladimirleite commented Nov 24, 2023 •

edited

Loading

lfcnassif commented Dec 22, 2023 •

edited

Loading

hauck-jvsh commented Jan 3, 2024 •

edited

Loading

hauck-jvsh commented Jan 18, 2024 •

edited

Loading