Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracted files seem to be corrupted #230

Closed
Silv3rHorn opened this issue Sep 27, 2017 · 15 comments
Closed

Extracted files seem to be corrupted #230

Silv3rHorn opened this issue Sep 27, 2017 · 15 comments
Assignees
Labels
question This issue is a question

Comments

@Silv3rHorn
Copy link

Hi,

I have written (mostly copied) a python script (br.txt) using dfvfs to automate the extraction of SOFTWARE registry hive from a forensic image (including volume shadow copies). However, the hive extracted differs in hash (MD5) from the ones extracted using X-Ways and AD FTK Imager.

e.g. cfreds_2015_data_leakage_pc.E01
br.py (using dfvfs)    54449C66393659C1EB5BD98D74AD5E5D
X-Ways                 DE50BC9B74A65373B540796D9FFD9D1F
AD FTK                 DE50BC9B74A65373B540796D9FFD9D1F

binary comparison

I have tried using different forensic images, latest 2 versions of dfvfs, python 2 & 3, but I still encounter this issue.

Steps to reproduce the issue:

  1. Rename br.txt to br.py
  2. Download cfreds pc image (4x e01 files) from https://www.cfreds.nist.gov/data_leakage_case/data-leakage-case.html
  3. Create output directory
  4. python br.py <downloaded cfreds image> <output directory>

Would appreciate your assistance in resolving the issue, pls.

Regards.

@joachimmetz
Copy link
Member

which version of libewf are you using? stable or experimental?

@Silv3rHorn
Copy link
Author

Silv3rHorn commented Sep 28, 2017

Hi Joachim,

Thanks for the reply. I was using libewf-experimental-20170703, and I just tried with libewf-20140608, but the hash is still different. Could you point me to where I can download other stable builds (your google drive link seems to be down)?

@joachimmetz
Copy link
Member

Could you point me to where I can download other stable builds

At the moment I don't distribute binary builds; source package can be found here https://github.com/libyal/legacy

I'll have a look when time permits if I can reproduce the behavior. Do you have the br.py somewhere you can share?

@Silv3rHorn
Copy link
Author

Yes, you can download it at https://www.dropbox.com/s/q2fm0mqd5vwx6j3/br.py?dl=0

e.g. cfreds_2015_data_leakage_pc.E01
br.py (using libewf-experimental)    54449C66393659C1EB5BD98D74AD5E5D
br.py (using libewf-20140608)        DB68CC9B7F8979821D9273A7A266506B	
X-Ways                               DE50BC9B74A65373B540796D9FFD9D1F
AD FTK                               DE50BC9B74A65373B540796D9FFD9D1F

@joachimmetz
Copy link
Member

joachimmetz commented Oct 15, 2017

Downloaded: pc.E01, pc.E02, pc.E03, pc.E04 from https://www.cfreds.nist.gov/data_leakage_case/data-leakage-case.html

ewfverify -q cfreds_2015_data_leakage_pc.E01
ewfverify 20140801

MD5 hash stored in file:		a49d1254c873808c58e6f1bcd60b5bde
MD5 hash calculated over data:		a49d1254c873808c58e6f1bcd60b5bde

Additional hash values:
SHA1:	afe5c9ab487bd47a8a9856b1371c2384d44fd785

ewfverify: SUCCESS

Side note: 20140801 is the latest stable 20140608 found here https://github.com/libyal/libewf-legacy

ran: ewfmount and sleuthkit icat

icat -o 206848 fuse/ewf1 58910-128-3 | md5sum 
db68cc9b7f8979821d9273a7a266506b  -

Result dfVFS recursive hasher with TSK

db68cc9b7f8979821d9273a7a266506b	/Windows/System32/config/SOFTWARE

libfsntfs

db68cc9b7f8979821d9273a7a266506b	\Windows\System32\config\SOFTWARE

mount.ntfs (ntfs3 fuse)

md5sum p2/Windows/System32/config/SOFTWARE
db68cc9b7f8979821d9273a7a266506b  p2/Windows/System32/config/SOFTWARE

Downloaded: pc.7z.001, pc.7z.002, pc.7z.003 from https://www.cfreds.nist.gov/data_leakage_case/data-leakage-case.html

7za x cfreds_2015_data_leakage_pc.7z.001
sha1sum cfreds_2015_data_leakage_pc.dd
afe5c9ab487bd47a8a9856b1371c2384d44fd785  cfreds_2015_data_leakage_pc.dd

SHA1 of image not the same between raw and E01, hashes of individual compressed segments do match

icat -o 206848 cfreds_2015_data_leakage_pc.dd 58910-128-3 | md5sum
db68cc9b7f8979821d9273a7a266506b  -

So MD5 of for SOFTWARE extracted from raw and E01 the same

mount.ntfs (ntfs3 fuse)

sudo mount -oro,offset=$(( 206848 * 512 )) cfreds_2015_data_leakage_pc.dd p2
md5sum p2/Windows/System32/config/SOFTWARE
db68cc9b7f8979821d9273a7a266506b  p2/Windows/System32/config/SOFTWARE

@joachimmetz joachimmetz self-assigned this Oct 15, 2017
@joachimmetz joachimmetz added the question This issue is a question label Oct 15, 2017
@joachimmetz
Copy link
Member

joachimmetz commented Oct 15, 2017

@Silv3rHorn

Current summary:

  • tested 3 different NTFS implementations
  • tested 2 different storage media types from source

MD5 of p2\Windows\System32\config\SOFTWARE remains the same across implementations.

@joachimmetz
Copy link
Member

Used FTK imager on both the raw and E01

md5sum SOFTWARE.*
de50bc9b74a65373b540796d9ffd9d1f  SOFTWARE.ftk
db68cc9b7f8979821d9273a7a266506b  SOFTWARE.tsk
ls -al SOFTWARE.* | awk '{print $5 "\t" $9}'
48496640	SOFTWARE.ftk
48496640	SOFTWARE.tsk
diff SOFTWARE.ftk.log SOFTWARE.tsk.log
3028231,3029120c3028231,3029120
< 02e35060  00 00 02 00 0b 00 00 00  20 00 00 80 10 00 00 00  |........ .......|
...
< 02e387f0  0a 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
---
> 02e35060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...
> 02e387f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
3029126,3029647c3029126,3029647
< 02e38850  00 00 00 00 00 10 00 00  2c 00 00 00 68 32 74 32  |........,...h2t2|
...
< 02e3a8e0  b4 31 94 34 98 34 9c 34  a4 34 a8 34 ac 34 b0 34  |.1.4.4.4.4.4.4.4|
---
> 02e38850  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
...

@joachimmetz
Copy link
Member

joachimmetz commented Oct 16, 2017

fsntfsinfo o $(( 206848 * 512 )) cfreds_2015_data_leakage_pc.dd
fsntfsinfo 20171016
Windows NT File System information:

Volume information:
	Name				: 
	Version				: 3.1
	Serial number			: c8ca0c8dca0c7a48
	Cluster block size		: 4096
	MFT entry size			: 1024
	Index entry size		: 4096
fsntfsinfo -E 58910 -o $(( 206848 * 512 )) cfreds_2015_data_leakage_pc.dd
fsntfsinfo 20171016

MFT entry: 58910 information:
	Is allocated			: true
	File reference			: MFT entry: 58910, sequence: 1
	Base record file reference	: MFT entry: 0, sequence: 0
	Journal sequence number		: 386538538
	Number of attributes		: 3

Attribute: 1
	Type				: $STANDARD_INFORMATION (0x00000010)
	Creation time			: Jul 14, 2009 02:34:08.088364600 UTC
	Modification time		: Mar 25, 2015 15:31:05.341208900 UTC
	Access time			: Mar 25, 2015 15:31:05.341208900 UTC
	Entry modification time		: Mar 25, 2015 15:31:05.294408800 UTC
	Owner identifier		: 0
	Security descriptor identifier	: 584
	Update sequence number		: 69763712
	File attribute flags		: 0x00000020
		Should be archived (FILE_ATTRIBUTE_ARCHIVE)

Attribute: 2
	Type				: $FILE_NAME (0x00000030)
	Parent file reference		: MFT entry: 2360, sequence: 1
	Creation time			: Mar 25, 2015 11:13:39.503881500 UTC
	Modification time		: Mar 25, 2015 11:13:39.503881500 UTC
	Access time			: Mar 25, 2015 11:13:39.503881500 UTC
	Entry modification time		: Mar 25, 2015 11:13:39.503881500 UTC
	File attribute flags		: 0x00000020
		Should be archived (FILE_ATTRIBUTE_ARCHIVE)
	Name				: SOFTWARE

Attribute: 3
	Type				: $DATA (0x00000080)
	Data VCN range			: 0 - 11839
	Data size			: 48496640 bytes 
	Data flags			: 0x0000
$DATA
non-resident attribute data:
00000000: 00 00 00 00 00 00 00 00  3f 2e 00 00 00 00 00 00   ........ ?.......
00000010: 40 00 00 00 00 00 00 00  00 00 e4 02 00 00 00 00   @....... ........
00000020: 00 00 e4 02 00 00 00 00  00 50 e3 02 00 00 00 00   ........ .P......

segment: 000    file index: 000 offset: 0x3045dc000 - 0x3073dc000 (size: 48234496)
(fo: 0x2e00000)
segment: 001    file index: 000 offset: 0x3b06a8000 - 0x3b06dd000 (size: 217088)
(fo: 0x2e35000)
segment: 002    file index: 000 offset: 0x02e35000 - 0x02e40000 (size: 45056)

Looks that the last segment (data run) is the one that differs

@joachimmetz
Copy link
Member

joachimmetz commented Oct 16, 2017

Booted a Windows 10 VM and attached the image:

md5sum.exe /cygdrive/f/Windows/System32/config/SOFTWARE
db68cc9b7f8979821d9273a7a266506b */cygdrive/f/Windows/System32/config/SOFTWARE

@joachimmetz
Copy link
Member

@Silv3rHorn what it looks like, it that FTK and XWays are the ones interpreting NTFS differently than the Windows OS.

@joachimmetz
Copy link
Member

joachimmetz commented Oct 16, 2017

I'm closing this issue, since libewf (stable), libtsk and dfvfs seem to do the right thing.

Informed AccessData and X-Ways by mail.

@joachimmetz
Copy link
Member

joachimmetz commented Oct 16, 2017

According to X-Ways they include the data stored after the valid data size, depending on your configuration:

data first VCN                       : 0
data last VCN                        : 11839
data runs offset                     : 0x0040
compression unit size                : 0 (0)
padding                              : 0x00000000
allocated data size                  : 48496640
data size                            : 48496640
valid data size                      : 48451584 (0x02e35000)

From a UX perspective IMHO this should be verbose in UI (can only judge FTK on this)

@joachimmetz
Copy link
Member

joachimmetz commented Oct 16, 2017

For context: https://www.osr.com/nt-insider/2015-issue1/logical-physical-file-sizes-windows/

Section: Valid Data Length

Space beyond this point may be allocated, but the data contents are not returned to the caller – instead,
they receive known safe data – usually zero filled memory, though nothing requires that specific pattern.

The TL;DR is that the behavior of data beyond valid data size is not considered part of the file and should be handled differently.

@Silv3rHorn
Copy link
Author

Thanks. Really appreciate your time and effort in clarifying this issue.

@joachimmetz
Copy link
Member

For completeness AccessData has indicated to work on changes to FTKImager to have it behave like Windows and expose the "invalid data" as slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question This issue is a question
Projects
None yet
Development

No branches or pull requests

2 participants