ZFS invalid mode: 0x38 #2316

maci0 · 2014-05-08T22:46:33Z

woops

maci0 · 2014-05-08T22:51:32Z

happened after I set xattr=sa on my pool and all datasets.
also happens when I boot with selinux=0

maci0 · 2014-05-08T22:53:28Z

this one has version info

maci0 · 2014-05-09T00:33:51Z

i was able to import the pool from another system without a problem

upgrading to latest head now

maci0 · 2014-05-09T02:18:41Z

i was able to reproduce it every time. it happens when the FS has selinux labels, even when i boot with selinux=0

tuxoko · 2014-05-09T03:40:21Z

I'm not really sure.
But this seems related to #2228
This happens on current HEAD, is that correct?

maci0 · 2014-05-09T07:14:13Z

yes current HEAD

maci0 · 2014-05-09T12:02:11Z

was able to boot with init=/bin/bash and zfs set xattr=off on my datasets.
but the system still fails to come up.
@behlendorf any idea?
setfattr -x security.selinux didnt work either.
is there any other way to remove xattr from my files ? maybe using some zdb magic?

dweeezil · 2014-05-09T12:11:59Z

This is very likely another instance of #2228 as @tuxoko pointed out or possibly #2214.

@maci0 Have you got a reproducing scenario starting with a fresh pool creation? It might be useful to follow the debugging steps of #2228 (apply the small patch to zfs_znode.c) and find the inode of a corrupted file/directory and then try dumping it with the hacked zdb of dweeezil/zfs@9888f3c.

maci0 · 2014-05-09T12:41:43Z

I can always reproduce this on a fresh pool. the steps are rahter lengthy tho:
boot a centos6 system, install spl and zfs head, create pool and datasets on a second disk.
bootstrap rhel7, chroot into rhel7, install spl and zfs head, get grub working. boot into rhel7, install missing selinux packages and trigger autorelabel, reboot.
if i find time i will try to create a vm image to test this problem

dweeezil · 2014-05-09T12:58:06Z

Just to make sure I'm clear of these steps, you're doing a fresh centos6 install for the purpose of creating the pool and then copying a complete, existing rhel7 system on to that pool which you then set up for direct ZFS boot? Then the corruption of the pool happens during the selinux labeling by the rhel7 system?

If this is the case, I wonder if I can duplicate it simply by running restorecon on an already-populated filesystem (under rhel6 or any other distro for that matter). I may give this a try.

maci0 · 2014-05-09T13:10:04Z

its a server in a managed datacenter, so you have several installation options. centos6 is one of them, so i install centos6 without raid ( the server has 2 hdds )
i create a zpool on the second hdd
and bootstrap a fresh rhel7 there.
after the rhel7 system is booted, I installed the selinux-policy packages and relabeled the system everything fails.

here are the steps as from my notes:

##fresh centos 6

yum -y install screen
screen

yum -y install https://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

yum -y install @development kernel-devel zlib-devel libuuid-devel \
libblkid-devel libselinux-devel parted lsscsi wget gdisk

#prepare 2nd disk
sgdisk /dev/sdb -Z
sgdisk /dev/sdb -n 1:0:+2M
sgdisk /dev/sdb -t 1:ef02
sgdisk /dev/sdb -n 2::
sgdisk /dev/sdb -t 2:bf00



cd /usr/src
git clone https://github.com/zfsonlinux/spl.git
cd spl
sed -i 's/make/make -j8/' scripts/dkms.mkconf
./autogen.sh
./configure --with-config=user
make rpm-utils rpm-dkms
yum -y install *.rpm

cd /usr/src
git clone https://github.com/zfsonlinux/zfs.git
cd zfs
sed -i 's/make/make -j8/' scripts/dkms.mkconf
./autogen.sh
./configure --with-config=user
make rpm-utils rpm-dkms
yum -y install *.rpm

######################### end rhel6 specifics
zpool create -f -o ashift=12 -O compression=lz4 -O xattr=sa -m none rpool /dev/sdb2
zpool export rpool
zpool import -o altroot=/sysroot rpool

zfs create -o mountpoint=none rpool/ROOT
zfs create -o mountpoint=/ rpool/ROOT/rhel7
zfs create -o mountpoint=/home rpool/HOME
zfs create -o mountpoint=/root rpool/HOME/root
zfs create -o sync=always -o primarycache=metadata -o secondarycache=none -o volblocksize=4K -V 16G rpool/swap

mkswap -f /dev/zvol/rpool/swap


echo "
[rhel7]
baseurl=http://mirror1.hs-esslingen.de/Mirrors/ftp.redhat.com/redhat/rhel/rc/7/Server/x86_64/os/
gpgcheck=0
enabled=0

[epel7]
baseurl=http://mirror1.hs-esslingen.de/Mirrors/epel/beta/7/x86_64/
gpgcheck=0
enabled=0
" > /etc/yum.repos.d/epel7.repo

echo"
title chainloader
    root (hd1)
    chainloader +1
" >> /etc/grub.cfg


#### bootstrap rhel7
yum -y --nogpg --installroot=/sysroot --disablerepo='*' --enablerepo=rhel7 \
--enablerepo=epel7 --exclude=firewalld install @base vim-minimal openssh-server e2fsprogs iptables-services \
@development kernel-devel zlib-devel libuuid-devel \
libblkid-devel libselinux-devel parted lsscsi wget gdisk

##### copy needed files
cp /etc/resolv.conf /sysroot/etc
cp /etc/sysconfig/network /sysroot/etc/sysconfig
cp /etc/sysconfig/network-scripts/ifcfg-eth* /sysroot/etc/sysconfig/network-scripts
cp /etc/yum.repos.d/epel7.repo /sysroot/etc/yum.repos.d 

echo "hv.satgnu.net" > /sysroot/etc/hostname
echo "/dev/zvol/rpool/swap none swap discard 0 0" >> /sysroot/etc/fstab




mount -o bind /dev /sysroot/dev
mount -o bind /sys /sysroot/sys
mount -o bind /proc /sysroot/proc
chroot /sysroot /bin/bash

### in chroot

yum-config-manager --enable epel7
yum-config-manager --enable rhel7

echo "changeme" | passwd root --stdin

cd /usr/src
rm -Rf spl
git clone https://github.com/zfsonlinux/spl.git
cd spl
sed -i 's/make/make -j8/' scripts/dkms.mkconf
./autogen.sh
./configure --with-config=user
make rpm-utils rpm-dkms
yum -y localinstall *.rpm

cd /usr/src
rm -Rf zfs
git clone https://github.com/zfsonlinux/zfs.git
cd zfs
sed -i 's/make/make -j8/' scripts/dkms.mkconf
./autogen.sh
./configure --with-config=user --with-dracutdir=/usr/lib/dracut
make rpm-utils rpm-dkms
ln -s /usr/lib/dracut/ /usr/share/dracut #workaround
yum -y localinstall *.rpm

dracut -f -v --kver=3.10.0-121.el7.x86_64


yum -y install grub2

grub2-install /dev/sdb
grub2-mkconfig -o /boot/grub2/grub.cfg

reboot



###### system boots fine

dweeezil · 2014-05-10T12:58:58Z

I may have something I can work with. I'm able to reliably trigger an assertion when copying a bunch of files with typical security.selinux xattrs to a xattr=sa filesystem. Details to follow after I've done some investigation.

dweeezil · 2014-05-11T01:28:39Z

@maci0 Could you please try the patch in #2321. My test system now survives a complete xattr-copying rsync of my root filesystem onto ZFS with all the selinux xattrs. The typical trouble spot turned out to be the /etc/ssl/certs (or equivalent depending on your distro; likely /etc/pki/ under redhat) due to all the big symlinks.

EDIT: This may fix #2228.
EDIT2: This may fix #2214.

maci0 · 2014-05-12T10:36:01Z

I was able to reproduce the issue in a VM.
It complains about invalid mode 0x28.
Stacktrace looks similar though.

You are also right about /etc/pki. It doesn't properly relabel those files.
Also some files in /usr/lib/systemd/system fail.

After applying the patch on a faulting system + updating the initrd to include the latest module versions I still get a crash though.

Attached are screenshots with and without selinux enabled respectively.

dweeezil · 2014-05-12T12:14:31Z

@maci0 By "After applying the patch on a faulting system", do you mean you applied the patch once the filesystem had already been corrupted? If so, it won't fix anything. Once the filesystem is corrupted your only recourse would be to destroy the filesystem and re-create it.

The patch must already be applied when the files are being written to the filesystem. If you're running stock 0.6.2 when bootstrapping the system under centos6, it likely has a different but related bug which affects symlinks and was fixed by 472e7c6.

You should be running current master code with the patch during both phases of your test: while you're using your initial centos6 for bootstrapping and also while you're running rhel7.

maci0 · 2014-05-12T17:11:30Z

I see. Sorry, I was not aware the bug causes actual corruption.
I wasn't using stock 0.6.2 but HEAD.
I will try again tomorrow with your patch applied.
Should I use this patch only applied to HEAD or use your issue-2316 branch ? Github says its 200sth commit ahead of master

DeHackEd · 2014-05-12T17:25:32Z

It's 200 commits ahead of HIS master branch, which looks like it's currently 33 commits ahead of the 0.6.2 release. Don't worry about that. You're not using it.

His issue-2316 branch is fine to use straight up.

maci0 · 2014-05-15T15:00:22Z

@dweeezil

after applying your patches it breaks as well, but without a stacktrace

dweeezil · 2014-05-15T17:34:20Z

@maci0 Unfortunately, I don't have a problem I can replicate any more. The next step would be to get a stack trace from your system. Have you tried using sysrq to get stack traces?

In the case where a variable-sized SA overlaps the spill block pointer and a new variable-sized SA is being added, the header size was improperly calculated to include the to-be-moved SA. This problem could be reproduced when xattr=sa enabled as follows: ln -s $(perl -e 'print "x" x 120') blah setfattr -n security.selinux -v blahblah -h blah The symlink is large enough to interfere with the spill block pointer and has a typical SA registration as follows (shown in modified "zdb -dddd" <SA attr layout obj> format): [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ] Adding the SA xattr will attempt to extend the registration to: [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ] but since the ZPL_SYMLINK SA interferes with the spill block pointer, it must also be moved to the spill block which will have a registration of: [ ZPL_SYMLINK ZPL_DXATTR ] This commit updates extra_hdrsize when this condition occurs, allowing hdrsize to be subsequently decreased appropriately. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue #2214 Issue #2228 Issue #2316 Issue #2343

behlendorf · 2014-05-19T19:16:25Z

A potential fix for this has been merged. Please let us know if you're able to recreate this issue using a pool created from the latest master source which includes commit 83021b4.

In the case where a variable-sized SA overlaps the spill block pointer and a new variable-sized SA is being added, the header size was improperly calculated to include the to-be-moved SA. This problem could be reproduced when xattr=sa enabled as follows: ln -s $(perl -e 'print "x" x 120') blah setfattr -n security.selinux -v blahblah -h blah The symlink is large enough to interfere with the spill block pointer and has a typical SA registration as follows (shown in modified "zdb -dddd" <SA attr layout obj> format): [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ] Adding the SA xattr will attempt to extend the registration to: [ ... ZPL_DACL_COUNT ZPL_DACL_ACES ZPL_SYMLINK ZPL_DXATTR ] but since the ZPL_SYMLINK SA interferes with the spill block pointer, it must also be moved to the spill block which will have a registration of: [ ZPL_SYMLINK ZPL_DXATTR ] This commit updates extra_hdrsize when this condition occurs, allowing hdrsize to be subsequently decreased appropriately. Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ned Bass <bass6@llnl.gov> Issue openzfs#2214 Issue openzfs#2228 Issue openzfs#2316 Issue openzfs#2343

maci0 · 2014-07-29T06:42:47Z

i was not able to reproduce it anymore with 0.6.3 seems fixed

behlendorf · 2014-07-30T20:18:07Z

@maci0 Thanks for the update. Then let's close this out as fixed for now. If it ever resurfaces we can (and should) open a new issue.

behlendorf added this to the 0.7.0 milestone May 10, 2014

behlendorf added the Bug label May 10, 2014

dweeezil mentioned this issue May 11, 2014

Caculate header size correctly in sa_find_sizes() #2321

Closed

behlendorf modified the milestones: 0.6.3, 0.7.0 May 12, 2014

dweeezil mentioned this issue May 19, 2014

ZFS Invalid mode: 0x50 - SPL PANIC #2343

Closed

behlendorf modified the milestones: 0.6.4, 0.6.3 May 19, 2014

behlendorf closed this as completed Jul 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZFS invalid mode: 0x38 #2316

ZFS invalid mode: 0x38 #2316

maci0 commented May 8, 2014

maci0 commented May 8, 2014

maci0 commented May 8, 2014

maci0 commented May 9, 2014

maci0 commented May 9, 2014

tuxoko commented May 9, 2014

maci0 commented May 9, 2014

maci0 commented May 9, 2014

dweeezil commented May 9, 2014

maci0 commented May 9, 2014

dweeezil commented May 9, 2014

maci0 commented May 9, 2014

dweeezil commented May 10, 2014

dweeezil commented May 11, 2014

maci0 commented May 12, 2014

dweeezil commented May 12, 2014

maci0 commented May 12, 2014

DeHackEd commented May 12, 2014

maci0 commented May 15, 2014

dweeezil commented May 15, 2014

behlendorf commented May 19, 2014

maci0 commented Jul 29, 2014

behlendorf commented Jul 30, 2014

ZFS invalid mode: 0x38 #2316

ZFS invalid mode: 0x38 #2316

Comments

maci0 commented May 8, 2014

maci0 commented May 8, 2014

maci0 commented May 8, 2014

maci0 commented May 9, 2014

maci0 commented May 9, 2014

tuxoko commented May 9, 2014

maci0 commented May 9, 2014

maci0 commented May 9, 2014

dweeezil commented May 9, 2014

maci0 commented May 9, 2014

dweeezil commented May 9, 2014

maci0 commented May 9, 2014

dweeezil commented May 10, 2014

dweeezil commented May 11, 2014

maci0 commented May 12, 2014

dweeezil commented May 12, 2014

maci0 commented May 12, 2014

DeHackEd commented May 12, 2014

maci0 commented May 15, 2014

dweeezil commented May 15, 2014

behlendorf commented May 19, 2014

maci0 commented Jul 29, 2014

behlendorf commented Jul 30, 2014