Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs-crypto is far behind ZoL #32

Closed
FransUrbo opened this issue Jun 5, 2013 · 33 comments
Closed

zfs-crypto is far behind ZoL #32

FransUrbo opened this issue Jun 5, 2013 · 33 comments

Comments

@FransUrbo
Copy link
Contributor

I wanted to try the new libzfs_core code, but it breaks zfs-crypto. Tried to merge it myself, but it's a very, very big change and I'm unsure how to do it in some files.

openzfs/zfs#1496

Sooner or later (hopefully sooner :), someone with knowledge of the crypto stuff needs to take care of this, so I'm putting this issue as a 'heads up'.

UPDATE: This pull have now (September 4, 2013) been accepted into ZoL HEAD. But there have been huge amounts of work merging illumos fixes into ZoL (and more is in the pipeline). zfs-crypto is currently way behind in commits. Even the 0.6.3 branch which is the work branch for this.

@dweeezil
Copy link
Contributor

dweeezil commented Jun 5, 2013

@FransUrbo I'll take a look at this. I've still got the code pretty fresh in my mind.

@dweeezil
Copy link
Contributor

dweeezil commented Jun 5, 2013

Seems like a good first step would be to get zfs-crypto rebased to a current master. The #1496 pull is based on zfsonlinux/zfs master as of 0377189. I just tried it myself and, so far, the conflicts look pretty minor.

@FransUrbo
Copy link
Contributor Author

Sounds good, thanx for looking into this. ZFSRogue is quite busy, but he usually 'comes in' every now and then and accepts pulls etc, so give him a few days for the rebase... :)

@lundman
Copy link
Contributor

lundman commented Jun 7, 2013

Looks like rogue has merged upstream

@zfsrogue
Copy link
Owner

zfsrogue commented Nov 1, 2013

ok sync with 0.6.3, large change with libzfs_core. changing outside, not core of zfs. test please.

@FransUrbo
Copy link
Contributor Author

Get compilation errors with latest pull from ZoL:

../../module/zfs/dmu_objset.c: In function 'dmu_objset_create_check':
../../module/zfs/dmu_objset.c:796: error: 'oa' undeclared (first use in this function)
../../module/zfs/dmu_objset.c:796: error: (Each undeclared identifier is reported only once
../../module/zfs/dmu_objset.c:796: error: for each function it appears in.)
../../module/zfs/dmu_objset.c: In function 'dmu_objset_create_sync':
../../module/zfs/dmu_objset.c:819: warning: passing argument 4 of 'dsl_dataset_create_sync' makes pointer from integer without a cast
../../module/zfs/dmu_objset.c:819: warning: passing argument 5 of 'dsl_dataset_create_sync' makes integer from pointer without a cast
../../module/zfs/dmu_objset.c:819: warning: passing argument 6 of 'dsl_dataset_create_sync' from incompatible pointer type
../../module/zfs/dmu_objset.c:819: error: too few arguments to function 'dsl_dataset_create_sync'
../../module/zfs/dmu_objset.c:824: warning: passing argument 5 of 'dmu_objset_create_impl' from incompatible pointer type
../../module/zfs/dmu_objset.c:824: error: too few arguments to function 'dmu_objset_create_impl'
../../module/zfs/dmu_objset.c: In function 'dmu_objset_clone_sync':
../../module/zfs/dmu_objset.c:923: warning: passing argument 5 of 'dsl_dataset_create_sync' makes integer from pointer without a cast
../../module/zfs/dmu_objset.c:923: warning: passing argument 6 of 'dsl_dataset_create_sync' from incompatible pointer type
../../module/zfs/dmu_objset.c:923: error: too few arguments to function 'dsl_dataset_create_sync'

When I pulled in zfs-crypto, I got these conflicts:

static int
dmu_objset_create_check(void *arg, dmu_tx_t *tx)
{
<<<<<<< HEAD:module/zfs/dmu_objset.c
    dmu_objset_create_arg_t *doca = arg;                                                                                                                  
    dsl_pool_t *dp = dmu_tx_pool(tx);
    dsl_dir_t *pdd;                                                                                                                                       
    const char *tail;                                                                                                                                     
    int error;                                                                                                                                            
=======
    dsl_dir_t *dd = arg1;                                                                                                                                 
    struct oscarg *oa = arg2;                                                                                                                             
    objset_t *mos = dd->dd_pool->dp_meta_objset;                                                                                                          
    int err;                                                                                                                                              
    uint64_t ddobj;         
    static boolean_t dp_config_rwlock_held = B_TRUE;                                                                                                      

    err = zap_lookup(mos, dd->dd_phys->dd_child_dir_zapobj,
        oa->lastname, sizeof (uint64_t), 1, &ddobj);                                                                                                      
    if (err != ENOENT)
            return (err ? err : EEXIST);                                                                                                                  
>>>>>>> c2383cdcc4f302fa13761701e7d382a865a5ee09:module/zfs/dmu_objset.c

    if (strchr(doca->doca_name, '@') != NULL)
            return (SET_ERROR(EINVAL));                                                                                                                   

<<<<<<< HEAD:module/zfs/dmu_objset.c
    error = dsl_dir_hold(dp, doca->doca_name, FTAG, &pdd, &tail);                                                                                         
    if (error != 0)
            return (error);                                                                                                                               
    if (tail == NULL) {
            dsl_dir_rele(pdd, FTAG);                                                                                                                      
            return (SET_ERROR(EEXIST));                                                                                                                   
=======
            /* You can only clone snapshots, not the head datasets. */
            if (!dsl_dataset_is_snapshot(oa->clone_origin))
                    return (EINVAL);                                                                                                                      

        if (dsl_dataset_keystatus(oa->clone_origin,
            dp_config_rwlock_held) == ZFS_CRYPT_KEY_UNAVAILABLE)
            return (ENOKEY);                                                                                                                              
>>>>>>> c2383cdcc4f302fa13761701e7d382a865a5ee09:module/zfs/dmu_objset.c
    }
    dsl_dir_rele(pdd, FTAG);                                                                                                                              

    /*
     * Check we have the required crypto algorithms available
     * via kcf since this is our last chance to fail the dataset creation.
     */
    if (oa->crypto_ctx != NULL &&
        !zcrypt_mech_available(oa->crypto_ctx->dcc_crypt)) {
        return (ENOTSUP);                                                                                                                                 
    }

    return (0);                                                                                                                                           
}

I kept the HEAD stuff. Maybe that was the misstake.

On the other hand, dmu_objset_create_check() was rewritten in commit 13fe019 for a reason:

commit 13fe019
Author: Matthew Ahrens mahrens@delphix.com
Date: Wed Sep 4 07:00:57 2013 -0500

Illumos #3464 - zfs synctask code needs restructuring
References: https://www.illumos.org/issues/3464

@lundman
Copy link
Contributor

lundman commented Nov 1, 2013

You should get spl-0.6.3 and zfs-0.6.3 and build. I just tried here and had no issues for the whole build.

@FransUrbo
Copy link
Contributor Author

I have. But that's not the problem. The problem is that 'oa' isn't defined in module/zfs/dmu_objset.c:dmu_objset_create_check(). That is, if one takes the ZoL version. And looking at the 13fe019 commit, there's quite a number of changes that will/would break havoc if I accepted the zfs-crypto part. Which won't work anyway, since that function only accepts two options now...

I think the zfs-crypto part needs to be rewritten with this new code in mind.

Cloning ZoL, using latest master/HEAD and then pull zfs-crypto/master will give a number of CONFLICTS that I had to merge manually. I chosed to use the ZoL/master.

CONFLICT (content): Merge conflict in cmd/ztest/ztest.c
CONFLICT (content): Merge conflict in include/sys/dmu.h
CONFLICT (content): Merge conflict in include/sys/dmu_objset.h
CONFLICT (content): Merge conflict in include/sys/dsl_dataset.h
CONFLICT (content): Merge conflict in include/sys/dsl_prop.h
CONFLICT (content): Merge conflict in include/sys/fs/zfs.h
CONFLICT (content): Merge conflict in include/sys/spa.h
CONFLICT (content): Merge conflict in lib/libzfs/libzfs_dataset.c
CONFLICT (content): Merge conflict in man/man1/Makefile.am
CONFLICT (content): Merge conflict in module/zfs/Makefile.in
CONFLICT (content): Merge conflict in module/zfs/dmu_objset.c
CONFLICT (content): Merge conflict in module/zfs/dmu_send.c
CONFLICT (content): Merge conflict in module/zfs/dsl_dataset.c
CONFLICT (content): Merge conflict in module/zfs/dsl_pool.c
CONFLICT (content): Merge conflict in module/zfs/spa.c
CONFLICT (content): Merge conflict in module/zfs/zfs_ioctl.c
CONFLICT (content): Merge conflict in module/zfs/zil.c
CONFLICT (content): Merge conflict in module/zfs/zio_checksum.c

@lundman
Copy link
Contributor

lundman commented Nov 2, 2013

Yeah, check it out and use it. Dont try to merge it yourself. it was merged with oct 30th zol so it is up to that date. Each merge is a fair bit of work. If you really need to get Nov 1st zol, you'd be better of cherry picking those patches.

@FransUrbo
Copy link
Contributor Author

You don't really read what I write, do you? :) That commit was made 'Wed Sep 4'....

My point is that the crypto part still needs a lot of work to get working. There's a number of Illumos merges that's been done, and more is in the pipeline. They all seem to be modifying the core code, which is where, unfortunatly, zfs-crypto 'operates'.

@kerberizer
Copy link

@FransUrbo, have you actually checked the 0.6.3 branch of zfs-crypto?
https://github.com/zfsrogue/zfs-crypto/commits/0.6.3
I think that commit e252f25 does indeed merge a lot of things, including the commit you mention.

@FransUrbo
Copy link
Contributor Author

Because of openzfs/zfs#1848, I had to build new kernel, spl and zfs modules.

Both ZoL HEAD and 0.6.3 zfs-crypto have problems. Different problems, but still problems. I'm unsure what the problem is at this moment...

@FransUrbo
Copy link
Contributor Author

I tried to cherry-pick every commit after 0c28fb4 (Fri Aug 16 15:20:07 2013), and most of them applied, but 8 failed (partly - only got some conflicts):

6f1ffb06655008c9b519108ed29fbf03acd6e5de        CONFLICTS
13fe019870c8779bf2f5b3ff731b512cf89133ef        CONFLICTS
98ab38d1096079d82247350f526f0d7268956fb5        CONFLICTS
1421c89142376bfd41e4de22ed7c7846b9e41f95        CONFLICTS
0b1401ee911c5a0c0bdb7a8e6ad36840cea3af24        CONFLICTS
2e528b49f8a0f8f2f51536a00fdf3ea9343bf302        CONFLICTS
03c6040bee6c87a9413b7da41d9f580f79a8ab62        CONFLICTS
ea97f8ce35a8a515610e52b7e4744549f9c510f4        CONFLICTS

Fifteen conflicted, but I could manually apply them:

31fc19399e597e3391f19f1392ab120f1de0d5f2        CONFLICTS
76463d4026e0fa4b3d7b96acd58cb5fb79c49af7        CONFLICTS
e0b0ca983d6897bcddf05af2c0e5d01ff66f90db        CONFLICTS
8eaf9f3543aa6843aa276010768cce8c0626e2d8        CONFLICTS
37fd6e00a699aff3fea24199497e9484cd218a84        CONFLICTS
a35beedfb3f25596b4ec9122742c1337083118f5        CONFLICTS
63fd3c6cfd264cab94dc186fe8cceecac8bc0d50        CONFLICTS
b1118acbb16ec347f6a3eb091d9b7097d12b8d54        CONFLICTS
d3cc8b152edc608fa4b73d4cb5354356da6b451c        CONFLICTS
3a84951d7dfb5357509a1ed1699f80b71f87982a        CONFLICTS
95fd54a1c5b93bb2aa3e7dffc28c784b1e21a8bb        CONFLICTS
19580676295b4e271da63dce145bb17c3731d069        CONFLICTS
831baf06efb3023ddee7ed41800d3b44521bf2ee        CONFLICTS
2883cad5b747b5e4e2164fbe3236451d5b43f333        CONFLICTS
46ba1e59d3ae7e374c7a98f15f4bef21ee3fcded        CONFLICTS

And finaly 74 (!!) commits succeeded. UNFORTUNATLY, the very first one (6f1ffb0) is a dependency on most of the others, so the succeeded ones don't help me much - get massive compile problems :(

@lundman
Copy link
Contributor

lundman commented Nov 10, 2013

Still not entirely sure what you mean by problems. The merge was a huge amount of work, so I wouldn't try to merge it myself. Pure 0.6.3 branches compile, then install, then compile again, as you will most likely find the linker is picking the OS library (/usr/local/lib maybe) before the in source-tree one.

There was also a fix with checksums, the previous crypto branch completely ignored checksums, but this code was finally fixed. So old dataset will not import without checksum errors. The best way around that is most likely to import with -N, then set checksum=none on the crypto dataset, and copy the data to a brand new dataset.

@FransUrbo
Copy link
Contributor Author

@lundman my problem is that my previous spl-0.6.1-1.20130514, zfs-0.6.1-1.20130604 on a 3.9.0-rc6 kernel suddenly wouldn't load after a reboot (see the ZoL issue link). The machine and FS worked flawlessly before the reboot, but now I can no longer access (or load) the zfs module.

The spl/zfs crypto branch 0.6.3 loads (as does ZoL HEAD), but I still can't import it properly (because it uses the crypto feature)

ZoL HEAD seems to work better than crypto 0.6.3, so my hope was to merge the 97 (!!) missing commits to crypto, which I wasn't able to do (well, most of the commits applied more or less cleanly). But there was the eight that I just didn't have the knowledge to fix/apply manually.

So my problem isn't with building and installing 0.6.3, it's with a bug (?) in the code somewhere. My previous version seemed to have corrupted my pool (without telling me - there where absolutly no indication before the reboot that something was wrong), but now I can't use either (crypto or ZoL) because my pool is using the encryption feature, and crypto 0.6.3 is to far behind ZoL for me to be able to fix it...

@lundman
Copy link
Contributor

lundman commented Nov 11, 2013

Are you sure that when you import the pool with crypto-0.6.3, you aren't being hit by the checksum bug-fix I mentioned?

If you wish to "roll back the fix" to broken state (so you can copy the data off), you need to comment this line back out

0a58a69

@FransUrbo
Copy link
Contributor Author

It doesn't say anything about checksums etc. It just say that the metadata is corrupted - see openzfs/zfs#1848 (comment). And when I try to fix the pool, I simply get an I/O error - openzfs/zfs#1848 (comment).

@lundman
Copy link
Contributor

lundman commented Nov 11, 2013

Metadata also have checksums. definitely worth testing.

@FransUrbo
Copy link
Contributor Author

Well, the code looks completely different now:

        if (ci->ci_trunc) {
          if (!(0 == (
                      (actual_cksum.zc_word[0] - expected_cksum.zc_word[0]) |
                      (actual_cksum.zc_word[1] - expected_cksum.zc_word[1]) |
                      (BF64_GET(actual_cksum.zc_word[2], 0, 32) -
                       BF64_GET(expected_cksum.zc_word[2], 0, 32))))) {
            return (ECKSUM);
          }
        } else if (!ZIO_CHECKSUM_EQUAL(actual_cksum, expected_cksum)) {
          return (ECKSUM);
        }

So I'm unsure if I should disable the whole section, or just the else if and it's return... ?

@lundman
Copy link
Contributor

lundman commented Nov 11, 2013

That actually is the code that fixes the checksums. Previous versions were generated incorrectly, so rogue
commented out the line that would return ECKSUM, as I showed.

You should keep the checksum code above so that NEW checksums are calculated correct, but comment out the return ECKSUM to ignore checksum failures (due to old code making wrong checksums).

@FransUrbo
Copy link
Contributor Author

There's two returns. Which one did you want me to comment out?

@lundman
Copy link
Contributor

lundman commented Nov 11, 2013

Do both, only way to be sure. :)

@FransUrbo
Copy link
Contributor Author

No change. Still OOPSes...

@lundman
Copy link
Contributor

lundman commented Nov 11, 2013

Not sure you've mentioned any OOPSes so far in this thread... ?

@FransUrbo
Copy link
Contributor Author

@lundman If you've read the links I gave you, you might have seen them :). A lot of them...

@lundman
Copy link
Contributor

lundman commented Nov 11, 2013

I've seen oopses listed on ZOL issues, which I assume are not related to crypto branch, since they will not touch crypto at all. If you have an oops related to crypto you need to place it in an issue in crypto, ie, here, or as a new issue.

@FransUrbo
Copy link
Contributor Author

@lundman I never said it was in the crypto code! I have, numerous times, tried to say that zfs-crypto is to far behind for me to be able to import my pool.

Again:

  • ZoL HEAD seems to work, but it can not properly import the pool because of the crypto feature. And since it can't be imported read-write, I can't run a scrub on it to fix any problems.
  • zfs-crypto 0.6.3 lacks 97 commits to make it level with ZoL HEAD, one of which might be the one I need to import the pool. These can not be merged (without a lot of knowledge on both libzfs_core and crypto knowledge).

Can't make it any clearer/simpler than that I'm afraid...

@lundman
Copy link
Contributor

lundman commented Nov 11, 2013

Ah ok, so unrelated problem to crypto, other than there are fixes you need. I'm sure we can convince rogue to do another merge this week

@FransUrbo
Copy link
Contributor Author

That would be awesome indeed.

@zfsrogue If you like, I could send $150 your way. Not much, but maybe that could give some inspiration? :)

@lundman
Copy link
Contributor

lundman commented Nov 12, 2013

I haven't heard anything specific, but looks like there was another large merge yesterday.

@FransUrbo
Copy link
Contributor Author

Nice!

I'm not sure I got the merge correctly. The commit message talks about '0.6.3b', which I can't find (might be lag at GitHub). There's a big commit in '0.6.3' which I haven't looked over in detail but I just tried - compiles just fine.

It doesn't work as expected though.. I can import the pool correctly, in r/w mode IF (I've only mounted a few, selected filesystems and they work - I'm currently a little afraid of mounting them all and putting the machine in correct runlevel) I supply 'zvol_inhibit_dev=1' to the module.

If I don't [supply zvol_inhibit_dev], zpool and then vol_id (from udev) crashes with an OOPS, locking /dev/zfs so I can't run any more ZoL command...

I'm not sure where the problem lies, if it's (still) in ZoL or if it's because zfs-crypto is still behind (at least that what it looks like if I diff ZoL/master - haven't seen anything major yet, still looking at the diff).

@FransUrbo
Copy link
Contributor Author

Ok, so it doesn't seem like zfs-crypto is behind any more... Thanx @zfsrogue. Let me know how you want me to send you the bounty.

Still have problem with ZVOL's though. I'm not sure if I should take this here or on the ZoL tracker...

@FransUrbo
Copy link
Contributor Author

Since zfs-crypto is now up-to-date, I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants