Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longname: files/directories name upto 1023 bytes #15921

Closed
wants to merge 4 commits into from

Conversation

tuxoko
Copy link
Contributor

@tuxoko tuxoko commented Feb 20, 2024

Motivation and Context

Description

This patch adds the ability for zfs to support file/dir name up to 1023
bytes. This number is chosen so we can support up to 255 4-byte
characters. This new feature is represented by the new feature flag
feature@longname.

A new dataset property "longname" is also introduced to toggle longname
support for each dataset individually. This property can be disabled,
even if it contains longname files. In such case, new file cannot be
created with longname but existing longname files can still be looked
up.

Note that, to my knowledge native Linux filesystems don't support name
longer than 255 bytes. So there might be programs not able to work with
longname.

Note that NFS server may needs to use exportfs_get_name to reconnect
dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So
NFS may not work when longname is enabled.

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@tuxoko tuxoko force-pushed the longname branch 6 times, most recently from 88197c7 to ba7a10c Compare February 22, 2024 21:06
@behlendorf behlendorf added Type: Feature Feature request or new feature Status: Code Review Needed Ready for review and testing labels Feb 26, 2024
@@ -367,9 +367,20 @@ typedef struct {
boolean_t za_normalization_conflict;
uint64_t za_num_integers;
uint64_t za_first_integer; /* no sign extension for <8byte ints */
char za_name[ZAP_MAXNAMELEN];
uint32_t za_name_len;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, this represents total length of the field. Just couple days ago looking recently on ZAP code I've found that its iterator code can not properly report binary names, since this structure includes neither length nor number of integers for the name, unlike value. I am thinking if it would be good to fix while you are here and changing the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amotin Right, this represent the buffer len for za_name.
Regarding your issue, can you elaborate a bit. Is there any command or api affect by this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tuxoko There may be other examples, but I particularly hit that dump_zap() in zdb is unable to handle DDT and BRT ZAPs.

@tonyhutter
Copy link
Contributor

@tuxoko sorry this PR has sat on the sidelines for so long. Can you rebase on master and I can take a look?

@tuxoko
Copy link
Contributor Author

tuxoko commented Aug 1, 2024

@tonyhutter rebased

@jumbi77
Copy link
Contributor

jumbi77 commented Sep 20, 2024

@tuxoko Can I may ask for another rebase and @tonyhutter for review? Get this done would be great! In any case much thanks.

@tonyhutter
Copy link
Contributor

Note that, to my knowledge native Linux filesystems don't support name longer than 255 bytes. So there might be programs not able to work with longname.

That seems to be the case. I'm unable to write a >256B filename on Fedora 40 using this PR :

/tank$ touch $(printf 'a%.0s' {1..255})
/tank$ touch $(printf 'a%.0s' {1..256})
touch: cannot touch 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa': File name too long

So what would be the use case for 1024B filenames if it's not supported by Linux? Does it work in FreeBSD?

@user-566
Copy link

So what would be the use case for 1024B filenames if it's not supported by Linux? Does it work in FreeBSD?

I might be misunderstanding your question, but Windows supports >255B filenames, which I assume represents a large portion of clients connected to TrueNas systems. Lots of discussion on this in #13043

@Haravikk
Copy link

Haravikk commented Sep 22, 2024

So what would be the use case for 1024B filenames if it's not supported by Linux? Does it work in FreeBSD?

IIRC the limit might be 255 characters, including multi-byte characters, but the current/historic ZFS limit is 255 bytes, so multibyte characters will eat into that. For example if you're using two-byte characters your limit is only 127 characters (254 bytes). This is also the case on macOS, which is what caused me to notice it as I was handling a lot of files with long, multibyte names at the time that worked fine on HFS+/APFS but not on ZFS.

There are a bunch of filesystems now that support either 255 characters, or have even less strict limits (so they'll support whatever the OS' limit(s) are).

It's also one of those classic chicken and egg problems – if ZFS keeps a limitation because it matched one on Linux, then Linux is less likely to increase that limit because a major filesystem doesn't support it etc.

IMO it's better for a filesystem to support any size that it doesn't have a good technical reason not to (due to overhead or whatever), that way it's not the part that's holding something back. Though it makes sense to have it tunable for those that need to guarantee portability between more and less limited OSes.

@tuxoko
Copy link
Contributor Author

tuxoko commented Sep 22, 2024

Note that, to my knowledge native Linux filesystems don't support name longer than 255 bytes. So there might be programs not able to work with longname.

That seems to be the case. I'm unable to write a >256B filename on Fedora 40 using this PR :

/tank$ touch $(printf 'a%.0s' {1..255})
/tank$ touch $(printf 'a%.0s' {1..256})
touch: cannot touch 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa': File name too long

So what would be the use case for 1024B filenames if it's not supported by Linux? Does it work in FreeBSD?

@tonyhutter
Did you enable both the pool feature and turn on the property on the dataset?

The kernel VFS layer itself doesn't impose a 255 limit on filename as long as the "path" length fits in 4k page.
As you can see the unit tests are passed in the built bot, there shouldn't be any problem creating longname file with common shell command. Any program that rely only on syscall to check file name length should work fine.
We have no problem running this with samba.
The only notable issue if nfs server may need to call exportfs_get_name, and it has a hard coded buffer size of 256.

$ touch $(printf 'a%.0s' {1..255})
$ touch $(printf 'a%.0s' {1..256})
$ ls
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

@tonyhutter
Copy link
Contributor

@tuxoko thanks, you were right, I wasn't enabling the longname dataset feature. Its working fine for me natively in Linux. 👍

@tonyhutter
Copy link
Contributor

I'm pretty happy with this - if you want to rebase it I can approve it.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tuxoko sorry about the slow review on this. Just a couple of minor nits then this looks good to me. If you can get this rebased I can pull it in.

tests/runfiles/linux.run Show resolved Hide resolved
This patch is preparatory work for long name feature. It changes all
users of zap_attribute_t to allocate it from kmem instead of stack. It
also make zap_attribute_t and zap_name_t structure variable length.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
@robn
Copy link
Member

robn commented Sep 24, 2024

I had a few of drive-by questions/thoughts. Note I don't consider these blockers; just thinking about the problem space.

Is it worth storing the max filename length somewhere on the dir ZAP (extended attribute? somewhere on the bonus?) so that we can increase it further in the future without needing another feature flag etc?

I assume we're happy leaving leaving interpretation of the bytes in the filename as out of scope (ie, normal utf8only, normalization, casesensitivity, etc)?

Is there any value in a kind of "compatibility" option, so that older implementations can still use these dirs? I personally think not; it would be a lot of complexity for perhaps very little advantage. And its tricky to do well; at least, I remember FAT32 long filenames being surprisingly difficult to get right :)

@behlendorf
Copy link
Contributor

@tuxoko the updated ABI files look like they contain a bunch on unrelated changes (and they they pass). If you remove the ABI file changes, then force update the PR you should be able to download new ABI files from the "checkstyle" builder summary section with the minimal changes required. e.g. generated with the exact same version of libabigail.

@@ -81,6 +81,7 @@ typedef enum dmu_objset_type {
* All of these include the terminating NUL byte.
*/
#define ZAP_MAXNAMELEN 256
#define ZAP_MAXNAMELEN_NEW 1024
Copy link
Member

@amotin amotin Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be since we still keep the old and to match the feature name better would be to call it ZAP_MAXNAMELEN_LONG? Though it echoes the question below how much do we expect somebody to disable long names, or it is a temporary migration instrument?

module/os/freebsd/zfs/zfs_dir.c Show resolved Hide resolved
module/os/linux/zfs/zfs_dir.c Show resolved Hide resolved
module/os/linux/zfs/zfs_vnops_os.c Show resolved Hide resolved
*/
if (!dsl_dataset_feature_is_active(ds, SPA_FEATURE_LONGNAME) ||
(flags & (LOOKUP_CREATE | LOOKUP_RENAME_TARGET)))
return (ERR_PTR(-ENAMETOOLONG));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be I don't understand something, but why do you check the dataset feature is active to fail non-create/rename operations? I suppose it is not intended to happen (other than I am nut sure feature gets enabled before TXG commit), but why do we care?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it's not active then we can just return early.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is frequent enough (outside some special synthetic) case to optimize for it, and it just complicated the logic.

Comment on lines +184 to +186
if (is_nametoolong(dentry)) {
return (-ENAMETOOLONG);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cosmetics, but I prefer to avoid extra braces for a single line.

module/os/linux/zfs/zpl_xattr.c Show resolved Hide resolved
{
static const spa_feature_t longname_deps[] = {
SPA_FEATURE_EXTENSIBLE_DATASET,
SPA_FEATURE_ENABLED_TXG,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really care at what TXG long names were actually enabled? What do we do with this knowledge?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's any reason when this was added. Maybe for audit purpose. Anyway I'll remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am not sure what exactly does the extensible dataset, so I just hope it is there for reason and not just copy-paste.

Comment on lines +775 to +777
zprop_register_index(ZFS_PROP_LONGNAME, "longname", 0, PROP_INHERIT,
ZFS_TYPE_FILESYSTEM, "on | off", "LONGNAME", boolean_table,
sfeatures);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat split about this property and its disabled default. I may speculate that some users may not want this functionality, while same time once it become routine I don't think I want to manually enable it on each new pool created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it's not enabled is because it may not be suitable for all workload. For example NFS, or when you copy files between native linux fs.

Copy link
Contributor

@lundman lundman Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat split about this property and its disabled default.

I am hoping to have a couple of days after it lands to see if I can make it work on my platforms, before it is made default ...

module/zfs/zfs_ioctl.c Show resolved Hide resolved
@behlendorf
Copy link
Contributor

@tuxoko
Copy link
Contributor Author

tuxoko commented Sep 28, 2024

Updated with the abi files and freebsd compile fix. Hopefull we can get a freebsd run now.

A couple of notable new changes,
I added check in fzap_checkname to make sure we don't accidentally use longname for non directory zap.
Also remove readonly compat for SPA_FEATURE_LONGNAME, as it might cause problem.

module/zfs/zap.c Fixed Show resolved Hide resolved
@tuxoko tuxoko force-pushed the longname branch 2 times, most recently from 3eb9465 to 665afcf Compare September 30, 2024 19:30
@tuxoko
Copy link
Contributor Author

tuxoko commented Sep 30, 2024

Update, fixed compile error and a typo in test.
Also, it turns out freebsd has 255 limit in vfs layer.
https://github.com/freebsd/freebsd-src/blob/01eb635d12953e24ee5fae69692c28e4aab4f0f6/sys/kern/vfs_lookup.c#L1132
So this won't actually work on freebsd, so I moved the unit tests back to linux only.

module/zfs/zap.c Outdated
uint64_t len = zn->zn_key_orig_numints * zn->zn_key_intlen;
if (len > maxnamelen ||
(zn->zn_zap->zap_dnode->dn_type != DMU_OT_DIRECTORY_CONTENTS &&
len > ZAP_MAXNAMELEN))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient if you reorder the last two lines. You already have len value in register, while zn->zn_zap->zap_dnode->dn_type may require several memory accesses, while in most cases it is not needed.

Meanwhile I am not sure in general it is a business of ZAP code to care about specific dn_type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not elegant, but it's readily available.

@@ -577,6 +581,7 @@ zfs_link_create(znode_t *dzp, const char *name, znode_t *zp, dmu_tx_t *tx,
{
zfsvfs_t *zfsvfs = zp->z_zfsvfs;
vnode_t *vp = ZTOV(zp);
dsl_dataset_t *ds = dmu_objset_ds(zfsvfs->z_os);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ds assignment is not used in most cases. It could be moved inside the if below. Same for Linux.

@@ -1814,7 +1814,7 @@ zfs_znode_parent_and_name(znode_t *zp, znode_t **dzpp, char *buf)
return (SET_ERROR(EINVAL));

err = zap_value_search(zfsvfs->z_os, parent, zp->z_id,
ZFS_DIRENT_OBJ(-1ULL), buf);
ZFS_DIRENT_OBJ(-1ULL), buf, MAXNAMELEN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks dirty to hardcode MAXNAMELEN here, while we receive buf as an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was hardcoded as well in zap_value_search before. Though in this case we can make the caller to pass buf length.


norm = kmem_alloc(namelen, KM_SLEEP);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems zap_match() is called in a loop over potentially many other entries. Allocation for might be expensive. Sure using stack previously was a cheating, but this might need some more thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go back to the original one.

This patch adds the ability for zfs to support file/dir name up to 1023
bytes. This number is chosen so we can support up to 255 4-byte
characters. This new feature is represented by the new feature flag
feature@longname.

A new dataset property "longname" is also introduced to toggle longname
support for each dataset individually. This property can be disabled,
even if it contains longname files. In such case, new file cannot be
created with longname but existing longname files can still be looked
up.

Note that, to my knowledge native Linux filesystems don't support name
longer than 255 bytes. So there might be programs not able to work with
longname.

Note that NFS server may needs to use exportfs_get_name to reconnect
dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So
NFS may not work when longname is enabled.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even
though we add code to support it here, it won't actually work.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 1, 2024
@behlendorf
Copy link
Contributor

@tuxoko looks good. The CI failures here all look to be unrelated. I can handle resolving the trivial conflict in the merge.

@behlendorf behlendorf closed this in 3cf2bfa Oct 1, 2024
behlendorf pushed a commit that referenced this pull request Oct 1, 2024
This patch adds the ability for zfs to support file/dir name up to 1023
bytes. This number is chosen so we can support up to 255 4-byte
characters. This new feature is represented by the new feature flag
feature@longname.

A new dataset property "longname" is also introduced to toggle longname
support for each dataset individually. This property can be disabled,
even if it contains longname files. In such case, new file cannot be
created with longname but existing longname files can still be looked
up.

Note that, to my knowledge native Linux filesystems don't support name
longer than 255 bytes. So there might be programs not able to work with
longname.

Note that NFS server may needs to use exportfs_get_name to reconnect
dentries, and the buffer being passed is limit to NAME_MAX+1 (256). So
NFS may not work when longname is enabled.

Note, FreeBSD vfs layer imposes a limit of 255 name lengh, so even
though we add code to support it here, it won't actually work.

Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #15921
amotin added a commit to amotin/zfs that referenced this pull request Oct 4, 2024
Our code reading/writing there may not handle misaligned accesses
there on platforms that may care about it.  I don't see a point to
complicate it to satisfy UBSan in CI. This alignment costs nothing.

Fixes:		openzfs#15921
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
amotin added a commit to amotin/zfs that referenced this pull request Oct 4, 2024
Our code reading/writing there may not handle misaligned accesses
on a platforms that may care about it.  I don't see a point to
complicate it to satisfy UBSan in CI. This alignment costs nothing.

Fixes:		openzfs#15921
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
behlendorf pushed a commit that referenced this pull request Oct 4, 2024
Our code reading/writing there may not handle misaligned accesses
on a platforms that may care about it.  I don't see a point to
complicate it to satisfy UBSan in CI. This alignment costs nothing.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #15921
Closes #16606
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested) Type: Feature Feature request or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.