Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zlua is a maintenance problem with low bang/buck as implemented...? #13948

Open
adamdmoss opened this issue Sep 24, 2022 · 6 comments
Open

zlua is a maintenance problem with low bang/buck as implemented...? #13948

adamdmoss opened this issue Sep 24, 2022 · 6 comments
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@adamdmoss
Copy link
Contributor

adamdmoss commented Sep 24, 2022

I admire and enjoy Lua.
I don't know what it's really doing in ZFS though (channel programs? wha?), and least of all why we need to put it in the darn kernel if it's not twiddling hardware bits or deeply coupled to kernel structures. It's a frickin' interpreter for a scripting language to support one feature, and as small and elegant as Lua is it hurts my head to think of it in ring-zero.
So while I admit my caveman brain is having a knee-jerk reaction to something I don't fully understand, nonetheless zlua is 15KLloc of code which:

  • is heavily modified from upstream in ways that create unique issues and make it harder to accept upstream fixes
  • is a constant source of bugs and holes (see: git log)
  • does not appear owned or maintained short of when users or code analyzers tell us it's full of bugs and holes
  • surely should live in userspace if it should live at all
@adamdmoss adamdmoss added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 24, 2022
@ryao
Copy link
Contributor

ryao commented Sep 24, 2022

@adamdmoss It made certain things faster by eliminating syscall overhead.

That said, I was worried about lua being a source of bugs since the beginning, but it is behind the SYS_CONFIG privilege. There are plenty of things that someone with that privilege could do to try to compromise a system, so the risk is mostly mitigated by that.

A quick check finds that the imported lua code is vulnerable to GHSA-v3hh-4h88-w4mr (which is unfortunate considering that the patch was available when the lua code was merged). All of the existing CVEs probably need to be checked to see if they apply:

https://www.opencve.io/cve?vendor=lua

A LUA update might be a good idea too, but I am not very familiar with the LUA code, so I am taking things one step at a time.

I do not feel strongly about keeping/ejecting LUA (since putting it behind SYS_CONFIG was enough to dissuade me from voicing objections when it was first proposed), but I suspect that the guys at Delphix would feel strongly against ejecting it (ping @ahrens).

I am slowly working my way through all of the coverity reports. My current plan is to keep patching things until static analyzers stop telling us that there are problems (and there are no more bug reports). That is actually my current plan for the entire codebase, so there is nothing about it that is unique to the lua code.

@ryao ryao added Type: Question Issue for discussion and removed Type: Question Issue for discussion labels Sep 24, 2022
@ryao
Copy link
Contributor

ryao commented Sep 24, 2022

It occurs to me that if people start sharing lua scripts for ZCP, then it would be possible for someone to share a malicious script that would exploit a LUA bug. This makes getting the LUA interpreter correct very important.

@ryao ryao mentioned this issue Sep 24, 2022
13 tasks
@ryao
Copy link
Contributor

ryao commented Sep 24, 2022

In hindsight, I am curious why Lua was used rather than the D interpreter from DTrace. DTrace’s D interpreter was designed for use inside the kernel and is presumably less buggy.

@adamdmoss adamdmoss changed the title zlua is a problem zlua is a maintenance problem with low bang/buck as implemented...? Sep 24, 2022
@rincebrain
Copy link
Contributor

Given that when Oracle landed DTrace in the Linux kernel, it had (and perhaps still has? I haven't checked, and I thought they ported that to using an eBPF backend anyway) a problem with regularly panicking, perhaps that was limited to how well it was integrated in Solaris.

@ghost
Copy link

ghost commented Sep 28, 2022

Something I've been wondering about lately is why libzfs does zcp_check() to check that channel programs see the same property values as whatever ioctl the value came from, just so it can print a warning when they differ.

It's also been on my mind recently that our Lua supports only int64_t numbers, so any property that has a uint64_t numeric value has to be explicitly cast to the right type after pulling it out of the results nvlist. This is easily noticeable when using the json ouput of zfs program -j, for example.

I think channel programs are a pretty neat idea, being able to construct complex operations in a simple language and even use them with a single ioctl from userspace, but there still is not a whole lot you can actually do with them in practice. Try to implement zfs list (just that, no other options/args) as a channel program - you can't, channel programs can't iterate over pools. You have to give one pool it's allowed to know about.

It was interesting to see #13802 come through recently. Now you can rename a snapshot in a channel program. You still can't rename a dataset though.

I hope to eventually see channel programs be capable of more. The less I have to rely on libzfs the better. Channel programs give you one ioctl that has the potential to be incredibly flexible and powerful. But it is true that right now it's a pretty big chunk of code that doesn't do much.

@stale
Copy link

stale bot commented Oct 15, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Oct 15, 2023
@rincebrain rincebrain added Bot: Not Stale Override for the stale bot and removed Status: Stale No recent activity for issue labels Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants