-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow files to share state. #145
Conversation
Can you please give some context / explanation to the changes? |
Sure. Consider the example that I added: how would you implement it without my changes? The only way to do it would be with global variables, and even then it wouldn't be great because if we registered more than one device they'd all share the global state. My change allows us to have per-device state: when we register a device, we provide some state; then each file gets a reference to the shared state on its This is a very common pattern in the kernel, and each type of device saves this shared state in a different place. For example, In C, this is all untyped and one could easily mess this up. In Rust, the unsafe code is wrapped and it exposes a safe API. In binder, we need this because each device is a "universe" -- processes that can access a device can all do IPC amongst themselves, totally independently of what's happening in another device (e.g., normally we have three instances: |
} | ||
|
||
impl SharedState { | ||
fn try_new() -> KernelResult<Arc<Self>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alex -- this is an example of the ugliness brought by Pin
I mentioned in one of the meetings. A bunch of unsafe
that wouldn't need to be unsafe if we had better support for pinning. Two aspects are relevant here:
- Initialisation:
CondVar::new
andMutex::new
are unsafe because they require a call toinit
after the condvar/mutex are in their final pinned location. - Projection: the compiler can't (or won't) verify for us that we don't move out of the fields of a pinned structure, so we need unsafe. (Though admittedly I don't explicitly say that
state
is pinned, mostly because it doesn't give me anything.)
It would be nice if we could avoid this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. When you say "we had better support for pinning", do you think this is something that Rust-for-Linux can improve itself, or do you think this requires better language level support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we could choose to ignore Pin
and assume that everything that is heap-allocated is pinned, but my opinion is that this isn't desirable because Pin
is actually useful. (We sort of discussed this in the meeting.)
So I think we need better language support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joshtriplett as our resident friend upstream, what's the best way for us to share these observations/desires with the team?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RalfJung -- since you commented on something else related to this project, I thought I'd ask you if you have any thoughts about improving pinning support. As it is today, it forces us (unreasonably in my opinion) to use too much unsafe code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Projection: the compiler can't (or won't) verify for us that we don't move out of the fields of a pinned structure, so we need unsafe. (Though admittedly I don't explicitly say that state is pinned, mostly because it doesn't give me anything.)
The pin-project crate has a proc macro to help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that writing code with pin projections currently requires more unsafe code than it should. I am not sure what the best way forward for that is; the pin-project crate mentioned by @bjorn3 is a great start showing how far one can go even without any language support. (I did not carefully audit it though.)
The "new and then init" pattern seems mostly orthogonal to pinning; that pattern inherently requires exposing the user to an unsafe pre-init state. If you want to expose that state safely, you need to give it a separate type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you commented on something else related to this project, I thought I'd ask you
Sure, feel free to. :) I don't have a ton of time these days, but I am super excited about Rust having a realistic chance of being used in the upstream Linux kernel, so if I can help in small ways I'm more than happy to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had one pass through this, and I'm still working to make sure I fully understand it, to be honest.
One small thing though -- do the changes to make read
/write
return counts need to be a part of this? Could we split them out, just to reduce the review surface area.
Thanks for reviewing. I'm happy to try to clarify if there's anything that doesn't make sense.
It is needed by the example. I could split out the example if you think it'll help, although it would be counter to what reviewers have been asking -- they always [rightfully] ask for example usage. |
Can we split the baby :-) The changes for read/write to return usize into
their own PR, but keep the core changes to the example that demo the shared
state here.
…On Tue, Mar 30, 2021 at 7:22 PM Wedson Almeida Filho < ***@***.***> wrote:
I've had one pass through this, and I'm still working to make sure I fully
understand it, to be honest.
Thanks for reviewing. I'm happy to try to clarify if there's anything that
doesn't make sense.
One small thing though -- do the changes to make read/write return counts
need to be a part of this? Could we split them out, just to reduce the
review surface area.
It is needed by the example. I could split out the example if you think
it'll help, although it would be counter to what reviewers have been asking
-- they always [rightfully] ask for example usage.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#145 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAGBHDZDVZ4ZW6GF3OILDTGJMLHANCNFSM4Z2JEZAQ>
.
--
All that is necessary for evil to succeed is for good people to do nothing.
|
Sure, give me a few minutes. |
I'm splitting the last commit (or most of it) into its own PR to simplify things here. |
Actually, I'm going to split it into more PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple bits that I noticed...
Co-authored-by: Miguel Ojeda <ojeda@users.noreply.github.com>
I lost track of the fact that I owed you a re-review. Added to my TODO list for this evening. |
Not at all! Just marked things as resolved (I usually let the PR sender to resolve them). |
@ojeda did you want to take another pass at this before it's merged? |
Let's move forward! |
This is a partial revert of commit 0faf20a ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use"). Prior to that commit, we always set the decrementer in timer_interrupt(), to clear the timer interrupt. Otherwise we could end up continuously taking timer interrupts. When high res timers are enabled there is no problem seen with leaving the decrementer untouched in timer_interrupt(), because it will be programmed via hrtimer_interrupt() -> tick_program_event() -> clockevents_program_event() -> decrementer_set_next_event(). However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we see a stall/lockup, because tick_nohz_handler() does not cause a reprogram of the decrementer, leading to endless timer interrupts. Example trace: [ 1.898617][ T7] Freeing initrd memory: 2624K^M [ 22.680919][ C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M [ 22.682281][ C1] rcu: 0-....: (25 ticks this GP) idle=073/0/0x1 softirq=10/16 fqs=1050 ^M [ 22.682851][ C1] (detected by 1, t=2102 jiffies, g=-1179, q=476)^M [ 22.683649][ C1] Sending NMI from CPU 1 to CPUs 0:^M [ 22.685252][ C0] NMI backtrace for cpu 0^M [ 22.685649][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc2-00185-g0faf20a1ad16 #145^M [ 22.686393][ C0] NIP: c000000000016d64 LR: c000000000f6cca4 CTR: c00000000019c6e0^M [ 22.686774][ C0] REGS: c000000002833590 TRAP: 0500 Not tainted (5.16.0-rc2-00185-g0faf20a1ad16)^M [ 22.687222][ C0] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24000222 XER: 00000000^M [ 22.688297][ C0] CFAR: c00000000000c854 IRQMASK: 0 ^M ... [ 22.692637][ C0] NIP [c000000000016d64] arch_local_irq_restore+0x174/0x250^M [ 22.694443][ C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M [ 22.695762][ C0] Call Trace:^M [ 22.696050][ C0] [c000000002833830] [c000000000f6cc80] __do_softirq+0xc0/0x3dc (unreliable)^M [ 22.697377][ C0] [c000000002833920] [c000000000151508] __irq_exit_rcu+0xd8/0x130^M [ 22.698739][ C0] [c000000002833950] [c000000000151730] irq_exit+0x20/0x40^M [ 22.699938][ C0] [c000000002833970] [c000000000027f40] timer_interrupt+0x270/0x460^M [ 22.701119][ C0] [c0000000028339d0] [c0000000000099a8] decrementer_common_virt+0x208/0x210^M Possibly this should be fixed in the lowres timing code, but that would be a generic change and could take some time and may not backport easily, so for now make the programming of the decrementer unconditional again in timer_interrupt() to avoid the stall/lockup. Fixes: 0faf20a ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use") Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20220420141657.771442-1-mpe@ellerman.id.au
chrdev
will need to be updated once we decide how we want it to share state. For now, it doesn't allow sharing.