Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netlink (neli) code for the CAN interfaces #32

Closed
fpagliughi opened this issue Dec 6, 2022 · 18 comments
Closed

Netlink (neli) code for the CAN interfaces #32

fpagliughi opened this issue Dec 6, 2022 · 18 comments
Labels
Fix Added A fix was added to an unreleased branch
Milestone

Comments

@fpagliughi
Copy link
Collaborator

It would be great to have full coverage of the Netlink API to interact with the CAN interfaces. Or even something better than what is already there. I don't have any experience with the Netlink CAN interfaces, so if anyone has some existing code to get it started, that would be helpful.

@marcelbuesing ? @jreppnow ?

@fpagliughi fpagliughi added this to the v2.0 milestone Dec 6, 2022
@jreppnow
Copy link
Contributor

jreppnow commented Dec 6, 2022

Personally, I have never used it directly. Of course, I have used iptools to create and manipulate CAN interfaces, but never used the netlink API directly. Would be willing to investigate though, given some free time. Just can't give any authorative answers.

The (only) must-have use cases based on my work:

  • create/delete
  • bring up/down
  • set baud rate
  • enable FD/set secondary baud rate

Reading through the socketcan docs once again, it does support quite a lot of stuff which would certainly be useful, though.

@fpagliughi
Copy link
Collaborator Author

Thanks, @jreppnow . Setting the baud rate and secondary (FD) baud rate would be a great next step.

The code to bring the interface up and down is in there, and to my knowledge works. It's probably a good example to get started on the rest.

If you were to get started on it, let us know, so that the work is not duplicated.

@jreppnow
Copy link
Contributor

jreppnow commented Dec 7, 2022

I'll have some time to look at this in detail over the next weekend, although I'll presumably have to do quite a bit of research first. If that's fine in terms of delay, I'd gladly take a swing.

@fpagliughi
Copy link
Collaborator Author

Sounds like a plan. Over the weekend I will get back to looking at timestamps.

fpagliughi added a commit that referenced this issue Dec 12, 2022
@jreppnow
Copy link
Contributor

jreppnow commented Dec 17, 2022

So I started working on the bitrate stuff in order to use it as an example for a guide, but it's pretty hairy tbh. The main issue is that a lot of constants and structs are missing and it's overall very fiddly. You can see my current progress here: https://github.com/jreppnow/socketcan-rs/tree/netlink-bitrate

I still going to go ahead and describe my basic approach so that others can follow through. Recommended reading: https://man7.org/linux/man-pages/man7/netlink.7.html and specifically https://man7.org/linux/man-pages/man7/rtnetlink.7.html.

For cross-referencing, I recommend looking at the iproute2 source code at https://github.com/shemminger/iproute2.

Important things to note

Short collection of things that can trip you up badly.

  • Interface name length is limited to 16 characters on Linux.
  • All numbers used in netlink/netlink route are NATIVE endianness, not NETWORK endianness.
  • If you are like me and you are wondering why the Rtattr types in neli get serialized as c_ushorts, even though they are c_uints in the kernel headers - the actual rtnetlink attribute struct rtattr (https://man7.org/linux/man-pages/man7/rtnetlink.7.html) only has an unsigned short type field, and it determines serialization in the protocol.. As to why the kernel header has explicit // u32 comments behind some of their enum variants - I honestly don't know.

Basic concepts

  • we are using Netlink sockets to talk to the routing subservice of the Linux kernel, which also handles configuration of network devices
  • the message we exchange always have the same format:
    • Netlink header (in neli, this is Nlmsghdr)
    • Netlink route header (Iinfomsg)
    • Netlink route attributes
      • In general, these are added by addattr_l(...) and similar calls in the C code. They are put into the attributes field of Iinfomsg in neli.
      • They can be nested (possibly multiple times)! In iproute2, you will see something like addattr_nest(&req.n, sizeof(req), iflatype) the code, which opens a nesting scope, and addattr_nest_end(...), which ends a nesting scope. This is modeled in neli via the add_nested_attribute(...) method on the Rtattr struct. The data content of the Rtattr that accepts such nested attributes should be Vec resp. the neli::Buffer wrapper.
        let info = Ifinfomsg::new(
            RtAddrFamily::Unspecified,
            Arphrd::Netrom,
            index.unwrap_or(0) as c_int,
            IffFlags::empty(),
            IffFlags::empty(), // The documentation says this should always be 0xFF..FF, but that does not work!
            {
                let mut buffer = RtBuffer::new();
                /// Adding an attribute.
                buffer.push(Rtattr::new(None, Ifla::Ifname, name)?);
                /// Adding an attribute with nested attributes inside.
                let mut linkinfo = Rtattr::new(None, Ifla::Linkinfo, Vec::<u8>::new())?;
                /// Adding the nested attribute itself.
                linkinfo.add_nested_attribute(&Rtattr::new(None, IflaInfo::Kind, kind)?)?;
                buffer.push(linkinfo);
                buffer
            },
        );

Concrete example

As mentioned before, I am trying to get bitrate setting to work at the moment. Here is how I approach the problem:

  • Figure out what the correct iproute2 command is, in this case it's
    ip link set <dev name> type can bitrate <bitrate> [sample-point <sample point>]
    A good source for these is https://www.kernel.org/doc/Documentation/networking/can.txt.
  • Use strace to figure out what the command is actually doing:
    sudo strace ip link set <dev name> type can bitrate <bitrate> [sample-point <sample point>]
    • This gives us (among other things):
    sendmsg(3, { 
      msg_name = { 
        sa_family=AF_NETLINK, 
        nl_pid=0, nl_groups=00000000
      },  
      msg_namelen=12, 
      msg_iov = [ 
        {  
          iov_base= [ 
            {  // This is the netlink header (Nlmsghdr)
              nlmsg_len=84, 
              nlmsg_type=RTM_NEWLINK, 
              nlmsg_flags=NLM_F_REQUEST|NLM_F_ACK, 
              nlmsg_seq=1671295701, 
              nlmsg_pid=0
            }, 
            { // This is the netlink route header (Iinfomsg) 
              ifi_family=AF_UNSPEC, 
              ifi_type=ARPHRD_NETROM, 
              ifi_index=if_nametoindex("vcan0"), 
              ifi_flags=0, 
              ifi_change=0 
            }, 
            [ // Attributes start here
              { // Parent for the following nested attributes 
                nla_len=52, 
                nla_type=IFLA_LINKINFO
              }, 
              [ 
                [ 
                  { // First nested attribute, this is just a (c) string
                    nla_len=7, 
                    nla_type=IFLA_INFO_KIND 
                  }, 
                  "can"... 
                ], 
                [ 
                  { // Second nested attribute, this is actually another parent attribute, i.e. it has further children 
                    nla_len=40, 
                    nla_type=IFLA_INFO_DATA
                  }, 
                  "\x24\x00\x01\x00\xb8\x0b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"... // See below
                ]
              ]
            ]
          ], 
        iov_len=84
      }
    ], 
    msg_iovlen=1, 
    msg_controllen=0, 
    msg_flags=0}, 0) = 84
    
    Notably, CAN-specific data is not decoded, which brings us to the next step:
  • Look around in iproute2/blob/main/ip/iplink_can.c to see what's actually being done. In this case:
    static int can_parse_opt(struct link_util *lu, int argc, char **argv,
      		 struct nlmsghdr *n)
    {
      struct can_bittiming bt = {}, dbt = {};
          
          // snip ..
    
      while (argc > 0) {
      	if (matches(*argv, "bitrate") == 0) {
      		NEXT_ARG();
      		if (get_u32(&bt.bitrate, *argv, 0))
      			invarg("invalid \"bitrate\" value\n", *argv);
      	} else if (matches(*argv, "sample-point") == 0) {
      		// snip ..
      	}
      	argc--, argv++;
      }
    
      if (bt.bitrate || bt.tq)
      	addattr_l(n, 1024, IFLA_CAN_BITTIMING, &bt, sizeof(bt));
          
         // snip ..
    }
    What we see here is that another attribute is added (the nesting attribute IFLA_INFO_DATA has already been added), which is the bytes of the c struct can_bittiming (linux/can/netlink.h) with the bit timing set accordingly.
    Unfortunately, this struct and the constants used here IFLA_CAN_BITTIMING are available neither in libc nor in neli, so we have to add them manually now. What we get is something like this:
            let info = Ifinfomsg::new(
              RtAddrFamily::Unspecified,
              Arphrd::Netrom,
              self.if_index as c_int,
              IffFlags::empty(),
              IffFlags::empty(),
              {
                  let mut buffer = RtBuffer::new();
                  let mut link_info = Rtattr::new(None, Ifla::Linkinfo, Buffer::new())?;
                  link_info.add_nested_attribute(&Rtattr::new(None, IflaInfo::Kind, "can")?)?;
                  let mut data = Rtattr::new(None, IflaInfo::Data, Buffer::new())?;
                  let timing = can_bittiming {
                      bitrate,
                      sample_point: sample_point.unwrap_or(0) as u32,
                      tq: 0,
                      prop_seg: 0,
                      phase_seg1: 0,
                      phase_seg2: 0,
                      sjw: 0,
                      brp: 0,
                  };
                  data.add_nested_attribute(&Rtattr::new(None, rt::IflaCan::BitTiming, unsafe {
                      std::slice::from_raw_parts::<'_, u8>(
                          &timing as *const can_bittiming as *const u8,
                          size_of::<can_bittiming>(),
                      )
                  })?)?;
                  buffer.push(link_info);
                  buffer
              },
          );
    What I am stuck on right now is that IflaCan does not implement the required traits for neli and adding them is a bit of a pain. Feel free to see the branch yourself.

I will update this instructions as I obtain more information or based on feedback.

@fpagliughi
Copy link
Collaborator Author

Thanks for keeping at this and all the updates. My guess is that there's too much upstream work to try to get in before this is workable? So, it sounds like it won't make it in to the upcoming v2.0 release. But it is something we can target for a followup v2.1 release?

@jreppnow
Copy link
Contributor

Yeah, I think 2.0 should not wait for for the Netlink stuff - making stuff like FD support and the layout fixes available to people seems more important.

I'll personally keep pecking at this topic, implementing a function/request or two when I have the time. Not sure if we need a specific milestone like a version 2.1 for this tbh.

@fpagliughi
Copy link
Collaborator Author

Unfortunately, I didn't get much done on this over the holidays, but I'm starting back on it now to get a release out. I began adding some support for the new Netlink features to the optional utility app, which I renamed to rcan. This is currently in the developbranch. So far, it looks good, but could use some better error messages.

Diving into the Netlink stuff, it definitely looks like we would want to get some stuff added upstream into the neli and libc crates to support this. Probably the contents of the can/netlink.h header:
https://elixir.bootlin.com/linux/latest/source/include/uapi/linux/can/netlink.h

I opened a new issue in the neli crate for some guidance on how best to proceed.

@fpagliughi
Copy link
Collaborator Author

We'll go with what's currently in there for v2.0, and keep this issue open to add some more neli features in v2.1

@fpagliughi fpagliughi modified the milestones: v2.0, v2.1 Apr 4, 2023
@fpagliughi fpagliughi modified the milestones: v2.2, v3.0, v3.1 Sep 15, 2023
fpagliughi added a commit that referenced this issue Oct 10, 2023
#32 [partial] - Add set_bitrate method
@fpagliughi
Copy link
Collaborator Author

fpagliughi commented Oct 11, 2023

Thanks to @jackyzjk for adding the required structs and enums to implement set_bitrate() much the way @jreppnow showed it, above.

I then kept inertia going to implement most of the other netlink CAN structs etc. Just as I was finishing, it occurred to me that I could have just used bindgen, like:

$ bindgen /usr/include/linux/can/netlink.h -o bindings_can_netlink.rs

But, I did use the output to adjust some of the declarations.

After that I just did some copy-pasta to implement a few more setter functions and commands, like set_restart_ms() and restart().

There are still more to go, but this is a good start. And it would be good to consolidate the common part of the code.

@fpagliughi
Copy link
Collaborator Author

fpagliughi commented Oct 11, 2023

For completeness, I'll add this...

The API definitions for communicating with the kernel about CAN are in the Linux kernel sources here:
include/uapi/linux/can/netlink.h

And although looking at an existing client like the CAN code in iproute2 is quite helpful, it may also be useful to see the kernel code that receives and processes the netlink requests for CAN. The code is fairly small and readable:
drivers/net/can/dev/netlink.c

I found the top of that file particularly interesting where it maps the requests to the data type expected for each:
https://github.com/torvalds/linux/blob/1c8b86a3799f7e5be903c3f49fcdaee29fd385b5/drivers/net/can/dev/netlink.c#L11-L25

The dev.c file in there has additional implementation of the commands which helps to show the expected state of the interface to process particular commands, and the errors returned if not in that state:
drivers/net/can/dev/dev.c

There is also a libsocketcan C library, with a few forks. It primarily deals with the Netlink interface.

@fpagliughi fpagliughi modified the milestones: v3.1, v3.2 Oct 12, 2023
@fpagliughi
Copy link
Collaborator Author

fpagliughi commented Oct 12, 2023

A number of important netlink commands are shipping with v3.1, including setting bitrate and FD data bitrate, setting control modes, manually restarting the interface, and setting an automatic restart delay time.

But the implementation is still far from complete, particularly in regard to reading status and parameters back from the interface. The setter functions are also for individual parameters only, and currently there is no way to set multiple parameters in a single netlink call. Setting multiple parameters requires making a separate call for each. It would be nice to add a builder pattern, or something like that to create a single request packet for multiple parameters and send them in one call.

As usual, a PR for any of this would be appreciated! Hopefully we can get some more of this implemented in v3.2.

@fpagliughi
Copy link
Collaborator Author

Does anyone know how to extract the CAN-specific parameters out of the Ifla::Linkinfo attribute that is returned from the kernel. I added a match in the nl::CanInterface::details() function to get the link info, but can't figure out how to parse out the nested attributes.

@jreppnow , did you ever figure this out?

@jreppnow
Copy link
Contributor

@fpagliughi Can you share the bytes that you are trying to decode? I don't own a physical CAN device (privately) that I can use for testing and getting traces, but presumably the messages should have the same format as the one you use for configuration of bitrate etc. I did something similar (proprietarily) for the rust-netlink project.

@fpagliughi
Copy link
Collaborator Author

Het @jreppnow . Thanks for the quick reply.

I got completely stuck on this for a day, but I think I'm starting to get it now... When requesting the CAN interface details, the CAN-specific parameters like can_bittiming, etc, are in the response message in the LinkInfo attribute, like you described above. I've been trying to figure out how to extract it out, like here:

socketcan-rs/src/nl/mod.rs

Lines 531 to 543 in d065f83

if let Some(msg) = nl.recv::<'_, Rtm, Ifinfomsg>()? {
if let Ok(payload) = msg.get_payload() {
for attr in payload.rtattrs.iter() {
if attr.rta_type == Ifla::Linkinfo {
// Trying to figure this out!
}
}
}
Ok(0)
} else {
Err(NlError::NoAck)
}
}

First I was just trying to figure out how to parse down into nested attributes.
Then I realized that the attributes are nested a little deeper than I thought.
Then it became obvious that for the final collection, we really do want the IflaCan enum that we created pushed up and integrated into neli data types, with the proper traits defined, for seamless extraction. And implementing FromBytes for the structs is pretty helpful, too.

I think I may be able to get it this evening. I'll post in the morning if I was able to get it working.

Thanks.

@jreppnow
Copy link
Contributor

jreppnow commented Oct 15, 2023

I personally really liked the way the rust-netlink project does it, including the tests which serve as examples and documentation as well: https://github.com/rust-netlink/netlink-packet-route/blob/main/src/rtnl/link/nlas/link_infos.rs#L2672

In general, all of these libraries will have a generic way to parse an attribute (NLA), which should give you the bytes (length + data) and the id, which you need to compare to the relevant constants - and then you need to know how to parse the contents (recursive NLAs, structs, values, enums,..). And then they have a way to define specific types within the library that do this parsing step for you as well and give you the above contents as Rust types - the link above does it with enums and a couple of traits (Emitable and NlaParse I think?). That should be the goal if you are developing this for a library. I've had CAN open-sourcing our CAN extension for rust-netlink on my list for a while, but I have been otherwise occupied unfortunately.

@fpagliughi
Copy link
Collaborator Author

OK. I got it. There's a really messy initial implementation up in the develop branch, and it depends on my fork of neli with an upstreamed IflaCan enumeration. But at least, for now, I figured out the parsing.

@fpagliughi fpagliughi added Fix Added A fix was added to an unreleased branch and removed help wanted labels Oct 17, 2023
@fpagliughi
Copy link
Collaborator Author

After the better part of a year, this has progressed enough to close the issue! Thanks to everyone who helped on this, especially @jreppnow and @jackyzjk.

With v3.2, most of interface CAN parameters can be set or queried. There are a few minor ones that are missing; those will be added eventually as needed, or just for completeness.

The low-level constants and struct bindings were kept in this crate to speed up the release, but at some point these will be pushed upstream to the neli and libc crates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Fix Added A fix was added to an unreleased branch
Projects
None yet
Development

No branches or pull requests

2 participants