x11rb::atom_manager's output size is very large #912

notgull · 2024-01-29T00:48:48Z

I noticed this while running cargo-bloat on a winit example:

 File  .text     Size          Crate Name
 1.3%   4.8% 111.6KiB          winit winit::platform_impl::platform::x11::atoms::Atoms::new
 0.9%   3.3%  77.7KiB         x11_dl x11_dl::xlib::Xlib::open::{{closure}}
 0.3%   0.9%  21.9KiB          winit winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::pump_events
 0.3%   0.9%  21.6KiB            std std::backtrace_rs::symbolize::gimli::resolve::{{closure}}
 0.2%   0.8%  19.5KiB          winit winit::platform_impl::platform::x11::window::UnownedWindow::new
 0.2%   0.8%  18.0KiB            std std::backtrace_rs::symbolize::gimli::Context::new
 0.2%   0.6%  13.8KiB          winit winit::platform_impl::platform::x11::event_processor::EventProcessor::process_event
 0.2%   0.6%  13.7KiB          winit winit::platform_impl::platform::x11::event_processor::EventProcessor::process_event
 0.2%   0.6%  13.4KiB          winit winit::platform_impl::platform::x11::atoms::AtomsCookie<C>::reply
 0.2%   0.6%  13.1KiB    wayland_sys wayland_sys::client::WaylandClient::open
 0.2%   0.5%  12.6KiB     ttf_parser ttf_parser::tables::cff::cff1::_parse_char_string
 0.1%   0.5%  12.2KiB            std addr2line::ResUnit<R>::find_function_or_location::{{closure}}
 0.1%   0.5%  11.1KiB     ttf_parser ttf_parser::tables::cff::cff2::_parse_char_string
 0.1%   0.5%  10.8KiB   xkbcommon_dl xkbcommon_dl::XkbCommon::open
 0.1%   0.5%  10.8KiB          winit winit::platform_impl::platform::wayland::state::WinitState::new
 0.1%   0.4%   9.4KiB     ttf_parser ttf_parser::tables::gvar::outline_var_impl
 0.1%   0.4%   9.3KiB wayland_client <wayland_client::protocol::wl_pointer::WlPointer as wayland_client::Proxy>::parse_event
 0.1%   0.4%   9.0KiB          winit winit::platform_impl::platform::wayland::window::Window::new
 0.1%   0.4%   8.9KiB            std gimli::read::dwarf::Unit<R>::new
 0.1%   0.4%   8.6KiB    miniz_oxide miniz_oxide::inflate::core::decompress
22.4%  79.8%   1.8MiB                And 5557 smaller methods. Use -n N to show more.
28.1% 100.0%   2.3MiB                .text section size, the file size is 8.1MiB

Note the one at the top. It looks like creating a new atom set ends up being over 1% of a release mode executable.

The text was updated successfully, but these errors were encountered:

psychon · 2024-01-29T16:51:39Z

Using RUSTC_BOOTSTRAP=1 cargo rustc --profile=check -- -Zunpretty=expanded (I know I shouldn't do that), I fetched the generated code for winit's atoms module and then looked at what is reachable from new:

                impl Atoms {
                    pub(crate) fn new<C: ::x11rb::protocol::xproto::ConnectionExt>(_conn:
                            &C)
                        ->
                            ::std::result::Result<AtomsCookie<'_, C>,
                            ::x11rb::errors::ConnectionError> {
                        Ok(AtomsCookie {
                                phantom: std::marker::PhantomData,
                                CARD32: _conn.intern_atom(false, "CARD32".as_bytes())?,
[SNIP - 108 lines skipped]
                                _XEMBED: _conn.intern_atom(false, "_XEMBED".as_bytes())?,
                            })
                    }
                }

That doesn't look worse than what I would expected...? Perhaps LLVM decided to inline everything?

I might be on a different Rust version than you, but I looked at cargo build --release --example window with objdump -S. There are almost 23k lines of assembly for that function, so I guess this means I can reproduce. I'm just not good at figuring out 23k lines of assembly to see what the compiler did there. I see calls to functions that c++filt decodes as x11rb::protocol::xproto::ConnectionExt::intern_atom::hf83f1c65b93c5c2c, so apparently intern_atom was not inlined?!

However, there is a linear increase of... I don't know what after every call to intern_atom. I would guess (but don't have the symbols) that this is another case of ? being annoying. Every intern_atom call could fail and in that case the code has to call drop for all the previous Cookies. And the compiler is doing that in a stupid way since grep drop_in_place | wc -l for this function says that there are 1709 such calls. That's (loosely) similar to the conclusion in #488.

psychon · 2024-01-29T17:14:20Z

I copied the definition of the atom_manager macro to winit and hacked on it. The following patch turns the 23k lines of assembly from above into 3225 lines. That's a 85% reduction, but still not really small.

The idea is to first create an array of all the Result<Cookie, _>s and then collecting that array into a Result<Vec<_>, _>. That way the stdlib deals with "what happens if an error happens halfway through".

However, to make that work, I had to jump through some hoops. But at least it's a start.

A macro cannot easily count. I used an enum to do the counting for me
To be able to create the struct that is returned, the "stuff" somehow has to be moved from the Vec. In this diff, I used .into_iter().map(Some).collect to wrap everything in a useless Option just so that later I can call take(). That's ugly.

Ideas?

(The patch obviously cannot be applied by anyone, but it should get the idea across)

diff --git a/src/platform_impl/linux/x11/atoms.rs b/src/platform_impl/linux/x11/atoms.rs
index 6b9ab49e..23aaeee3 100644
--- a/src/platform_impl/linux/x11/atoms.rs
+++ b/src/platform_impl/linux/x11/atoms.rs
@@ -36,13 +36,24 @@ macro_rules! x11rb_atom_manager {
             $vis fn new<C: x11rb::protocol::xproto::ConnectionExt>(
                 _conn: &C,
             ) -> ::std::result::Result<$cookie_name<'_, C>, x11rb::errors::ConnectionError> {
+                // Locally create an enum that allows to get the index for an atom
+                #[allow(non_camel_case_types)]
+                enum Indicies { $($field_name,)* }
+
+                let cookies = [
+                    $(_conn.intern_atom(
+                            false,
+                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
+                        ),
+                    )*
+                ];
+                let atoms: Result<Vec<_>, _> = cookies.into_iter().collect();
+                let mut atoms: Vec<_> = atoms?.into_iter().map(Some).collect();
+
                 Ok($cookie_name {
                     phantom: std::marker::PhantomData,
                     $(
-                        $field_name: _conn.intern_atom(
-                            false,
-                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
-                        )?,
+                        $field_name: atoms[Indicies::$field_name as usize].take().expect("TODO good text"),
                     )*
                 })
             }

psychon · 2024-01-29T17:32:41Z

Idea to avoid the Option-wrapping: Using remove(0) on the Vec (and hoping that the compiler can optimise away the resulting shuffle-things-around):

diff --git a/src/platform_impl/linux/x11/atoms.rs b/src/platform_impl/linux/x11/atoms.rs
index 6b9ab49e..849c4cdb 100644
--- a/src/platform_impl/linux/x11/atoms.rs
+++ b/src/platform_impl/linux/x11/atoms.rs
@@ -36,13 +36,20 @@ macro_rules! x11rb_atom_manager {
             $vis fn new<C: x11rb::protocol::xproto::ConnectionExt>(
                 _conn: &C,
             ) -> ::std::result::Result<$cookie_name<'_, C>, x11rb::errors::ConnectionError> {
+                let cookies = [
+                    $(_conn.intern_atom(
+                            false,
+                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
+                        ),
+                    )*
+                ];
+                let atoms: Result<Vec<_>, _> = cookies.into_iter().collect();
+                let mut atoms = atoms?;
+
                 Ok($cookie_name {
                     phantom: std::marker::PhantomData,
                     $(
-                        $field_name: _conn.intern_atom(
-                            false,
-                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
-                        )?,
+                        $field_name: atoms.remove(0),
                     )*
                 })
             }

The resulting assembly isn't too bad, but contains a bunch of "randomly moving stuff around" and a call to memcpy, before calling core::iter::adapters::try_process. Then, moving the resulting cookies into the struct is also not the prettied code, involving a call to memmove on each case. There is still lots of duplications, but at least no quadratic code generation. 117 calls to drop_in_place.

psychon · 2024-01-29T17:54:01Z

Since the fields of the Cookie type are not meant to be public (= I hope people don't access them directly), part of this could be moved to the non-cookie struct and we just keep a Vec in the cookie type:

diff --git a/src/platform_impl/linux/x11/atoms.rs b/src/platform_impl/linux/x11/atoms.rs
index 6b9ab49e..93fdf003 100644
--- a/src/platform_impl/linux/x11/atoms.rs
+++ b/src/platform_impl/linux/x11/atoms.rs
@@ -17,9 +17,7 @@ macro_rules! x11rb_atom_manager {
         $(#[$cookie_meta])*
         $vis struct $cookie_name<'a, C: x11rb::protocol::xproto::ConnectionExt> {
             phantom: std::marker::PhantomData<&'a C>,
-            $(
-                $field_name: x11rb::cookie::Cookie<'a, C, x11rb::protocol::xproto::InternAtomReply>,
-            )*
+            cookies: Vec<x11rb::cookie::Cookie<'a, C, x11rb::protocol::xproto::InternAtomReply>>,
         }
 
         // Replies
@@ -36,23 +34,27 @@ macro_rules! x11rb_atom_manager {
             $vis fn new<C: x11rb::protocol::xproto::ConnectionExt>(
                 _conn: &C,
             ) -> ::std::result::Result<$cookie_name<'_, C>, x11rb::errors::ConnectionError> {
-                Ok($cookie_name {
-                    phantom: std::marker::PhantomData,
-                    $(
-                        $field_name: _conn.intern_atom(
+                let cookies = [
+                    $(_conn.intern_atom(
                             false,
                             __atom_manager_atom_value!($field_name$(: $atom_value)?),
-                        )?,
+                        ),
                     )*
+                ];
+                let cookies: Result<Vec<_>, _> = cookies.into_iter().collect();
+
+                Ok($cookie_name {
+                    phantom: std::marker::PhantomData,
+                    cookies: cookies?,
                 })
             }
         }
 
         impl<'a, C: x11rb::protocol::xproto::ConnectionExt> $cookie_name<'a, C> {
-            $vis fn reply(self) -> ::std::result::Result<$struct_name, x11rb::errors::ReplyError> {
+            $vis fn reply(mut self) -> ::std::result::Result<$struct_name, x11rb::errors::ReplyError> {
                 Ok($struct_name {
                     $(
-                        $field_name: self.$field_name.reply()?.atom,
+                        $field_name: self.cookies.remove(0).reply()?.atom,
                     )*
                 })
             }

notgull · 2024-01-30T16:22:52Z

The Vec isn't too bad, as this macro is only called once at the start of the program. Personally I think that the quadratic drop is a compiler problem, but I imagine that this solution would work as a patch for now.

Although I think that, instead of vec.remove(0), we could use let mut iter = vec.into_iter(); and then call iter.next().unwrap().

winit uses atom_manager! with 59 atom names. Due to some sub-optimal code generation, this causes the generated new() function to end up with a size of 111.6 KiB, or almost 5% of the .text size of a winit example. The issue is that each "intern_atom()?" can cause an early exit and needs to call drop() of the already generated Cookies. This leads to some quadratic growth. To work around this, in this commit I change the implementation of new(). It now iterates over the atom names, using .map() to send the InternAtom requests. Finally, collect() is then used to change this Iterator<Item=Result<Cookie, Error>> into Result<Vec<Cookie>, Error>. This even preserves the early-exit behaviour in case of errors. The reply() function is then changed to turn the Vec into an iterator, using next() to pull out the items. This relies on the evaluation of this code to be well-defined (which e.g. wouldn't be guaranteed in C). Luckily, Rust provides the needed guarantee: rust-lang/reference#888 Fixes: #912 Signed-off-by: Uli Schlachter <psychon@znc.in>

psychon mentioned this issue Feb 3, 2024

Quadratic code size due to combination of ? and Drop for linear code rust-lang/rust#120604

Open

psychon mentioned this issue Feb 3, 2024

Optimise atom_manager! implementation #914

Merged

mergify bot closed this as completed in #914 Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x11rb::atom_manager's output size is very large #912

x11rb::atom_manager's output size is very large #912

notgull commented Jan 29, 2024

psychon commented Jan 29, 2024

psychon commented Jan 29, 2024

psychon commented Jan 29, 2024

psychon commented Jan 29, 2024

notgull commented Jan 30, 2024

x11rb::atom_manager's output size is very large #912

x11rb::atom_manager's output size is very large #912

Comments

notgull commented Jan 29, 2024

psychon commented Jan 29, 2024

psychon commented Jan 29, 2024

psychon commented Jan 29, 2024

psychon commented Jan 29, 2024

notgull commented Jan 30, 2024