Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x11rb::atom_manager's output size is very large #912

Closed
notgull opened this issue Jan 29, 2024 · 5 comments · Fixed by #914
Closed

x11rb::atom_manager's output size is very large #912

notgull opened this issue Jan 29, 2024 · 5 comments · Fixed by #914

Comments

@notgull
Copy link
Collaborator

notgull commented Jan 29, 2024

I noticed this while running cargo-bloat on a winit example:

 File  .text     Size          Crate Name
 1.3%   4.8% 111.6KiB          winit winit::platform_impl::platform::x11::atoms::Atoms::new
 0.9%   3.3%  77.7KiB         x11_dl x11_dl::xlib::Xlib::open::{{closure}}
 0.3%   0.9%  21.9KiB          winit winit::platform_impl::platform::wayland::event_loop::EventLoop<T>::pump_events
 0.3%   0.9%  21.6KiB            std std::backtrace_rs::symbolize::gimli::resolve::{{closure}}
 0.2%   0.8%  19.5KiB          winit winit::platform_impl::platform::x11::window::UnownedWindow::new
 0.2%   0.8%  18.0KiB            std std::backtrace_rs::symbolize::gimli::Context::new
 0.2%   0.6%  13.8KiB          winit winit::platform_impl::platform::x11::event_processor::EventProcessor::process_event
 0.2%   0.6%  13.7KiB          winit winit::platform_impl::platform::x11::event_processor::EventProcessor::process_event
 0.2%   0.6%  13.4KiB          winit winit::platform_impl::platform::x11::atoms::AtomsCookie<C>::reply
 0.2%   0.6%  13.1KiB    wayland_sys wayland_sys::client::WaylandClient::open
 0.2%   0.5%  12.6KiB     ttf_parser ttf_parser::tables::cff::cff1::_parse_char_string
 0.1%   0.5%  12.2KiB            std addr2line::ResUnit<R>::find_function_or_location::{{closure}}
 0.1%   0.5%  11.1KiB     ttf_parser ttf_parser::tables::cff::cff2::_parse_char_string
 0.1%   0.5%  10.8KiB   xkbcommon_dl xkbcommon_dl::XkbCommon::open
 0.1%   0.5%  10.8KiB          winit winit::platform_impl::platform::wayland::state::WinitState::new
 0.1%   0.4%   9.4KiB     ttf_parser ttf_parser::tables::gvar::outline_var_impl
 0.1%   0.4%   9.3KiB wayland_client <wayland_client::protocol::wl_pointer::WlPointer as wayland_client::Proxy>::parse_event
 0.1%   0.4%   9.0KiB          winit winit::platform_impl::platform::wayland::window::Window::new
 0.1%   0.4%   8.9KiB            std gimli::read::dwarf::Unit<R>::new
 0.1%   0.4%   8.6KiB    miniz_oxide miniz_oxide::inflate::core::decompress
22.4%  79.8%   1.8MiB                And 5557 smaller methods. Use -n N to show more.
28.1% 100.0%   2.3MiB                .text section size, the file size is 8.1MiB

Note the one at the top. It looks like creating a new atom set ends up being over 1% of a release mode executable.

@psychon
Copy link
Owner

psychon commented Jan 29, 2024

Using RUSTC_BOOTSTRAP=1 cargo rustc --profile=check -- -Zunpretty=expanded (I know I shouldn't do that), I fetched the generated code for winit's atoms module and then looked at what is reachable from new:

                impl Atoms {
                    pub(crate) fn new<C: ::x11rb::protocol::xproto::ConnectionExt>(_conn:
                            &C)
                        ->
                            ::std::result::Result<AtomsCookie<'_, C>,
                            ::x11rb::errors::ConnectionError> {
                        Ok(AtomsCookie {
                                phantom: std::marker::PhantomData,
                                CARD32: _conn.intern_atom(false, "CARD32".as_bytes())?,
[SNIP - 108 lines skipped]
                                _XEMBED: _conn.intern_atom(false, "_XEMBED".as_bytes())?,
                            })
                    }
                }

That doesn't look worse than what I would expected...? Perhaps LLVM decided to inline everything?

I might be on a different Rust version than you, but I looked at cargo build --release --example window with objdump -S. There are almost 23k lines of assembly for that function, so I guess this means I can reproduce. I'm just not good at figuring out 23k lines of assembly to see what the compiler did there. I see calls to functions that c++filt decodes as x11rb::protocol::xproto::ConnectionExt::intern_atom::hf83f1c65b93c5c2c, so apparently intern_atom was not inlined?!

However, there is a linear increase of... I don't know what after every call to intern_atom. I would guess (but don't have the symbols) that this is another case of ? being annoying. Every intern_atom call could fail and in that case the code has to call drop for all the previous Cookies. And the compiler is doing that in a stupid way since grep drop_in_place | wc -l for this function says that there are 1709 such calls. That's (loosely) similar to the conclusion in #488.

@psychon
Copy link
Owner

psychon commented Jan 29, 2024

I copied the definition of the atom_manager macro to winit and hacked on it. The following patch turns the 23k lines of assembly from above into 3225 lines. That's a 85% reduction, but still not really small.

The idea is to first create an array of all the Result<Cookie, _>s and then collecting that array into a Result<Vec<_>, _>. That way the stdlib deals with "what happens if an error happens halfway through".

However, to make that work, I had to jump through some hoops. But at least it's a start.

  • A macro cannot easily count. I used an enum to do the counting for me
  • To be able to create the struct that is returned, the "stuff" somehow has to be moved from the Vec. In this diff, I used .into_iter().map(Some).collect to wrap everything in a useless Option just so that later I can call take(). That's ugly.

Ideas?

(The patch obviously cannot be applied by anyone, but it should get the idea across)

diff --git a/src/platform_impl/linux/x11/atoms.rs b/src/platform_impl/linux/x11/atoms.rs
index 6b9ab49e..23aaeee3 100644
--- a/src/platform_impl/linux/x11/atoms.rs
+++ b/src/platform_impl/linux/x11/atoms.rs
@@ -36,13 +36,24 @@ macro_rules! x11rb_atom_manager {
             $vis fn new<C: x11rb::protocol::xproto::ConnectionExt>(
                 _conn: &C,
             ) -> ::std::result::Result<$cookie_name<'_, C>, x11rb::errors::ConnectionError> {
+                // Locally create an enum that allows to get the index for an atom
+                #[allow(non_camel_case_types)]
+                enum Indicies { $($field_name,)* }
+
+                let cookies = [
+                    $(_conn.intern_atom(
+                            false,
+                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
+                        ),
+                    )*
+                ];
+                let atoms: Result<Vec<_>, _> = cookies.into_iter().collect();
+                let mut atoms: Vec<_> = atoms?.into_iter().map(Some).collect();
+
                 Ok($cookie_name {
                     phantom: std::marker::PhantomData,
                     $(
-                        $field_name: _conn.intern_atom(
-                            false,
-                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
-                        )?,
+                        $field_name: atoms[Indicies::$field_name as usize].take().expect("TODO good text"),
                     )*
                 })
             }

@psychon
Copy link
Owner

psychon commented Jan 29, 2024

Idea to avoid the Option-wrapping: Using remove(0) on the Vec (and hoping that the compiler can optimise away the resulting shuffle-things-around):

diff --git a/src/platform_impl/linux/x11/atoms.rs b/src/platform_impl/linux/x11/atoms.rs
index 6b9ab49e..849c4cdb 100644
--- a/src/platform_impl/linux/x11/atoms.rs
+++ b/src/platform_impl/linux/x11/atoms.rs
@@ -36,13 +36,20 @@ macro_rules! x11rb_atom_manager {
             $vis fn new<C: x11rb::protocol::xproto::ConnectionExt>(
                 _conn: &C,
             ) -> ::std::result::Result<$cookie_name<'_, C>, x11rb::errors::ConnectionError> {
+                let cookies = [
+                    $(_conn.intern_atom(
+                            false,
+                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
+                        ),
+                    )*
+                ];
+                let atoms: Result<Vec<_>, _> = cookies.into_iter().collect();
+                let mut atoms = atoms?;
+
                 Ok($cookie_name {
                     phantom: std::marker::PhantomData,
                     $(
-                        $field_name: _conn.intern_atom(
-                            false,
-                            __atom_manager_atom_value!($field_name$(: $atom_value)?),
-                        )?,
+                        $field_name: atoms.remove(0),
                     )*
                 })
             }

The resulting assembly isn't too bad, but contains a bunch of "randomly moving stuff around" and a call to memcpy, before calling core::iter::adapters::try_process. Then, moving the resulting cookies into the struct is also not the prettied code, involving a call to memmove on each case. There is still lots of duplications, but at least no quadratic code generation. 117 calls to drop_in_place.

@psychon
Copy link
Owner

psychon commented Jan 29, 2024

Since the fields of the Cookie type are not meant to be public (= I hope people don't access them directly), part of this could be moved to the non-cookie struct and we just keep a Vec in the cookie type:

diff --git a/src/platform_impl/linux/x11/atoms.rs b/src/platform_impl/linux/x11/atoms.rs
index 6b9ab49e..93fdf003 100644
--- a/src/platform_impl/linux/x11/atoms.rs
+++ b/src/platform_impl/linux/x11/atoms.rs
@@ -17,9 +17,7 @@ macro_rules! x11rb_atom_manager {
         $(#[$cookie_meta])*
         $vis struct $cookie_name<'a, C: x11rb::protocol::xproto::ConnectionExt> {
             phantom: std::marker::PhantomData<&'a C>,
-            $(
-                $field_name: x11rb::cookie::Cookie<'a, C, x11rb::protocol::xproto::InternAtomReply>,
-            )*
+            cookies: Vec<x11rb::cookie::Cookie<'a, C, x11rb::protocol::xproto::InternAtomReply>>,
         }
 
         // Replies
@@ -36,23 +34,27 @@ macro_rules! x11rb_atom_manager {
             $vis fn new<C: x11rb::protocol::xproto::ConnectionExt>(
                 _conn: &C,
             ) -> ::std::result::Result<$cookie_name<'_, C>, x11rb::errors::ConnectionError> {
-                Ok($cookie_name {
-                    phantom: std::marker::PhantomData,
-                    $(
-                        $field_name: _conn.intern_atom(
+                let cookies = [
+                    $(_conn.intern_atom(
                             false,
                             __atom_manager_atom_value!($field_name$(: $atom_value)?),
-                        )?,
+                        ),
                     )*
+                ];
+                let cookies: Result<Vec<_>, _> = cookies.into_iter().collect();
+
+                Ok($cookie_name {
+                    phantom: std::marker::PhantomData,
+                    cookies: cookies?,
                 })
             }
         }
 
         impl<'a, C: x11rb::protocol::xproto::ConnectionExt> $cookie_name<'a, C> {
-            $vis fn reply(self) -> ::std::result::Result<$struct_name, x11rb::errors::ReplyError> {
+            $vis fn reply(mut self) -> ::std::result::Result<$struct_name, x11rb::errors::ReplyError> {
                 Ok($struct_name {
                     $(
-                        $field_name: self.$field_name.reply()?.atom,
+                        $field_name: self.cookies.remove(0).reply()?.atom,
                     )*
                 })
             }

@notgull
Copy link
Collaborator Author

notgull commented Jan 30, 2024

The Vec isn't too bad, as this macro is only called once at the start of the program. Personally I think that the quadratic drop is a compiler problem, but I imagine that this solution would work as a patch for now.

Although I think that, instead of vec.remove(0), we could use let mut iter = vec.into_iter(); and then call iter.next().unwrap().

psychon added a commit that referenced this issue Feb 3, 2024
winit uses atom_manager! with 59 atom names. Due to some sub-optimal
code generation, this causes the generated new() function to end up with
a size of 111.6 KiB, or almost 5% of the .text size of a winit example.

The issue is that each "intern_atom()?" can cause an early exit and
needs to call drop() of the already generated Cookies. This leads to
some quadratic growth.

To work around this, in this commit I change the implementation of
new(). It now iterates over the atom names, using .map() to send the
InternAtom requests. Finally, collect() is then used to change this
Iterator<Item=Result<Cookie, Error>> into Result<Vec<Cookie>, Error>.
This even preserves the early-exit behaviour in case of errors.

The reply() function is then changed to turn the Vec into an iterator,
using next() to pull out the items. This relies on the evaluation of
this code to be well-defined (which e.g. wouldn't be guaranteed in C).
Luckily, Rust provides the needed guarantee:
rust-lang/reference#888

Fixes: #912
Signed-off-by: Uli Schlachter <psychon@znc.in>
@mergify mergify bot closed this as completed in #914 Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants