Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gala crash on startup with mutter 3.3x #634

Closed
2 of 3 tasks
Tireg opened this issue Nov 8, 2019 · 8 comments · Fixed by #635
Closed
2 of 3 tasks

Gala crash on startup with mutter 3.3x #634

Tireg opened this issue Nov 8, 2019 · 8 comments · Fixed by #635

Comments

@Tireg
Copy link
Contributor

Tireg commented Nov 8, 2019

Prerequisites

  • I have searched open and closed issues for duplicates.

Describe the bug

When gala is built with mutter 3.3x, it crashes on startup.
This bug report follows the discussion in #566.

To Reproduce

Steps to reproduce the behavior:

  1. Start gala compiled with mutter 3.3x

Expected behavior

Gala should start properly

Logs

See logs reported after #566 was merged.

Platform Information

OS: Gentoo
Mutter version: 3.32.2

  • I'm using the latest version from git that I've manually compiled
  • I'm using the latest released stable version

Additional context

I've already triggered the bug(s) origin. I opened this bug report to track issues related to building against mutter 3.3x.
Within the comment, I'll explained each new bug, what patch is used in order to fix it, and why the original code was buggy.

@Tireg
Copy link
Contributor Author

Tireg commented Nov 8, 2019

So the PR related to building with mutter 3.3x has been merged. We can finally build gala against it, yay !
It seems however some critical bugs are still present and prevent using gala when build with mutter 3.3x.
After some investigation, I finally got it to work almost entirely properly.

In this post, I'll detail all the steps I used in order to make everything work.

Steps to fix gala with Mutter 3.3x

Building against Mutter 3.32

It seems the upstream gala builds fine agains mutter 3.34 and mutter 3.30 but fails to build agains mutter 3.32.
It looks like the PR has some changed code for MUTTER_334 that should also be made for MUTTER_332, concerning background management. Rather simple, just replace the if HAS_MUTTER334 with HAS_MUTTER332 for things related to background management (the related errors are reported by your compiler when building fails).
In order to fix this, I use the following patch: https://gist.github.com/Tireg/40a88ef867e978644ef3024d3d28ec55

Getting dynamic workspaces to work

Ok, so I have my gala working, but there is no animations, no dynamic workspace, ... This is not my gala ! It's almost unusable this way :(
It seems like applying gsettings has changes with mutter 3.30. We need to use a schema override. Here is a simple fix: https://gist.github.com/Tireg/7af22e747425b5a4f52b055a7b878a73

Fix gala crash on startup

Oh crap ! I can't even start a session with my newly built gala. I get a stracktrace looking like https://gist.github.com/worldofpeace/8e2d3ff4a442a09454a6b91e69af290b.
It seems like the WorkspaceManager's cleanup () function is causing gala to crash.

This function should remove every existing empty workspace except the last one. It does this by looping through each one and checking if the requirements are met.
On each deletion, it calls some mutter code to delete the workspace. The deletion triggers a callback to Gala.WorkspaceManager.workspace_removed (), which crashes when trying to get a workspace index.
Well well well, it seems the code has changed when using mutter 3.3x and differs slightly from the original version. Let's check what's wrong:

void workspace_removed (Meta.WorkspaceManager manager, int index)
{
	var it = workspaces_marked_removed.iterator ();
	while (it.next ()) {
	    var workspace = it.@get ();
		if (workspace.index () < 0)
			it.remove ();
	}
}

The callback Gala.WorkspaceManager.workspace_removed ():

Let's see the old behaviour:

void workspace_removed (Screen screen, int index)
{
	unowned List<Workspace> existing_workspaces = screen.get_workspaces ();

	var it = workspaces_marked_removed.iterator ();
	while (it.next ()) {
		if (existing_workspaces.index (it.@get ()) < 0)
			it.remove ();
	}
}

Gala duplicated the list of existing workspace, and search for each workspace marked removed if it can find it within the existing workspaces, therefore bypassing the internal mutter mechanism to raise an error if it doesn't exists.

We would like to use this behaviour, but manager.get_workspaces () doesn't seem available looking at libmutter.vapi.
However, gala uses a similar mechanism elsewhere in it's own code for mutter 3.3x, that seems to work: https://github.com/elementary/gala/blob/master/src/Widgets/MultitaskingView.vala#L131
We can simply mimic this behaviour within this function, which leads to this patch:
https://gist.github.com/Tireg/6730c64acce6e7c48597201ef61114e6

Ok so fine, workspaces are removed properly but this still crashes. Let's go back to the Gala.WorkspaceManager.cleanup () function:

unowned Meta.Display display = wm.get_display ();
unowned Meta.WorkspaceManager manager = display.get_workspace_manager ();
var last_index = manager.get_n_workspaces () - 1;
for (int i = 0; i < manager.get_n_workspaces (); i++) {
	unowned Meta.Workspace workspace = manager.get_workspace_by_index (i);
	if (Utils.get_n_windows (workspace) < 1
		&& workspace.index () != last_index) {
			remove_workspace (workspace);
	}
}

This seems to differ from the old behaviour:

var screen = wm.get_screen ();
var last_index = screen.get_n_workspaces () - 1;
unowned GLib.List<Meta.Workspace> workspaces = screen.get_workspaces ();

foreach (var workspace in workspaces) {
	if (Utils.get_n_windows (workspace) < 1
		&& workspace.index () != last_index) {
			remove_workspace (workspace);
	}
}

We are now looping with indexes directly in order to delete workspaces. This is problematic: the loop delete workspaces, and so the workspace indexes may changes at each loop count. We need to replicate the old behaviour:

unowned Meta.Display display = wm.get_display ();
unowned Meta.WorkspaceManager manager = display.get_workspace_manager ();
var last_index = manager.get_n_workspaces () - 1;
unowned List<Meta.Workspace> existing_workspaces = null;
for (int i = 0; i < manager.get_n_workspaces (); i++) {
	existing_workspaces.append (manager.get_workspace_by_index (i));
}

foreach (var workspace in workspaces) {
	if (Utils.get_n_windows (workspace) < 1
		&& workspace.index () != last_index) {
			remove_workspace (workspace);
	}
}

We can use this patch in order to fix this: https://gist.github.com/Tireg/a29bf78009f2e6ab65b131cbcfe8486a

Fix gala crash when opening a new window

Hey, I can now start my session ! But why the hell does opening a window on the spare workspace makes it crash ?
It now makes a stacktrace like #566 (comment)...

Ok so this bug was a tough one and I fixed it only with intuition. Looking similar issues on google, I supposed the current workspace was beeing deleted. When disabling cleanup () function from Gala.WorkspaceManager.construct, it seemed this bug disappeared. Could the cleanup () function be responsible from this bug ?
Let's checkout what it really does:

unowned Meta.Display display = wm.get_display ();
unowned Meta.WorkspaceManager manager = display.get_workspace_manager ();
var last_index = manager.get_n_workspaces () - 1;
unowned List<Meta.Workspace> existing_workspaces = null;
for (int i = 0; i < manager.get_n_workspaces (); i++) {
	existing_workspaces.append (manager.get_workspace_by_index (i));
}

foreach (var workspace in workspaces) {
	if (Utils.get_n_windows (workspace) < 1
		&& workspace.index () != last_index) {
			remove_workspace (workspace);
	}
}

Within the main foreach, it checks wheter the workspace index is the last index. Ok this seems logically correct. But what defines the last index ?
var last_index = manager.get_n_workspaces () - 1;
The last index is the index of the last workspace at the beginning of the function. This index changes when workspaces are removed ! We need to update le last index, otherwise gala will destroy all of the workspaces.
We can do it with this patch: https://gist.github.com/Tireg/7641308eea067e6ea28753d4bb86a430

Fix crash on leaving an emptied workspace

Ok so now I can finally start my session, and open a window without crashing gala. But hey ! When I empty a workspace and I leave it, it still crashes !
It's giving me a stacktrace like:

Bug in window manager: Workspace does not exist to index!
==2815771== 
==2815771== Process terminating with default action of signal 6 (SIGABRT): dumping core
==2815771==    at 0x5CE7F5B: raise (in /lib64/libc-2.29.so)
==2815771==    by 0x5CD1534: abort (in /lib64/libc-2.29.so)
==2815771==    by 0x57A39F4: meta_bug (in /usr/lib64/libmutter-4.so.0.0.0)
==2815771==    by 0x57B000D: meta_workspace_index (in /usr/lib64/libmutter-4.so.0.0.0)
==2815771==    by 0x261C4E: gala_multitasking_view_update_positions (in /usr/bin/gala)
==2815771==    by 0x26232F: _gala_multitasking_view_remove_workspace_meta_workspace_manager_workspace_removed (in /usr/bin/gala)
==2815771==    by 0x4A8EF1C: g_closure_invoke (in /usr/lib64/libgobject-2.0.so.0.6000.7)
==2815771==    by 0x4AA2887: signal_emit_unlocked_R (in /usr/lib64/libgobject-2.0.so.0.6000.7)
==2815771==    by 0x4AABAAD: g_signal_emit_valist (in /usr/lib64/libgobject-2.0.so.0.6000.7)
==2815771==    by 0x4AAC156: g_signal_emit (in /usr/lib64/libgobject-2.0.so.0.6000.7)
==2815771==    by 0x579B5CE: meta_workspace_manager_remove_workspace (in /usr/lib64/libmutter-4.so.0.0.0)
==2815771==    by 0x254671: gala_workspace_manager_remove_workspace (in /usr/bin/gala)

So now, we have another bug related to workspace removal. It seems like this one, the Multitasking-View widget is giving us a hard time, even when we don't use it.
Ok so the update_positions () function of this widget seems to be unable to find one of the workspace it internally keeps: https://github.com/elementary/gala/blob/master/src/Widgets/MultitaskingView.vala#L315.

We should check that the workspaces removal works with mutter 3.3x. Lets see the differences:

void remove_workspace (int num)
{
	WorkspaceClone? workspace = null;
	unowned Meta.WorkspaceManager manager = display.get_workspace_manager ();

	foreach (var child in workspaces.get_children ()) {
		unowned WorkspaceClone clone = (WorkspaceClone) child;
		for (int i = 0; i < manager.get_n_workspaces (); i++) {
			if (manager.get_workspace_by_index (i) == clone.workspace) {
				workspace = clone;
				break;
			}
		}

		if (workspace != null) {
			break;
		}
	}

	[...]
}

This also differs from the old behaviour:

void remove_workspace (int num)
{
	WorkspaceClone? workspace = null;
	unowned List<Meta.Workspace> existing_workspaces = screen.get_workspaces ();

	foreach (var child in workspaces.get_children ()) {
		unowned WorkspaceClone clone = (WorkspaceClone) child;
		if (existing_workspaces.index (clone.workspace) < 0) {
			workspace = clone;
			break;
		}
	}

	[...]
}

So in the original version, we loop through each of the Multitasking-View workspaces and check whether they exists (==> we can find them) within the mutter workspaces. If we find one that does not exists, then we delete it.
In the new version, we try something very strange. We try to find the first workspace of the multitasking-view which has an equivalent within mutter, and then delete it.

Let's just reuse the old behaviour the same way we did before, using this patch: https://gist.github.com/Tireg/cd97aa57df65706adb12b2f6210f580e

Fix workspace reorder glitches with mutter 3.34

Phew ! Gala is finally working as expected. But I noticed a strange behaviour when using workspace reordering with mutter 3.34: the multitasking view does not update properly when I move a workspace. I don't immediately see the order changes within the multitasking view.

Well with the PR beeing merged, some changes were not applied accordingly for Mutter 3.34. We can finally fix this last issue using this patch: https://gist.github.com/Tireg/dfbf0d7dc12fd8a3e533c808681b109a

State of gala with those patches

Gala seems to work fine with these patches. I'll probably make a pull request based that summaries everything. I still got some bugs though, and I'd like testing it more in order to eradicate all of the bugs left as a crash is critical.

@tintou
Copy link
Member

tintou commented Nov 8, 2019

Hi there! Thanks for taking so much time on fixing those issue, to make your patch easier to include and review, please open a Pull Request on GitHub (this is a right tutorial https://www.digitalocean.com/community/tutorials/how-to-create-a-pull-request-on-github if you don't know how to do)

@Tireg
Copy link
Contributor Author

Tireg commented Nov 8, 2019

Thanks @tintou, I was indeed discovering the PR process of Github. I hope I didn't made any mistake !

@worldofpeace
Copy link
Contributor

worldofpeace commented Nov 11, 2019

I've noticed something interesting with the multitasking view, both with @Tireg patch and on master:

Screenshot from 2019-11-10 21-26-33

It appears windows in multitasking view aren't drawn/visible.

Produced in NixOS 20.03 development, which has a Gnome 3.34 stack.

Edit: Also in show all windows, so at least that helps to pinpoint the abstraction where the fault is.

@Tireg
Copy link
Contributor Author

Tireg commented Nov 11, 2019

@worldofpeace Is the window drawn if you:

  • Drag & drop it on another workspace ?
  • Reorder workspaces (at least move the workspace which contains the window) ?
  • Move across workspaces without quitting multi-tasking view ?

Just to help with debugging.

@worldofpeace
Copy link
Contributor

@Tireg In all of these scenarios the window isn't drawn. Only when quitting mult-tasking view does it appear.

@Tireg
Copy link
Contributor Author

Tireg commented Nov 13, 2019

@worldofpeace I'm afraid I don't have enough Vala/Gobject/Clutter experience to understand what's actually responsible for drawing the objects.
The src/Widgets/WindowClone.vala seems to be the Clutter Actor responsible of the windows when you open multitasking view and windows-overview. It contains the window icon and the window close button, as well as the box responsible to catch mouse events and the window shadow.
It is contained within a WindowCloneContainer (src/Widgets/WindowCloneContainer.vala) which is responsible for placing the windows (inside a workspace) in those views.

I don't quite get why the icon and close button are drawn as there a sibling of the WindowClone, when the actual Window is not drawn.

It seems there is no Gala's code specific for MUTTER334 that can interfere with actual drawing operations, so I would check for Mutter changes that could lead to this issue.

I'll probably lack some time to investigate this but I'll let you know if I find something.

@worldofpeace
Copy link
Contributor

@Tireg Thanks for looking into it though. I'll pull it into it's separate bug report so others can have an opportunity to observe/investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants