On Wayland | Reiddragon's Nerd Cave

If you’ve been using Linux over the past 15 years, you’ve almost certainly heard of Wayland. The new generation of display server protocols, meant to replace the old and antiquated X11! Or so it’s touted by its more hardcore proponents.

However, the full story is… less rosy.

Before I can make my point, however, we need to go over why Wayland exists to begin with.

Composited Desktops

When X was first created in 1984, computers were not equipped with 3D hardware acceleration like they are today. The most you could really hope for in terms of dedicated hardware assistance for graphics was a blitter chip like the Amiga’s Agnus, or the Atari ST’s aptly named BLiTTER (although both of these systems would only release a year after X’s conception). And so, all graphics were done in CPU memory, with maybe some hardware assistance to copy large chunks of memory around when composing the final image from all the different windows present on the desktop (although some machines did all of that on the CPU).

By today’s standards, this would be considered incredibly slow and wasteful, especially at the high resolutions expected by many today, but even as late as the mid-2000s, it was the way graphical environments were being rendered.

By the late 2000s, however, it was clear that the future would be built on 3D accelerated graphical environments, with Windows Vista released in 2006 having a fully 3D accelerated shell (even if not all computers could actually render it yet), and Linux desktop environments such as KDE Plasma joining the party in 2008 when Plasma 4 released with a 3D accelerated desktop as well.

By 2010, you’d be hard pressed to find a modern system that didn’t use 3D hardware acceleration in rendering its desktop, and for good reason: this allowed higher resolutions without increasing CPU usage, as well as fancy graphical effects such as shadows, blur, or transparency, all relatively cheap thanks to GPUs being perfectly suited for these tasks.

In the world of Linux, there was, however, a problem: X.Org, the standard display server used by pretty much every desktop environment, was not built for this, and so the rendering pipeline got rather convoluted.

Rendering Pipelines!

The classic model that X was built for, each client would render its window, send the resulting image to the X server which then put each window’s image in the correct place, then render it on screen. Simple!

With composited desktops, however, the pipeline got rather complicated. The clients still sent their rendered windows to the X server, however, these images are then sent to the compositor which assembles the final image on the GPU using any desired effects (shadows, transparency, blur, wobbly windows etc), the final image being sent back to the X server which then finally sent it to the GPU to render.

Needless to say, this was not great for performance, which is why many compositors would step back and let X11 render things using the old pipeline when low latency was desired, such as for video games.

But there must be a better way, right?

A solution from far aWaylands

Work on Wayland was first started by Red Hat developer Kristian Høgsberg in 2008, and proposed a reworked graphics pipeline. By rolling the display server and compositor into one, the pipeline would be greatly simplified, and the performance and latency would improve. (this is a bit of an oversimplification, but I don’t want to focus on all of the nitty gritty in this post).

So now we have a display server better suited for the modern times, all while having a simplified architecture! What’s the catch?

Well, the catch is that the core Wayland spec is anemic. It’s really nowhere nearly suitable for anything. Instead you need to extend it using extra protocols which the display server and client each have to understand. That can’t be that bad, though, right? WRONG! Wayland also rolls the window manager into the display server, so if you want just a different display server, you need a whole new Wayland compositor which may or may not support all of the same extensions.

This is somewhat alleviated by wlroots, a library created as part of Sway which allows developers to focus on the window manager and leave all the dirty work supporting Wayland extension protocols to the library, but it’s still an issue.

For example, if you’re developing an application and want a system-drawn titlebar for your application’s window, you can only do that on some Wayland servers. The core Wayland specification is missing any way for clients to request a titlebar, and the extension XDG Decoration is only supported by some servers such as Kwin (Plasma’s Window Manager and Wayland server) and wlroots-based servers like Sway or labwc. Meanwhile Weston and GNOME don’t support this extension, meaning your application will end up without a titlebar in these environments.

And this is a relatively lucky case. For other things, you have completely different and incompatible means of archieving something depending on the environment. For example, if you want to take a screenshot, on wlroots compositors you have the wlr screencopy extension, on gnome you go trough a dbus interface, and on Kwin you request PipeWire to take the screenshot for you trough xdg-desktop-portal (if the latter even works, which it usually doesn’t).

And the list just keeps going. All sorts of functionality which used to be universally accessible on X using the same interface everywhere becomes a convoluted mess of protocols which the Wayland display server may or may not even support. And that’s assuming a protocol even exists.

But why is it such a mess?!

Well, a very simple answer: Red Hat made it with no other environment in mind but GNOME. GNOME 3 consolidated many components into its window manager-compositor combo in X11. In fact, the entire GNOME desktop shell runs inside the same process as the window manager and compositor in X11. So of course, missing protocols for a desktop shell wouldn’t matter to Red Hat as GNOME never intended to use that, they can just roll the shell inside the server, making it all a giant monolith. Same story for titlebars: GNOME started a little crusade against system-managed titlebars (also known as server-side decorations), and so they had absolutely no issue omitting that from the core Wayland specification, even if other environments and plenty of applications rely on server side decorations.

In short, if you aren’t GNOME, the Wayland designers just didn’t take you into account when designing it.

Highly modular environments put together from different independent pieces were hit the hardest by this.

On X11, it’s not hard to grab a window manager like i3 or Openbox, a compositor like picom or compton, a panel like i3bar or polybar, a notifications daemon like dunst, some application to display a wallpaper like feh or mpv, a launcher like dmenu or bemenu and whatever else you feel like to build up your own unique environment. On Wayland, however, this is simply not possible without a LOT of extensions. Sway has put much of the work into creating said extensions and making them available trough wlroots, allowing similar kinds of makeshift environments on Wayland, but it’s all built on countless extensions which may not even work on every compositor and which don’t even cover everything that could be done in X11. And why? Because Red Hat quite simply did not care about people who want that. They only cared about their own little monolithic environment that is GNOME.