systemd has been a complete, utter, unmitigated success

« systemd has been a complete, utter, unmitigated success »

8 July, 2025
1,782 words
7 minute read time

Figure 1: The systemd wars of the tenties were harsh and casualties were many

The year is 2013 and I am hopping mad.

systemd is replacing my plaintext logs with a binary format and pumping steroids into init and it is laughing at me. The unix philosophy cries out: is this the end of Linux (or, as many are calling it, GNU plus Linux)?

The year is 2025 and I’m here to repent. Not only is systemd a worthy successor to traditional init, but I think that it deserves a defense for what it’s done for the landscape – especially given the hostile reception it initially received (and somehow continues to receive? for some reason?). No software is perfect – except for TempleOS – but I think that systemd has largely been a success story and proven many dire forecasts wrong (including my own). I was wrong!

The `init` Paleolithic

I hope that I don't need to whine about why the old status quo wasn't great – init scripts of varying quality with janky dependencies and wildly varying semantics were frustrating. It's sort of wild to me that I was working as a full-time software engineer during an era in which we were still writing bespoke shell scripts to orchestrate process management. "Lost" or unmanaged processes, the weirdness of S99-type directories for dependency ordering, and different interfaces into /etc/init.d scripts were all real problems.

Figure 2: /etc/init.d, uh, finds a way

During the LINUX INIT WARS, you could probably write an upstart, s6, or OpenRC init script that didn't have too many problems. But even then you're supporting a variety of service management configuration formats with slightly differing behaviors. I wrote services for all of these different init systems! And the experience wasn't super!

Many of the deficiencies of traditional service management are more obvious in hindsight. Whereas bare-bones init was mostly about handling and/or reaping orphaned processes, entrusting a systemd-based PID 1 is also big for sandboxing and dependency management. We haven't even talked about timers, sockets, or mounts, either.

I Deprecated Your Mom

We don't need to re-tread in great detail the history of how we arrived here. But the how is part of the reason I think systemd worked out in the end.

Consider that the two primary ways that older init systems managed processes – either foregrounded or forked – were (and are!) fully supported modes. Modern systemd provides for more nuanced "I'm ready" signaling apart from "is the process alive" (via Type=notify), but this kind of backward compatability really helped bridge the legacy gap. The systemd authors even wrote generator code to help migrate old services.

I don't think the ini-style configuration format is a panacea (I like Dhall), but that's another olive branch from systemd authors to system administrators: it doesn't require a turing-complete configuration format or domain-specific language. You can generally understand what this means when you read it and how to change it:

Systemd

Font used to highlight builtins.
Font used to highlight keywords.
Font used to highlight type and class names.

[Service]
Type=forking

Defaults matter and configuration languages matter, too. I appreciate that systemd chose one that is obvious.

I can cite other examples but the point I want to make is that systemd deliberately chose

backward compatability,
simple configuration paradigms,
and to proactively support and help folks migrate.

Not every open source project chooses to take explicit steps to support old paths on the road to deprecation. Lennart, you sweetheart.

Trust the Process

I don't just think that systemd is our newer, cooler Dad now that does previously-annoying things better, but that systemd also brought us good, brand new things.

Won't Somebody Think of the Plaintext?

Figure 3: Logged logs logging loggily

journald is here. Past Me hated it, too. The primary complaint with journald is that its journal files aren't in plaintext. Do I miss that? A little, yeah. I'm sort of a Linux boomer at heart and like to use awk for everything.

However, I really like having one place to send stdout and stderr! Have you ever leveraged custom fields when writing logs to the journal natively? I attach NOTIFY_SLACK=1 to some of my services and listen to my lab's log stream for these events and forward them along to a Slack channel to see logs I want more easily, it's great!

Moreover, delegating the responsibility to journald is also convenient from a rotation and disk space perspective. With an awareness of filesystem space, I essentially never have to make rough guesses about rotation frequency any more, either¹. Are you aware that part of the reason your journal files are in a binary format rather than plaintext because journald is compressing them transparently? That default choice is probably saving exabytes of space in aggregate across the entire computing space.

We can still live-tail logs, we can still forward log streams to different servers, and services can now reliably trust that their output will be captured during runtime. These are all just net Good Things.

Time-r Out

I can still remember debugging cron scripts at my university job: was $PATH wrong? Should I echo $USER somewhere? Why am I emitting output to the mail spool by default???

If there's a candidate for "most legible over its predecessor", it might be the systemd timer system. Every Linux person feels some smug pride knowing what 0 0 * * * means just by seeing a sequence of asterisks, but we all know OnCalendar=daily is easier to understand. Is OnCalendar=minutely a word? Not according to the grammar police, but you can probably infer what minutely means!

I could fill a blog post with things I love about systemd timers, so here's a list instead:

Persistent=true is a great tool to ensure you don't miss timer executions.
systemctl list-timers is an excellent way to see everything scheduled on a machine.
The scheduling flexibility of OnCalendar= and OnActiveSec= are both powerful and easy to understand.

Socket Activation

This alone is a hugely different and powerful way to optimize a system. nix-daemon leverages this to great effect by "lazily" running only when you need it: the daemon will stop when you aren't building anything, but as soon as you ask for it, nix-daemon.socket will start nix-daemon.service. That's a great feature!

True to form, systemd even provides the systemd-socket-proxyd executable to bridge the gap for services that may not speak the native protocol yet. I leverate this trick with heavy-handed daemons like Minecraft servers to great effect: I don't need to alter the original daemon at all, but systemd-socket-proxyd lets me leverage socket activation to run it on-demand anyway.

A Fistful of Units

When you glue together the various unit types - service, path, timer, mount, socket, and so on - you can almost create a state machine out of your system. I've done this on NixOS and it's a powerful way to model interdependent service management.

Expressing system configuration like mounts as mount units lets you correctly order a daemon that needs a network mount to function. Triggering a service to restart when a file changes is easy with a path unit. The variety of options available to a service unit are mind-boggling and address almost every need you can think of. Seriously – did you know that ConditionVirtualization= can be used to run a unit depending on whether you're in AWS or Docker, for example? That's crazy.

Security

If you've written a nontrivial number of .service units, then you know the options available for hardening services are vast in number. There are already many great blog posts about what they are; I won't go into that there.

Personally, my problem is remembering what those options are. Did you know that systemd built tools to help with that, too? Each one of these explains some operational security benefit you can wrap a daemon with and in most cases they're each easy to add and don't break functionality. These are a great way to take advantage of features like capabilities easily.

shell

systemd-analyze security polkit.service

  NAME                                                        DESCRIPTION                                                                                         EXPOSURE
✓ SystemCallFilter=~@swap                                     System call allow list defined for service, and @swap is not included                               
✗ SystemCallFilter=~@resources                                System call allow list defined for service, and @resources is included (e.g. ioprio_set is allowed)      0.2
✓ SystemCallFilter=~@reboot                                   System call allow list defined for service, and @reboot is not included                             
✓ SystemCallFilter=~@raw-io                                   System call allow list defined for service, and @raw-io is not included                             
✗ SystemCallFilter=~@privileged                               System call allow list defined for service, and @privileged is included (e.g. chown is allowed)          0.2
✓ SystemCallFilter=~@obsolete                                 System call allow list defined for service, and @obsolete is not included                           
✓ SystemCallFilter=~@mount                                    System call allow list defined for service, and @mount is not included                              
✓ SystemCallFilter=~@module                                   System call allow list defined for service, and @module is not included                             
✓ SystemCallFilter=~@debug                                    System call allow list defined for service, and @debug is not included                              
✓ SystemCallFilter=~@cpu-emulation                            System call allow list defined for service, and @cpu-emulation is not included                      
✓ SystemCallFilter=~@clock                                    System call allow list defined for service, and @clock is not included                              
✓ RemoveIPC=                                                  Service user cannot leave SysV IPC objects around                                                   
✗ RootDirectory=/RootImage=                                   Service runs within the host's root directory                                                            0.1
✓ User=/DynamicUser=                                          Service runs under a static non-root user identity                                                  
✓ RestrictRealtime=                                           Service realtime scheduling access is restricted                                                    
✓ CapabilityBoundingSet=~CAP_SYS_TIME                         Service processes cannot change the system clock                                                    
✓ NoNewPrivileges=                                            Service processes cannot acquire new privileges                                                     
✓ AmbientCapabilities=                                        Service process does not receive ambient capabilities                                               
✓ CapabilityBoundingSet=~CAP_BPF                              Service may not load BPF programs                                                                   
✓ SystemCallArchitectures=                                    Service may execute system calls only with native ABI                                               
✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service may change UID/GID identities/capabilities                                                       0.3
✗ RestrictAddressFamilies=~AF_UNIX                            Service may allocate local sockets                                                                       0.1
✓ ProtectSystem=                                              Service has strict read-only access to the OS file hierarchy                                        
✓ SupplementaryGroups=                                        Service has no supplementary groups                                                                 
✓ CapabilityBoundingSet=~CAP_SYS_RAWIO                        Service has no raw I/O access                                                                       
✓ CapabilityBoundingSet=~CAP_SYS_PTRACE                       Service has no ptrace() debugging abilities                                                         
✓ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE)              Service has no privileges to change resource use parameters                                         
✓ CapabilityBoundingSet=~CAP_NET_ADMIN                        Service has no network configuration privileges                                                     
✓ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has no elevated networking privileges                                                       
✓ CapabilityBoundingSet=~CAP_AUDIT_*                          Service has no audit subsystem access                                                               
✓ CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has no administrator privileges                                                             
✓ PrivateNetwork=                                             Service has no access to the host's network                                                         
✓ PrivateTmp=                                                 Service has no access to other software's temporary files                                           
✓ CapabilityBoundingSet=~CAP_SYSLOG                           Service has no access to kernel logging                                                             
✓ ProtectHome=                                                Service has no access to home directories                                                           
✓ PrivateDevices=                                             Service has no access to hardware devices                                                           
✗ ProtectProc=                                                Service has full access to process tree (/proc hidepid=)                                                 0.2
✗ ProcSubset=                                                 Service has full access to non-process /proc files (/proc subset=)                                       0.1
✗ PrivateUsers=                                               Service has access to other users                                                                        0.2
✗ DeviceAllow=                                                Service has a device ACL with some special devices: char-rtc:r /dev/null:rw                              0.1
✓ KeyringMode=                                                Service doesn't share key material with other services                                              
✓ Delegate=                                                   Service does not maintain its own delegated control group subtree                                   
✗ IPAddressDeny=                                              Service does not define an IP address allow list                                                         0.2
✓ NotifyAccess=                                               Service child processes cannot alter service state                                                  
✓ ProtectClock=                                               Service cannot write to the hardware clock or system clock                                          
✓ CapabilityBoundingSet=~CAP_SYS_PACCT                        Service cannot use acct()                                                                           
✓ CapabilityBoundingSet=~CAP_KILL                             Service cannot send UNIX signals to arbitrary processes                                             
✓ ProtectKernelLogs=                                          Service cannot read from or write to the kernel log ring buffer                                     
✓ CapabilityBoundingSet=~CAP_WAKE_ALARM                       Service cannot program timers that wake up the system                                               
✓ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)         Service cannot override UNIX file/IPC permission checks                                             
✓ ProtectControlGroups=                                       Service cannot modify the control group file system                                                 
✓ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE                  Service cannot mark files immutable                                                                 
✓ CapabilityBoundingSet=~CAP_IPC_LOCK                         Service cannot lock memory into RAM                                                                 
✓ ProtectKernelModules=                                       Service cannot load or read kernel modules                                                          
✓ CapabilityBoundingSet=~CAP_SYS_MODULE                       Service cannot load kernel modules                                                                  
✓ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG                   Service cannot issue vhangup()                                                                      
✓ CapabilityBoundingSet=~CAP_SYS_BOOT                         Service cannot issue reboot()                                                                       
✓ CapabilityBoundingSet=~CAP_SYS_CHROOT                       Service cannot issue chroot()                                                                       
✓ PrivateMounts=                                              Service cannot install system mounts                                                                
✓ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND                    Service cannot establish wake locks                                                                 
✓ MemoryDenyWriteExecute=                                     Service cannot create writable executable memory mappings                                           
✓ RestrictNamespaces=~user                                    Service cannot create user namespaces                                                               
✓ RestrictNamespaces=~pid                                     Service cannot create process namespaces                                                            
✓ RestrictNamespaces=~net                                     Service cannot create network namespaces                                                            
✓ RestrictNamespaces=~uts                                     Service cannot create hostname namespaces                                                           
✓ RestrictNamespaces=~mnt                                     Service cannot create file system namespaces                                                        
✓ CapabilityBoundingSet=~CAP_LEASE                            Service cannot create file leases                                                                   
✓ CapabilityBoundingSet=~CAP_MKNOD                            Service cannot create device nodes                                                                  
✓ RestrictNamespaces=~cgroup                                  Service cannot create cgroup namespaces                                                             
✓ RestrictNamespaces=~ipc                                     Service cannot create IPC namespaces                                                                
✓ ProtectHostname=                                            Service cannot change system host/domainname                                                        
✓ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)           Service cannot change file ownership/access mode/capabilities                                       
✓ LockPersonality=                                            Service cannot change ABI personality                                                               
✓ ProtectKernelTunables=                                      Service cannot alter kernel tunables (/proc/sys, …)                                                 
✓ RestrictAddressFamilies=~AF_PACKET                          Service cannot allocate packet sockets                                                              
✓ RestrictAddressFamilies=~AF_NETLINK                         Service cannot allocate netlink sockets                                                             
✓ RestrictAddressFamilies=~…                                  Service cannot allocate exotic sockets                                                              
✓ RestrictAddressFamilies=~AF_(INET|INET6)                    Service cannot allocate Internet sockets                                                            
✓ CapabilityBoundingSet=~CAP_MAC_*                            Service cannot adjust SMACK MAC                                                                     
✓ RestrictSUIDSGID=                                           SUID/SGID file creation by service is restricted                                                    
✓ UMask=                                                      Files created by service are accessible only by service's own user by default                       

→ Overall exposure level for polkit.service: 1.2 OK :-)

Hater Sauce and The Terror From The Year 2000

Part of the reason I wrote this piece is that I keep stumbling onto threads like this:

i used to think that systemd was made the default and adopted by most distros because of its ease of use and the fact it supplied a whole bunch of things in one suite and i see where the appeal is in that but after switching to artix openrc, im just lost on why they decided to use systemd when openrc is objectively better when it comes to being an init system and for managing services, and all the other components of systemd suite can just be replaced, like why would they do this?

Oh my god. Look, I respect that stvpidcvnt111111 has a right to their opinion, but we can't let rhetoric with the intellectual weight of a mediocre fart waft into spaces as critical as computing infrastructure. Get your stench outta here.

I'm not going to argue with straw men here, but wait, I am actually:

systemd does too much.

Have you considered that just "reaping old process IDs" wasn't enough responsibility for an init daemon on a secure, robust system? That maybe it should be protecting other parts of the system and tracking the liveness of a desired service?

systemd does a bad job

If I see an argument like this then I can only assume the interlocutor doesn't do software engineering. Any sort of consistent experience using systemctl or journalctl will tell you otherwise. I've never even heard of systemd failing at its core responsibilities (starting, stopping, and managing daemons).

systemd is too bloated and tries to do too much

For everything that modern systemd does, I'm shocked that there aren't more vulnerabilities (and yes, I'm aware of the CVEs that systemd does have). I have no hard numbers supporting this claim, but I do wonder what the delta is between "exploits due to systemd itself" against "exploits blocked by the service sandboxing that systemd provides" is. The ease of dropping an executable in an unprivileged environment is a great feature. The industry as a whole is almost assuredly safer with the accessibility to process sandboxing that systemd brought down to an easier level.

Yeah, systemd-boot and systemd-networkd do different things. Frankly, my life as an operator has been significantly better thanks to the quality of software that comes out of systemd-* based projects and they're all configured in similar ways, too. I've integrated at a low level with systemd APIs as well, so it's not as if this scary-sounding sprawl is closed, either. The APIs are there! You can use them!

I've consistently found myself preferring to use the systemd based alternatives like systemd-resolved and systemd-networkd when given the option because they end up being easier to configure and use.

red hat is trying to control the linux ecosystem with systemd

This is absolutely true. I can't believe we, the SYSTEMD GLOBALIST ILLUMINATI, have been exposed.

Footnotes:

I know logrotate can do very intelligent things. But the configuration steps for journald is "print to stdout, done".

« The Human Resources Alignment Problem

To monitor my backups, I had to first invent the universe »

Tyblog

All the posts unfit for blogging
blog.tjll.net