« systemd, User Instances, Device Units, and Media Archiving »

11 July 2015

I recently used systemd, HandBrake, and some simple scripts to digitize a large collection of physical media (for personal, archival use.) In this post I’ll go through systemd features that made this easier and cover all the components that make the automated pipeline work.

If systemd, automation, or digital archiving sounds interesting to you, then read on!

The Problem

Atoms take up space. I’m moving in the near future, so my wife and I have been simplifying and trying to convert whatever we can into digital media rather than physical. This includes a moderately-sized DVD collection - which would be really nice to convert into media files so they can be stored on my NAS and used in tandem with my already-existing Kodi setup.

However, there’s a bit of a vacuum when it comes to automating this sort of process. Handbrake makes this easier, but the extent of the interaction I want a human to have to perform is to 1) put a DVD in and 2) take the DVD out - no selecting a title track, choosing an encoding format, and so on.

We can make computers do the tedious parts!

We’ll start from the ground up, beginning with device activation and ending with media file processing.

Hardware

I have a NAS running Arch Linux which serves as a multi-purpose always-on server. I also had a spare DVD drive laying around after tearing it out of a 2009 Macbook Pro to add a second hard drive, so the first step was to convert that formerly internal drive into a USB-connected external drive.

This is pretty easy by putting the drive into an external enclosure, which I can hook into the NAS as a USB device.

systemd

Devices

Because Arch Linux uses systemd through-and-through, the init system also manages devices. You can actually see managed devices with the following command on a systemd-managed distribution with:

$ systemctl --all --full -t device

This lists all ‘active’ devices, which in systemd nomenclature indicates the device is up, running, and available. In our case, once a CD/DVD is loaded, the dev-cdrom.device device is activated and started to make it available.

We need a way to trigger the activation of a script, and this seems like a good system to hook into. Note that this is also possible with udev rules - the hotplugging mechanism that permits you to write rules that can dictate how devices are mounted and made available to a Linux operating system. However, if we look at the man page for udev, we find this:

RUN{type}
    Add a program to the list of programs to be executed after processing all the rules for a specific event ...

    ...snip...

    This can only be used for very short-running foreground tasks. Running an event process for a long period of time may block all further events for this or a dependent device.

Definitely not a good candidate for a script that can potentially run for a few hours to properly encode optical media.

However, systemd can fit this need pretty well. The feature set of systemd exceeds the scope of this post, but there’s good documentation under the following man pages: systemd.exec, systemd.service, and systemd.unit.

The Service

We’ll define our script as a systemd service, and work under the paradigm that starting our service begins capturing our DVD.

Here’s the content of the service unit:

[Unit]
Description=Automatically rip inserted DVDs
After=dev-cdrom.device
BindsTo=dev-cdrom.device
Requisite=dev-cdrom.device

[Service]
WorkingDirectory=/path/to/where/you/want/media/files
Type=oneshot
ExecStart=/usr/local/bin/rip-dvd.sh
ExecStart=/usr/bin/eject /dev/cdrom
StandardOutput=journal

[Install]
WantedBy=dev-cdrom.device

Let’s go over the parts that aren’t self-explanatory – we’ll start with WantedBy, as that’s the most important part for our task. Note that each of these options are well-explained in the man pages as well:

  • WantedBy - This indicates that the dev-cdrom.device unit will ask for our service to be started when the device itself gets activated. We say WantedBy and not RequiredBy because the latter would indicate that our service is a required dependency of making our cdrom available - which isn’t the case (there’s no logical relationship that indicates our autorip service is doing something that must be available for the cdrom to work)
  • After, BindsTo, Requisite - These seem somewhat redundant, but let’s look at each:
    • Requisite - This indicates that the cdrom needs to be running in order for our unit to function. It’s similar to using “Requires”, except that if our cdrom isn’t running, systemd won’t try to “start” is as part of beginning our ripping process. This makes sense, as if the disk isn’t in the optical drive, systemd can’t do much physically to insert the disk and make the cdrom device unit active.
    • After - Ensure that our service begins only after the cdrom unit has started completely. This is important, because otherwise systemd will attempt to aggressively parallelize unit activation once this unit gets triggered by our WantedBy line.
    • BindsTo - This isn’t a strictly required option, but it makes the implementation cleaner. Essentially our unit will be killed if the indicated unit disappears from some reason - for example, if the disk gets ejected from the optical drive (which makes sense - without a disk to read from, our ripping process should be halted.)
  • Type=oneshot - This unit is intended to run then exit without any persistent daemon. This line tells systemd to expect that behavior (our script will encode the DVD, then exit.)
  • StandardOutput=journal - In my testing I noticed that I didn’t always get script output captured in the journal. Explicitly adding this ensures that any script output is automatically logged to the journal, and can then be easily retrieved with journalctl --user-unit <unit> (if you’ve tried to debug cron job behavior, you’ll recognize this as pretty useful.)
  • ExecStart - For a oneshot service, we can define multiple stanzas. In this case, we wrap up the ripping process into a script, then if that command succeeds, spit the disk out. By default multiple ExecStart lines are dependent upon the success of the preceding execution, so the unit will halt and not eject the disk if ripping fails.

The most important line of configuration here is WantedBy. Because systemd has abstracted both devices and services into unit files, we can create arbitrary relationships such as this one. Asking a unit to begin as the result of a device starting then becomes a trivial task, as opposed to using udev’s RUN, which could cause problems by inlining scripts into the pipeline that actually initializes the device.

Enabling the Service

We now have a systemd service file defining the script we want to run as a result of inserting a disk. How do we enable it?

One option is to install it system-wide into /etc/systemd/system, but we’re going to use a user instance for this script. Partially to get familiar with user instances, and partially because it fits our use case: I, as a user, want to run this script based on a trigger. It also follows the principle of least privilege and I have all my dotfiles tracked anyway, so this process becomes nicely version controlled.

Luckily this is pretty straightforward. By placing the unit in /home/$user/.config/systemd/user/autorip.service (I’ve named my unit “autorip” here), the user systemd instance becomes aware of the service unit.

Note: This was done on an Arch Linux machine which is thoroughly steeped in systemd, so I’m not sure what the steps may be to leverage systemd user instances on another distro. I will mention that after some recent upgrades, I had to follow the Arch Wiki instructions to enable user instances to persist after log-out, otherwise enabled services disappear once the user session is destroyed.

Now that our user instance is aware of the unit’s presence, just enable it:

$ systemctl --user enable autorip.service

This means that, among other things, our WantedBy setting takes effect, asking to be activated when the cdrom unit is activated.

That’s it! Now when the cdrom is inserted, our unit will get triggered, and we get useful script output logging for free.

The Script

Let’s talk about the script we need. I initially turned to Handbrake, which is an excellent tool for media ripping that also has a command-line utility. I tried a few scripts with it, which worked, but were difficult to get working with a ratio of size to quality that I wanted.

It turns out Don Melton has some absolutely fantastic scripts that leverage Handbrake for very well-tuned video encoding. I homeshick clone-ed this and symlinked the scripts into my homeshick-tracked ~/.bin $PATH directory, so my user can call on transcode-video.sh properly.

All that remains is to glue some pieces together to actually encode the video. Here’s a simplified version of the script I wrote. Some parts to note:

  • I’m using zsh - some features like that ${(C)..} syntax can be shell-specific. (that capitalizes a string, by the way)
  • filmdate.py is a small utility I wrote to attempt a guess at the year of a film’s title. The source is on github if you’re interested. I used this because there’s little (if any) metadata available on the disk to rely on, and most post-processors need, at minimum, the title and year to parse what movie the filename contains.
  • The WorkingDirectory setting in the unit is a temporary directory that I can write the file in, and the last mv in the script drops the finished file into a directory that my setup watches to pick up media.

If you’ve ever tried automation like this, you know that the amount of information you can glean from a plain DVD is pretty bare; essentially only the (sometimes mangled) title is available. In my script I attempt to pull the title from Handbrake, which can sometimes fail. In that case you’ll notice I use := variable assignment. This allows me to use the set-environment action in systemd before inserting a disk if I want to statically set the film title before attempting to rip a disk. Thus if I know that a particularly ugly title is going into the DVD drive, I can use:

$ systemctl --user set-environment TITLE="Justin.Bieber.Never.Say.Never"

This means that my script will not overwrite the TITLE variable if it finds one in the environment. This is a little clunky because we set set this globally across all user units instead of for our specific unit, but it works well enough. Issuing

$ systemctl --user unset-environment TITLE

Means the script will fallback to using Handbrake to attempt auto-detection. SELECT_TITLE behaves similarly if the automation can’t determine the correct title.

Conclusion

With the script in the right location for our unit’s ExecStart and the .service enabled, DVDs inserted into the system will trigger activation of our user service. In my case, the first ExecStart should succeed, moving the file into a pickup location for Kodi, and then eject the disk. This means that digitization is just a matter of inserting the disk and taking it out when I get a Pushover notification from Couchpotato that my media has been captured correctly.

There’s lots of room for improvement here - for example, parsing DVD titles can be finicky, which still requires manual intervention to fix by setting environment variables for the unit. However, the process is much more streamlined than using a point-and-click method, and helped me archive a pretty big stack of media with very little human interaction.

ty@tjllgmail.net