Tyblog

All the posts unfit for blogging
blog.tjll.net

« A Beginner's Guide to Extending Emacs

  • 4 February, 2025
  • 3,983 words
  • 17 minute read time

This post isn’t about the virtues of some editors versus others: that's already been written by somebody else (and it’s really good) – if you want to know why I use emacs, I suggest reading that instead.

This post will help you understand why "extensibility" and "introspectability" are such prominent emacs features even without an emacs lisp background. Bridging the gap from spacemacs or doom emacs to a bespoke configuration wasn't easy for me because I didn’t know how to learn emacs, so I'm going to stumble through one of my own use cases to demonstrate how this process goes if you're peeking in from outside the emacs ecosystem, horrified curious about how this all works.

Let's talk about reStructuredText.

reStructuredText

At my day job I write our user documentation using Sphinx. It expects my stilted prose in .rst format, which is kind of like Markdown if you squint.

I do an awful lot of cross-referencing between references (or refs) to link concepts across the documentation. You define a reference like this:

ReST
  • Font used for directives and roles.
  • Font used for all other defining constructs.
.. _code_example:

.. code::
   echo "HELP I'M TRAPPED IN A CODE EXAMPLE"

…and then link to it later like this:

ReST
  • Font used for field names and interpreted text.
  • Font used for directives and roles.
This :ref:`doesn't look like anything to me <code_example>`.

…or like this (if the ref is associated with a title of some sort):

ReST
  • Font used for field names and interpreted text.
  • Font used for directives and roles.
Don't say :ref:`code_example`.

My problem is that I have an assload of references across the all of the documentation and my brain cannot recall them on the spot. What I really need is the ability to call up the list of references to easily discover and select from that list – this is basically auto-completion but for documentation headers (or titles).

I am ready to write some shitty elisp with the help of aliens.

A Parentheses Prelude

Before we dig into emacs' guts, here are some principles that I learned after my first elisp experiments that might help somebody digging into this ecosystem for the first time:

1. Emacs Wants You to Extend It

I haven't written plugins for other editors extensively, but I can tell you this: emacs doesn't just make deep customization available, but it actively encourages you to make an absolute customization messes masterpieces. Core editor functions aren't just documented, but often include tidbits about "you probably want to see this other variable" or "here's how you should use this".

Not only that, but emacs happily hands you functions shaped like nuclear warheads like advice-add (that let you override any function) that can absolutely obliterate your editor if you hold it the wrong way. Of course, this also grants you unlimited power.

Remember that emacs is designed to be torn apart and rearranged.

2. Geriatric Software

The first public release of GNU emacs happened in 1985. Literally 40 years of development sits inside of emacs and its developers are still adding non-trivial features (native language server support landed in version 29 in 2023).

The ecosystem is vast and the language has evolved for a long time. There's nearly always something useful if you need a particular piece of functionality, so even moreso than with other ecosystems: remember to do your homework first.

3. Lisp for for the un-Lisped

The syntax is polarizing, I know. Gurus will wince when I get this wrong, but:

  • Writing lisp is like writing any other code, just with the parentheses wrapping everything instead of just arguments. print("Macrodata Refinement") becomes (print "Macrodata Refinement")
  • Sometimes you don't get functions, you get macros that behave special ways. For example, let sets variables for an inner block of code. Like this: (let (name "Mark S.") (print name))
  • Lispers say "this is actually data and not calling code" by doing this with single quotes: '("list" "of" "strings")

I'm out of my depth in lisp, but if you're a novice, those notes might help.

Extensible MACroS

With that prelude out of the way, let's begin.

Inside of emacs you can call up a list of potential completions by using the keyboard shortcut M-. (that’s "hit the meta key along with period", where "meta" is the Alt key for me). This applies in a wide variety of scenarios, like when completing class names or variables. If we want to ask emacs to hand us a list of potential references, then the system we want to hook into is this completions system.

(This is the only time I'll assume we know where to go without crawling through documentation. You could discover it yourself looking for "completion" or similar string in emacs docs).

To start our hero’s journey, we figure out what the hell M-. actually does. We can ask emacs this by calling the function describe-key, which is bound to C-h k. Hitting Ctrl-h, then k, then M-. drops us into a help buffer that looks like this:

M-. runs the command completion-at-point (found in
evil-insert-state-map), which is an interactive native-compiled Lisp
function in ‘minibuffer.el’.

It is bound to M-..

(completion-at-point)

Perform completion on the text around point.
The completion method is determined by ‘completion-at-point-functions’.

  Probably introduced at or before Emacs version 23.2.

We have the next breadcrumb to follow, which is the variable completion-at-point-functions. Running completion-at-point by hitting M-. consults that variable to hand us completion candidates, so we describe-variable it with C-h v and then choose completion-at-point-functions from the list of variables:

completion-at-point-functions is a variable defined in ‘minibuffer.el’.

Its value is (cape-dict cape-file tags-completion-at-point-function)

Special hook to find the completion table for the entity at point.
Each function on this hook is called in turn without any argument and
should return either nil, meaning it is not applicable at point,
or a function of no arguments to perform completion (discouraged),
or a list of the form (START END COLLECTION . PROPS)

…and it goes on from there. You can see some existing completion functions in there: I use a package called cape to offer helpful suggestions like file paths if I start typing in something like ./filename.

The description for this variable instructs us about how to add our own functions (scary!) You’ll note that emacs calls this a "hook", which is most often just a term used to describe a variable that is a list of functions that get called at a specific time (hooks show up everywhere).

I elided the full description for completion-at-point-functions – which is lengthy! – but if you parse it all out, you learn the following:

  • Your completion at point function should return either nil (the elisp "null") – which means your completion function doesn’t apply right now – or another function (which emacs discourages), or a list, which is what we’ll do because it sounds like the most-correct thing to do.
  • The list we return is (START END COLLECTION . PROPS):
    • START and END should be positions in the buffer between which emacs will replace the completed symbol with our candidate. That is, if your cursor is calling a method on a Python object like file.ope| (where the bar is your cursor), emacs will replace just ope when you select open from a list of completions and not the entire file.ope string.
    • COLLECTION is the juicy bit. The documentation calls it a completion "table", and there’s probably hidden meaning there, but you can just return a list of candidates and move on with your day, which is what I'll do.

Okay, so we need to write something to find the bounds of a string to replace and a function that returns that list.

Completions Abound

I fooled around with some regular expressions for a while until I did the right thing and examined how other completion backends do it. If you have the package installed, the aforementioned cape-file function gives us a hint: hit M-x, then choose find-function, select cape-file, and poke around. You’ll find the use of a function called bounds-of-thing-at-point. Describing it with C-h f bounds-of-thing-at-point gives us:

Determine the start and end buffer locations for the THING at point.
THING should be a symbol specifying a type of syntactic entity.
Possibilities include ‘symbol’, ‘list’, ‘sexp’, ‘defun’, ‘number’,
‘filename’, ‘url’, ‘email’, ‘uuid’, ‘word’, ‘sentence’, ‘whitespace’,
‘line’, and ‘page’.

And that is useful for our START and END needs. You can take it for a test drive at any time with M-: (bounds-of-thing-at-point 'word) to see where emacs thinks the word at your cursor starts and ends. This is a common theme when developing elisp: try out functions all the time within the editor since they’re near at hand.

The argument to bounds-of-thing-at-point is a symbol for a literal thing that is predefined by the function define-thing-chars. We pass define-thing-chars a name for our "thing" and a regex, and we can call bounds-of-thing-at-point with it from that point on. The function documentation in thingatpt.el that emacs refers you to explains more if you’re interested.

define-thing-chars expects a string with characters to put into a regex character class (like [...]) - just any valid character. This is a pretty standard character class and we can start with something super simple. I can’t be bothered to look up whatever the reStructedText spec is for references, but let’s start with "word characters, dashes, and underscores". That expressed as a "thing" looks like this:

ELisp
  • Font used to highlight strings.
  • Font used to highlight keywords.
(define-thing-chars rst-ref "[:alpha:]_-")

Now we have a thing called rst-ref we can use with bounds-of-thing-at-point. In typical emacs fashion, we can run elisp ad-hoc in our editor just to tinker, so let’s do that now.

Remember: we’re trying to write a function to give us the start and end of whatever piece of text we intend for a completion to replace. Let’s try it out: in any sort of buffer, put a piece of fake .rst text with a reference, like this:

ReST
  • Font used for field names and interpreted text.
  • Font used for directives and roles.
This is a :ref:`other-reference`.

Place your point somewhere within "other-reference" and try out your thing:

M-: (bounds-of-thing-at-point 'rst-ref)

You’ll see something like (number . number) in the echo area (the little minibuffer at the bottom of the emacs window frame). Congratulations! We’ve got the first part of the problem solved.

Gathering Completions

Recall the structure of what our "completion backend" needs to return to emacs:

ELisp
(START END COLLECTION . PROPS)

We can construct START and END with bounds-of-thing-at-point, now we just need COLLECTION, which is a list of potential candidates.

Conceptually the task isn’t hard: we should find all instances of strings of the form:

ReST
  • Font used for all other defining constructs.
.. _my-reference:

in our document and capture my-reference. Where do we start?

Once again you can rely on discovery mechanisms like searching for functions that sound related (by browsing describe-function) or look at existing code. Personally, I found this:

(re-search-forward REGEXP &optional BOUND NOERROR COUNT)

Search forward from point for regular expression REGEXP.

The documentation refers you to some other related functions, like this one:

(match-beginning SUBEXP)

Return position of start of text matched by last search.
SUBEXP, a number, specifies which parenthesized expression in the last
regexp.

So we can (re-search-forward) for something then invoke (match-beginning 1), for example, if we used a regex capture group to grab the reference’s label. Cool: we can start there.

As you get deeper into elisp you’ll find that regular expressions are everywhere, and this case is no different. We need a solid regex to search through a reStructuredText buffer (and honor any quirks in emacs’ regular expression engine), so we’ll use this opportunity to kick the tires on interactively developing regular expressions in emacs.

Regexes

Geriatric millennial software engineers like myself grew up on https://regexr.com/ when it was still a Flash application. Unless you’re a masochist that lives and breathes regular expressions, it’s kind of hard to develop a good regex without live feedback, which sites like https://regexr.com/ provide.

Little did I know that emacs comes with its own live regular expression builder and it's goooood.

Within any emacs buffer, run M-x re-builder to open the regex builder window split alongside the current buffer. If I then enter the string "re-\\(builder\\)" into that buffer, that string a) gets highlighted in my original buffer and b) the capture group gets highlighted in its own unique group color.

You can do this all day long to fine-tune a regular expression, but there’s yet another trick when writing regular expressions, which is to use the rx macro.

My previous example regular expression "re-\\(builder\\)" works, but the quirks when writing emacs regular expressions pile up quickly: escaping characters is one example but there are more, too.

Instead, the rx macro will let you define a regular expression in lisp-y form and evaluate it into a typical string-based regular expression you can use normally, so it works any place emacs expects a string-based regular expression. For example, if you evaluate this with M-::

ELisp
  • Font used to highlight strings.
  • Font used to highlight keywords.
(rx "re-" (group "builder"))

This is what emacs returns:

ELisp
  • Font for backslashes in Lisp regexp grouping constructs.
  • Font used to highlight strings.
"re-\\(builder\\)"

Identical! The rx documentation explains all the constructs available to you.

Jumping back to re-builder, with the re-builder window active, invoke M-x reb-change-syntax and choose rx. Now you can interactively build regular expressions with the rx macro! In the re-builder window, you’ve got to enter a weird syntax to get it to take rx constructs (I’m… not sure why this is), but you end up with the same outcome:

ELisp
  • Font used to highlight strings.
'(: "re-" (group "builder"))

Watch the regex get highlighted live just as it was in the string-based regex mode.

To bring this full circle, hop into a buffer with an example .rst document like this one:

ReST
  • Font used for all other defining constructs.
  • Font used for the adornment of a section header.
  • Default font for section title text at level 1.
A Heading
=========

.. _my-reference:

Link to me!

Using our newfound re-builder knowledge, let’s build a regex interactively to make short work of it:

  • Invoke M-x re-builder
  • Change the engine to something easier with M-x reb-change-syntax and choose rx
  • Start trying out solutions

I’ll refer here to the rx constructs documentation which lists out all the possibilities that you can plug into the rx macro. Here’s a recorded example of what developing it looks like from start to finish, ending up with a functional rx construct:

Live-highlighting regex development. Nice. If you add more groups, more colors show up. In this example the rx constructs I’m using are:

  • Any strings end up as literal matches
  • Special symbols bol and eol for "beginning of line" and "end of line", respectively
  • Symbols like + behave like their regex counterparts ("at least one")
  • Some symbols like not are nice little shortcuts (in this case, to negate the next form)

Because rx is a macro, we don’t ever actually need to compile its regular expressions to use elsewhere - we can always just use rx when we need a regex.

Gathering Completions: Continued

Okay, we've cut our teeth on emacs regular expressions. Let's use 'em. (Not our teeth. Regexes.)

To start, let's save our reStructuredText regular expression to find a ref so we can easily grab it later. I'll save the one I came up with to the name tmp/re (this name is arbitrary, I drop temporary variables into tmp/<name> out of habit)

ELisp
  • Font used to highlight built-in function names.
  • Font used to highlight strings.
  • Font used to highlight keywords.
(setq tmp/re (rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol))

Now we can reference it easily. I mentioned before that re-search-forward accepts a regex, so let's hop into a reStructuredText rev up the regex.

Here's my sample text that I'll work with:

ReST
  • Font used for directives and roles.
  • Font used for all other defining constructs.
  • Font used for the adornment of a section header.
  • Default font for section title text at level 1.
A Title
=======

Beware the Jabberwock, my son.

.. _my-reference:

You are like a little baby. Watch this.

.. _code-sample:

.. code:: python

   print("emacs needs telemetry")

The end?

The re-search-forward documentation indicates that it starts at the point's current position, so head to the start of the buffer, hit M-: to enter the elisp Eval prompt, and try:

ELisp
  • Font used to highlight built-in function names.
(re-search-forward tmp/re)

This is anticlimactic because you'll just see the point move to the end of one of the references. BUT. This means that the search succeeded. So… what now?

More reading in the re-search-forward documentation will educate you about emacs global match data. In non-functional-programming style, functions like match-beginning and match-end serve to interrogate a global state that functions like re-search-forward will modify. In concise terms, our regular expression defines one match group and we can grab it with (match-string-no-properties 1) to get the first group match (match-string will return a string with "properties", which is a bunch of data like font styling that we don't want).

Within our example buffer, executing this after the regex search should return our match:

ELisp
  • Font used to highlight function names.
(match-string-no-properties 1)

I see "my-reference" from this command. Now we're cooking like it's 1985, baby. You can enter the minibuffer again with M-:, press to find the re-search-forward command again, and repeat this process again to watch the point move to the next match, after which you can see the matched string with match-string-no-properties.

Note that running this a few times will eventually error out after no matches exist past your point. We'll address this.

If you're a human (or Claude) at this point, you can see the path ahead – we need to write some elisp that will:

  • Move the point to the beginning of the buffer (important, remember that re-search-forward relies upon the current position of your point)
  • Iteratively execute an re-search-forward command to aggregate reference targets
  • Conclude when there aren't any more matches

I'll start with the code and then explain which demons the parentheses are summoning afterward:

ELisp
  • Font used to highlight function names.
  • Font used to highlight strings.
  • Font used to highlight special form names.
  • Font used to highlight built-in function names.
  • Font used to highlight keywords.
  • Font used to highlight comments.
  • Font used to highlight comment delimiters.
;; This function will save the current position of the cursor and then
;; return it to this position once the code that it wraps has finished
;; executing, which lets us hop around the buffer without driving the
;; programmer insane. Important for any functions that move the point
;; around.
(save-excursion
  ;; progn is a simple function that just executes each lisp form
  ;; step-by-step.
  (progn
    ;; Step one: go to the beginning of the buffer.
    (goto-char (point-min))
    ;; Step two: loop
    ;;
    ;; cl-loop is a macro with a long and venerable heritage stemming
    ;; from the common lisp family of macros, which it mimics the
    ;; behavior of. You could spend hours honing your ability to wield
    ;; the common lisp `loop` macro, but we'll just explain the parts
    ;; we're using:
    ;;
    ;; `while` runs the loop until its argument evalutates to a falsy
    ;; value. We can overload our use of `re-search-forward` here: we
    ;; can use it to step our loop forward each time and also rely
    ;; upon it returning `nil` once it stops matching substrings in
    ;; the buffer and we should finish up.
    (cl-loop while (re-search-forward
                    (rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol)
                    ;; The aforementioned `while` termination case
                    ;; relies upon this `t` parameter, which says
                    ;; "don't error out with no matches, just return
                    ;; nil". Once no more matches are found, the loop
                    ;; exits.
                    nil t)
             ;; The `collect` keyword instructs `cl-loop` how to form
             ;; its return value. We can helpfully summarize the regex
             ;; match item by pulling out the global match data.
             collect (match-string-no-properties 1))))

The code is less intimidating without comments:

ELisp
  • Font used to highlight function names.
  • Font used to highlight strings.
  • Font used to highlight special form names.
  • Font used to highlight built-in function names.
  • Font used to highlight keywords.
(save-excursion
  (progn
    (goto-char (point-min))
    (cl-loop while (re-search-forward
                    (rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol)
                    nil t)
             collect (match-string-no-properties 1))))

Without belaboring the point, you can – like I did – discover most of these functions by skimming existing elisp code and using it as a launch pad. Many of these functions are bog standard and show up all over the place in emacs packages (save-excursion, progn, goto-char…)

Here's the result when I run this code against our example .rst file:

ELisp
  • Font used to highlight strings.
("my-reference" "code-sample")

Looks good!

Completing the Completion Backend

We're now armed with the ability to:

  • Identify the bounds of the string we want to replace, and
  • Collect a list of targets for completion candidates

We are so close. Recall the description of the variable we need to modify:

completion-at-point-functions is a variable defined in ‘minibuffer.el’.

Its value is (cape-dict cape-file tags-completion-at-point-function)

Special hook to find the completion table for the entity at point.
Each function on this hook is called in turn without any argument and
should return either nil, meaning it is not applicable at point,
or a function of no arguments to perform completion (discouraged),
or a list of the form (START END COLLECTION . PROPS)

To return the list that completion-at-point-functions expects, we already have the ability to identify the bounds of a thing and sweep up a list of candidates in our buffer. Note the comment about returning nil: we probably don't always want to run our backend, so we should short-circuit our function to eagerly return nil to avoid tying up emacs with a regex loop we don't need.

With all that said, consider the following:

ELisp
  • Font used to highlight special form names.
  • Font to highlight quoted Lisp symbols.
  • Font used to highlight built-in function names.
  • Font used to highlight function names.
  • Font used to highlight documentation embedded in program code. It is typically used for special documentation comments or strings.
  • Font used to highlight function names.
  • Font used to highlight strings.
  • Font used to highlight keywords.
  • Font used to highlight comments.
  • Font used to highlight comment delimiters.
;; Our reStructuredText reference "thing"
(define-thing-chars rst-ref "[:alpha:]_-")

(defun my/rst-internal-reference-capf ()
  "Completion backend for buffer reStructuredText references"
  ;; Only applies when we're within a reference - outside of a
  ;; reference, we bail out with nil.
  (when (looking-back (rx ":ref:`" (* (not "`"))) (point-at-bol))
    ;; Get potential bounds for the string to replace
    (let* ((bounds (or (bounds-of-thing-at-point 'rst-ref)
                       ;; Fallback to the current position
                       (cons (point) (point))))
           (start (car bounds))
           (end (cdr bounds))
           ;; Collect all reference candidates
           (candidates
            ;; Our previously-noted reference collector
            (save-excursion
              (progn
                (goto-char (point-min))
                (cl-loop while (re-search-forward
                                (rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol)
                                nil t)
                         collect (match-string-no-properties 1))))))
      ;; Return value suitable for `completion-at-point-functions`
      (list start end candidates))))
  • We're following some naming conventions by calling this a "capf" (a "completion-at-point function) and prefixing with my/ (a habit to namespace your own functions)
  • Our short-circuit takes the form of using looking-back to ask, "are we inside of a reStructuredText reference"? Note the use of rx here again to clean up our lisp.
  • We use our rst-ref thing to easily snag the start and end of the string to replace – note our fallback to just the immediate point if we can't find the bounds of our thing.

We wrap it all up with list. Personally, even as somebody relatively new to writing Lisps, I find the code pleasant to read and self-evident. We did a lot in 17 lines of code!

Inside of our test .rst buffer, we can test drive this function. First, invoke M-x eval-defun with your cursor somewhere in the function to evaluate it, which makes my/rst-internal-reference-capf available. Then run:

ELisp
  • Font to highlight quoted Lisp symbols.
  • Font used to highlight variable names.
  • Font used to highlight function names.
(add-hook 'completion-at-point-functions 'my/rst-internal-reference-capf)

Huzzah! Our function is now live in emacs' completion framework. You can trigger the completion by calling completion-at-point at a relevant spot in a buffer. Many batteries-included emacs distributions like spacemacs or doom emacs slap nice-looking porcelain on top of the completion framework; here's an example that uses the corfu package:

Congratulations, you've extended emacs for the first time!

Dressing Up the Bones

Okay, this is a pretty basic setup. You could improve it in many ways, but here are a few ideas about potential directions:

Mode Hooks

Manually adding your custom completion function to the completion-at-point-functions hook is tedious, but there's a way to automate it. Recall that in emacs parlance, a "hook" is usually a variable that holds a list of functions that get called at a specific time.

If you use rst-mode, then opening an .rst file will drop you into rst-mode and implicitly call the rst-mode-hook functions. That means that this line is sufficient to integrate our completion function:

ELisp
  • Font to highlight quoted Lisp symbols.
  • Font to highlight Lisp quotes.
  • Font used to highlight keywords.
  • Font used to highlight variable names.
  • Font used to highlight function names.
(add-hook 'rst-mode-hook (lambda () 
    (add-hook 'completion-at-point-functions  #'my/rst-internal-reference-capf 0 t)))

This says: "when I open an .rst file, run this lambda that modifies completion-at-point-functions only for this buffer by adding my internal reference completion function". It's a little nested which makes it less obvious with the two add-hook calls.

Other Files

Okay, our example works for references in the same buffer but this is sort of pointless for uses across files.

You can solve this too, although my post is already too long so we won't solve this step-by-step. However, here's how I solved it:

  • Turn my capf into a minor mode that manages the completion variables
  • Doesn't search the buffer every time but instead does so once and then rebuilds it with a hook in after-change-functions, saving it to a hash cache
  • Walk all .rst files in the current project and run the reference collection function for each, storing the results into a hash cache for all files that don't have live buffers
  • When it comes time to call the completion function, combine the hash for completions for files without buffers along with each .rst buffer's cached list of references

It sounds complicated, but it works! Functions like with-temp-buffer make this pretty easy by aggregating reference targets for files using the exact same function we do for live buffers.

ELisp
  • Font used to highlight built-in function names.
  • Font used to highlight keywords.
(with-temp-buffer
  (insert-file-contents file)
  (my/rst-internal-references))
Fancy Completion

Emacs' long history includes company-mode, which is a third-party completion framework that integrates with the completion-at-point set of functions. Some company-mode features include additional metadata about completion candidates, and I found two that were useful: company-kind and company-doc-buffer.

  • company-kind is a simple key that just tells the completion caller what the completion cadidate is. In our case we can add some eye candy by indicating it's 'text.
  • company-doc-buffer lets us add additional context to a completion candidate. I leveraged this to include a couple of lines following the reference line to help me figure out what exactly the link refers to. It's easier to show what this looks like rather than tell:

Notes:

  • I'm using GUI emacs here for the nicer completion popup with corfu which displays a transparent, floating frame
  • My completion candidate "context" is a real excerpt from the text around the reference, complete with styling, etc.
  • The small icon to the left of each candidate comes from the company-kind attribute.
  • The ~ syntax is part of orderless

Completion candidate context is an extra frill but very helpful.

Summary

My experience extending a core emacs function was an instructive and interesting exercise. I don't know what the future of emacs looks like in an increasingly LLM-crazed world, but I hope that future includes an open and powerful way to extend and customize the tools we use to write software.