This post isn’t about the virtues of some editors versus others: that's already been written by somebody else (and it’s really good) – if you want to know why I use emacs, I suggest reading that instead.
This post will help you understand why "extensibility" and "introspectability" are such prominent emacs features even without an emacs lisp background.
Bridging the gap from spacemacs or doom emacs to a bespoke configuration wasn't easy for me because I didn’t know how to learn emacs, so I'm going to stumble through one of my own use cases to demonstrate how this process goes if you're peeking in from outside the emacs ecosystem, horrified curious about how this all works.
Let's talk about reStructuredText.
Table of Contents
reStructuredText
At my day job I write our user documentation using Sphinx.
It expects my stilted prose in .rst
format, which is kind of like Markdown if you squint.
I do an awful lot of cross-referencing between references (or refs
) to link concepts across the documentation.
You define a reference like this:
- Font used for directives and roles.
- Font used for all other defining constructs.
.. _code_example:
.. code::
echo "HELP I'M TRAPPED IN A CODE EXAMPLE"
…and then link to it later like this:
ReST- Font used for field names and interpreted text.
- Font used for directives and roles.
This :ref:`doesn't look like anything to me <code_example>`.
…or like this (if the ref
is associated with a title of some sort):
- Font used for field names and interpreted text.
- Font used for directives and roles.
Don't say :ref:`code_example`.
My problem is that I have an assload of references across the all of the documentation and my brain cannot recall them on the spot. What I really need is the ability to call up the list of references to easily discover and select from that list – this is basically auto-completion but for documentation headers (or titles).
I am ready to write some shitty elisp with the help of aliens.
A Parentheses Prelude
Before we dig into emacs' guts, here are some principles that I learned after my first elisp experiments that might help somebody digging into this ecosystem for the first time:
1. Emacs Wants You to Extend It
I haven't written plugins for other editors extensively, but I can tell you this: emacs doesn't just make deep customization available, but it actively encourages you to make an absolute customization messes masterpieces.
Core editor functions aren't just documented, but often include tidbits about "you probably want to see this other variable" or "here's how you should use this".
Not only that, but emacs happily hands you functions shaped like nuclear warheads like advice-add
(that let you override any function) that can absolutely obliterate your editor if you hold it the wrong way.
Of course, this also grants you unlimited power.
Remember that emacs is designed to be torn apart and rearranged.
2. Geriatric Software
The first public release of GNU emacs happened in 1985. Literally 40 years of development sits inside of emacs and its developers are still adding non-trivial features (native language server support landed in version 29 in 2023).
The ecosystem is vast and the language has evolved for a long time. There's nearly always something useful if you need a particular piece of functionality, so even moreso than with other ecosystems: remember to do your homework first.
3. Lisp for for the un-Lisped
The syntax is polarizing, I know. Gurus will wince when I get this wrong, but:
- Writing lisp is like writing any other code, just with the parentheses wrapping everything instead of just arguments.
print("Macrodata Refinement")
becomes(print "Macrodata Refinement")
- Sometimes you don't get functions, you get macros that behave special ways.
For example,
let
sets variables for an inner block of code. Like this:(let (name "Mark S.") (print name))
- Lispers say "this is actually data and not calling code" by doing this with single quotes:
'("list" "of" "strings")
I'm out of my depth in lisp, but if you're a novice, those notes might help.
Extensible MACroS
With that prelude out of the way, let's begin.
Inside of emacs you can call up a list of potential completions by using the keyboard shortcut M-. (that’s "hit the meta key along with period", where "meta" is the Alt key for me). This applies in a wide variety of scenarios, like when completing class names or variables. If we want to ask emacs to hand us a list of potential references, then the system we want to hook into is this completions system.
(This is the only time I'll assume we know where to go without crawling through documentation. You could discover it yourself looking for "completion
" or similar string in emacs docs).
To start our hero’s journey, we figure out what the hell M-. actually does.
We can ask emacs this by calling the function describe-key
, which is bound to C-h k.
Hitting Ctrl-h, then k, then M-. drops us into a help buffer that looks like this:
M-. runs the command completion-at-point (found in
evil-insert-state-map), which is an interactive native-compiled Lisp
function in ‘minibuffer.el’.
It is bound to M-..
(completion-at-point)
Perform completion on the text around point.
The completion method is determined by ‘completion-at-point-functions’.
Probably introduced at or before Emacs version 23.2.
We have the next breadcrumb to follow, which is the variable completion-at-point-functions
.
Running completion-at-point
by hitting M-. consults that variable to hand us completion candidates, so we describe-variable
it with C-h v and then choose completion-at-point-functions
from the list of variables:
completion-at-point-functions is a variable defined in ‘minibuffer.el’.
Its value is (cape-dict cape-file tags-completion-at-point-function)
Special hook to find the completion table for the entity at point.
Each function on this hook is called in turn without any argument and
should return either nil, meaning it is not applicable at point,
or a function of no arguments to perform completion (discouraged),
or a list of the form (START END COLLECTION . PROPS)
…and it goes on from there.
You can see some existing completion functions in there: I use a package called cape to offer helpful suggestions like file paths if I start typing in something like ./filename
.
The description for this variable instructs us about how to add our own functions (scary!) You’ll note that emacs calls this a "hook", which is most often just a term used to describe a variable that is a list of functions that get called at a specific time (hooks show up everywhere).
I elided the full description for completion-at-point-functions
– which is lengthy! – but if you parse it all out, you learn the following:
- Your completion at point function should return either
nil
(the elisp "null") – which means your completion function doesn’t apply right now – or another function (which emacs discourages), or a list, which is what we’ll do because it sounds like the most-correct thing to do. - The list we return is
(START END COLLECTION . PROPS)
:START
andEND
should be positions in the buffer between which emacs will replace the completed symbol with our candidate. That is, if your cursor is calling a method on a Python object likefile.ope|
(where the bar is your cursor), emacs will replace justope
when you selectopen
from a list of completions and not the entirefile.ope
string.COLLECTION
is the juicy bit. The documentation calls it a completion "table", and there’s probably hidden meaning there, but you can just return a list of candidates and move on with your day, which is what I'll do.
Okay, so we need to write something to find the bounds of a string to replace and a function that returns that list.
Completions Abound
I fooled around with some regular expressions for a while until I did the right thing and examined how other completion backends do it.
If you have the package installed, the aforementioned cape-file
function gives us a hint: hit M-x, then choose find-function
, select cape-file
, and poke around. You’ll find the use of a function called bounds-of-thing-at-point
.
Describing it with C-h f bounds-of-thing-at-point
gives us:
Determine the start and end buffer locations for the THING at point.
THING should be a symbol specifying a type of syntactic entity.
Possibilities include ‘symbol’, ‘list’, ‘sexp’, ‘defun’, ‘number’,
‘filename’, ‘url’, ‘email’, ‘uuid’, ‘word’, ‘sentence’, ‘whitespace’,
‘line’, and ‘page’.
And that is useful for our START
and END
needs.
You can take it for a test drive at any time with M-: (bounds-of-thing-at-point 'word)
to see where emacs thinks the word at your cursor starts and ends.
This is a common theme when developing elisp: try out functions all the time within the editor since they’re near at hand.
The argument to bounds-of-thing-at-point
is a symbol for a literal thing that is predefined by the function define-thing-chars
.
We pass define-thing-chars
a name for our "thing" and a regex, and we can call bounds-of-thing-at-point
with it from that point on.
The function documentation in thingatpt.el
that emacs refers you to explains more if you’re interested.
define-thing-chars
expects a string with characters to put into a regex character class (like [...]
) - just any valid character.
This is a pretty standard character class and we can start with something super simple.
I can’t be bothered to look up whatever the reStructedText spec is for references, but let’s start with "word characters, dashes, and underscores".
That expressed as a "thing" looks like this:
- Font used to highlight strings.
- Font used to highlight keywords.
(define-thing-chars rst-ref "[:alpha:]_-")
Now we have a thing called rst-ref
we can use with bounds-of-thing-at-point
.
In typical emacs fashion, we can run elisp ad-hoc in our editor just to tinker, so let’s do that now.
Remember: we’re trying to write a function to give us the start
and end
of whatever piece of text we intend for a completion to replace.
Let’s try it out: in any sort of buffer, put a piece of fake .rst
text with a reference, like this:
- Font used for field names and interpreted text.
- Font used for directives and roles.
This is a :ref:`other-reference`.
Place your point somewhere within "other-reference
" and try out your thing
:
M-: (bounds-of-thing-at-point 'rst-ref)
You’ll see something like (number . number)
in the echo area (the little minibuffer at the bottom of the emacs window frame).
Congratulations!
We’ve got the first part of the problem solved.
Gathering Completions
Recall the structure of what our "completion backend" needs to return to emacs:
ELisp(START END COLLECTION . PROPS)
We can construct START
and END
with bounds-of-thing-at-point
, now we just need COLLECTION
, which is a list of potential candidates.
Conceptually the task isn’t hard: we should find all instances of strings of the form:
ReST- Font used for all other defining constructs.
.. _my-reference:
in our document and capture my-reference
.
Where do we start?
Once again you can rely on discovery mechanisms like searching for functions that sound related (by browsing describe-function
) or look at existing code.
Personally, I found this:
(re-search-forward REGEXP &optional BOUND NOERROR COUNT)
Search forward from point for regular expression REGEXP.
The documentation refers you to some other related functions, like this one:
(match-beginning SUBEXP)
Return position of start of text matched by last search.
SUBEXP, a number, specifies which parenthesized expression in the last
regexp.
So we can (re-search-forward)
for something then invoke (match-beginning 1)
, for example, if we used a regex capture group to grab the reference’s label.
Cool: we can start there.
As you get deeper into elisp you’ll find that regular expressions are everywhere, and this case is no different. We need a solid regex to search through a reStructuredText buffer (and honor any quirks in emacs’ regular expression engine), so we’ll use this opportunity to kick the tires on interactively developing regular expressions in emacs.
Regexes
Geriatric millennial software engineers like myself grew up on https://regexr.com/ when it was still a Flash application. Unless you’re a masochist that lives and breathes regular expressions, it’s kind of hard to develop a good regex without live feedback, which sites like https://regexr.com/ provide.
Little did I know that emacs comes with its own live regular expression builder and it's goooood.
Within any emacs buffer, run M-x re-builder
to open the regex builder window split alongside the current buffer.
If I then enter the string "re-\\(builder\\)"
into that buffer, that string a) gets highlighted in my original buffer and b) the capture group gets highlighted in its own unique group color.
You can do this all day long to fine-tune a regular expression, but there’s yet another trick when writing regular expressions, which is to use the rx
macro.
My previous example regular expression "re-\\(builder\\)"
works, but the quirks when writing emacs regular expressions pile up quickly: escaping characters is one example but there are more, too.
Instead, the rx
macro will let you define a regular expression in lisp-y form and evaluate it into a typical string-based regular expression you can use normally, so it works any place emacs expects a string-based regular expression.
For example, if you evaluate this with M-::
- Font used to highlight strings.
- Font used to highlight keywords.
(rx "re-" (group "builder"))
This is what emacs returns:
ELisp- Font for backslashes in Lisp regexp grouping constructs.
- Font used to highlight strings.
"re-\\(builder\\)"
Identical!
The rx
documentation explains all the constructs available to you.
Jumping back to re-builder
, with the re-builder
window active, invoke M-x reb-change-syntax
and choose rx
.
Now you can interactively build regular expressions with the rx
macro!
In the re-builder
window, you’ve got to enter a weird syntax to get it to take rx
constructs (I’m… not sure why this is), but you end up with the same outcome:
- Font used to highlight strings.
'(: "re-" (group "builder"))
Watch the regex get highlighted live just as it was in the string-based regex mode.
To bring this full circle, hop into a buffer with an example .rst
document like this one:
- Font used for all other defining constructs.
- Font used for the adornment of a section header.
- Default font for section title text at level 1.
A Heading
=========
.. _my-reference:
Link to me!
Using our newfound re-builder
knowledge, let’s build a regex interactively to make short work of it:
- Invoke M-x
re-builder
- Change the engine to something easier with M-x
reb-change-syntax
and chooserx
- Start trying out solutions
I’ll refer here to the rx constructs documentation which lists out all the possibilities that you can plug into the rx
macro.
Here’s a recorded example of what developing it looks like from start to finish, ending up with a functional rx
construct:
Live-highlighting regex development. Nice. If you add more groups, more colors show up. In this example the rx constructs I’m using are:
- Any strings end up as literal matches
- Special symbols
bol
andeol
for "beginning of line" and "end of line", respectively - Symbols like
+
behave like their regex counterparts ("at least one") - Some symbols like
not
are nice little shortcuts (in this case, to negate the next form)
Because rx
is a macro, we don’t ever actually need to compile its regular expressions to use elsewhere - we can always just use rx
when we need a regex.
Gathering Completions: Continued
Okay, we've cut our teeth on emacs regular expressions. Let's use 'em. (Not our teeth. Regexes.)
To start, let's save our reStructuredText regular expression to find a ref
so we can easily grab it later.
I'll save the one I came up with to the name tmp/re
(this name is arbitrary, I drop temporary variables into tmp/<name>
out of habit)
- Font used to highlight built-in function names.
- Font used to highlight strings.
- Font used to highlight keywords.
(setq tmp/re (rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol))
Now we can reference it easily.
I mentioned before that re-search-forward
accepts a regex, so let's hop into a reStructuredText rev up the regex.
Here's my sample text that I'll work with:
ReST- Font used for directives and roles.
- Font used for all other defining constructs.
- Font used for the adornment of a section header.
- Default font for section title text at level 1.
A Title
=======
Beware the Jabberwock, my son.
.. _my-reference:
You are like a little baby. Watch this.
.. _code-sample:
.. code:: python
print("emacs needs telemetry")
The end?
The re-search-forward
documentation indicates that it starts at the point
's current position, so head to the start of the buffer, hit M-: to enter the elisp Eval
prompt, and try:
- Font used to highlight built-in function names.
(re-search-forward tmp/re)
This is anticlimactic because you'll just see the point move to the end of one of the references. BUT. This means that the search succeeded. So… what now?
More reading in the re-search-forward
documentation will educate you about emacs global match data.
In non-functional-programming style, functions like match-beginning
and match-end
serve to interrogate a global state that functions like re-search-forward
will modify.
In concise terms, our regular expression defines one match group and we can grab it with (match-string-no-properties 1)
to get the first group match (match-string
will return a string with "properties", which is a bunch of data like font styling that we don't want).
Within our example buffer, executing this after the regex search should return our match:
ELisp- Font used to highlight function names.
(match-string-no-properties 1)
I see "my-reference"
from this command.
Now we're cooking like it's 1985, baby.
You can enter the minibuffer again with M-:, press ↑ to find the re-search-forward
command again, and repeat this process again to watch the point move to the next match, after which you can see the matched string with match-string-no-properties
.
Note that running this a few times will eventually error out after no matches exist past your point. We'll address this.
If you're a human (or Claude) at this point, you can see the path ahead – we need to write some elisp that will:
- Move the point to the beginning of the buffer (important, remember that
re-search-forward
relies upon the current position of your point) - Iteratively execute an
re-search-forward
command to aggregate reference targets - Conclude when there aren't any more matches
I'll start with the code and then explain which demons the parentheses are summoning afterward:
ELisp- Font used to highlight function names.
- Font used to highlight strings.
- Font used to highlight special form names.
- Font used to highlight built-in function names.
- Font used to highlight keywords.
- Font used to highlight comments.
- Font used to highlight comment delimiters.
;; This function will save the current position of the cursor and then
;; return it to this position once the code that it wraps has finished
;; executing, which lets us hop around the buffer without driving the
;; programmer insane. Important for any functions that move the point
;; around.
(save-excursion
;; progn is a simple function that just executes each lisp form
;; step-by-step.
(progn
;; Step one: go to the beginning of the buffer.
(goto-char (point-min))
;; Step two: loop
;;
;; cl-loop is a macro with a long and venerable heritage stemming
;; from the common lisp family of macros, which it mimics the
;; behavior of. You could spend hours honing your ability to wield
;; the common lisp `loop` macro, but we'll just explain the parts
;; we're using:
;;
;; `while` runs the loop until its argument evalutates to a falsy
;; value. We can overload our use of `re-search-forward` here: we
;; can use it to step our loop forward each time and also rely
;; upon it returning `nil` once it stops matching substrings in
;; the buffer and we should finish up.
(cl-loop while (re-search-forward
(rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol)
;; The aforementioned `while` termination case
;; relies upon this `t` parameter, which says
;; "don't error out with no matches, just return
;; nil". Once no more matches are found, the loop
;; exits.
nil t)
;; The `collect` keyword instructs `cl-loop` how to form
;; its return value. We can helpfully summarize the regex
;; match item by pulling out the global match data.
collect (match-string-no-properties 1))))
The code is less intimidating without comments:
ELisp- Font used to highlight function names.
- Font used to highlight strings.
- Font used to highlight special form names.
- Font used to highlight built-in function names.
- Font used to highlight keywords.
(save-excursion
(progn
(goto-char (point-min))
(cl-loop while (re-search-forward
(rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol)
nil t)
collect (match-string-no-properties 1))))
Without belaboring the point, you can – like I did – discover most of these functions by skimming existing elisp code and using it as a launch pad.
Many of these functions are bog standard and show up all over the place in emacs packages (save-excursion
, progn
, goto-char
…)
Here's the result when I run this code against our example .rst
file:
- Font used to highlight strings.
("my-reference" "code-sample")
Looks good!
Completing the Completion Backend
We're now armed with the ability to:
- Identify the bounds of the string we want to replace, and
- Collect a list of targets for completion candidates
We are so close. Recall the description of the variable we need to modify:
completion-at-point-functions is a variable defined in ‘minibuffer.el’.
Its value is (cape-dict cape-file tags-completion-at-point-function)
Special hook to find the completion table for the entity at point.
Each function on this hook is called in turn without any argument and
should return either nil, meaning it is not applicable at point,
or a function of no arguments to perform completion (discouraged),
or a list of the form (START END COLLECTION . PROPS)
To return the list that completion-at-point-functions
expects, we already have the ability to identify the bounds of a thing
and sweep up a list of candidates in our buffer.
Note the comment about returning nil
: we probably don't always want to run our backend, so we should short-circuit our function to eagerly return nil to avoid tying up emacs with a regex loop we don't need.
With all that said, consider the following:
ELisp- Font used to highlight special form names.
- Font to highlight quoted Lisp symbols.
- Font used to highlight built-in function names.
- Font used to highlight function names.
- Font used to highlight documentation embedded in program code. It is typically used for special documentation comments or strings.
- Font used to highlight function names.
- Font used to highlight strings.
- Font used to highlight keywords.
- Font used to highlight comments.
- Font used to highlight comment delimiters.
;; Our reStructuredText reference "thing"
(define-thing-chars rst-ref "[:alpha:]_-")
(defun my/rst-internal-reference-capf ()
"Completion backend for buffer reStructuredText references"
;; Only applies when we're within a reference - outside of a
;; reference, we bail out with nil.
(when (looking-back (rx ":ref:`" (* (not "`"))) (point-at-bol))
;; Get potential bounds for the string to replace
(let* ((bounds (or (bounds-of-thing-at-point 'rst-ref)
;; Fallback to the current position
(cons (point) (point))))
(start (car bounds))
(end (cdr bounds))
;; Collect all reference candidates
(candidates
;; Our previously-noted reference collector
(save-excursion
(progn
(goto-char (point-min))
(cl-loop while (re-search-forward
(rx bol ".." (+ blank) "_" (group (+ (not ":"))) ":" eol)
nil t)
collect (match-string-no-properties 1))))))
;; Return value suitable for `completion-at-point-functions`
(list start end candidates))))
- We're following some naming conventions by calling this a "
capf
" (a "completion-at-point function) and prefixing withmy/
(a habit to namespace your own functions) - Our short-circuit takes the form of using
looking-back
to ask, "are we inside of a reStructuredText reference"? Note the use ofrx
here again to clean up our lisp. - We use our
rst-ref
thing
to easily snag thestart
andend
of the string to replace – note our fallback to just the immediate point if we can't find the bounds of ourthing
.
We wrap it all up with list
.
Personally, even as somebody relatively new to writing Lisps, I find the code pleasant to read and self-evident.
We did a lot in 17 lines of code!
Inside of our test .rst
buffer, we can test drive this function.
First, invoke M-x eval-defun
with your cursor somewhere in the function to evaluate it, which makes my/rst-internal-reference-capf
available.
Then run:
- Font to highlight quoted Lisp symbols.
- Font used to highlight variable names.
- Font used to highlight function names.
(add-hook 'completion-at-point-functions 'my/rst-internal-reference-capf)
Huzzah!
Our function is now live in emacs' completion framework.
You can trigger the completion by calling completion-at-point
at a relevant spot in a buffer.
Many batteries-included emacs distributions like spacemacs or doom emacs slap nice-looking porcelain on top of the completion framework; here's an example that uses the corfu package:
Congratulations, you've extended emacs for the first time!
Dressing Up the Bones
Okay, this is a pretty basic setup. You could improve it in many ways, but here are a few ideas about potential directions:
Mode Hooks
Manually adding your custom completion function to the completion-at-point-functions
hook is tedious, but there's a way to automate it.
Recall that in emacs parlance, a "hook" is usually a variable that holds a list of functions that get called at a specific time.
If you use rst-mode, then opening an .rst
file will drop you into rst-mode
and implicitly call the rst-mode-hook
functions.
That means that this line is sufficient to integrate our completion function:
- Font to highlight quoted Lisp symbols.
- Font to highlight Lisp quotes.
- Font used to highlight keywords.
- Font used to highlight variable names.
- Font used to highlight function names.
(add-hook 'rst-mode-hook (lambda ()
(add-hook 'completion-at-point-functions #'my/rst-internal-reference-capf 0 t)))
This says: "when I open an .rst
file, run this lambda that modifies completion-at-point-functions
only for this buffer by adding my internal reference completion function".
It's a little nested which makes it less obvious with the two add-hook
calls.
Other Files
Okay, our example works for references in the same buffer but this is sort of pointless for uses across files.
You can solve this too, although my post is already too long so we won't solve this step-by-step. However, here's how I solved it:
- Turn my
capf
into a minor mode that manages the completion variables - Doesn't search the buffer every time but instead does so once and then rebuilds it with a hook in
after-change-functions
, saving it to a hash cache - Walk all
.rst
files in the current project and run the reference collection function for each, storing the results into a hash cache for all files that don't have live buffers - When it comes time to call the completion function, combine the hash for completions for files without buffers along with each
.rst
buffer's cached list of references
It sounds complicated, but it works!
Functions like with-temp-buffer
make this pretty easy by aggregating reference targets for files using the exact same function we do for live buffers.
- Font used to highlight built-in function names.
- Font used to highlight keywords.
(with-temp-buffer
(insert-file-contents file)
(my/rst-internal-references))
Fancy Completion
Emacs' long history includes company-mode, which is a third-party completion framework that integrates with the completion-at-point
set of functions.
Some company-mode
features include additional metadata about completion candidates, and I found two that were useful: company-kind
and company-doc-buffer
.
company-kind
is a simple key that just tells the completion caller what the completion cadidate is. In our case we can add some eye candy by indicating it's'text
.company-doc-buffer
lets us add additional context to a completion candidate. I leveraged this to include a couple of lines following the reference line to help me figure out what exactly the link refers to. It's easier to show what this looks like rather than tell:
Notes:
- I'm using GUI emacs here for the nicer completion popup with corfu which displays a transparent, floating frame
- My completion candidate "context" is a real excerpt from the text around the reference, complete with styling, etc.
- The small icon to the left of each candidate comes from the
company-kind
attribute. - The
~
syntax is part of orderless
Completion candidate context is an extra frill but very helpful.
Summary
My experience extending a core emacs function was an instructive and interesting exercise. I don't know what the future of emacs looks like in an increasingly LLM-crazed world, but I hope that future includes an open and powerful way to extend and customize the tools we use to write software.