Tree-sitter in Emacs 29 and Beyond
Emacs’ release branch is now on complete feature freeze, meaning absolutely only bug fixes can happen on it. Now is a good time to talk about the state of tree-sitter in Emacs: what do you get in Emacs 29, what you don’t, and what would happen going forward.
What’s in Emacs 29
From a pure user’s perspective, Emacs 29 just adds some new built-in major modes which look more-or-less identical to the old ones. There aren’t any flashy cool features either. That sounds disappointing, but there are a lot of new stuff under the hood, a solid base upon which exciting things can emerge.
If Emacs 29 is built with the tree-sitter library, you have access to
most of the functions in its C API, including creating parsers, parsing
text, retrieving nodes from the parse tree, finding the
parent/child/sibling node, pattern matching nodes with a DSL, etc. You
also get a bunch of convenient functions built upon the primitive
functions, like searching for a particular node in the parse tree, cherry
picking nodes and building a sparse tree out of the parse tree, getting
the node at point, etc. You can type
M-x shortdoc RET treesit
RET to view a list of tree-sitter functions. And because it’s
Emacs, there is comprehensive manual coverage for everything you need to
know. It’s in “Section 37, Parsing Program Source” of Emacs Lisp
Emacs 29 has built-in tree-sitter major modes for C, C++, C#, Java,
Dockerfile, CMake file. We tried to extend existing modes with
tree-sitter at first but it didn’t work out too well, so now tree-sitter
lives in separate major modes. The tree-sitter modes are usually called
python-ts-mode. The simplest way to enable them is to use
major-mode-remap-alist. For example,
(add-to-list 'major-mode-remap-alist '(c-mode . c-ts-mode))
The built-in tree-sitter major modes have support for font-lock (syntax highlight), indentation, Imenu, which-func, and defun navigation.
For major mode developers, Emacs 29 includes integration for these features for tree-sitter, so major modes only need to supply language-specific information, and Emacs takes care of plugging tree-sitter into font-lock, indent, Imenu, etc.
In tree-sitter major modes, fontification is categorized into “features”, like “builtin”, “function”, “variable”, “operator”, etc. You can choose what “features” to enable for a mode. If you are feeling adventurous, it is also possible to add your own fontification rules.
To add/remove features for a major mode, use
treesit-font-lock-recompute-features in its mode hook. For
(defun c-ts-mode-setup () (treesit-font-lock-recompute-features '(function variable) '(definition))) (add-hook 'c-ts-mode-hook #'c-ts-mode-setup)
Features are grouped into decoration levels, right now there are 4
levels and the default level is 3. If you want to program in skittles,
treesit-font-lock-level to 4 ;-)
Tree-sitter major modes need corresponding langauge grammar to work.
These grammars come in the form of dynamic libraries. Ideally the package
manager will build them when building Emacs, like with any other dynamic
libraries. But they can’t cover every language grammar out there, so you
probably need to build them yourself from time to time. Emacs has a
command for it:
treesit-install-language-grammar. It asks
you for the Git repository and other stuff and builds the dynamic
library. Third-party major modes can instruct their users to add the
recipe for building a language grammar like this:
(add-to-list 'treesit-language-source-alist '(python "https://github.com/tree-sitter/tree-sitter-python.git"))
M-x treesit-install-language-grammar RET
python builds the language grammar without user-input.
Things like indentation, Imenu, navigation, etc, should just work.
There is no code-folding, selection expansion, and structural navigation (except for defun) in Emacs 29. Folding and expansion should be trivial to implement in existing third-party packages. Structural navigation needs careful design and nontrivial changes to existing commands (ie, more work). So not in 29, unfortunately.
The tree-sitter integration is far from complete. As mentioned
earlier, structural navigation is still in the works. Right now Emacs
allows you to define a “thing” by a regexp that matches node types, plus
optionally a filter function that filters out nodes that matches the
regexp but isn’t really the “thing”. Given the definition of a “thing”,
Emacs has functions for finding the “things” around point
treesit--things-around), finding the “thing” at point
treesit--thing-at-point), and navigating around “things”
treesit--navigate-thing). Besides moving around, these
functions should be also useful for other things like folding blocks.
Beware that, as the double dash suggests, these functions are
experimental and could change.
I also have an idea for “abstract list elements”. Basically an abstract list element is anything repeatable in a grammar: defun, statement, arguments in argument list, etc. These things appear at every level of the grammar and seems like a very good unit for navigation.
There is also potential for language-agnostic “context extraction” (for the lack of a better term) with tree-sitter. Right now we can get the name and span of the defun at point, but it doesn’t have to stop there, we can also get the parameter list, the type of the return value, the class/trait of the function, etc. Because it’s language agnostic, any tool using this feature will work on many languages all at once.
In fact, you can already extract useful things, to some degree, with
the fontification queries written by major modes: using the query
intended for the
variable query, I can get all the variable
nodes in a given range.
There are some unanswered questions though: (1) What would be the best
function interface and data structure for such a feature? Should it use a
(:name ... :params ...), or a cl-struct? (2) If a
language is different enough from the “common pattern”, how useful does
this feature remains? For example, there isn’t a clear parameter list in
Haskell, and there could be several defun bodies that defines the same
function. (3) Is this feature genuinely useful, or is it just something
that looks cool? Only time and experiments can tell, I’m looking forward
to see what people will do with tree-sitter in the wild :-)
Major mode fallback
Right now there is no automatic falling back from tree-sitter major modes to “native” major modes when the tree-sitter library or language grammar is missing. Doing it right requires some change to the auto-mode facility. Hopefully we’ll see a good solution for it in Emacs 30. Right now, if you need automatic fallback, try something like this:
(define-derived-mode python-auto-mode prog-mode "Python Auto" "Automatically decide which Python mode to use." (if (treesit-ready-p 'python t) (python-ts-mode) (python-mode)))
Existing tree-sitter major modes are pretty basic and doesn’t have many bells and whistles, and I’m sure there are rough corners here and there. Of course, these things will improve over time.
Tree-sitter is very different and very new, and touches many parts of Emacs, so no one has experience with it and no one knows exactly how should it look like. Emacs 29 will give us valuable experience and feedback, and we can make it better and better in the future.
If you are interested, get involved! Read Contributing to Emacs for some tips in getting involved with the Emacs development. Read Tree-sitter Starter Guide if you want to write a major mode using tree-sitter. And of course, docstrings and the manual is always your friend. If you have questions, you can ask on Reddit, or comment in this post’s public inbox (see the footer).