Previous: , Up: Automatic Indentation of code   [Contents][Index]


24.7.2 Parser-based Indentation

When built with the tree-sitter library (see Parsing Program Source), Emacs is capable of parsing the program source and producing a syntax tree. This syntax tree can be used for guiding the program source indentation commands. For maximum flexibility, it is possible to write a custom indentation function that queries the syntax tree and indents accordingly for each language, but that is a lot of work. It is more convenient to use the simple indentation engine described below: then the major mode needs only to write some indentation rules and the engine takes care of the rest.

To enable the parser-based indentation engine, either set treesit-simple-indent-rules and call treesit-major-mode-setup, or equivalently, set the value of indent-line-function to treesit-indent.

Variable: treesit-indent-function

This variable stores the actual function called by treesit-indent. By default, its value is treesit-simple-indent. In the future we might add other, more complex indentation engines.

Writing indentation rules

Variable: treesit-simple-indent-rules

This local variable stores indentation rules for every language. It is a list of the form: (language . rules), where language is a language symbol, and rules is a list of the form (matcher anchor offset).

First, Emacs passes the smallest tree-sitter node at the beginning of the current line to matcher; if it returns non-nil, this rule is applicable. Then Emacs passes the node to anchor, which returns a buffer position. Emacs takes the column number of that position, adds offset to it, and the result is the indentation column for the current line.

The matcher and anchor are functions, and Emacs provides convenient defaults for them.

Each matcher or anchor is a function that takes three arguments: node, parent, and bol. The argument bol is the buffer position whose indentation is required: the position of the first non-whitespace character after the beginning of the line. The argument node is the largest node that starts at that position (and is not a root node); and parent is the parent of node. However, when that position is in a whitespace or inside a multi-line string, no node can start at that position, so node is nil. In that case, parent would be the smallest node that spans that position.

matcher should return non-nil if the rule is applicable, and anchor should return a buffer position.

offset can be an integer, a variable whose value is an integer, or a function that returns an integer. If it is a function, it is passed node, parent, and bol, like matchers and anchors.

Variable: treesit-simple-indent-presets

This is a list of defaults for matchers and anchors in treesit-simple-indent-rules. Each of them represents a function that takes 3 arguments: node, parent and bol. The available default functions are:

no-node

This matcher is a function that is called with 3 arguments: node, parent, and bol, and returns non-nil, indicating a match, if node is nil, i.e., there is no node that starts at bol. This is the case when bol is on an empty line or inside a multi-line string, etc.

parent-is

This matcher is a function of one argument, type; it returns a function that is called with 3 arguments: node, parent, and bol, and returns non-nil (i.e., a match) if parent’s type matches regexp type.

node-is

This matcher is a function of one argument, type; it returns a function that is called with 3 arguments: node, parent, and bol, and returns non-nil if node’s type matches regexp type.

query

This matcher is a function of one argument, query; it returns a function that is called with 3 arguments: node, parent, and bol, and returns non-nil if querying parent with query captures node (see Pattern Matching Tree-sitter Nodes).

match

This matcher is a function of 5 arguments: node-type, parent-type, node-field, node-index-min, and node-index-max). It returns a function that is called with 3 arguments: node, parent, and bol, and returns non-nil if node’s type matches regexp node-type, parent’s type matches regexp parent-type, node’s field name in parent matches regexp node-field, and node’s index among its siblings is between node-index-min and node-index-max. If the value of an argument is nil, this matcher doesn’t check that argument. For example, to match the first child where parent is argument_list, use

(match nil "argument_list" nil nil 0 0)
n-p-gp

Short for “node-parent-grandparent”, this matcher is a function of 3 arguments: node-type, parent-type, and grandparent-type. It returns a function that is called with 3 arguments: node, parent, and bol, and returns non-nil if: (1) node-type matches node’s type, and (2) parent-type matches parent’s type, and (3) grandparent-type matches parent’s parent’s type. If any of node-type, parent-type, and grandparent-type is nil, this function doesn’t check for it.

comment-end

This matcher is a function that is called with 3 arguments: node, parent, and bol, and returns non-nil if point is before a comment ending token. Comment ending tokens are defined by regular expression comment-end-skip

first-sibling

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of the first child of parent.

parent

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of parent.

parent-bol

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-space character on the line of parent.

prev-sibling

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of the previous sibling of node.

no-indent

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the start of node.

prev-line

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-whitespace character on the previous line.

point-min

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the beginning of the buffer. This is useful as the beginning of the buffer is always at column 0.

comment-start

This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the position after the comment-start token. Comment-start tokens are defined by regular expression comment-start-skip. This function assumes parent is the comment node.

prev-adaptive-prefix

This anchor is a function that is called with 3 arguments: node, parent, and bol. It tries to go to the beginning of the previous non-empty line, and matches adaptive-fill-regexp. If there is a match, this function returns the end of the match, otherwise it returns nil. This anchor is useful for a indent-relative-like indent behavior for block comments.

Indentation utilities

Here are some utility functions that can help writing parser-based indentation rules.

Function: treesit-check-indent mode

This function checks the current buffer’s indentation against major mode mode. It indents the current buffer according to mode and compares the results with the current indentation. Then it pops up a buffer showing the differences. Correct indentation (target) is shown in green color, current indentation is shown in red color.

It is also helpful to use treesit-inspect-mode (see Tree-sitter Language Grammar) when writing indentation rules.


Previous: Simple Minded Indentation Engine, Up: Automatic Indentation of code   [Contents][Index]