Using Parser (GNU Emacs Lisp Reference Manual)

37.2 Using Tree-sitter Parser

This section described how to create and configure a tree-sitter parser. In Emacs, each tree-sitter parser is associated with a buffer. As we edit the buffer, the associated parser is automatically kept up-to-date.

Before creating a parser, it is perhaps good to check whether we should use tree-sitter at all. Sometimes a user don’t want to use tree-sitter features for a major mode. To turn-off tree-sitter for a mode, they add that mode to tree-sitter-disabled-modes. If they want to turn off tree-sitter for buffers larger than a particular size (because tree-sitter consumes memory ~10 times the buffer size for storing the syntax tree), they set tree-sitter-maximum-size.

Function: tree-sitter-should-enable-p &optional mode ¶

This function returns non-nil if mode (default to the current major mode) should activate tree-sitter features. The result depends on the value of tree-sitter-disabled-modes and tree-sitter-maximum-size described above. The result also depends on, of course, the result of tree-sitter-avaliabe-p.

Writer of major modes or other packages are responsible for calling this function and determine whether to activate tree-sitter features.

To create a parser, we provide a buffer to parse and the language to use (see Tree-sitter Language Definitions). Emacs provides several creation functions for different use cases.

Function: tree-sitter-get-parser-create language ¶

This function is the most convenient one. It gives you a parser that recognizes language for the current buffer. The function checks if there already exists a parser suiting the need, and only creates a new one when it can’t find one.

;; Create a parser for C programming language.
(tree-sitter-get-parser-create 'tree-sitter-c)

Function: tree-sitter-get-parser language ¶: This function is like tree-sitter-get-parser-create, but it always creates a new parser.

Function: tree-sitter-parser-create buffer language ¶: This function is the most primitive, requiring both the buffer to associate to, and the language to use. If buffer is nil, the current buffer is used.

Given a parser, we can query information about it:

Function: tree-sitter-parser-buffer parser ¶: Returns the buffer associated with parser.

Function: tree-sitter-parser-language parser ¶: Returns the language that parser uses.

Function: tree-sitter-parser-p object ¶: Checks if object is a tree-sitter parser. Return non-nil if it is, return nil otherwise.

There is no need to explicitly parse a buffer, because parsing is done automatically and lazily. A parser only parses when we query for a node in its syntax tree. Therefore, when a parser is first created, it doesn’t parse the buffer; instead, it waits until we query for a node for the first time. Similarly, when some change is made in the buffer, a parser doesn’t re-parse immediately and only records some necessary information to later re-parse when necessary.

When a parser do parse, it checks for the size of the buffer. Tree-sitter can only handle buffer no larger than about 4GB. If the size exceeds that, Emacs signals tree-sitter-buffer-too-large with signal data being the buffer size.

Once a parser is created, Emacs automatically adds it to the buffer-local variable tree-sitter-parser-list. Every time a change is made to the buffer, Emacs updates parsers in this list so they can update their syntax tree incrementally. Therefore, one must not remove parsers from this list and put the parser back in: if any change is made when that parser is absent, the parser will be permanently out-of-sync with the buffer content, and shouldn’t be used anymore.

Normally, a parser “sees” the whole buffer, but when the buffer is narrowed (see Narrowing), the parser will only see the visible region. As far as the parser can tell, the hidden region is deleted. And when the buffer is later widened, the parser thinks text is inserted in the beginning and in the end. Although parsers respect narrowing, narrowing shouldn’t be the mean to handle a multi-language buffer; instead, set the ranges in which a parser should operate in. See Parsing Text in Multiple Languages.

Because a parser parses lazily, when we narrow the buffer, the parser doesn’t act immediately; as long as we don’t query for a node while the buffer is narrowed, narrowing does not affect the parser.

Function: tree-sitter-parse-string string language ¶

Besides creating a parser for a buffer, we can also just parse a string. Unlike a buffer, parsing a string is a one-time deal, and there is no way to update the result.

This function parses string with language, and returns the root node of the generated syntax tree.