Using Parser (GNU Emacs Lisp Reference Manual)

37.2 Using Tree-sitter Parser

This section describes how to create and configure a tree-sitter parser. In Emacs, each tree-sitter parser is associated with a buffer. As the user edits the buffer, the associated parser and syntax tree are automatically kept up-to-date.

Variable: treesit-max-buffer-size ¶: This variable contains the maximum size of buffers in which tree-sitter can be activated. Major modes should check this value when deciding whether to enable tree-sitter features.

Function: treesit-parser-create language &optional buffer no-reuse ¶

Create a parser for the specified buffer and language (see Tree-sitter Language Grammar). If buffer is omitted or nil, it stands for the current buffer.

By default, this function reuses a parser if one already exists for language in buffer, but if no-reuse is non-nil, this function always creates a new parser.

If that buffer is an indirect buffer, its base buffer is used instead. That is, indirect buffers use their base buffer’s parsers. If the base buffer is narrowed, an indirect buffer might not be able to retrieve information of the portion of the buffer text that are invisible in the base buffer. Lisp programs should widen as necessary should they want to use a parser in an indirect buffer.

Given a parser, we can query information about it.

Function: treesit-parser-buffer parser ¶: This function returns the buffer associated with parser.

Function: treesit-parser-language parser ¶: This function returns the language used by parser.

Function: treesit-parser-p object ¶: This function checks if object is a tree-sitter parser, and returns non-nil if it is, and nil otherwise.

There is no need to explicitly parse a buffer, because parsing is done automatically and lazily. A parser only parses when a Lisp program queries for a node in its syntax tree. Therefore, when a parser is first created, it doesn’t parse the buffer; it waits until the Lisp program queries for a node for the first time. Similarly, when some change is made in the buffer, a parser doesn’t re-parse immediately.

When a parser does parse, it checks for the size of the buffer. Tree-sitter can only handle buffer no larger than about 4GB. If the size exceeds that, Emacs signals the treesit-buffer-too-large error with signal data being the buffer size.

Once a parser is created, Emacs automatically adds it to the internal parser list. Every time a change is made to the buffer, Emacs updates parsers in this list so they can update their syntax tree incrementally.

Function: treesit-parser-list &optional buffer ¶: This function returns the parser list of buffer. If buffer is nil or omitted, it defaults to the current buffer. If that buffer is an indirect buffer, its base buffer is used instead. That is, indirect buffers use their base buffer’s parsers.

Function: treesit-parser-delete parser ¶: This function deletes parser.

Normally, a parser “sees” the whole buffer, but when the buffer is narrowed (see Narrowing), the parser will only see the accessible portion of the buffer. As far as the parser can tell, the hidden region was deleted. When the buffer is later widened, the parser thinks text is inserted at the beginning and at the end. Although parsers respect narrowing, modes should not use narrowing as a means to handle a multi-language buffer; instead, set the ranges in which the parser should operate. See Parsing Text in Multiple Languages.

Because a parser parses lazily, when the user or a Lisp program narrows the buffer, the parser is not affected immediately; as long as the mode doesn’t query for a node while the buffer is narrowed, the parser is oblivious of the narrowing.

Besides creating a parser for a buffer, a Lisp program can also parse a string. Unlike a buffer, parsing a string is a one-off operation, and there is no way to update the result.

Function: treesit-parse-string string language ¶: This function parses string using language, and returns the root node of the generated syntax tree.

Be notified by changes to the parse tree

A Lisp program might want to be notified of text affected by incremental parsing. For example, inserting a comment-closing token converts text before that token into a comment. Even though the text is not directly edited, it is deemed to be “changed” nevertheless.

Emacs lets a Lisp program to register callback functions (a.k.a. notifiers) for this kind of changes. A notifier function takes two arguments: ranges and parser. ranges is a list of cons cells of the form (start . end), where start and end mark the start and the end positions of a range. parser is the parser issuing the notification.

Every time a parser reparses a buffer, it compares the old and new parse-tree, computes the ranges in which nodes have changed, and passes the ranges to notifier functions. Note that the initial parse is also considered a “change”, so notifier functions are called on the initial parse, with range being the whole buffer.

Function: treesit-parser-add-notifier parser function ¶: This function adds function to parser’s list of after-change notifier functions. function must be a function symbol, not a lambda function (see Anonymous Functions).

Function: treesit-parser-remove-notifier parser function ¶: This function removes function from the list of parser’s after-change notifier functions. function must be a function symbol, rather than a lambda function.

Function: treesit-parser-notifiers parser ¶: This function returns the list of parser’s notifier functions.