Next: Tree-sitter C API Correspondence, Previous: Pattern Matching Tree-sitter Nodes, Up: Parsing Program Source [Contents][Index]
Sometimes, the source of a programming language could contain sources of other languages, HTML + CSS + JavaScript is one example. In that case, we need to assign individual parsers to text segments written in different languages. Traditionally this is achieved by using narrowing. While tree-sitter works with narrowing (see narrowing), the recommended way is to set ranges in which a parser will operate.
This function sets the range of parser to ranges. Then
parser will only read the text covered in each range. Each
range in ranges is a list of cons (beg
. end)
.
Each range in ranges must come in order and not overlap. That is, in pseudo code:
(cl-loop for idx from 1 to (1- (length ranges)) for prev = (nth (1- idx) ranges) for next = (nth idx ranges) should (<= (car prev) (cdr prev) (car next) (cdr next)))
If ranges violates this constraint, or something else went wrong, this function signals a tree-sitter-range-invalid. The signal data contains a specific error message and the ranges we are trying to set.
This function can also be used for disabling ranges. If ranges is nil, the parser is set to parse the whole buffer.
Example:
(tree-sitter-parser-set-included-ranges parser '((1 . 9) (16 . 24) (24 . 25)))
This function returns the ranges set for parser. The return
value is the same as the ranges argument of
tree-sitter-parser-included-ranges
: a list of cons
(beg . end)
. And if parser doesn’t have any
ranges, the return value is nil.
(tree-sitter-parser-included-ranges parser) ⇒ ((1 . 9) (16 . 24) (24 . 25))
Like tree-sitter-parser-set-included-ranges
, this function sets
the ranges of parser-or-lang to ranges. Conveniently,
parser-or-lang could be either a parser or a language. If it is
a language, this function looks for the first parser in
tree-sitter-parser-list for that language in the current buffer,
and set range for it.
This function returns the ranges of parser-or-lang, like
tree-sitter-parser-included-ranges
. And like
tree-sitter-set-ranges
, parser-or-lang can be a parser or
a language symbol.
This function matches source with pattern and returns the
ranges of captured nodes. The return value has the same shape of
other functions: a list of (beg . end)
.
For convenience, source can be a language symbol, a parser, or a node. If a language symbol, this function matches in the root node of the first parser using that language; if a parser, this function matches in the root node of that parser; if a node, this function matches in that node.
Parameter pattern is the query pattern used to capture nodes (see Pattern Matching Tree-sitter Nodes). The capture names don’t matter. Parameter beg and end, if both non-nil, limits the range in which this function queries.
Like other query functions, this function raises an tree-sitter-query-error if pattern is malformed.
This function tries to figure out which language is responsible for the text at point. It goes over each parser in tree-sitter-parser-list and see if that parser’s range covers point.
An alist of (language . function). Font-locking and indenting code uses functions in this alist to set correct ranges for a language parser before using it.
language is a language symbol, function is a function that sets ranges for language. It’s signature should be
(start end &rest _)
where start and end marks the region that is about to be used. function only need to (but not limited to) update ranges in that region.
This function is used by font-lock and indent to update ranges before using any parser. Each range function in tree-sitter-range-functions is called in-order. Arguments start and end are passed to each range function.
Normally, in a set of languages that can be mixed together, there is a major language and several embedded languages. The major language parses the whole document, and skips the embedded languages. Then the parser for the major language knows the ranges of the embedded languages. So we first parse the whole document with the major language’s parser, set ranges for the embedded languages, then parse the embedded languages.
Suppose we want to parse a very simple document that mixes HTML, CSS and JavaScript:
<html> <script>1 + 2</script> <style>body { color: "blue"; }</style> </html>
We first parse with HTML, then set ranges for CSS and JavaScript:
;; Create parsers. (setq html (tree-sitter-get-parser-create 'tree-sitter-html)) (setq css (tree-sitter-get-parser-create 'tree-sitter-css)) (setq js (tree-sitter-get-parser-create 'tree-sitter-javascript)) ;; Set CSS ranges. (setq css-range (tree-sitter-query-range 'tree-sitter-html "(style_element (raw_text) @capture)")) (tree-sitter-parser-set-included-ranges css css-range) ;; Set JavaScript ranges. (setq js-range (tree-sitter-query-range 'tree-sitter-html "(script_element (raw_text) @capture)")) (tree-sitter-parser-set-included-ranges js js-range)
We use a query pattern (style_element (raw_text) @capture)
to
find CSS nodes in the HTML parse tree. For how to write query
patterns, see Pattern Matching Tree-sitter Nodes.
Next: Tree-sitter C API Correspondence, Previous: Pattern Matching Tree-sitter Nodes, Up: Parsing Program Source [Contents][Index]