Git Product home page Git Product logo

swifttreesitter's Introduction

Build Status Platforms Documentation Discord

SwiftTreeSitter

Swift API for the tree-sitter incremental parsing system.

  • Close to full coverage of the C API
  • Swift/Foundation types where possible
  • Standard query result mapping for highlights and injections
  • Query predicate/directive support via ResolvingQueryMatchSequence
  • Nested language support
  • Swift concurrency support where possible

Structure

This project is actually split into two parts: SwiftTreeSitter and SwiftTreeSitterLayer.

The SwiftTreeSitter target is a close match to the C runtime API. It adds only a few additional types to help support querying. It is fairly low-level, and there will be significant work to use it in a real project.

SwiftTreeSitterLayer is an abstraction built on top of SwiftTreeSitter. It supports documents with nested languages and transparent querying across those nestings. It also supports asynchronous language resolution. While still low-level, SwiftTreeSitterLayer is easier to work with while also supporting more features.

And yet there's more! If you are looking a higher-level system for syntax highlighting and other syntactic operations, you might want to have a look at Neon. It is much easier to integrate with a text system, and has lots of additional performance-related features.

Integration

dependencies: [
    .package(url: "https://github.com/ChimeHQ/SwiftTreeSitter")
],
targets: [
    .target(
        name: "MySwiftTreeSitterTarget",
        dependencies: ["SwiftTreeSitter"]
    ),
    .target(
        name: "MySwiftTreeSitterLayerTarget",
        dependencies: [
            .product(name: "SwiftTreeSitterLayer", package: "SwiftTreeSitter"),
        ]
    ),
]

Query Conflicts

SwiftTreeSitter does its best to resolve poor/incorrect query constructs, which are surprisingly common.

When using injections, child query ranges are automatically expanded using parent matches. This handles cases where a parent has queries that overlap with children in conflicting ways. Without expansion, it is possible to construct queries that fall within children ranges but produce on parent matches.

All matches are sorted by:

  • depth
  • location in content
  • specificity of match label (more components => more specific)
  • occurrence in the query source

Even with these, it is possible to produce queries that will result in "incorrect" behavior that are either ambiguous or undefined in the query definition.

Highlighting

A very common use of tree-sitter is to do syntax highlighting. It is possible to use this library directly, especially if your source text does not change. Here's a little example that sets everything up with a SPM-bundled language.

First, check out how it works with SwiftTreeSitterLayer. It's complex, but does a lot for you.

// LanguageConfiguration takes care of finding and loading queries in SPM-created bundles.
let markdownConfig = try LanguageConfiguration(tree_sitter_markdown(), name: "Markdown")
let markdownInlineConfig = try LanguageConfiguration(
    tree_sitter_markdown_inline(),
    name: "MarkdownInline",
    bundleName: "TreeSitterMarkdown_TreeSitterMarkdownInline"
)
let swiftConfig = try LanguageConfiguration(tree_sitter_swift(), name: "Swift")

// Unfortunately, injections do not use standardized language names, and can even be content-dependent. Your system must do this mapping.
let config = LanguageLayer.Configuration(
    languageProvider: {
        name in
        switch name {
        case "markdown":
            return markdownConfig
        case "markdown_inline":
            return markdownInlineConfig
        case "swift":
            return swiftConfig
        default:
            return nil
        }
    }
)

let rootLayer = try LanguageLayer(languageConfig: markdownConfig, configuration: config)

let source = """
# this is markdown

```swift
func main(a: Int) {
}

also markdown

let value = "abc"

"""

rootLayer.replaceContent(with: source)

let fullRange = NSRange(source.startIndex..<source.endIndex, in: source)

let textProvider = source.predicateTextProvider let highlights = try rootLayer.highlights(in: fullRange, provider: textProvider)

for namedRange in highlights { print("(namedRange.name): (namedRange.range)") }


You can also use SwiftTreeSitter directly:

```swift
let swiftConfig = try LanguageConfiguration(tree_sitter_swift(), name: "Swift")

let parser = Parser()
try parser.setLanguage(swiftConfig.language)

let source = """
func main() {}
"""
let tree = parser.parse(source)!

let query = swiftConfig.queries[.highlights]!

let cursor = query.execute(in: tree)
let highlights = cursor
    .resolve(with: .init(string: source))
    .highlights()

for namedRange in highlights {
    print("range: ", namedRange)
}

Language Parsers

Tree-sitter language parsers are separate projects, and you'll probably need at least one. More details are available in the documentation. How they can be installed an incorporated varies.

Here's a list of parsers that support SPM. Since you're here, you might find that convenient. And the LanguageConfiguration type supports loading bundled queries directly.

Parser Make SPM Official Repo
Bash
C
C++
C#
Clojure
CSS
Dockerfile
Diff
Elixir
Elm
Go
GoMod
GoWork
Haskell
HCL
HTML
Java
Javascript
JSON
JSDoc
Julia
Kotlin
Latex
Lua
Markdown
OCaml
Perl
PHP
Pkl
Python
Ruby
Rust
Scala
SQL
SSH
Swift
TOML
Tree-sitter query language
Typescript
Verilog
YAML
Zig

Contributing and Collaboration

I would love to hear from you! Issues or pull requests work great. A Discord server is also available for live help, but I have a strong bias towards answering in the form of documentation.

I prefer collaboration, and would love to find ways to work together if you have a similar project.

I prefer indentation with tabs for improved accessibility. But, I'd rather you use the system you want and make a PR than hesitate because of whitespace.

By participating in this project you agree to abide by the Contributor Code of Conduct.

swifttreesitter's People

Contributors

danielpunkass avatar divinedominion avatar elviswong213 avatar fjtrujy avatar intitni avatar jsorge avatar kaunteya avatar krzyzanowskim avatar lukepistrol avatar mattmassicotte avatar nhubbard avatar rex4539 avatar yeatse avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

swifttreesitter's Issues

Investigate issues with emoji

@kaunteya reported that there are issues with handling emoji. I think these are all typically multi-point characters in UTF16, which is on the rare side. But, it would mean there are encoding/decoding issues and need to be investigated.

Weird crash happening while accessing properties on Node

I am facing a strange issue where the Node if accessed in the function is not crashing but if it is accessed in the called function then it crashes.

class ViewModel: NSObject {
    let textStorage = NSTextStorage()
    let treeSitterClient: TreeSitterClient
    let query: Query

    func node(at location: Int) async throws -> Node? {
        let currentTree = try await treeSitterClient.currentTree()
        var root: Node = currentTree.rootNode!.firstChild!
        print(root.parent) // <- Does not crash here
        return root
    }

    func getPath(range: NSRange) {
        Task { @MainActor in
            let node = try! await node(at: range.location)!
            print(node.parent) // <- But crashes here
        }
    }
}

The crash is reproducible even when I access other properties like nextSibling, prevSibling in getPath() function.
The crash however DOESNT happen when I access namedChildCount & childCount in getPath() function.
Does it have to be something related to Node being a struct?

I have removed other irrelevant code. If required I can create a standalone minimum reproducible

`QueryCursor.highlights()` highlights same token multiple times

I'm using SwiftTreeSitter for syntax highlighting in my app. However, I find that highlights obtained using

let cursor = query.execute(node: tree.rootNode!)
var attrString = AttributedString(stringLiteral: code)
        
for highlight in cursor.highlights() {
    print(highlight.name) // for debugging (see below)
    print(highlight.range)
    let range = Range(highlight.range, in: attrString)!
    let attr = HighlightTheme.attributesFor(name: highlight.name)
    attrString[range].mergeAttributes(attr)

gives different results from using the same parser and highlighter in the tree-sitter CLI. I'm not sure if this is because I'm using SwiftTreeSitter the wrong way or because there's a bug. Here's an example:

Given a string containing the following very simple C code

// my main function
int main() {
    printf("Hello World\n");
    return 0;
}

the code above outputs:

comment
{0, 19}
type
{20, 3}
function <---
{24, 4}
constant <---
{24, 4}
variable <---
{24, 4}
...

i.e. the token "main" is marked as both a function, a constant, and a variable. This messes with my highlighting, since "main" will be marked as a variable, not a function. I see in the docs that in the highlights() function "[r]esults are sorted such that less-specific matches come before more-specific." In that case, I could keep track of whether I've seen a specific range before and ignore any subsequent matches for that range. However, for other languages, I find that the order is different. Here is another example in Rust:

// my main function
fn main() {
    println!("Hello World");
}

which yields:

comment
{0, 19}
keyword
{20, 2}
constant <---
{23, 4}
constructor <---
{23, 4}
function <---
{23, 4}
...

I was curious to see whether this was due to a problem with the different highlight.scm files I'm using, so I ran tree-sitter highlight test.c and tree-sitter highlight test.rs (i.e. the same snippets I've shown above) with the same parser and highlight.scm files and found that the code is highlighted properly.

Is this expected behavior? If so, how do I deal with it?

Possible memory leak in Input

I'm not familiar enough with the code to be sure, but I think the setter for Input.buffer leaks memory.

Existing version:

set {
    // if newValue != nil && internalBuffer != nil, we never deallocate internalBuffer
    if newValue == nil {
        internalBuffer?.deallocate()
    }

    internalBuffer = newValue
}

Possible fix:

set {
    internalBuffer?.deallocate()
    internalBuffer = newValue
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.