github / stack-graphs Goto Github PK

Rust implementation of stack graphs

Home Page: https://docs.rs/stack-graphs/*/stack_graphs/

License: Apache License 2.0

Rust 81.33% C 3.23% Shell 0.32% CSS 0.87% JavaScript 6.72% Python 0.97% TypeScript 4.41% Java 0.72% Scheme 1.43%

stack-graphs's Introduction

Stack graphs

The crates in this repository provide a Rust implementation of stack graphs, which allow you to define the name resolution rules for an arbitrary programming language in a way that is efficient, incremental, and does not need to tap into existing build or program analysis tools.

How to contribute

We welcome your contributions! Please see our contribution guidelines and our code of conduct for details on how to participate in our community.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as described below, without any additional terms or conditions.

Credits

Stack graphs are heavily based on the scope graphs framework from Eelco Visser's group at TU Delft.

License

Licensed under either of

at your option.

stack-graphs's People

Contributors

Stargazers

Watchers

Forkers

classicvalues mrcodechef isgasho lyhour-archieved zxdxjtu jiayuanmark hbscharp ka3ak40 nejnaya iradukundairenee jackyevo938 vdhanashree09 cyberflamego gabssnake filesy maxwnewcomer mathieulemieux cee-tee c33t33 binadamu-isiyoonekana oppiliappan hsqstephenzhang rishav0902 sivavijayasri shivam2250 daylyt247 attila-lin femadeclaration1024 wirsich-forks kitcatier isabella232 khiem447 ahmedtaher202 nielsymn zhijie-zhang chadhietala elyn6748 nashid david290620 gridsp9 conexinvershion limck78 ameratrraza mushik888 ryman lennox420 zyhipa bluskript llin590 wmanley ninja-man07 wuducndjwiwoejbxiwow phamhongtuyen386 miezoking khlooddeeb ekc9399 drcrank23 brandon170699vc gigi-s-str watchtower001110 codeup357 ytachi ayomidebolaji120lcom haym11 aaa9ji jdonszelmann muruku1 carltonbrown-test-org salvotoshi77-creator mohdirfanhaikal391 nomicfoundation dcreager unkwonperson134 bigdude100 qais121234q nguyenthaocc727 dhirenmathur dearborn-open-ai samet1994 jlefever mirnagi1356 ffffjx d-wy tnturner23 ucanzii jofernmorais fiyaas gshshegeghejebbehehheb subhampramanik wilmare eyakubovich marco1553 aziemchawdhary-gs skillz619 duckyou20 zillyfi84 mjid999 jylkim kennethwthomas69 smc3185

stack-graphs's Issues

More flexible `source_node` handling

Sometimes it is useful to introduce proper definitions that do not originate from the syntax tree. A common example is module definitions that are derived from the file path. To make a definition, one has to provide a source_node. In the module example, the only available node is often the whole tree. As a result, the definition covers the whole file, which can lead to annoying behavior when consuming the data, e.g. because the whole file becomes clickable.

Instead of using the whole span from the provided node, it should be possible to use an empty span at the start of the nodes span. So a node span 1:0--5:10 would be reduced to 1:0--1:0.

To implement this, I propose an extra attribute empty_span that modifies the span calculation based on the already existing source_node.

Example: Split definitions

Add tree-sitter-stack-graphs example showing how to encode split definitions, similar to C# partial classes or Rust impl blocks.

Use partially ordered labels instead of numbers for edge precedence

Using a partially ordered set of labels for precedence is more general, and conveys more semantic information, than the current approach of using numbers. It is also more flexible when different graphs or rule sets are combined.

Java TSG: Comments break modifiers related rule

Repro case

class A {
    @Getter
    @ToString
    // This comment will break the TSG rule
    static class B {
    }
}

Affected TSG rule

(modifiers (_) @annotation) @this {
    edge @annotation.ref -> @this.lexical_scope
}

Note that @annotation here becomes a node of type line_comment which does not have a ref node.

Make stack-graphs generic over source info

We just made a PR that optionally disables tree-sitter (#332). However, the only place where that library is used is in source_info:

stack-graphs/stack-graphs/src/graph.rs

Line 1443 in 068342a

pub(crate) source_info: SupplementalArena<Node, SourceInfo>,

Ideally, stack-graphs is not tied to LSP-positions here and is generic over this datatype. StackGraph could be generic over some T, where T is the type stored in the supplemental arena. If that is true, SourceInfo which is defined here:

stack-graphs/stack-graphs/src/graph.rs

Line 1336 in 068342a

pub struct SourceInfo {

could even be removed from the crate.

Is that something you agree with?

Strange bug with capture scoping

See branch new_js_debug_labeled_stmt

If you run

tree-sitter-stack-graphs test --tsg src/stack-graphs.tsg test/base_syntax.js

you get the following error about duplicate nodes:

Error: Error running test test/base_syntax.js

Caused by:
    Executing edge (scoped [syntax node number (1, 1)] 'before_scope) -> (scoped [syntax node number (1, 1)] 'before_scope) at (3, 5). Caused by: Duplicate variable [syntax node labeled_statement (2, 1)].before_scope set at (9, 24) and (9, 24)

This error goes away when the @foo capture is removed from the query file, and also when the number is removed from the JS source file.

The source code for the respective files is:

(number)@number {

    edge @number.before_scope -> @number.before_scope
  
}

(labeled_statement (_)@foo)@labeled_stmt {

    node @labeled_stmt.before_scope

}

5;
foo: "foo";

The parse tree for the JS file is

(program [0, 0] - [1, 11]
  (expression_statement [0, 0] - [0, 2]
    (number [0, 0] - [0, 1]))
  (labeled_statement [1, 0] - [1, 11]
    label: (statement_identifier [1, 0] - [1, 3])
    body: (expression_statement [1, 5] - [1, 11]
      (string [1, 5] - [1, 10]
        (string_fragment [1, 6] - [1, 9])))))

Messagepack breaks db decode logic

I tried to get a db using tree-sitter graphs working, but I encountered an issue pretty quickly where it would save successfully but it wouldn't be able to load the database back using load_graph_for_file_or_directory successfully again.

Here's a couple issues I've noticed:

There's a place where the code refers to a json column which doesn't exist anymore, shouldn't this be value instead?

stack-graphs/stack-graphs/src/storage.rs

Line 504 in d15b259

let mut stmt = conn.prepare_cached("SELECT json FROM graphs WHERE file = ?")?;

In addition, even after this issue is fixed, the MessagePack serialization is currently using rmp_serde::to_vec, and as a result the deserializer encounters issues due to the use of a lot of #[serde(skip_serializing_if = "Option::is_none")] in the StackGraph structure.
Since its all unlabeled array elements, the MessagePack decoder fails to parse fields in the right order correctly, giving a cryptic error:

invalid type: integer `1`, expected a string

It might be a good idea to use rmp_serde::to_vec_named instead or to move back to JSON again, or alternatively remove the skip serializing option on the fields (but if going with this option it would be good to have a thorough check to ensure it doesn't have any edge cases).

Java TSG: Package names are not correctly processed

Repro case

// File: src/main/java/example/one/A.java
package com.example.one;

import com.example.two.B;

public class A {
    public void method() {
        B instance = new B();
        instance.method();
    }
}

// File: src/main/java/example/two/B.java
package com.example.two;

public class B {
   public void method() {}
}

Find the definition of instance.method from A.java
This is expected to work correctly, and it does.
Now replace the following in B.java

-- package com.example.two;
++ package com.cheese.two;

Find the definition of instance.method from A.java

Expected result

No definition should be found, as packages do not match.

Actual result

Wrong definition is found.

Java TSG: Comma operator in for loop breaks TSG rule

Repro case

for (int i = 0, j = fish.length(); i < j; i++) {
}

This causes a duplicate edge error.

Affected TSG rule

(local_variable_declaration
  type: (_) @type
  declarator: (variable_declarator
    name: (_) @name
  )
) @local_var
{
  node def
  attr (def) node_definition = @name
  edge @local_var.after_scope -> def
  edge @local_var.after_scope -> @local_var.before_scope
  attr (@local_var.after_scope -> @local_var.before_scope) precedence = 1

  edge @type.lexical_scope -> @local_var.before_scope

  node def__typeof
  attr (def__typeof) pop_symbol = ":"

  edge def -> def__typeof

  edge def__typeof -> @type.type
}

Add tests for json features in CI

Pushed PR and passed all tests, realized json tests aren't ran by CI

Support highlighting all nodes from file(s)

Being able to highlight all nodes that belong to a particular file can be very helpful when debugging larger multi-file tests.

Allow users to pass along stitcher configuration

Path stitching can be configured to enable or disable similar path detection. It can be interesting to disable that if the rules allow it, for increased performance. However, for rules that do not allow it, it can lead to exponential blowup.

The path stitcher is often not created by callers directly, but indirectly when e.g. running tests, or running a query. Callers should be able to pass along stitcher parameters to configure the stitcher correctly.

Additionally, language configurations should have an additional value specifying the required stitching parameters for that language. (Ideally these could also be read from package.json for dynamically loaded languages.)

`parse` command parses unused TSG rules

This could be considered a bug or a feature I suppose, but I was certainly surprised that an error in my .tsg prevented me from successfully running cargo run -p tree-sitter-stack-graphs-java -- parse file.java.

Definition assertions

We can assert references with defined, but it would also be nice to assert definitions with e.g. defines to ease testing in isolation.

int x;
//  ^ defines: x

Example: import hiding, as in Scala?

Hi! I just wandered over here from the Github documentation. The approach taken in this project looks really interesting (and seems like it could have wider applications, like incremental recompilation), but I was wondering how something like import hiding in Scala could be represented:

object A {
  val foo: Int = 1
}
object B {
  val foo: Int = 1
  val bar: Int = 2
}
object C {
  import A._
  import B.{foo => _, _}
  foo // resolves to A.foo, not B.foo
}

I just found out that previous work on scope graphs did an impressive job modeling Scala import rules, but it's not clear to me how applicable this is to stack graphs.

Investigate possibility of replacing SQLite with RocksDB

The current file based storage implementation is based on rusqlite. It seems to be rather slow in practice, and we don't really depend on the SQL features. @bluskript suggested to replace it with something else like RocksDB #302. I think this is worth investigating and making the change if it improves performance.

Similar to the binary encoding we use, the database itself should be mature, have a stable storage format, and perform well. These are explicit goals of the RocksDB project, making it a good candidate.

Changes

I imagine the following changes to be part of this transition:

One of the main challenges will be maintaining the index of file paths from the root node. The paths coming from root are indexed by partial symbol stacks, and originate from multiple files. Therefore, it should be possible to add paths belonging to a particular file to this shared index, but also to remove the paths from a particular files from the index, while keeping the rest.
- Perhaps the problem is slightly simplified if we add an intermediate step. Per file an index of partial symbol stacks for root paths, and one global index for partial symbol stacks pointing to files contributing paths with that stack.
See if we can exploit RocksDB's range based indexing to efficiently load/remove data from the database (e.g., all paths belong to a single file). This will require careful key design.
Drop the support for having an in-memory database. RocksDB doesn't support this, and once #291 is merged, switching between a database and in-memory data structures should have little impact on code.
Find more neutral naming for the storage classes. Currently, they explicitly mention SQL, but we should use something to suggest disk, file-based, or persistent storage, without exposing the implementation.

Examples?

As this library currently is, it seems capable enough, but the documentation isn't too clear about how the API is used. Examples of how to use this library would help.

References resolve differently depending on context

References in a stack graph may resolve differently depending on context.

Problem

Consider the following TypeScript test:

let x = {
    f: 11
};

{
    let x = {
    };

    x.f;
  //^ defined: 6
  //  ^ defined:
}

export {};

Two variables x are defined, one shadowing the other inside the block. The x reference in x.f resolves, correctly, only to the inner definition. This shows that shadowing works correctly.

The f reference in x.f should not resolve to anything. After all, x resolved to the inner definition, which does not have fields. Unfortunately, this is not what happens. The reference resolves to the field of the outer definition.

Why is this happening?

Let's have a look at the stack graph:

When resolving x, there are two possible paths to x definitions. At the branch point, one outgoing edge has a higher priority than the other, so only one path is selected.

When resolving f, there is only one possible path, namely via the outer x to its field. Disambiguation is only applied on complete paths, which end in a definition with an empty symbol stack. Since there is no such path via the inner x, the one result path is not shadowed at all.

Exercise for the reader

Is this behavior enough to implement SAT solving with stack graphs?

What should we do about this?

The original scope graph work had similar problems, where a reference would resolve differently on its own vs. as part of a qualified reference. I have always been of the opinion that this is incorrect. I do not know of programming languages where a reference is interpreted differently depending on whether you look at it vs. when you look through it. As far as I can see now, this is not something one can fix by adding more precedences either.

Doing resolution such that a reference is always interpreted the same requires that shadowing is applied to subpaths that go from a reference to a definition. Every subpath that goes from a reference to a definition and has the same pre and post condition, should be in the set of resolved paths for that reference (which have and empty pre and post condition) if the pre and post conditions are erased.

My hunch is that this approach would retain all resolution behavior that you would expect.

Implementation

One question is whether we could resolve references we encounter on their own (disregarding the context we are in), and continue based on the shadowed result set for that reference. Some paths only work within a context, e.g. paths ending in a jump node, or the root node, they have non-empty symbol stacks but should definitely be considered.

Mutable Variables::nested(globals) in release 0.3.1

I get this error when trying to build new project, looking at the lib.rs in the current main, there is a

let mut globals ...

but maybe didn't make it to the actual release of 0.3.1?

error[E0596]: cannot borrow `globals` as mutable, as it is not declared as mutable
   --> .../.cargo/registry/src/github.com-1ecc6299db9ec823/tree-sitter-stack-graphs-0.3.1/src/lib.rs:514:9
    |
512 |           let globals = Variables::nested(globals);
    |               ------- help: consider changing this to be mutable: `mut globals`
513 |           let root_node = self.inject_node(NodeID::root());
514 | /         globals
515 | |             .add(ROOT_NODE_VAR.into(), root_node.into())
    | |________________________________________________________^ cannot borrow as mutable

I also might be using this wrong, but my cargo.toml looks like this:

stack-graphs = {version = "0.10.1", features = ["json"]}
tree-sitter-stack-graphs = "0.3.1"
tree-sitter-graph = "0.6.1"
tree-sitter = "0.20.8"
tree-sitter-python = "0.20.2"

Support non-TSG analysis for certain files

Certain "special" files, such as project configuration files, should be analyzed and produce stack graphs to model certain language features, even though they may not be written in the language being targeted. For example, for TypeScript the tsconfig.json can define path mappings that influence how module imports are resolved. Without a way to process that file, it may be impossible to resolve these imports correctly.

Some slightly random observations about the problem:

Specific ways of turning files such as tsconfig.json or setup.py into stack graphs.
These files are not matched on file extension, but on the whole file name.
Using TSG is not always the best/easiest way to do this, and it should be possible to implement them directly in Rust.
There may not be a tree sitter grammar necessary or available for those files, so we shouldn't require that.
These analyses may depend on more information than just the source of the file. For example, tsconfig.json requires the paths of all sources in the project.

How can we support this?

My idea is to make this part of the LanguageConfiguration for the language they apply to. (Another option might be to consider them languages of their own, with separate LanguageConfiguration, but that might be difficult if they do not have corresponding Tree-sitter grammars.) I see two flavors of this:

Add some special_files field, that maps file names to custom analysis functions.
Introduce a list of (PathMatcher, Analyzer) pairs. The current file_types and content_regex would become a file extension PathMatcher, and tsg_source would become a TSG Analyzer. The special files would use a file name PathMatcher and an Analyzer written in Rust.

The second approach seems more general. It would for example allow different TSGs for different file extensions. I am not sure that is too useful though. If a separate TSG is necessary, what are the chances that the grammar is still the same? Different language configurations might work as well. The second approach seems a bit over-engineered for now.

One way in which special files may differ from regular sources is that a file may apply to multiple languages. An example is package.json, which is used for JavaScript and TypeScript. If both languages define special analyzers, we may need to apply both?

One interesting question is how to get extra data into the analyzer. Do we expect a fixed set of data items, or should this be completely open and up to the user? For example, tsconfig.json requires paths, but another might require the contents of the files, or Git metadata? Can we rule that out?

Exit criteria

There's an established way to define stack graph analyzers for special files, identified by their file name.
Special files are correctly analyzed as part of test runs.
The special files analyzer can be used explicitly be crate users, which might require their own orchestration of the different steps.

Creating JSON/HTML output - Question

Hi, I am looking through the cli trying to understand how to use stack graphs. To help with that I am trying to print out the stack graphs that I generate to better understand the structure. I see that in cli/test.rs the StackGraph instance makes a call to to_html_string. However, this method cannot be found when I try calling it directly.

I can see that it is implemented in visualization.rs. Is there some sort of way to import this that I am missing? Sorry I'm new to rust and wasn't able to find a suitable answer.

Implement simple LSP server

Tracking issue to implement a simple LSP server that supports jump to definition.

Tasks

#238
Workspace folders are indexed on start up and when added, removed workspace folders are cleaned.
Files are reindexed when saved.
Jump to definition requests are supported.

Exit criteria

A VSCode extension can be run either as an extension host, or by symlinking to the repository.
Indexing and jump to definition queries are handled based on the stack graph rules.

Cleanup Loader logic

The stack graph language loading code can use some cleanup. Now that we have the LanguageConfiguration I think it is possible to express loading from a set of paths or loading from a TSConfig as generating appropriate LanguageConfigurations. The actual loader (which would parse and cache, but not actually load anything from disk anymore) would simply take these configurations without having to know where they came from.

Example: Modules and imports

Add tree-sitter-stack-graphs example showing how to encode modules and imports, qualified names.

Example: Function application

Add tree-sitter-stack-graphs example showing how to encode function application w/ value tracking through arguments.

Graph Visualizations extremely flat

I love the idea of being able to visualize stack graphs, however, these graph visualizations get extremely "wide"/"flat" when there are more than ~100 nodes. Looking at graphviz documentation it seems like using a layout engine such as neato or twopi might be better suited to this visualization use case.

Simplify `cli::util::Logger` and make public

The logger trait used by the CLI test and index code to report status is part of a private module, making it impossible for external clients to reimplement it, or even use the code that expects it. It should be public.

Before we do that, it has to be simplified though. Currently, the Logger has to deal with various calling scenarios. E.g., failure/skipped/success/warning may be called when processing has been called first, or when processing hasn't been called at all. The default_failure method might be called when failure has been called or when it hasn't. This means the logger has to maintain some tricky state to ensure it outputs only a single status line.

To make it easier to implement that trait, it should be simplified and the calling code has to ensure it is called in a sensible way. A proposal:

trait Logger {
  fn started(&self, &Path);
  fn finished(&self, &Path, Result, details: Option<&str>)
}

enum Result {
  Success,
  Failed,
  Skipped,
  Ignored,
}

For each path, require that started and finished are called exactly once.

Additional ideas

Do we try to encode the protocol requirements in the trait? Something like:

trait Logger {
    fn started(&self, &Path) -> Finish;
}

trait Finish {
  fn finished(self, Result, details: Option<&str>);
}

How do we deal with parallel execution? Perhaps we need a method to indicate we if we'll run in parallel, so console logging can only log on finished.
Can we use https://crates.io/crates/indicatif for the console logger? This might support incremental parallel console logging.

Java TSG: Cannot jump to definitions within same package

In Java, you can omit imports within the same package.
This seems to work with the TSG, except if you use a simple package naming convention.

Repro case

Use a simple package name
Specify package name in each package file

// File example/A.java
package example;

public class A {
   public void method() {
      B instance = new B();
      instance.method(); 
   }
}

// File example/B.java
package example;

public class B {
   public void method() {
   }
}

Expected behavior

When resolving the definition of instance.method in example/A.java the definition should be resolved to B.method in example/B.java.

Actual behavior

No definition is not found.

Example: Type members

Add tree-sitter-stack-graphs example showing how to encode type members such as fields or methods.

stack-graphs: Provide alternative to StackGraph::get_file_unchecked that doesn't panic

Background

There are currently 2 ways to get a Handle<File> in StackGraph:

StackGraph::get_file_unchecked, which panics if the file is not found.
StackGraph::get_or_create_file, which borrows a mutable reference.

The perceived problem

It is currently not possible to check a Handle<File> in a convenient way.

Proposed enhancement

I propose a third option is added:

StackGraph::<name to be defined>(&self, name: &S) which returns Option<Handle<File>>

Essentially, 2) using options instead of fatal errors.

Thanks in advance.

Example: Namespaces

Add tree-sitter-stack-graphs example showing how to encode namespaces.

Generate Rust/Cargo based projects from `init` command

The init command generates NPM based projects. Since we are using Cargo based projects now, the init command should generate those.

Atomic pop and push node type?

Hello,

I was playing around with the stack (tree sitter used with tree sitter stack graphs) on a toy language, and while trying to write a tree sitter stack graph file I've run into problems where I get paths that don't end where I want to. It would be possible to reject these I assume based on checking e.g., whether the path ends at a real definition node, but it would be nicer to prevent these from being generated in the first place.

To address this, it seems like it would be possible to add a node type for atomically popping a symbol from the stack and replacing it with a new one, which would prevent pathing from considering it as a valid path when the symbol stack is empty.

I realize this is unlikely to be a priority, but is there any reason why this wouldn't be possible?

As a motivating example I have a symbol I've created called "resolve-type". When I see an assignment I can push "resolve-type" followed by the rhs identifier and then the declaration pops the identifier, pops the "resolve-type" node, and pushes the real type. This works in that the actual type is resolved, but there's also an unwanted path ending at the "resolve-type" node. Making the pop of the "resolve-type" node and the subsequent push of the declared type atomic would prevent this.

Language bindings

Hey hey,

First of all, this project looks super cool!

I was wondering if you're considering publishing language bindings for this lib, similar to how there are bindings for tree-sitter in python, js, ...

I was considering using this project on top of some of the tree-sitter parsers I'm maintaining, and having bindings for python and js for this lib would be amazing.

Docs suggest that method can panic, but it does not

This comment here:

stack-graphs/stack-graphs/src/graph.rs

Line 360 in ba57851

 /// Returns the file with a particular name. Panics if there is no file with the requested 

seems to suggest that get_file can panic, but it does not.

Likewise with StackGraph::add_from_graph.

Is the comment outdated (seems as though the code is newer) or the get_file function should, in fact, panic?

how to understand from where a function call is made?

I am using TreeSitter to parse python codes. I need to determine from which file a function is called.

For example, I need to understand check_files_in_directory is invoked from GPT4Readability.utils. I already captured all the function calls.

But now I have to find out from which file check_files_in_directory is called. I am struggling to understand what would the logic to do it. Can anyone please suggest?

import os
from getpass import getpass
from GPT4Readability.utils import *
import importlib.resources as pkg_resources  


def generate_readme(root_dir, output_name, model):
    """Generates a README.md file based on the python files in the provided directory

    Args:
        root_dir (str): The root directory of the python package to parse and generate a readme for
    """

    # prompt_folder_name = os.path.join(os.path.dirname(__file__), "prompts")
    # prompt_path = os.path.join(prompt_folder_name, "readme_prompt.txt")

    with pkg_resources.open_text('GPT4Readability.prompts','readme_prompt.txt') as f:         
	    inb_msg = f.read()

    # with open(prompt_path) as f:
    #     lines = f.readlines()
    # inb_msg = "".join(lines)

    file_check_result = check_files_in_directory(root_dir)

Modeling nested scoping

Problem

How to model nested scoping in stack graphs? The problematic pattern is illustrated by the following lambda term:

(\x\y.x) t t'

The question is what the stack graph model is that ensures the result of this whole term is t.

This problem occurs in dynamic languages with nested functions where we model data flow (e.g., JavaScript & Python), as well as in static languages with nested type abstractions (i.e., TypeScript & Java).

MWE

I created a minimal(-ish) working example using JavaScript that illustrates this problem, which can be found in tree-sitter-code-nav/tree/mwe-nested-scoping.
The JavaScript rules are stripped down to set that implements a lambda calculus.
The examples directory contains several examples of different cases. The file id.js defines an id function, which works as expected. The file const.js defines a cnst function with a nested function which exhibits the problem. For comparison, there is also nested-calls.js, which does a regular function call from a function body.

The JavaScript program exhibiting the problem:

let o1 = {a:{}};
let o2 = {b:{}};

let cnst = function (x) {
    return function (y) {
        return x;
    }
}

cnst(o1)(o2).a;

The ccorresponding stack graph:

Discussion

The core of the problem is that this kind of scoping results in two consecutive scoped pop nodes. The second pop replaces the scope stack coming from the first pop, which is lost. However, because the inner function can see the enclosing function, a reference the parameter of the outer function (y in our example) steps out of its own scope into the surrounding scope. In this surrounding scope, the the scope stack from the first push should be used for any jumps, instead of the scope stack resulting from the pop of the inner function.

In the MWE, the point where we step out of the inner scope is marked by a DROP-ONE node (here in the rules). However, this does not have the intended behavior. It pops the top of the scope stack, but it should instead restore the outer scope stack somehow.

Possible Solution

I have sketched my idea of using a stack of scope stacks, instead of a single stack. The top of that scope stack stack behaves as the current scope stack, but the remainder of the scope stack stack is captured on push and restored on jump, so that we always restore the complete context as it was at the push. A drop simply drops the top, going to the scope context as it was before the last push (which corresponds to stepping out of the scope of the nested function into the enclosing function).

Edit: Instead of scope stack stack, I am now calling it a scope context, which contains scope stacks.

Stack graph rules: stack-graphs.pdf

Rules applied to the example to illustrate the effect during resolution: paths-for-example.pdf

Support bookmarking nodes

It can be very helpful to be able to (visually) bookmark specific nodes when debugging larger graphs, where it can be hard to quickly find back nodes of interest.

Can't create node for file and also use tsg stanzas

Here's some pseudocode for what doesn't work

let file_handle = graph.get_or_create_file(filename);
let new_id = graph.new_node_id(file_handle);
let symbol = graph.add_symbol(filename);
let file_node = graph.add_*_node.unwrap()
if is python_file {
     language.build_stack_graph_into(...);
}

with

panicked at 'called `Option::unwrap()` on a `None` value', .../cargo/registry/src/github.com-1ecc6299db9ec823/tree-sitter-stack-graphs-0.3.0/src/lib.rs:694:14

However, moving

let new_id ...
let file_node ...

after the if python_file block works.

From tracing the error and debugging in vscode it seems like maybe tsg is trying to create an id for a file that is already in the graph (???) and then returns none up. I don't really know if that's what is actually happening though.

maybe a more verbose/explanatory error could be passed up
how am I supposed to create a node for the file, pass it in through globals, and then link it through the grammar?

The second question brings up another issue. When putting nodes into the globals what is the right way to pass them into globals. Handle<Node> doesn't seem like it can collapse into a tree_sitter_graph::graph::Value, however globals seems to accept tsg::graph::GraphNode or GraphNodeRef. Can I convert Handle<Node>to either of those?

Hi Douglas and Patrick!

    Hi Douglas and Patrick!

Congratulations on the precise code navigation release! All of this work is super exciting.

For larger teams working in monorepos, there can be a lot of interop/references across languages and services defined within a single repository. Do you think it might make sense for stack-graphs/tree-sitter-graph to eventually accommodate custom rules/configuration so users can augment their graphs in these contexts? Perhaps these custom rules or stanzas could overlay the default language-specific logic already being applied.

With something like sourcegraph, end users can upload their own LSIF indices to this end. It would be really great if this were eventually possible with GitHub's code navigation.

Thanks for your consideration, and thank you for this project!
Brian

Originally posted by @bts in #46

Java TSG: Annotations as parameter values to annotations break TSG rule

Repro case

@Cat(name = "Garfield", description = "A feline", taxonomy = @Feline)

Affected TSG rule

(marker_annotation name: (_) @name) @this {
  node @this.ref
  attr (@this.ref) node_reference = @name
}

Please help me

Can you tell me how to use this stack graph repository to implement code navigation for local python projects? I would be very grateful if you could give me a specific example.

Add support for selectively visualizing incomplete or shadowed paths

Visualizing incomplete or shadowed paths can be very useful for debugging incorrect or missing resolution paths. At the moment, these are not included in the visualization, because doing so for all incomplete paths would result in too much data.

The idea is to selectively include this data when a flag is specified on the command line. This could be something like --include-all-paths PATH:LINE:COLUMN, which would cause the inclusion of incomplete paths for the reference(s) at the given location.

I imagine that these incomplete paths are included in the set of paths that one can cycle through when clicking a reference, and perhaps also are highlighted when hovering the node. Being able to render these in a different color than the complete paths might be useful.

More details will be filled in here later.

Benchmark arena-allocated lists and ensure they display desired performance properties.

As discussed in #5

Creating Stack Graph of Nested Dirs/Files - Question

I was wondering what way is suggested to build a stack graph of nested directories with files in those directories. Would it involve a recursive call on directories and then calling StackGraph::add_from_graph(someMasterGraph, fileStackGraph);? Is there any example tsg_source anyone would be willing to provide to create nodes for each directory and nodes for each file, then connect the files to the directory/would that even work? I'm essentially looking to create something like this:

with each f*.py having their own stack graph (that of course connect files/methods/functions with the right definitions, possibly from directories up or across).

Any suggestions/directions I should head?

I really think what y'all have built is so neat, just trying to learn how to use it! Super cool stuff.

Example: Parametric polymorphism

Add tree-sitter-stack-graphs example showing how to encode generics or some other form of parametric polymorphism.

Edges from root usually should have a name pop node

This:

stack-graphs/languages/tree-sitter-stack-graphs-java/src/stack-graphs.tsg

Line 42 in 44575d6

edge ROOT_NODE -> @prog.defs

should probably be split to have a pop node for the file name in the middle, unless Java files don't define namespaces.

This post is in response to the false & disrespectful claims made by some silly GitHub devs, who claimed to be "engineers" or be related to "engineering". None of you are engineers, and you all know nothing about engineering.

This post is in response to the false & disrespectful claims made by some silly GitHub devs, who claimed to be "engineers" or be related to "engineering". None of you are engineers, and you all know nothing about engineering.
None of you GitHub devs are engineers, and will never even be close. There is no such thing as "software" "engineering". That is the stupidest and most pathetic lie ever. Stop being disrespectful to Engineering. You know nothing about what Engineering entails. Engineering means designing, building and producing real mechanical systems, like turbojets and vehicular engines. It has nothing to do with sitting behind a keyboard and writing code. And it has absolutely NOTHING related with making or maintaining a website/platform. Software dev is easy and for kids. And it has absolutely nothing to do with a serious and real subject like Engineering (Mechanical Engineering is the ONLY Engineering). Ya'll are just trying to be wannabes and steal the name of engineering and attach falsely claim to be related to it (but you'll never come close to engineering). So quit making false claims against engineering. And stop being disrespectful to real Engineers (Mechanical Engineering is the ONLY Engineering).
Death to "universities" [DUMPSTERveristies]. Death to every idiot who uses the disgusting & childish "titles" when mentioning first names. Death to them all.
Mechanical Engineering is the ONLY Engineering. It has always been like that, and will remain like that forever, period.

Java TSG: Duplicate formal parameter error

Documenting an error case in Java TSG:

Example:

public class Foo {
    public static void main(String args[]) {
    }
}

This results in:
1: Duplicate variable [syntax node formal_parameter (2, 27)].lexical_scope

Support setting global variables in tests

TSG files can declare global variables that must be provided by the caller. Tests cannot currently specify the values of arbitrary globals, which may result in untestable TSGs.

To solve this, tests should allow specifying the value of global variables in comments. For example, something like this:

// --- global: GLOBAL_VAR=value ---

Initially this would support string values only.

In a multi file test (with --- path: ... --- markers) the globals are set only for the file in which they appear. Optionally, globals that appear before the first --- path marker could be set for all files in the test.

github / stack-graphs Goto Github PK

stack-graphs's Introduction

Stack graphs

How to contribute

Credits

License

stack-graphs's People

Contributors

Stargazers

Watchers

Forkers

stack-graphs's Issues

Repro case

Affected TSG rule

Repro case

Expected result

Actual result

Repro case

Affected TSG rule

Changes

Problem

Why is this happening?

Exercise for the reader

What should we do about this?

Implementation

How can we support this?

Exit criteria

Tasks

Exit criteria

Additional ideas

Repro case

Expected behavior

Actual behavior

Background

The perceived problem

Proposed enhancement

Problem

MWE

Discussion

Possible Solution

Repro case

Affected TSG rule

Recommend Projects

Recommend Topics

Recommend Org