Git Product home page Git Product logo

Comments (9)

renkun-ken avatar renkun-ken commented on June 6, 2024 1

I did some experiments with the parse data and the following is a very rough demo of detecting local variables. I write the following functions to detect all enclosing scopes given a document location and detect local assignments in all these scopes, which are available symbols at the location.

detect_enclosing_exprs <- function(data, line, col) {
  sel <- (data[1, ] < line | (data[1, ] == line & data[2, ] <= col)) &
    (data[3, ] > line | (data[3, ] == line & data[4, ] >= col))
  data[7, sel]
}

detect_assign_symbol <- function(data, id) {
  s <- which(data$parent == id)
  if (length(s) == 3L) {
    ids <- data$id[s]
    tokens <- data$token[s]
    if (tokens[1] == "expr" && tokens[3] == "expr") {
      if (tokens[[2]] %in% c("LEFT_ASSIGN", "EQ_ASSIGN")) {
        symbol_id <- ids[1]
        value_id <- ids[3]
      } else if (tokens[[2L]] %in% c("RIGHT_ASSIGN")) {
        symbol_id <- ids[3]
        value_id <- ids[1]
      } else {
        return(NULL)
      }
      s <- which(data$parent == symbol_id)
      if (length(s) == 1L) {
        if (data$token[s] == "SYMBOL") {
          return(list(symbol = data$text[s], id = id, value_id = value_id))
        }
      }
    }
  }
}

detect_assign_symbols <- function(data, id, env, level = 0L) {
  row <- which(data$id == id)
  if (data$done[[row]]) return(NULL)
  parent <- data$parent[[row]]
  siblings <- data$id[data$parent == parent]
  data$done[[row]] <- TRUE
  for (sibling in siblings) {
    if (sibling == id) next
    srow <- which(data$id == sibling)
    if (data$terminal[[srow]]) next
    if (data$token[[srow]] %in% c("expr", "equal_assign")) {
      info <- detect_assign_symbol(data, sibling)
      if (length(info)) {
        env[[info$symbol]] <- info$id
        Recall(data, info$value_id, env, level + 1L)
      } else {
        Recall(data, sibling, env, level + 1L)
      }
    }
  }
}

Suppose the following code is the source code we are editing:

x0 <- 1
x1 <- 1 + x0
x2 <- 2 + x0 + abs(x0)

f <- function(x, y) {
  x + y
}

if (x1 > 0) {
  x2 <- 1
}

p <- local({
  m <- 1
  n <- 2

  m + n
})

g <- function(x) {
  y <- z <- 1
  bar <- function(z) {
    p <- 1 + y
    a = 2 + z
    3 + z -> c
    names(p) <- "a"
    # comment here
    x +     y + z
  }
}

Suppose we want to know the symbols local to a certain line and col (e.g. (28, 10), at x + () + y + x), first we detect all enclosing scopes:

e <- parse("test1.R", keep.source = TRUE)
f <- attr(e, "srcfile")
p <- getParseData(e)
p[p$id %in% detect_enclosing_exprs(f$parseData, 28, 10), ]

We get

    line1 col1 line2 col2  id parent token terminal text
283    20    1    30    1 283      0  expr    FALSE     
282    20    6    30    1 282    283  expr    FALSE     
279    20   18    30    1 279    282  expr    FALSE     
274    22    3    29    3 274    279  expr    FALSE     
273    22   10    29    3 273    274  expr    FALSE     
270    22   22    29    3 270    273  expr    FALSE     
265    28    5    28   17 265    270  expr    FALSE     
261    28    5    28   13 261    265  expr    FALSE     

This includes the local scope all the way up to the top-level scopes. In theory, if we can detect all assignments whose parents are these, we can get symbols local to the given location.

p <- cbind(getParseData(e), done = FALSE)
env <- new.env()
enclose_ids <- p[p$id %in% detect_enclosing_exprs(f$parseData, 28, 10), "id"]
for (id in enclose_ids) {
  detect_assign_symbols(p, id, env)
}
ls.str(env)

Except for the function arguments (I haven't implemented yet), all symbols local to the given location (28, 10) are correctly detected.

a :  int 220
c :  int 233
f :  int 77
g :  int 283
p :  int 206
x0 :  int 7
x1 :  int 20
x2 :  int 44
y :  int 183
z :  int 182

This is a very initial demonstration that the correct lexical analysis in local scopes is perfectly possible using parse data, which would hopefully allow us to implement most lsp providers.

from languageserver.

renkun-ken avatar renkun-ken commented on June 6, 2024 1

Demo functions are updated so that assignments and function formals are included:

detect_assign_symbol <- function(data, id) {
  s <- which(data$parent == id)
  if (length(s) == 3L) {
    ids <- data$id[s]
    tokens <- data$token[s]
    if (tokens[1] == "expr" && tokens[3] == "expr") {
      if (tokens[[2]] %in% c("LEFT_ASSIGN", "EQ_ASSIGN")) {
        symbol_id <- ids[1]
        value_id <- ids[3]
      } else if (tokens[[2L]] %in% c("RIGHT_ASSIGN")) {
        symbol_id <- ids[3]
        value_id <- ids[1]
      } else {
        return(NULL)
      }
      s <- which(data$parent == symbol_id)
      if (length(s) == 1L) {
        if (data$token[s] == "SYMBOL") {
          return(list(symbol = data$text[s], 
            id = id, 
            symbol_id = data$id[s],
            value_id = value_id))
        }
      }
    }
  }
}

detect_assign_symbols <- function(data, id, env, level = 0L) {
  row <- which(data$id == id)
  if (data$done[[row]]) return(NULL)
  parent <- data$parent[[row]]
  siblings <- data$id[data$parent == parent]
  data$done[[row]] <- TRUE
  for (sibling in siblings) {
    if (sibling == id) next
    srow <- which(data$id == sibling)
    if (data$terminal[[srow]]) next
    if (data$token[[srow]] %in% c("expr", "equal_assign")) {
      info <- detect_assign_symbol(data, sibling)
      if (length(info)) {
        env[[info$symbol]] <- data[data$id == info$symbol_id,]
        Recall(data, info$value_id, env, level + 1L)
      } else {
        Recall(data, sibling, env, level + 1L)
      }
    }
  }
}

detect_function_symbols <- function(data, id, env) {
  siblings <- data$id[data$parent == id]
  if (length(siblings) && 
    data$token[data$id == siblings[1]] == "FUNCTION") {
    for (sibling in siblings) {
      srow <- which(data$id == sibling)
      if (data$token[[srow]] == "SYMBOL_FORMALS") {
        env[[data$text[[srow]]]] <- data[srow, ]
      }
    }
  }
}
p <- cbind(getParseData(e), done = FALSE)
env <- new.env()
enclose_ids <- p$id[p$id %in% detect_enclosing_exprs(f$parseData, 28, 10)]
for (id in enclose_ids) {
  detect_function_symbols(p, id, env)
  detect_assign_symbols(p, id, env)
}

do.call(rbind, as.list.environment(env))
   line1 col1 line2 col2  id parent          token terminal text  done
x     20   15    20   15 168    291 SYMBOL_FORMALS     TRUE    x FALSE
y     21    3    21    3 173    175         SYMBOL     TRUE    y FALSE
z     22   19    22   19 191    282 SYMBOL_FORMALS     TRUE    z FALSE
a     24    5    24    5 218    220         SYMBOL     TRUE    a FALSE
c     25   14    25   14 239    241         SYMBOL     TRUE    c FALSE
f      5    1     5    1  49     51         SYMBOL     TRUE    f FALSE
g     20    1    20    1 163    165         SYMBOL     TRUE    g FALSE
m     22   22    22   22 194    282 SYMBOL_FORMALS     TRUE    m FALSE
n     22   25    22   25 197    282 SYMBOL_FORMALS     TRUE    n FALSE
p     23    5    23    5 205    207         SYMBOL     TRUE    p FALSE
x0     1    1     1    2   1      3         SYMBOL     TRUE   x0 FALSE
x1     2    1     2    2  10     12         SYMBOL     TRUE   x1 FALSE
x2     3    1     3    2  23     25         SYMBOL     TRUE   x2 FALSE

All symbols in scope are detected with their locations.

from languageserver.

randy3k avatar randy3k commented on June 6, 2024 1

@renkun-ken In the long run, I think we will need to parse the expressions more systematically. lintr uses xmlparsedata to allow easier searching and query. We may also want to move to that direction.

from languageserver.

randy3k avatar randy3k commented on June 6, 2024

It would be lovely to have. Sure enough, it is not that simple, for example

foo <- function() {
	x <- 1
	bar <- function() {
		y <- 2
			bla <- function() {
				z <- 3
			}
	}
}

At some point, I think we could move the parser to a separate package. Maybe even use RCpp to move some code to c++.

from languageserver.

randy3k avatar randy3k commented on June 6, 2024

codetools may be useful here https://cran.r-project.org/web/packages/codetools/index.html

from languageserver.

renkun-ken avatar renkun-ken commented on June 6, 2024

I notice getParseData() can be very useful to provide full-text parser info. For example, the following code:

x0 <- 1
x1 <- 1 + x0
x2 <- 2 + x0 + abs(x0)

f <- function(x, y) {
  x + y
}

g <- function(x) {
  y <- 1
  bar <- function(z) {
    x + y + z
  }
}

And then we can use getParseData to generate a data frame that maps each code unit to its location by reframing attr(e, "srcfile")$parseData.

e <- parse("test1.R", keep.source = TRUE)
getParseData(e)
    line1 col1 line2 col2  id parent                token terminal     text
7       1    1     1    7   7      0                 expr    FALSE         
1       1    1     1    2   1      3               SYMBOL     TRUE       x0
3       1    1     1    2   3      7                 expr    FALSE         
2       1    4     1    5   2      7          LEFT_ASSIGN     TRUE       <-
4       1    7     1    7   4      5            NUM_CONST     TRUE        1
5       1    7     1    7   5      7                 expr    FALSE         
20      2    1     2   12  20      0                 expr    FALSE         
10      2    1     2    2  10     12               SYMBOL     TRUE       x1
12      2    1     2    2  12     20                 expr    FALSE         
11      2    4     2    5  11     20          LEFT_ASSIGN     TRUE       <-
19      2    7     2   12  19     20                 expr    FALSE         
13      2    7     2    7  13     14            NUM_CONST     TRUE        1
14      2    7     2    7  14     19                 expr    FALSE         
15      2    9     2    9  15     19                  '+'     TRUE        +
16      2   11     2   12  16     18               SYMBOL     TRUE       x0
18      2   11     2   12  18     19                 expr    FALSE         
44      3    1     3   22  44      0                 expr    FALSE         
23      3    1     3    2  23     25               SYMBOL     TRUE       x2
25      3    1     3    2  25     44                 expr    FALSE         
24      3    4     3    5  24     44          LEFT_ASSIGN     TRUE       <-
43      3    7     3   22  43     44                 expr    FALSE         
32      3    7     3   12  32     43                 expr    FALSE         
26      3    7     3    7  26     27            NUM_CONST     TRUE        2
27      3    7     3    7  27     32                 expr    FALSE         
28      3    9     3    9  28     32                  '+'     TRUE        +
29      3   11     3   12  29     31               SYMBOL     TRUE       x0
31      3   11     3   12  31     32                 expr    FALSE         
30      3   14     3   14  30     43                  '+'     TRUE        +
41      3   16     3   22  41     43                 expr    FALSE         
33      3   16     3   18  33     35 SYMBOL_FUNCTION_CALL     TRUE      abs
35      3   16     3   18  35     41                 expr    FALSE         
34      3   19     3   19  34     41                  '('     TRUE        (
36      3   20     3   21  36     38               SYMBOL     TRUE       x0
38      3   20     3   21  38     41                 expr    FALSE         
37      3   22     3   22  37     41                  ')'     TRUE        )
77      5    1     7    1  77      0                 expr    FALSE         
49      5    1     5    1  49     51               SYMBOL     TRUE        f
51      5    1     5    1  51     77                 expr    FALSE         
50      5    3     5    4  50     77          LEFT_ASSIGN     TRUE       <-
76      5    6     7    1  76     77                 expr    FALSE         
52      5    6     5   13  52     76             FUNCTION     TRUE function
53      5   14     5   14  53     76                  '('     TRUE        (
54      5   15     5   15  54     76       SYMBOL_FORMALS     TRUE        x
55      5   16     5   16  55     76                  ','     TRUE        ,
57      5   18     5   18  57     76       SYMBOL_FORMALS     TRUE        y
58      5   19     5   19  58     76                  ')'     TRUE        )
73      5   21     7    1  73     76                 expr    FALSE         
60      5   21     5   21  60     73                  '{'     TRUE        {
68      6    3     6    7  68     73                 expr    FALSE         
62      6    3     6    3  62     64               SYMBOL     TRUE        x
64      6    3     6    3  64     68                 expr    FALSE         
63      6    5     6    5  63     68                  '+'     TRUE        +
65      6    7     6    7  65     67               SYMBOL     TRUE        y
67      6    7     6    7  67     68                 expr    FALSE         
71      7    1     7    1  71     73                  '}'     TRUE        }
139     9    1    14    1 139      0                 expr    FALSE         
82      9    1     9    1  82     84               SYMBOL     TRUE        g
84      9    1     9    1  84    139                 expr    FALSE         
83      9    3     9    4  83    139          LEFT_ASSIGN     TRUE       <-
138     9    6    14    1 138    139                 expr    FALSE         
85      9    6     9   13  85    138             FUNCTION     TRUE function
86      9   14     9   14  86    138                  '('     TRUE        (
87      9   15     9   15  87    138       SYMBOL_FORMALS     TRUE        x
88      9   16     9   16  88    138                  ')'     TRUE        )
135     9   18    14    1 135    138                 expr    FALSE         
90      9   18     9   18  90    135                  '{'     TRUE        {
98     10    3    10    8  98    135                 expr    FALSE         
92     10    3    10    3  92     94               SYMBOL     TRUE        y
94     10    3    10    3  94     98                 expr    FALSE         
93     10    5    10    6  93     98          LEFT_ASSIGN     TRUE       <-
95     10    8    10    8  95     96            NUM_CONST     TRUE        1
96     10    8    10    8  96     98                 expr    FALSE         
130    11    3    13    3 130    135                 expr    FALSE         
101    11    3    11    5 101    103               SYMBOL     TRUE      bar
103    11    3    11    5 103    130                 expr    FALSE         
102    11    7    11    8 102    130          LEFT_ASSIGN     TRUE       <-
129    11   10    13    3 129    130                 expr    FALSE         
104    11   10    11   17 104    129             FUNCTION     TRUE function
105    11   18    11   18 105    129                  '('     TRUE        (
106    11   19    11   19 106    129       SYMBOL_FORMALS     TRUE        z
107    11   20    11   20 107    129                  ')'     TRUE        )
126    11   22    13    3 126    129                 expr    FALSE         
109    11   22    11   22 109    126                  '{'     TRUE        {
121    12    5    12   13 121    126                 expr    FALSE         
117    12    5    12    9 117    121                 expr    FALSE         
111    12    5    12    5 111    113               SYMBOL     TRUE        x
113    12    5    12    5 113    117                 expr    FALSE         
112    12    7    12    7 112    117                  '+'     TRUE        +
114    12    9    12    9 114    116               SYMBOL     TRUE        y
116    12    9    12    9 116    117                 expr    FALSE         
115    12   11    12   11 115    121                  '+'     TRUE        +
118    12   13    12   13 118    120               SYMBOL     TRUE        z
120    12   13    12   13 120    121                 expr    FALSE         
124    13    3    13    3 124    126                  '}'     TRUE        }
133    14    1    14    1 133    135                  '}'     TRUE        }

I think this provides everything we need. We can provide local symbols, references, definition, etc.

from languageserver.

randy3k avatar randy3k commented on June 6, 2024

Ya, I have been peeking on how styler, lintr and codetools do things.

from languageserver.

andycraig avatar andycraig commented on June 6, 2024

At some point, I think we could move the parser to a separate package. Maybe even use RCpp to move some code to c++.

Yesterday I came across another package that might be interesting here: sourcetools. On CRAN only two functions are documented, but there's a lot of undocumented work on parsing. In particular, see https://github.com/kevinushey/sourcetools/tree/devel/inst/include/sourcetools

There's even work on completion and diagnostics, although they're incomplete stubs and it doesn't look like there's been much work in the last two years. I'm guessing it was a project to extract a lot of the RStudio code analysis features out into an R package.

from languageserver.

renkun-ken avatar renkun-ken commented on June 6, 2024

@randy3k Thanks for pointing to xmlparsedata! This certainly makes life easier using XPath. It looks like we can do a lot with it.

from languageserver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.