Comments (9)
I did some experiments with the parse data and the following is a very rough demo of detecting local variables. I write the following functions to detect all enclosing scopes given a document location and detect local assignments in all these scopes, which are available symbols at the location.
detect_enclosing_exprs <- function(data, line, col) {
sel <- (data[1, ] < line | (data[1, ] == line & data[2, ] <= col)) &
(data[3, ] > line | (data[3, ] == line & data[4, ] >= col))
data[7, sel]
}
detect_assign_symbol <- function(data, id) {
s <- which(data$parent == id)
if (length(s) == 3L) {
ids <- data$id[s]
tokens <- data$token[s]
if (tokens[1] == "expr" && tokens[3] == "expr") {
if (tokens[[2]] %in% c("LEFT_ASSIGN", "EQ_ASSIGN")) {
symbol_id <- ids[1]
value_id <- ids[3]
} else if (tokens[[2L]] %in% c("RIGHT_ASSIGN")) {
symbol_id <- ids[3]
value_id <- ids[1]
} else {
return(NULL)
}
s <- which(data$parent == symbol_id)
if (length(s) == 1L) {
if (data$token[s] == "SYMBOL") {
return(list(symbol = data$text[s], id = id, value_id = value_id))
}
}
}
}
}
detect_assign_symbols <- function(data, id, env, level = 0L) {
row <- which(data$id == id)
if (data$done[[row]]) return(NULL)
parent <- data$parent[[row]]
siblings <- data$id[data$parent == parent]
data$done[[row]] <- TRUE
for (sibling in siblings) {
if (sibling == id) next
srow <- which(data$id == sibling)
if (data$terminal[[srow]]) next
if (data$token[[srow]] %in% c("expr", "equal_assign")) {
info <- detect_assign_symbol(data, sibling)
if (length(info)) {
env[[info$symbol]] <- info$id
Recall(data, info$value_id, env, level + 1L)
} else {
Recall(data, sibling, env, level + 1L)
}
}
}
}
Suppose the following code is the source code we are editing:
x0 <- 1
x1 <- 1 + x0
x2 <- 2 + x0 + abs(x0)
f <- function(x, y) {
x + y
}
if (x1 > 0) {
x2 <- 1
}
p <- local({
m <- 1
n <- 2
m + n
})
g <- function(x) {
y <- z <- 1
bar <- function(z) {
p <- 1 + y
a = 2 + z
3 + z -> c
names(p) <- "a"
# comment here
x + y + z
}
}
Suppose we want to know the symbols local to a certain line and col (e.g. (28, 10), at x + () + y + x
), first we detect all enclosing scopes:
e <- parse("test1.R", keep.source = TRUE)
f <- attr(e, "srcfile")
p <- getParseData(e)
p[p$id %in% detect_enclosing_exprs(f$parseData, 28, 10), ]
We get
line1 col1 line2 col2 id parent token terminal text
283 20 1 30 1 283 0 expr FALSE
282 20 6 30 1 282 283 expr FALSE
279 20 18 30 1 279 282 expr FALSE
274 22 3 29 3 274 279 expr FALSE
273 22 10 29 3 273 274 expr FALSE
270 22 22 29 3 270 273 expr FALSE
265 28 5 28 17 265 270 expr FALSE
261 28 5 28 13 261 265 expr FALSE
This includes the local scope all the way up to the top-level scopes. In theory, if we can detect all assignments whose parents are these, we can get symbols local to the given location.
p <- cbind(getParseData(e), done = FALSE)
env <- new.env()
enclose_ids <- p[p$id %in% detect_enclosing_exprs(f$parseData, 28, 10), "id"]
for (id in enclose_ids) {
detect_assign_symbols(p, id, env)
}
ls.str(env)
Except for the function arguments (I haven't implemented yet), all symbols local to the given location (28, 10) are correctly detected.
a : int 220
c : int 233
f : int 77
g : int 283
p : int 206
x0 : int 7
x1 : int 20
x2 : int 44
y : int 183
z : int 182
This is a very initial demonstration that the correct lexical analysis in local scopes is perfectly possible using parse data, which would hopefully allow us to implement most lsp providers.
from languageserver.
Demo functions are updated so that assignments and function formals are included:
detect_assign_symbol <- function(data, id) {
s <- which(data$parent == id)
if (length(s) == 3L) {
ids <- data$id[s]
tokens <- data$token[s]
if (tokens[1] == "expr" && tokens[3] == "expr") {
if (tokens[[2]] %in% c("LEFT_ASSIGN", "EQ_ASSIGN")) {
symbol_id <- ids[1]
value_id <- ids[3]
} else if (tokens[[2L]] %in% c("RIGHT_ASSIGN")) {
symbol_id <- ids[3]
value_id <- ids[1]
} else {
return(NULL)
}
s <- which(data$parent == symbol_id)
if (length(s) == 1L) {
if (data$token[s] == "SYMBOL") {
return(list(symbol = data$text[s],
id = id,
symbol_id = data$id[s],
value_id = value_id))
}
}
}
}
}
detect_assign_symbols <- function(data, id, env, level = 0L) {
row <- which(data$id == id)
if (data$done[[row]]) return(NULL)
parent <- data$parent[[row]]
siblings <- data$id[data$parent == parent]
data$done[[row]] <- TRUE
for (sibling in siblings) {
if (sibling == id) next
srow <- which(data$id == sibling)
if (data$terminal[[srow]]) next
if (data$token[[srow]] %in% c("expr", "equal_assign")) {
info <- detect_assign_symbol(data, sibling)
if (length(info)) {
env[[info$symbol]] <- data[data$id == info$symbol_id,]
Recall(data, info$value_id, env, level + 1L)
} else {
Recall(data, sibling, env, level + 1L)
}
}
}
}
detect_function_symbols <- function(data, id, env) {
siblings <- data$id[data$parent == id]
if (length(siblings) &&
data$token[data$id == siblings[1]] == "FUNCTION") {
for (sibling in siblings) {
srow <- which(data$id == sibling)
if (data$token[[srow]] == "SYMBOL_FORMALS") {
env[[data$text[[srow]]]] <- data[srow, ]
}
}
}
}
p <- cbind(getParseData(e), done = FALSE)
env <- new.env()
enclose_ids <- p$id[p$id %in% detect_enclosing_exprs(f$parseData, 28, 10)]
for (id in enclose_ids) {
detect_function_symbols(p, id, env)
detect_assign_symbols(p, id, env)
}
do.call(rbind, as.list.environment(env))
line1 col1 line2 col2 id parent token terminal text done
x 20 15 20 15 168 291 SYMBOL_FORMALS TRUE x FALSE
y 21 3 21 3 173 175 SYMBOL TRUE y FALSE
z 22 19 22 19 191 282 SYMBOL_FORMALS TRUE z FALSE
a 24 5 24 5 218 220 SYMBOL TRUE a FALSE
c 25 14 25 14 239 241 SYMBOL TRUE c FALSE
f 5 1 5 1 49 51 SYMBOL TRUE f FALSE
g 20 1 20 1 163 165 SYMBOL TRUE g FALSE
m 22 22 22 22 194 282 SYMBOL_FORMALS TRUE m FALSE
n 22 25 22 25 197 282 SYMBOL_FORMALS TRUE n FALSE
p 23 5 23 5 205 207 SYMBOL TRUE p FALSE
x0 1 1 1 2 1 3 SYMBOL TRUE x0 FALSE
x1 2 1 2 2 10 12 SYMBOL TRUE x1 FALSE
x2 3 1 3 2 23 25 SYMBOL TRUE x2 FALSE
All symbols in scope are detected with their locations.
from languageserver.
@renkun-ken In the long run, I think we will need to parse the expressions more systematically. lintr
uses xmlparsedata to allow easier searching and query. We may also want to move to that direction.
from languageserver.
It would be lovely to have. Sure enough, it is not that simple, for example
foo <- function() {
x <- 1
bar <- function() {
y <- 2
bla <- function() {
z <- 3
}
}
}
At some point, I think we could move the parser to a separate package. Maybe even use RCpp to move some code to c++.
from languageserver.
codetools may be useful here https://cran.r-project.org/web/packages/codetools/index.html
from languageserver.
I notice getParseData()
can be very useful to provide full-text parser info. For example, the following code:
x0 <- 1
x1 <- 1 + x0
x2 <- 2 + x0 + abs(x0)
f <- function(x, y) {
x + y
}
g <- function(x) {
y <- 1
bar <- function(z) {
x + y + z
}
}
And then we can use getParseData
to generate a data frame that maps each code unit to its location by reframing attr(e, "srcfile")$parseData
.
e <- parse("test1.R", keep.source = TRUE)
getParseData(e)
line1 col1 line2 col2 id parent token terminal text
7 1 1 1 7 7 0 expr FALSE
1 1 1 1 2 1 3 SYMBOL TRUE x0
3 1 1 1 2 3 7 expr FALSE
2 1 4 1 5 2 7 LEFT_ASSIGN TRUE <-
4 1 7 1 7 4 5 NUM_CONST TRUE 1
5 1 7 1 7 5 7 expr FALSE
20 2 1 2 12 20 0 expr FALSE
10 2 1 2 2 10 12 SYMBOL TRUE x1
12 2 1 2 2 12 20 expr FALSE
11 2 4 2 5 11 20 LEFT_ASSIGN TRUE <-
19 2 7 2 12 19 20 expr FALSE
13 2 7 2 7 13 14 NUM_CONST TRUE 1
14 2 7 2 7 14 19 expr FALSE
15 2 9 2 9 15 19 '+' TRUE +
16 2 11 2 12 16 18 SYMBOL TRUE x0
18 2 11 2 12 18 19 expr FALSE
44 3 1 3 22 44 0 expr FALSE
23 3 1 3 2 23 25 SYMBOL TRUE x2
25 3 1 3 2 25 44 expr FALSE
24 3 4 3 5 24 44 LEFT_ASSIGN TRUE <-
43 3 7 3 22 43 44 expr FALSE
32 3 7 3 12 32 43 expr FALSE
26 3 7 3 7 26 27 NUM_CONST TRUE 2
27 3 7 3 7 27 32 expr FALSE
28 3 9 3 9 28 32 '+' TRUE +
29 3 11 3 12 29 31 SYMBOL TRUE x0
31 3 11 3 12 31 32 expr FALSE
30 3 14 3 14 30 43 '+' TRUE +
41 3 16 3 22 41 43 expr FALSE
33 3 16 3 18 33 35 SYMBOL_FUNCTION_CALL TRUE abs
35 3 16 3 18 35 41 expr FALSE
34 3 19 3 19 34 41 '(' TRUE (
36 3 20 3 21 36 38 SYMBOL TRUE x0
38 3 20 3 21 38 41 expr FALSE
37 3 22 3 22 37 41 ')' TRUE )
77 5 1 7 1 77 0 expr FALSE
49 5 1 5 1 49 51 SYMBOL TRUE f
51 5 1 5 1 51 77 expr FALSE
50 5 3 5 4 50 77 LEFT_ASSIGN TRUE <-
76 5 6 7 1 76 77 expr FALSE
52 5 6 5 13 52 76 FUNCTION TRUE function
53 5 14 5 14 53 76 '(' TRUE (
54 5 15 5 15 54 76 SYMBOL_FORMALS TRUE x
55 5 16 5 16 55 76 ',' TRUE ,
57 5 18 5 18 57 76 SYMBOL_FORMALS TRUE y
58 5 19 5 19 58 76 ')' TRUE )
73 5 21 7 1 73 76 expr FALSE
60 5 21 5 21 60 73 '{' TRUE {
68 6 3 6 7 68 73 expr FALSE
62 6 3 6 3 62 64 SYMBOL TRUE x
64 6 3 6 3 64 68 expr FALSE
63 6 5 6 5 63 68 '+' TRUE +
65 6 7 6 7 65 67 SYMBOL TRUE y
67 6 7 6 7 67 68 expr FALSE
71 7 1 7 1 71 73 '}' TRUE }
139 9 1 14 1 139 0 expr FALSE
82 9 1 9 1 82 84 SYMBOL TRUE g
84 9 1 9 1 84 139 expr FALSE
83 9 3 9 4 83 139 LEFT_ASSIGN TRUE <-
138 9 6 14 1 138 139 expr FALSE
85 9 6 9 13 85 138 FUNCTION TRUE function
86 9 14 9 14 86 138 '(' TRUE (
87 9 15 9 15 87 138 SYMBOL_FORMALS TRUE x
88 9 16 9 16 88 138 ')' TRUE )
135 9 18 14 1 135 138 expr FALSE
90 9 18 9 18 90 135 '{' TRUE {
98 10 3 10 8 98 135 expr FALSE
92 10 3 10 3 92 94 SYMBOL TRUE y
94 10 3 10 3 94 98 expr FALSE
93 10 5 10 6 93 98 LEFT_ASSIGN TRUE <-
95 10 8 10 8 95 96 NUM_CONST TRUE 1
96 10 8 10 8 96 98 expr FALSE
130 11 3 13 3 130 135 expr FALSE
101 11 3 11 5 101 103 SYMBOL TRUE bar
103 11 3 11 5 103 130 expr FALSE
102 11 7 11 8 102 130 LEFT_ASSIGN TRUE <-
129 11 10 13 3 129 130 expr FALSE
104 11 10 11 17 104 129 FUNCTION TRUE function
105 11 18 11 18 105 129 '(' TRUE (
106 11 19 11 19 106 129 SYMBOL_FORMALS TRUE z
107 11 20 11 20 107 129 ')' TRUE )
126 11 22 13 3 126 129 expr FALSE
109 11 22 11 22 109 126 '{' TRUE {
121 12 5 12 13 121 126 expr FALSE
117 12 5 12 9 117 121 expr FALSE
111 12 5 12 5 111 113 SYMBOL TRUE x
113 12 5 12 5 113 117 expr FALSE
112 12 7 12 7 112 117 '+' TRUE +
114 12 9 12 9 114 116 SYMBOL TRUE y
116 12 9 12 9 116 117 expr FALSE
115 12 11 12 11 115 121 '+' TRUE +
118 12 13 12 13 118 120 SYMBOL TRUE z
120 12 13 12 13 120 121 expr FALSE
124 13 3 13 3 124 126 '}' TRUE }
133 14 1 14 1 133 135 '}' TRUE }
I think this provides everything we need. We can provide local symbols, references, definition, etc.
from languageserver.
Ya, I have been peeking on how styler, lintr and codetools do things.
from languageserver.
At some point, I think we could move the parser to a separate package. Maybe even use RCpp to move some code to c++.
Yesterday I came across another package that might be interesting here: sourcetools. On CRAN only two functions are documented, but there's a lot of undocumented work on parsing. In particular, see https://github.com/kevinushey/sourcetools/tree/devel/inst/include/sourcetools
There's even work on completion and diagnostics, although they're incomplete stubs and it doesn't look like there's been much work in the last two years. I'm guessing it was a project to extract a lot of the RStudio code analysis features out into an R package.
from languageserver.
@randy3k Thanks for pointing to xmlparsedata! This certainly makes life easier using XPath. It looks like we can do a lot with it.
from languageserver.
Related Issues (20)
- Provide signature help on openening bracket
- Optimizing the R code parsing
- Support dir-nested R packages (monorepo support)
- Langaugeserver only works in the terminal but not in the script HOT 2
- Handle multi workspace folders HOT 3
- Autocomplete does not work for all R files HOT 4
- Formatting Jupyter Notebooks adds a newline at the end of every cell
- Support for old R version HOT 3
- Support for path completions
- Availability of `formatOnType` in quarto
- Roxygen comments soft linebreaks aren't ignored in languageserver hover hints
- Reference support for S4 methods
- No documentSymbol response for Rmarkdown file with helix editor HOT 1
- Variable duplication, one from workspace, another from session
- Autocompletion Bugs in vscode rmd-files
- "glue" function semantic for the expressions inside {}
- spawn: bash failed with exit code 127 and signal 0. HOT 1
- callback error in Neovim: “attempt to compare string with number” HOT 1
- Completion dataframe variable/columns HOT 3
- Neovim lspinstall r-languageserver installation not using .Rprofile CRAN mirror address?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from languageserver.