Git Product home page Git Product logo

gogrep's Introduction

gogrep

GO111MODULE=on go get mvdan.cc/gogrep

Search for Go code using syntax trees.

gogrep -x 'if $x != nil { return $x, $*_ }'

Note that this project is no longer being developed. See #64 for more details.

Instructions

usage: gogrep commands [packages]

A command is of the form "-A pattern", where -A is one of:

   -x  find all nodes matching a pattern
   -g  discard nodes not matching a pattern
   -v  discard nodes matching a pattern
   -a  filter nodes by certain attributes
   -s  substitute with a given syntax tree
   -w  write source back to disk or stdout

A pattern is a piece of Go code which may include wildcards. It can be:

   a statement (many if split by semicolons)
   an expression (many if split by commas)
   a type expression
   a top-level declaration (var, func, const)
   an entire file

Wildcards consist of $ and a name. All wildcards with the same name within an expression must match the same node, excluding "_". Example:

   $x.$_ = $x // assignment of self to a field in self

If * is before the name, it will match any number of nodes. Example:

   fmt.Fprintf(os.Stdout, $*_) // all Fprintfs on stdout

* can also be used to match optional nodes, like:

for $*_ { $*_ }    // will match all for loops
if $*_; $b { $*_ } // will match all ifs with condition $b

The nodes resulting from applying the commands will be printed line by line to standard output.

Here are two simple examples of the -a operand:

   gogrep -x '$x + $y'                   // will match both numerical and string "+" operations
   gogrep -x '$x + $y' -a 'type(string)' // matches only string concatenations

gogrep's People

Contributors

arl avatar divan avatar ferhatelmas avatar icholy avatar magodo avatar mvdan avatar quasilyte avatar rogpeppe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gogrep's Issues

Don't always depend on type information

Once we add typecheking for #3 and others, remember to not require it always. In some cases it won't be possible, such as when we're supplied single files from a larger package or when we're fed a portion of Go source code via standard input.

more examples

Hi, there are not enough examples and how to properly configure gogrep.
How to search recursively only by your project.

  • Only works like that:
find -name '*.go' -exec gogrep -x 'if $x != nil { $*_ }' {} \;  # Slow
  • Doesn't work as you want:
gogrep -r -x 'if $x != nil { $*_ }' #  Gives everything that is in GOROOT, but not in my project.
gogrep -r -x 'if $x != nil { $*_ }' main.go #  Gives everything that is in GOROOT, but not in my project.
gogrep -x 'if $x != nil { $*_ }' main.go # Returns found in only one file
  • Is it possible to make it work like?
grep -r "hello" # Only current dir recursively

Panic when gogrep is used with "-a 'package/lib/type'" and "-x '$x, _ := $a.Method()'"

Continuation from slack thread, here is the panic:

panic: resolveType TODO: *ast.BinaryExpr

goroutine 1 [running]:
main.(*matcher).resolveType(0xc0000cab60, 0xc0059c0d70, 0x139f940, 0xc000075bc0, 0xc004f99ef0, 0xc0000cab01)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:634 +0x555
main.(*matcher).attrApplies(0xc0000cab60, 0x139e680, 0xc006230580, 0x1302c60, 0xc00000acc0, 0x100813b)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:194 +0x6e5
main.(*matcher).cmdAttr(0xc0000cab60, 0x1344535, 0x1, 0x7ffeefbff978, 0x3d, 0x1302c60, 0xc00000acc0, 0xc001c8ff40, 0x1, 0x1, ...)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:155 +0xff
main.(*matcher).cmdAttr-fm(0x1344535, 0x1, 0x7ffeefbff978, 0x3d, 0x1302c60, 0xc00000acc0, 0xc001c8ff40, 0x1, 0x1, 0xc006223080, ...)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:78 +0x79
main.(*matcher).submatches(0xc0000cab60, 0xc0000a2120, 0x1, 0x2, 0xc001c8ff40, 0x1, 0x1, 0x1, 0x1, 0xc001c8ff40)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:89 +0x136
main.(*matcher).submatches(0xc0000cab60, 0xc0000a20f0, 0x2, 0x3, 0xc005badee0, 0x1, 0x1, 0x1f, 0x1f, 0xc005badee0)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:89 +0x1a3
main.(*matcher).submatches(0xc0000cab60, 0xc0000a20c0, 0x3, 0x4, 0xc003bbc000, 0x1f, 0x1f, 0x1636000, 0x0, 0x0)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:89 +0x1a3
main.(*matcher).matches(0xc0000cab60, 0xc0000a20c0, 0x3, 0x4, 0xc0019e1600, 0x1f, 0x1f, 0x0, 0x0, 0x0)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/match.go:24 +0x1d9
main.(*matcher).fromArgs(0xc0000cab60, 0x1344515, 0x1, 0xc00000e0f0, 0x1, 0x1, 0x12e8a60, 0xc000037dd8)
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/main.go:155 +0x20f
main.main()
        /Users/dmatrenichev/go/src/mvdan.cc/gogrep/main.go:63 +0xf9

Add type constraints

Since $x can match any node, it can be limiting when matching expressions. For example, we might want to only match strings.

A few syntaxes come to mind, like $x!string or $x: string.

Here's a thought - what about untyped constants? For example should, 0 match both int and uint, or neither?

option to skip type checks

Prior to commit 58f4747 (port to go/packages), it was possible to run gogrep on a file that doesn't pass type checking. But that no longer works. It would be nice to have that functionality back.

$?<name> to match a single optional node

Right now we only have $*<name> to match optional parts.

Sometimes we want to match exactly 1 optional node inside a pattern. It's either nil or it matches exactly 1 node.

It could be used to match optional parts inside a pattern that are expected to represent a single expression (or none).

I don't think that we need more general {min,max} form from regexps, but ? seems useful.

My use case: "find f() function call that is not followed by a return statement".
It could be expressed via f(); $x and then checking that $x is a return statement. The only problem is that f() could be the last statement inside a block.

We could solve that by using f(); $?x query that should in theory match these cases without return as well.

Advanced queries via Go code

Simple queries can be done as a pipeline, such as finding all unnecessary uses of [:] on strings:

$ gogrep -x '$x[:]' -a 'type(string)'

However, this quickly breaks down if any non-linear logic is required. For example, what if we wanted to catch all unnecessary uses of [:], including both strings and slices? You might imagine a more complex syntax and language, like:

$ gogrep -x '$x[:]' -a 'type(string) OR is(slice)'

However, this would still be a limited subset of all the logic that you might want in your query. Another common example is variations on -g, such as contains at least 2 of X pattern, as opposed to contains at least 1 of X pattern.

Instead of complicating the query language further, we should leverage the Go language, which all gogrep users should already be familiar with. Our unnecessary [:] example above could be something like:

stmts := gogrep.FindAll(input, `$x[:]`)
stmts = gogrep.Filter(stmts, gogrep.AnyOf(
        gogrep.Type("string"),
        gogrep.Kind("slice"),
))
gogrep.Output(stmts, "should drop [:] on strings and slices")

All the features that are available via the command line would be available via this API, as the command line would be implemented using the API after all. For example, to make the code perform a rewrite instead of giving a warning, we could have:

stmts := gogrep.FindAll(input, `$x[:]`)
stmts = gogrep.Filter(stmts, gogrep.AnyOf(
        gogrep.Type("string"),
        gogrep.Kind("slice"),
))
stmts = gogrep.Substitute(stmts, "$x")
gogrep.WriteBack(stmts)

One of the important features of using Go code is the ability to compose commands in more interesting ways, such as our use of gogrep.AnyOf above. But it would also be possible to perform changes and filters directly, as one can range over the stmts list of type []ast.Stmt.

We could even add ways to simplify the writing of simpler pipe-like queries in Go code, such as:

gogrep.Pipe(input,
    gogrep.FindAll("$x[:]"),
    gogrep.Filter(gogrep.AnyOf(gogrep.Type("string"), gogrep.Kind("slice"))),
    gogrep.Substitute("$x"),
    gogrep.WriteBack,
)

The first step is to move the logic out of the main package, so that it can be imported as a Go package. I would also like to come up with a package name that is shorter than gogrep, as one will have to write it everywhere (assuming dot-imports are to be avoided). Perhaps gg for its initials, or gr for "grep".

/cc @rogpeppe @kytrinyx @quasilyte

Matching of the empty block

Given this file:

package example

func f1() {}

func f2() {
	{
	}

	if true {
	}
}

If we do -x '{}', there will be no results:

gogrep -x '{}' example.go

If this is an expected behavior, this issue can be closed.

Support for searching struct types

Goal: find the names of all struct types and each one of their fields that are of type X.

My initial attempt at this was:

$ gogrep -x 'type $x struct {
$*_
$y StmtList
$*_
}
' mvdan.cc/sh/syntax

But that's honestly more through ignorance than any sort of expectation around the $*_ pattern.

cc @mvdan @rogpeppe

Can't match ` '(a << k) | (a >> (32 - k))' ` with wildcards

I was trying to find places where I had rolled my own rotate function instead of using math/bits.

I tried this query:

gogrep -x  '(a << k) | (a >> (32 - k))' .

But it failed to match this code:

func rotl32(k uint32, rot uint32) uint32 {
        return (k << rot) | (k >> (32 - rot))
}

Note that the equivalent gofmt rewrite query does work:

gofmt -d -r '(a << k) | (a >> (32 - k)) -> bits.RotateLeft32(a, k)' .

The gogrep query does work if I use the exact same variable names though:

gogrep -x '(k << rot) | (k >> (32 - rot))' .

$*_ for import specs

It seems that $*_ can't be used inside import spec:

import ($*_; "something/of/interest"; $*_)

The parsing error is:

cannot parse expr: 1:1: expected statement, found 'import'

If this is expected behavior, this issue can be closed.

Add type predicates support

It would be useful to search for $x of type that have a type that satisfies some predicate.

For example, I would like to grep for slices with elements of type $T.
More precisely, I wanted to find all slicing expressions of form s[:] where s is a string or slice.

$ gogrep '$(s type(string))[:]' std
strings/replace.go:450:22: s[:]

This works, but I need to manually substitute string with all other types I want to check.

With type predicates, it would be possible to describe function that accepts node that is being matched with some kind of context object (that includes types info) and returns bool.

I have no idea about how to enable user-defined predicates.
(Go plugins doesn't seem first class citizen.)

add full filesystem tests

We need test coverage for:

  • loading Go files (no types)
  • loading directories (no types)
  • loading packages (no types)
  • loading packages with types
  • printing of matches

Since we have unit tests for the matcher, these shouldn't cover all the match scenarios. But they should cover all the argument loading and stdout printing scenarios.

match multiple arguments

It might be useful to be able to match a variable number of arguments
passed to a function.

For example:

 gogrep 'if err != nil { return fmt.Errorf($_*, err)

This applies more when replacing the pattern with something else:

funcWithVariableArgs($a, $rest*) -> otherFunc($rest)

Still fails at substituting empty lists at times

$ gogrep -x 'fmt.Println($*a)' -s 'fmt.Fprintln(stdout, $a)' -w
panic: unsupported substitution: <nil>

goroutine 1 [running]:
main.(*matcher).substNode(0xc0000e8680, 0x73f180, 0xc0001d5080, 0x73f180, 0xc00031ce80)
        /home/mvdan/go/src/mvdan.cc/gogrep/subst.go:109 +0x135b
main.(*matcher).cmdSubst(0xc0000e8680, 0x6f1b5a, 0x1, 0x7ffd423aca39, 0x18, 0x6a8b20, 0xc0000ea780, 0xc000178f00, 0x9, 0x10, ...)
        /home/mvdan/go/src/mvdan.cc/gogrep/subst.go:22 +0x17d
main.(*matcher).cmdSubst-fm(0x6f1b5a, 0x1, 0x7ffd423aca39, 0x18, 0x6a8b20, 0xc0000ea780, 0xc000178f00, 0x9, 0x10, 0xc00019f800, ...)
        /home/mvdan/go/src/mvdan.cc/gogrep/match.go:76 +0x79
main.(*matcher).submatches(0xc0000e8680, 0xc00012a030, 0x2, 0x3, 0xc000178f00, 0x9, 0x10, 0xc, 0xc, 0xc000178f00)
        /home/mvdan/go/src/mvdan.cc/gogrep/match.go:89 +0x12b
main.(*matcher).submatches(0xc0000e8680, 0xc00012a000, 0x3, 0x4, 0xc0002afd40, 0xc, 0xc, 0xc000143dc8, 0x702cd8, 0x0)
        /home/mvdan/go/src/mvdan.cc/gogrep/match.go:89 +0x198
main.(*matcher).matches(0xc0000e8680, 0xc00012a000, 0x3, 0x4, 0xc0000ed600, 0xc, 0x10, 0xc00000e100, 0xc0000f0d20, 0x1)
        /home/mvdan/go/src/mvdan.cc/gogrep/match.go:24 +0x1d9
main.(*matcher).fromArgs(0xc0000e8680, 0xc00000e0d0, 0x5, 0x5, 0xc00006e058, 0x0)
        /home/mvdan/go/src/mvdan.cc/gogrep/main.go:164 +0x3c2
main.main()
        /home/mvdan/go/src/mvdan.cc/gogrep/main.go:63 +0xe1

Allow standard input too

Sometimes it's useful to be able to search an arbitrary
portion of a Go program (for example to see where any
identifiers named "a" are within a given function).
It would be nice if gogrep could do this.

no Go files in ...

I got strange error in the VictoriaMetrics source root

$ gogrep -x 'if len($x) < cap($x) { $*_ }'
-: no Go files in /home/me/dev/VictoriaMetrics

common grep founds lines correctly:

 $ egrep -r 'if len(.+?) < cap(.+?)' .
./app/vmselect/promql/transform.go:	if len(tags) < cap(tags) {
./app/victoria-metrics/self_scraper.go:			if len(mrs) < cap(mrs) {
./app/victoria-metrics/self_scraper.go:	if len(dst) < cap(dst) {
./lib/storage/metric_name.go:	if len(mn.Tags) < cap(mn.Tags) {
./lib/storage/metaindex_row.go:		if len(dst) < cap(dst) {
./lib/mergeset/metaindex_row.go:		if len(dst) < cap(dst) {
./vendor/github.com/klauspost/compress/zstd/history.go:	if len(b) < cap(h.b)-len(h.b) {

What I do wrong?

more variations of type restrictions

Going beyond relationships to other types, such as equality or assignability. Some come to mind:

  • Being addressable
  • Being comparable
  • Being an interface type

We also want all of these to be invertible, for example the ability to say "all non-addressable expressions".

add regex matching (limited to idents)

I was using gogrep to easily spot ways to simplify the cmd/compile/internal/gc code, which was once C and was transpiled. As such, it has tons of verbosity that is unnecessary.

I was focusing on the following two:

i := 0
for _, v := range slice {
    use(i, v)
    i++
}

for _, v := range slice {
    elem := v
    use(elem)
}

So, for example, for the first I used for _, $_ := range $_ { $*_; $_++ }. It had no false negatives that I can see, but it had one obvious false positive:

for _, v := range slice {
    use(v)
    someMap[someKey]++
}

You can imagine plenty of others. The root of the problem here is that $_ in our $_++ is not limited to just idents, as it can be any expression.

One could say that I should have grepped for $i := 0; for _, $_ := range $_ { $*_; $i++ }. But I think that is a worse solution. What if $i was actually declared as a parameter, or was of the form $i = 0 as it was being reused? Or what if it had statements between the decl and the for?

I think this could be orthogonal to #3. That issue is about constraining to go/types types, i.e. what the code ends up representing. What I am suggesting here is constraining to go/ast, i.e. what type of node it is. The syntax for the two should be different, as x could be both an *ast.Ident in the AST and be of *ast.Ident type in the resulting program.

This could also have other uses. For example, one could easily say "find all for statements whose body starts with an if statement" by constraining the AST type of the first body statement to IfStmt. Could still be possible with agressive matching, but this is more direct.

/cc @rogpeppe

gogrep panics with "func $x()" pattern

**Reproducer: **

  1. cd $GOROOT/src
  2. gogrep -x 'func $x()' ./...

Result:

gogrep -x 'func $x()' ./...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x65a6f1]

goroutine 1 [running]:
main.(*matcher).node(0xc00008add0, 0x750fc0, 0x0, 0x750fc0, 0xc00025e4e0, 0x1)
	$GOPATH/src/mvdan.cc/gogrep/match.go:513 +0x2101
main.(*matcher).node(0xc00008add0, 0x751440, 0xc000084e40, 0x751440, 0xc00025e510, 0x0)
	$GOPATH/src/mvdan.cc/gogrep/match.go:440 +0x32d6
main.(*matcher).topNode(0xc00008add0, 0x751440, 0xc000084e40, 0x751440, 0xc00025e510, 0x0, 0x0)
	$GOPATH/src/mvdan.cc/gogrep/match.go:264 +0xad
main.(*matcher).cmdRange.func1(0x751440, 0xc000084e40, 0x751440, 0xc00025e510)
	$GOPATH/src/mvdan.cc/gogrep/match.go:106 +0xce
main.(*matcher).walkWithLists.func1(0x751440, 0xc00025e510, 0x751601)
	$GOPATH/src/mvdan.cc/gogrep/match.go:242 +0x7c
go/ast.inspector.Visit(0xc01991a120, 0x751440, 0xc00025e510, 0x7506e0, 0xc01991a120)
	$GOROOT/src/go/ast/walk.go:373 +0x3a
go/ast.Walk(0x7506e0, 0xc01991a120, 0x751440, 0xc00025e510)
	$GOROOT/src/go/ast/walk.go:52 +0x66
go/ast.walkDeclList(0x7506e0, 0xc01991a120, 0xc00025cc80, 0x4, 0x4)
	$GOROOT/src/go/ast/walk.go:38 +0x9e
go/ast.Walk(0x7506e0, 0xc01991a120, 0x7513c0, 0xc000283a00)
	$GOROOT/src/go/ast/walk.go:353 +0x2656
go/ast.Inspect(0x7513c0, 0xc000283a00, 0xc01991a120)
	$GOROOT/src/go/ast/walk.go:385 +0x4b
main.inspect(0x7513c0, 0xc000283a00, 0xc01991a120)
	$GOPATH/src/mvdan.cc/gogrep/main.go:279 +0x162
main.(*matcher).walkWithLists(0xc00008add0, 0x751440, 0xc000084e40, 0x7513c0, 0xc000283a00, 0xc0193a56b0)
	$GOPATH/src/mvdan.cc/gogrep/match.go:254 +0x93
main.(*matcher).cmdRange(0xc00008add0, 0x6ffc5e, 0x1, 0x7ffe8b993097, 0x9, 0x6b6760, 0xc000084e40, 0xc014b23900, 0xd, 0xd, ...)
	$GOPATH/src/mvdan.cc/gogrep/match.go:121 +0x194
main.(*matcher).cmdRange-fm(0x6ffc5e, 0x1, 0x7ffe8b993097, 0x9, 0x6b6760, 0xc000084e40, 0xc014b23900, 0xd, 0xd, 0x6e8be0, ...)
	$GOPATH/src/mvdan.cc/gogrep/match.go:70 +0x79
main.(*matcher).submatches(0xc00008add0, 0xc000084cf0, 0x1, 0x1, 0xc014b23900, 0xd, 0xd, 0xc01867bdc8, 0xc0193a53b0, 0x1c)
	$GOPATH/src/mvdan.cc/gogrep/match.go:89 +0x136
main.(*matcher).matches(0xc00008add0, 0xc000084cf0, 0x1, 0x1, 0xc0000f6400, 0xd, 0x10, 0xc00000e000, 0xc00bbce000, 0x11e)
	$GOPATH/src/mvdan.cc/gogrep/match.go:24 +0x1df
main.(*matcher).fromArgs(0xc00008add0, 0xc00000e0d0, 0x3, 0x3, 0xc000055f88, 0x405539)
	$GOPATH/src/mvdan.cc/gogrep/main.go:164 +0x3ca
main.main()
	$GOPATH/src/mvdan.cc/gogrep/main.go:63 +0xe1

If gogrep doesn't support such patterns, there should be something like pattern-compile error. Descriptive error message would help.

think whether B.C should match A.B.C

Right now it doesn't. A.B does match the beginning of A.B.C. This is because of the nature of how these trees of selectors are built - (A.B).C. It should be noted, however, that $x.C does match the entire A.B.C because of this.

Is there a use case for this? Haven't found a real one. Found this quirk while playing.

/cc @rogpeppe

add composability of commands

What gogrep does at the moment is find all occurences of an expression. Let's call this the x command.

gogrep has an implicit print command at the end, which we will ignore - it will stay implicit in the input.

It would be useful to be able to compose these commands. For example, I could say:

gogrep '<x> if $_ { $*_ } <x> continue'

This would find all continue statements inside an if.

Other commands would also be helpful. For example:

  • Filter out nodes with zero matches of expression (g)
  • Filter out nodes with non-zero matches of expression (v)

Then one could find all naked loops that never break with:

gogrep '<x> for { $*_ } <v> break'

This composability was initially suggested by @rogpeppe, borrowing from the structural regexes in Acme. More info at http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf

Reusing these names seems fine since they're short. They are also intuitive since they're similar to what one is used to in programs like vim and sed (and acme itself, of course).

$(_ /regexp/) syntax doesn't seem to be working

From the readme:

fmt.$(_ /Fprint.*/)(os.Stdout, $*_) // all Fprint* on stdout

Using the latest gogrep:

$ gogrep -x 'fmt.$(_ /Fprint.*/)(os.Stdout, $*_)' hello.go
cannot tokenize expr: 1:6: $ must be followed by ident, got (

Maybe it's not supposed to be used with -x? But I don't see a parser code that handles /regex/ syntax inside parse.go so I'm not sure.

If this feature is not implemented, readme can be adjusted to clarify that.

rethink commands to allow any number of args

Right now, commands are always of the form -A expr. I propose that we redesign them so that any number of arguments may be used.

For example, let's say that we want to add a command that lets us control how many occurences of a match we want to have. For example, -c '>= 2' expr. Then, -g expr would be short for -c '> 0' expr, and -v expr would be short for -c '== 0' expr.

If we are to allow any number of arguments, we should drop the notion of flags entirely, as they typically only allow 0 or 1 values. I still think a preceding character other than - would be useful to visualise the start of these chained commands.

I propose ., for example:

gogrep .x expr1 .c '>= 2' expr2

Suggestions of more commands that would use more information than an expression to match are welcome.

@rogpeppe thoughts?

decide whether "b; c" should match "{ a; b; c; d }"

Reasons to:

  • Useful. I just needed it right now.
  • Using $*_; b; c; $*_ is not as useful, as it also prints the stuff on the sides.

Reasons not to:

  • If we really want a statement list with just b; c, there wouldn't be another way to express that. { b; c } does not cover it, as it wouldn't match statement lists outside block statements such as switch case bodies.

REPL mode to avoid re-loading from source

This could especially be useful when loading with types, as it can easily take seconds to load even a small program if we need type information.

We could have something like readline:

$ gogrep -repl
> -x expr
[...]
> -x expr -g expr_with_types
[...]
> ^D
$

$*_ for function return values

Given example.go:

package example

func f0()            { return }
func f1() int        { return 1 }
func f2() (int, int) { return 1, 2 }

If we run gogrep like this:

gogrep -x 'func $_() $*_ { $*_ }' example.go

We only get 2 results:

hello.go:4:1: func f1() int { return 1; }
hello.go:5:1: func f2() (int, int) { return 1, 2; }

This could be an issue, since $*_ is described as something that can be used to match optional pattern pieces. Right now, if we want to describe an arbitrary function result, two patterns are needed:

func $_() $*_ { $*_ } // 1-N results
func $_() { $*_ } // 0 results

API for gogrep matching engine

In writing linters (in my case, go/analysis passes to be run as golangci-lint plugins) I find myself wanting some sort of AST-matching engine. This tool seems like a really great one! But I want to use it inside my own tool, rather than as a simple global search, like this package or go-ruleguard is designed to do. To do that, I need to use the gogrep matcher, but then refer back to the AST or types in an arbitrary way.

A sample API that I think would be sufficient, and which looks to my quick glance similar to what you're already using internally:

type Matcher

// Compile compiles a gogrep pattern into a Matcher object.
func Compile(pattern string) (*Matcher, error)

// FindAll returns all matches to the given pattern within the given AST node.
func (m *Matcher) FindAll(node ast.Node) []Match

// A single match of a pattern.
type Match struct {
  // The top-level node that matched the pattern.
  Node ast.Node
  // The nodes that matched each variable; for example if $x + $_ matched 2 + 3,
  // captures would be {"$x": <node for 2>}
  Captures map[string]ast.Node
}

// optionally other regexp-style APIs like MustCompile, Matcher.Find, Matcher.Match.

Related to #32, but it seems like that won't actually do what we would want, for the same reason go-ruleguard isn't quite enough for us: we would still have no direct access to the underlying AST/types.

Add an "any number of" wildcard

We use $x to match any node. We should introduce a form to match any number of nodes in a list, to allow for stuff like:

if $cond { $<any number of statements> }
call(arg1, $<any number of expressions>)

I initially thought of $x..., but since expr... is valid Go, it's not a good idea. Perhaps $x*, which resembles a regular expression.

Support for $*_ inside function params

It seems impossible right now to match "a function declaration that has a parameter of type T".

This attempts to do that doesn't work:

$ gogrep -x 'func $_($*_, $_ T, $*_) { $*_ }' .
cannot parse expr: 1: expected '(', found gogrep_0

This pattern is parsed successfully, but it doesn't behave as expected:

$ gogrep -x 'func $_($*_ $_, $_ T, $*_ $_) { $*_ }' .

I also tried these variations:

func $_($*_ $*_, $_ int, $*_ $*_) { $*_ }
func $_($_ $*_, $_ int, $_ $*_) { $*_ }

Maybe there should be a special case for $*_ inside parameter lists?

feature request: -single-file param

Suppose you're about to do some refactoring inside some specific file.
You want to do a -x pattern with -s replacement to get the job done.
The problem is that we can't reliably run a gogrep over a single file since
it looks like it does a typechecking even if no type-related filters are involved (I may be wrong here).

We can get a cheap fix of this problem by adding a way of telling "I'm giving you a package as a target, but I'm only interested in one specific file". It could either be a bool param that makes gogrep infer the file package from the specified filename, load the package and then print results only for that file. A simpler approach is to make that param a string that tells which filename to include into the output, all other matches should be discarded.

If we generalize, it looks like I need an additional file filter, which may not necessarily be limited to 1 file, but I can't come up with a good use case for that at the moment.

cannot search for switch statement with arbitrary body

I was interested to find out how many instances of the idiom:

 switch $x := $x.(type) { .... }

there were in the Go source tree. This seems like a reasonable thing to search for to me,
but you can't use $*_ inside the braces because gogrep_any_ is not a valid thing to
find inside a switch body.

gogrep 'switch $x { $*_ }' ./...
cannot parse expr: 1:20: expected '}', found 'IDENT' _gogrep_any__ (and 1 more errors)

proposal: get rid of the $() syntax in favor of more commands

Right now, if I want to say "find all x of type T", I say gogrep -x '$(x type(T))'.

The more I use the tool, the more I think this should be decoupled. As in gogrep -x '$x' -t '$x T'. Here, -t would act as a filter, discarding all matches where the type doesn't match.

Advantages:

  • Simpler syntax; we could likely get rid of the $() stuff
  • With the simpler syntax, commands like -x are much closer to regular Go code
  • More modular; this opens the door to using these commands on non-dollar expressions, like: gogrep -x 'f(a)' -t 'a []int'

Disadvantages:

  • Simple use cases would require more commands; gogrep -x '$(x type(T))' would become gogrep -x '$x' -t '$x T' (then again, each command is simpler)

We could also apply all the type information filters at the very end, meaning that we could defer the heavy work of type checking until all the AST commands are done. This way, gogrep -t '$x T' -x '$x' would also work. And perhaps it reads better too, as in "with $x of type T, find all occurrences of $x etc etc".

The command -t is up to debate. We could also simply move all the existing "type filters" into one large command, like -t '$x type(T)', -t '$x is(rune)', and so on.

And, if #26 happens, we could also do something like gogrep .x '$x' .t '$x' T, if we wish to.

@rogpeppe would really like your input, as we designed the $() syntax and semantics together.

Suggestion: Add debug argument to print ast.Node type information

How about having a -d (or -v) argument to print the AST type information of the matched node and the $x patterns?
Currently, when a pattern match occurs, gogrep prints the value of the AST node: https://github.com/mvdan/gogrep/blob/master/main.go#L326

For example, given the following code:

func main() {
  v := mypkg.MyStruct{}

The command gogrep -x '$x.$y{$*_}' will print something like this:

main.go:24:10: mypkg.MyStruct{}

This is great, but when exploring various patterns, I find it would be very useful to know the exact ast nodes of the entire pattern and each of the $X sub matches.

a variant of $*_ that discards the nodes

Case in point:

$ gogrep 'for $_, $_ := range $_ { $*_; $_++ }'
bexport.go:255:2: for _, n := range exportlist { sym := n.Sym; if sym.Exported() { continue; }; sym.SetExported(true); if strings.Contains(sym.Name, ".") { Fatalf("exporter: unexpected symbol: %v", sym); }; if sym.Def == nil { Fatalf("exporter: unknown export symbol: %v", sym); }; if p.trace { p.tracef("\n"); }; p.obj(sym); objcount++; }
esc.go:859:3: for _, lrn := range Curfn.Func.Dcl { if i >= retList.Len() { break; }; if lrn.Op != ONAME || lrn.Class() != PPARAMOUT { continue; }; e.escassignWhyWhere(lrn, retList.Index(i), "return", n); i++; }
fmt.go:1541:3: for _, n1 := range n.List.Slice() { if i != 0 { fmt.Fprint(s, " + "); }; n1.exprfmt(s, nprec, mode); i++; }
plive.go:981:2: for i, live := range lv.livevars { h := hashbitmap(H0, live) % uint32(tablesize); for { j := table[h]; if j < 0 { break; }; jlive := lv.livevars[j]; if live.Eq(jlive) { remap[i] = j; continue Outer; }; h++; if h == uint32(tablesize) { h = 0; }; }; table[h] = uniq; remap[i] = uniq; lv.livevars[uniq] = live; uniq++; }
reflect.go:1337:3: for _, t1 := range t.Fields().Slice() { dtypesym(t1.Type); n++; }
sinit.go:1244:3: for _, a := range n.List.Slice() { if a.Op == OKEY { k = nonnegintconst(a.Left); a = a.Right; }; addvalue(p, k*n.Type.Elem().Width, a); k++; }
subr.go:264:2: for _, s := range opkg.Syms { if s.Def == nil { continue; }; if !exportname(s.Name) || strings.ContainsRune(s.Name, 0xb7) { continue; }; s1 = lookup(s.Name); if s1.Def != nil { pkgerror = fmt.Sprintf("during import %q", opkg.Path); redeclare(s1, pkgerror); continue; }; s1.Def = s.Def; s1.Block = s.Block; if asNode(s1.Def).Name == nil { Dump("s1def", asNode(s1.Def)); Fatalf("missing Name"); }; asNode(s1.Def).Name.Pack = pack; s1.Origpkg = opkg; n++; }
typecheck.go:2650:2: for _, tl := range tstruct.Fields().Slice() { t = tl.Type; if tl.Isddd() { if isddd { if i >= nl.Len() { goto notenough; }; if nl.Len()-i > 1 { goto toomany; }; n = nl.Index(i); setlineno(n); if n.Type != nil { nl.SetIndex(i, assignconvfn(n, t, desc)); }; goto out; }; for ; i < nl.Len(); i++ { n = nl.Index(i); setlineno(n); if n.Type != nil { nl.SetIndex(i, assignconvfn(n, t.Elem(), desc)); }; }; goto out; }; if i >= nl.Len() { goto notenough; }; n = nl.Index(i); setlineno(n); if n.Type != nil { nl.SetIndex(i, assignconvfn(n, t, desc)); }; i++; }
typecheck.go:3487:3: for _, r := range s { l = append(l, nod(OKEY, nodintconst(int64(i)), nodintconst(int64(r)))); i++; }

I get reeeeally long lines and it's hard to see what I was after. Would be neat if, for example, double underscores meant any node and discard, like:

$ gogrep 'for $_, $_ := range $_ { $*__; $_++ }'
bexport.go:255:2: for _, n := range exportlist { [...]; objcount++; }

The problem I think is important, but the solution I just came up with rather quickly without much thought. Perhaps there is a cleaner or better way in the long run to achieve this.

/cc @rogpeppe

make matcher optionally more agressive

Right now, if one uses "ab" in an expression to look for, "a"+"b" won't be matched. We could use go/const for this.

Is there a reason why a user would ever not want this? If so, we could make it optional via the syntax.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.