Git Product home page Git Product logo

php-parser's Issues

Parser adds "\n;" to heredoc value

<?php

<<<CAT
TEST
CAT;
  | [*stmt.StmtList]
  |   "Position": Pos{Line: 3-5 Pos: 11-23};
  |   "Stmts":
  |     [*stmt.Expression]
  |       "Position": Pos{Line: 3-5 Pos: 11-23};
  |       "Expr":
  |         [*scalar.Heredoc]
  |           "Position": Pos{Line: 3-5 Pos: 11-22};
  |           "Label": CAT;
  |           "Parts":
  |             [*scalar.EncapsedStringPart]
  |               "Position": Pos{Line: 4-4 Pos: 15-19};
  |               "Value": TEST
;

crash on (most) control chars in a php block

$ echo -e '<? \004' > test.php && php-parser test.php
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x5473f8]

goroutine 6 [running]:
github.com/z7zmey/php-parser/errors.NewError(...)
	/home/imuli/src/github.com/z7zmey/php-parser/errors/error.go:20
github.com/z7zmey/php-parser/php7.(*Parser).Error(0xc42006c180, 0xc420014220, 0x1d)
	/home/imuli/src/github.com/z7zmey/php-parser/php7/parser.go:52 +0x68
github.com/z7zmey/php-parser/php7.(*yyParserImpl).Parse(0xc420091500, 0x6d3760, 0xc42006c180, 0x0)
	yaccpar:253 +0x4da06
github.com/z7zmey/php-parser/php7.yyParse(0x6d3760, 0xc42006c180, 0xc4200a8000)
	yaccpar:153 +0x58
github.com/z7zmey/php-parser/php7.(*Parser).Parse(0xc42006c180, 0xc42000c028)
	/home/imuli/src/github.com/z7zmey/php-parser/php7/parser.go:69 +0x11a
main.parserWorker(0xc42001a180, 0xc42001a1e0)
	/home/imuli/src/github.com/z7zmey/php-parser/main.go:80 +0x37
created by main.main
	/home/imuli/src/github.com/z7zmey/php-parser/main.go:32 +0x136

This happens with chars 1-8, 11,12,14-31. PHP itself gives a warning of unexpected character, but continues parsing the file.

stop parsing after __halt_compiler();

Anything after __halt_compiler(); is not parsed (or compiled) by PHP, and attempting to parse beyond it only invites syntax errors from trying to parse non-PHP.

Thus parsing something like

<?php
__halt_compiler();
"nothing to see here";

shouldn't produce

==> halt_compiler.php
  | [*node.Root]
  |   "Position": Pos{Line: 2-3 Pos: 7-47};
  |   "Stmts":
  |     [*stmt.HaltCompiler]
  |       "Position": Pos{Line: 2-2 Pos: 7-24};
  |     [*stmt.Expression]
  |       "Position": Pos{Line: 3-3 Pos: 26-47};
  |       "Expr":
  |         [*scalar.String]
  |           "Position": Pos{Line: 3-3 Pos: 26-46};
  |           "Value": "nothing to see here";

but rather, something more like

==> halt_compiler.php
  | [*node.Root]
  |   "Position": Pos{Line: 2-3 Pos: 7-24};
  |   "Stmts":
  |     [*stmt.HaltCompiler]
  |       "Position": Pos{Line: 2-2 Pos: 7-24};

or maybe including the stuff afterward either in a simple wrapper or within the HaltCompiler statement?

line comment immediately inside block comment

I'm finding some files with things like this:

/*// comment
  commented_out_source()
 */

The scanner checks the previous rune for '*' and then the current for '/', but starts with the current rune immediately after the /* - so it closes the comment immediately.

I suspect that adding in a c = l.Next() before the loop in scanner/scanner.l:297 would fix this, but I'm not sure this is the best solution and also not certain how to go about generating scanner.go from that - go generate doesn't seem to work.

Feature request: Changing the logo

Can you change the logo for this project? It looks like the gopher ate the elephant. I would recommend changing it to a gopher playing with an elephant.

another position error with php5

Again, the php7 parser produces sane output. This one looks like it's stemming from expr.Variable.

Sorry not to be submiting patches with these, I've never touched yacc before and am finding the parser a bit hard to follow. If I find more of these should I continue opening new bugs or just reopen this one with more info?

<?php
$here->where();
==> method_call.php
  | [*node.Root]
  |   "Position": Pos{Line: 2-2 Pos: 19-21};    # ought to be 7-21
  |   "Stmts":
  |     [*stmt.Expression]
  |       "Position": Pos{Line: 2-2 Pos: 19-21};    # ought to be 7-21
  |       "Expr":
  |         [*expr.MethodCall]
  |           "Position": Pos{Line: 2-2 Pos: 19-20};    # ought to be 7-20
  |           "Variable":
  |             [*expr.Variable]
  |               "Position": Pos{Line: 2-2 Pos: 7-20};    # ought to be 7-11
  |               "VarName":
  |                 [*node.Identifier]
  |                   "Position": Pos{Line: 2-2 Pos: 7-11};
  |                   "Value": here;
  |           "Method":
  |             [*node.Identifier]
  |               "Position": Pos{Line: 2-2 Pos: 14-18};
  |               "Value": where;
  |           "ArgumentList":
  |             [*node.ArgumentList]
  |               "Position": Pos{Line: 2-2 Pos: 19-20};

printer: Does not keep formatting as-is

What I expected
When using the package simply labelled `printer*, I assumed it would just print the file with all the formatting it had previously, however it seems this is just meant to be a pretty printer.

What's the plan for retaining formatting, if any? My use case is that I'd like to make something that'll resolve all my PHP namespaces in Sublime Text when you hit save.

For saving the data back out, I'm sure I could do a sort of hack where I only modify the lines affected, but I'm hoping retention of formatting is do-able and not too difficult so I can avoid that effort

At the very least, can the Printer struct perhaps just be renamed to PrettyPrinter?

Non-ASCII symbols are not parsed correctly in comments.

Non-ASCII symbols are not parsed correctly in comments (file encoding is UTF-8).

<?php
$a = 1; // тестовый коммент
$b = 2;
[*stmt.StmtList]
  "Stmts":
    [*stmt.Expression]
      "Expr":
        [*assign.Assign]
          "Variable":
            [*expr.Variable]
              "VarName":
                [*node.Identifier]
                  "Value": a;
          "Expression":
            [*scalar.Lnumber]
              "Value": 1;
    [*stmt.Expression]
      "Comments":
        "// \u0080\u0080\u0080\u0080\u0080\u0080\u0080\u0080 \u0080\u0080\u0080\u0080\u0080\u0080\u0080\n"
      "Expr":
        [*assign.Assign]
          "Comments":
            "// \u0080\u0080\u0080\u0080\u0080\u0080\u0080\u0080 \u0080\u0080\u0080\u0080\u0080\u0080\u0080\n"
          "Variable":
            [*expr.Variable]
              "Comments":
                "// \u0080\u0080\u0080\u0080\u0080\u0080\u0080\u0080 \u0080\u0080\u0080\u0080\u0080\u0080\u0080\n"
              "VarName":
                [*node.Identifier]
                  "Comments":
                    "// \u0080\u0080\u0080\u0080\u0080\u0080\u0080\u0080 \u0080\u0080\u0080\u0080\u0080\u0080\u0080\n"
                  "Value": b;
          "Expression":
            [*scalar.Lnumber]
              "Value": 2;

pretty printer: group operations by precedence

[*binary.Mul]
    "Left":
    [*binary.Plus]
        "Left":
        [*expr.Variable]
            "VarName":
            [*node.Identifier]
                "Value": a;
        "Right":
        [*expr.Variable]
            "VarName":
            [*node.Identifier]
                "Value": b;
    "Right":
    [*expr.Variable]
        "VarName":
        [*node.Identifier]
            "Value": c;

Currently, above AST is printed incorrectly: $a + $b * $c.
It must group Plus expression and print ($a + $b) * $c

NamespaceResolver should remove unresolved names

I'm having issues with the namespace resolver. It contains unresolved names like void, true and null. Shouldn't these be removed when they are not resolved

test.php

<?php

declare(strict_types=1);

namespace App\Domain\Handler\Cart;

use SimpleBus\Message\Recorder\RecordsMessages;
use App\Domain\Command\ChangeCurrencyCommand;
use App\Domain\Repository\CartRepository;
use App\Domain\Event\CurrencyChangedEvent as CurrencyChangedEventWithAlias;

class ChangeCurrencyHandler
{
    /**
     * @var CartRepository
     */
    private $cartRepository;

    /**
     * @var RecordsMessages
     */
    private $eventRecorder;

    public function __construct(
        CartRepository $cartRepository,
        RecordsMessages $eventRecorder
    ) {
        $this->cartRepository   = $cartRepository;
        $this->eventRecorder    = $eventRecorder;
    }

    public function __invoke(ChangeCurrencyCommand $command) : void
    {
        if (true === $command->getBool()) {
            // Do something
        }
        
        if (null !== $command->getNull()) {
            // Do something
        }

        $this->eventRecorder->record(new CurrencyChangedEventWithAlias());
    }
}

main.go

package main

import (
	"fmt"
	"github.com/z7zmey/php-parser/php7"
	"github.com/z7zmey/php-parser/visitor"
	"os"
	"reflect"
)

func main() {
	for _, file := range os.Args[1:] {
		fmt.Printf("Checking %s\n", file)

		checkFile(file)
	}
}

func checkFile(file string) {
	src, err := os.Open(file)
	if err != nil {
		panic(err)
	}

	parser := php7.NewParser(src, file)
	parser.Parse()

	for _, e := range parser.GetErrors() {
		fmt.Println(e)
	}

	nsResolver := visitor.NewNamespaceResolver()
	parser.GetRootNode().Walk(nsResolver)

	for n, fqcn := range nsResolver.ResolvedNames {
		fmt.Printf("Found %s: %s\n", reflect.TypeOf(n), fqcn)
	}
}

output

Checking ./test.php
Found *name.Name: SimpleBus\Message\Recorder\RecordsMessages
Found *name.Name: App\Domain\Command\ChangeCurrencyCommand
Found *name.Name: void
Found *name.Name: true
Found *name.Name: null
Found *name.Name: App\Domain\Event\CurrencyChangedEvent
Found *stmt.Class: App\Domain\Handler\Cart\ChangeCurrencyHandler
Found *name.Name: App\Domain\Repository\CartRepository

Does not work correctly if used from several goroutines

Parser sometimes gives a lot of strange errors, see below. When I parse file using 1 goroutine then it works just fine.

Function that does the parsing does not rely on any global state:

func parse(filename string) {
	fp, err := os.Open(filename)
	if err != nil {
		log.Fatalf("Could not open file %s: %s", filename, err.Error())
	}

	defer fp.Close()

	var b bytes.Buffer

	conv := transform.NewReader(fp, charmap.Windows1251.NewDecoder())
	parser := php7.NewParser(io.TeeReader(conv, &b), filename)
	parser.Parse()

	for _, e := range parser.GetErrors() {
		fmt.Printf("ERROR: parsing %s: %s", filename, e)
	}

	rootNode := parser.GetRootNode()

	if rootNode == nil {
		log.Printf("Could not parse %s at all due to errors", filename)
		return
	}

	rootNode.Walk(&rootWalker{
		w:         os.Stdout,
		filename:  filename,
		comments:  parser.GetComments(),
		positions: parser.GetPositions(),
		lines:     bytes.Split(b.Bytes(), []byte("\n")),
	})
}

Errors example:

syntax error: unexpected T_ENCAPSED_AND_WHITESPACE at line 409
syntax error: unexpected '}' at line 480
syntax error: unexpected T_STRING, expecting T_VARIABLE or T_ENCAPSED_AND_WHITESPACE or T_DOLLAR_
OPEN_CURLY_BRACES or T_CURLY_OPEN at line 605
...

Unknown unicode characters in inline HTML cause syntax errors

When parsing

hi 󰀄 bye

I get

==> plane_15.php
syntax error: unexpected $unk at line 1
  | [*node.Root]
  |   "Position": Pos{Line: 1-1 Pos: 1-12};
  |   "Stmts":
  |     [*stmt.InlineHtml]
  |       "Position": Pos{Line: 1-1 Pos: 1-3};
  |       "Value": hi ;
  |     [*stmt.InlineHtml]
  |       "Position": Pos{Line: 1-1 Pos: 9-12};
  |       "Value": bye
;

rather than

==> plane_15.php
  | [*node.Root]
  |   "Position": Pos{Line: 1-1 Pos: 1-12};
  |   "Stmts":
  |     [*stmt.InlineHtml]
  |       "Position": Pos{Line: 1-1 Pos: 1-12};
  |       "Value": hi 󰀄 bye
;

The character in there is U+F0004, in Supplemental Private Use Area-A, commonly used with custom fonts for rendering charactcer like things in text on the web.

I'll submit a pull request with the fix, which simply seperates EOF from other uncategorized characters in the classifier.

Built-in primitives / constants not supported

Consider the code below:

Not how the parser doesn't realize that int, bool and true are global constants, instead we get: "NamespacedName": Test\bool; etc, which is obviously wrong.

<?php

declare(strict_types=1);
namespace Test;

class Test
{
    public static function isValid(int $typeid): bool
    {
        return true;
    }
}
[*node.Root]
  "Stmts":
    [*stmt.Declare]
      "Consts":
        [*stmt.Constant]
          "PhpDocComment": ;
          "ConstantName":
            [*node.Identifier]
              "Value": strict_types;
          "Expr":
            [*scalar.Lnumber]
              "Value": 1;
      "Stmt":
        [*stmt.Nop]
    [*stmt.Namespace]
      "NamespaceName":
        [*name.Name]
          "Parts":
            [*name.NamePart]
              "Value": Test;
    [*stmt.Class]
      "NamespacedName": Test\Test;
      "PhpDocComment": ;
      "ClassName":
        [*node.Identifier]
          "Value": Test;
      "Stmts":
        [*stmt.ClassMethod]
          "ReturnsRef": false;
          "PhpDocComment": ;
          "MethodName":
            [*node.Identifier]
              "Value": isValid;
          "Modifiers":
            [*node.Identifier]
              "Value": public;
            [*node.Identifier]
              "Value": static;
          "Params":
            [*node.Parameter]
              "ByRef": false;
              "Variadic": false;
              "VariableType":
                [*name.Name]
                  "NamespacedName": Test\int;
                  "Parts":
                    [*name.NamePart]
                      "Value": int;
              "Variable":
                [*expr.Variable]
                  "VarName":
                    [*node.Identifier]
                      "Value": typeid;
          "ReturnType":
            [*name.Name]
              "NamespacedName": Test\bool;
              "Parts":
                [*name.NamePart]
                  "Value": bool;
          "Stmt":
            [*stmt.StmtList]
              "Stmts":
                [*stmt.Return]
                  "Expr":
                    [*expr.ConstFetch]
                      "Constant":
                        [*name.Name]
                          "NamespacedName": Test\true;
                          "Parts":
                            [*name.NamePart]
                              "Value": true;

Update Example in README

What actually works

  • There doesn't seem to be a nice way to get error messages, or at least it's not obvious to me. I'd assume perhaps I need to walk the AST and find nodes marked with errors or something?
rootNode, comments, positions := php7.Parse(bytes.NewBufferString(`<? echo "Hello world";`), "example.php")
//How do we get a list of errors easily?
//How do we get the position/column?
//for _, e := range parser.GetErrors() {
//	fmt.Println(e)
//}
visitor := visitor.Dumper{
	Writer:    os.Stdout,
	Indent:    "",
	Comments:  comments,
	Positions: positions,
}
rootNode.Walk(visitor)

The non-working example given

src := bytes.NewBufferString(`<? echo "Hello world";`)

parser := php7.NewParser(src, "example.php")
parser.Parse()

for _, e := range parser.GetErrors() {
	fmt.Println(e)
}

visitor := visitor.Dumper{
	Writer:    os.Stdout,
	Indent:    "",
	Comments:  parser.GetComments(),
	Positions: parser.GetPositions(),
}

rootNode := parser.GetRootNode()
rootNode.Walk(visitor)

class positions in php5

The class position field covers only the word "class" when parsing with PHP 5.

<?php
class Foo {
	private $bar;
}
==> classTest.php
  | [*stmt.StmtList]
  |   "Position": Pos{Line: 2-2 Pos: 7-11};     # should be 2-4, 7-34
  |   "Stmts":
  |     [*stmt.Class]
  |       "Position": Pos{Line: 2-2 Pos: 7-11};    # should be 2-4, 7-34
  |       "NamespacedName": Foo;
  |       "PhpDocComment": ;
  |       "ClassName":
  |         [*node.Identifier]
  |           "Position": Pos{Line: 2-2 Pos: 13-15};
  |           "Value": Foo;
  |       "Stmts":
  |         [*stmt.PropertyList]
  |           "Position": Pos{Line: 3-3 Pos: 20-32};
  |           "Modifiers":
  |             [*node.Identifier]
  |               "Position": Pos{Line: 3-3 Pos: 20-26};
  |               "Value": private;
  |           "Properties":
  |             [*stmt.Property]
  |               "Position": Pos{Line: 3-3 Pos: 28-31};
  |               "PhpDocComment": ;
  |               "Variable":
  |                 [*expr.Variable]
  |                   "Position": Pos{Line: 3-3 Pos: 28-31};
  |                   "VarName":
  |                     [*node.Identifier]
  |                       "Position": Pos{Line: 3-3 Pos: 28-31};
  |                       "Value": bar;

Pretty printer does not print when syntax error is found.

This is php code taken from php-src:

INPUT:

 interface Serializable
{

	function serialize();
	function unserialize($serialized);
}

class ArrayObject implements IteratorAggregate, ArrayAccess, Countable
{
	
	const STD_PROP_LIST     = 0x00000001;
	const ARRAY_AS_PROPS    = 0x00000002;


	function __construct($array, $flags = 0, $iterator_class = "ArrayIterator") {/**/}

	function uasort(mixed cmp_function) {/**/}

	/** Sort the entries by key using user defined function.
	 */
	function uksort(mixed cmp_function) {/**/}

}?>

OUTPUT:

syntax error: unexpected T_STRING, expecting T_VARIABLE at line 20
syntax error: unexpected T_STRING, expecting T_VARIABLE at line 24

File Out:
<?php
interface Serializable
{
	function serialize();
	function unserialize($serialized);
}
{

}
{
};?>

backslash before newline in single quoted string

While backslash-newline doesn't have any special meaning inside a string in PHP,
it is syntatically valid. Currently parsing somethng like

<?php
echo "/ --- \
| foo |
\ --- /" . "\n";
echo '/ --- \
| bar |
\ --- /' . "\n";

yields syntax errors on the second string

==> multi_line_strings.php
syntax error: unexpected $unk at line 5
syntax error: unexpected T_DEC, expecting T_STRING at line 7
  | [*node.Root]
  |   "Position": Pos{Line: 2-7 Pos: 7-83};
  |   "Stmts":
  |     [*stmt.Echo]
  |       "Position": Pos{Line: 2-4 Pos: 7-44};
  |       "Exprs":
  |         [*binary.Concat]
  |           "Position": Pos{Line: 2-4 Pos: 12-43};
  |           "Left":
  |             [*scalar.String]
  |               "Position": Pos{Line: 2-4 Pos: 12-36};
  |               "Value": "/ --- \
| foo |
\ --- /";
  |           "Right":
  |             [*scalar.String]
  |               "Position": Pos{Line: 4-4 Pos: 40-43};
  |               "Value": "\n";
  |     [*stmt.Expression]
  |       "Position": Pos{Line: 7-7 Pos: 79-83};
  |       "Expr":
  |         [*scalar.String]
  |           "Position": Pos{Line: 7-7 Pos: 79-82};
  |           "Value": "\n";

rather than two valid strings

==> /home/imuli/src/github.com/imuli/semantic-php/snippets/multi_line_strings.php
  | [*node.Root]
  |   "Position": Pos{Line: 2-7 Pos: 7-83};
  |   "Stmts":
  |     [*stmt.Echo]
  |       "Position": Pos{Line: 2-4 Pos: 7-44};
  |       "Exprs":
  |         [*binary.Concat]
  |           "Position": Pos{Line: 2-4 Pos: 12-43};
  |           "Left":
  |             [*scalar.String]
  |               "Position": Pos{Line: 2-4 Pos: 12-36};
  |               "Value": "/ --- \
| foo |
\ --- /";
  |           "Right":
  |             [*scalar.String]
  |               "Position": Pos{Line: 4-4 Pos: 40-43};
  |               "Value": "\n";
  |     [*stmt.Echo]
  |       "Position": Pos{Line: 5-7 Pos: 46-83};
  |       "Exprs":
  |         [*binary.Concat]
  |           "Position": Pos{Line: 5-7 Pos: 51-82};
  |           "Left":
  |             [*scalar.String]
  |               "Position": Pos{Line: 5-7 Pos: 51-75};
  |               "Value": '/ --- \
| bar |
\ --- /';
  |           "Right":
  |             [*scalar.String]
  |               "Position": Pos{Line: 7-7 Pos: 79-82};
  |               "Value": "\n";

Parser crashes on obscure PHP `list` syntax with missing arguments

I discovered today that the parser fails with an "index out of range" runtime error when it encounters the following PHP code:

<?php
$things = ["foo", "bar"];
list(, $bar) = $things;

Surprisingly, this is valid PHP code (running it results in bar).

php-parser gets really unhappy about the lack of a first argument though:

panic: runtime error: index out of range

I almost feel bad reporting this because it's such bad PHP code, but it's nonetheless valid and should probably at least not crash the parser.

no parse error when an abstract method contain body

PHP Usually throws a Fatal Error when an abstract method contain a body, even if its not used. but PHP-parser does not.

example :

<?php


namespace Foo;

abstract class Bar extends Baz
{
    private $int = 5;
    protected $val = 'value';
    public $bol = false;
    
    abstract function name(): string
    {
    }
    
	public function greet(): void
	{
		echo "Hello World";
	}
}

$main = function (int $argc,string ...$args): void {
    $class = new class extends Bar {
        public function name(): string {
            return 'azjezz';            
        }
    };
    $class->greet();
};

$main($_SERVER['argc'], ...$_SERVER['args']);

Expected :

fatal error: Abstract function Foo\Bar::name() cannot contain body in %s on line %d

PHP Behavior :
https://3v4l.org/llHmE

Suggestion: add possible types list as comments for properties

It would be very useful to be able to understand which types can be present in node.Node properties.

For example, I initially thought that (*expr.Variable).VarName can only be *node.Identifier but very much later I saw that it is not always the case as there exist "variable variables".

Type comments can be created automatically using actual type information when analyzing some big codebase. I may volunteer for that if you do not have plans to implement it yourself.

Calling parser concurrently

I wanted to call concurrently the php7 parser but as rootnode, comments and positions are defined as php7 module variable it does not work. What I did to solve this issue, was to move those structures to the lexer struct https://github.com/z7zmey/php-parser/blob/master/scanner/lexer.go#L438 and update the yacc parser file and it has worked.

Did I miss something somewhere to have it working with goroutines ?

If not, do you have a different idea to handle such a case or would you consider a pull request ?

Parser fails on (valid) string interpolation code

This is a really interesting project, thanks for working on it. I've run into what looks like an erroneous parse failure with regard to string interpolation code.

<?php
$filename = "something.txt";
@header("Content-Disposition: attachment; filename=\"$filename\"");

This fails due to parse errors of various descriptions, depending on what the surrounding code looks like. When using the php-parser binary, this dumps out

$ php-parser /tmp/brokenparse.php
==> /private/tmp/brokenparse.php
syntax error: unexpected $end, expecting ')'
  | [*stmt.StmtList]
  |   "Stmts":

PHP itself doesn't complain about this code and parses it just fine. Running php -l /tmp/brokenparse.php results in no errors. I believe this is related to a flaw in the grammar, because if $filename is followed by a space things work fine:

<?php
$filename = "something.txt";
@header("Content-Disposition: attachment; filename=\"$filename \"");
➜  analyze php-parser /tmp/brokenparse.php
==> /private/tmp/brokenparse.php
  | [*stmt.StmtList]
  |   "Position": Pos{Line: 3-4 Pos: 8-104};
  |   "Stmts":
  |     [*stmt.Expression]

... 52 lines elided ...

PHPDoc is attributed to a wrong node

PHPDoc is sometimes attributed to a wrong node. Example:

<?php
/** Some phpdoc */
$a = 1;
function wrong_phpdoc() {}

It yields the following structure:

[*node.Root]
  "Stmts":
    [*stmt.Expression]
      "Expr":
        [*assign.Assign]
          "Variable":
            [*expr.Variable]
              "VarName":
                [*node.Identifier]
                  "Value": a;
          "Expression":
            [*scalar.Lnumber]
              "Value": 1;
    [*stmt.Function]
      "NamespacedName": wrong_phpdoc;
      "ReturnsRef": false;
      "PhpDocComment": /** Some phpdoc */;
      "FunctionName":
        [*node.Identifier]
          "Value": wrong_phpdoc;
      "Stmts":

As you can see, PHPDoc here is attributed to a function while the correct position would be the assignment.

Performance suggestion: reduce allocations

The parser is not as performant as it could be (PHP7+ also creates AST in order to execute file):

$ php -n -r '$start = microtime(true); require("some_big_file.php"); echo "time: " . (microtime(true) - $start) . " sec\n";'
time: 0.0185 sec

$ go run test.go some_big_file.php
Errors count:  0
Parser time: 167.636796ms

test.go.zip

I made a few (very hacky) patches that significantly reduce allocations count for some critical parts and it reduced parsing time by a factor of 1.5:

# after patches
$ go run test.go some_big_file.php
Errors count:  0
Parser time: 109.522377ms

There are still plenty more allocations that profiler shows, my patches are just proof-of-concept.
speedup.patch.txt

Complex (curly) syntax for string interpolation not working in pretty printer

Issue Description EDITED

abstract class AbstractClass
  {
    // Force Extending class to define this method
    abstract protected function getValue();
    abstract protected function prefixValue($prefix);
    
    // Common method
    public function printOut()
      {
        print $this->getValue() . "\n";
      }
  }

class ConcreteClass1 extends AbstractClass
  {
    protected function getValue()
      {
        return "ConcreteClass1";
      }
    
    public function prefixValue($prefix)
      {
        return "{$prefix}ConcreteClass1";
      }
  }

$class1 = new ConcreteClass1;
$class1->printOut();
echo $class1->prefixValue('FOO_') . "\n";

// console output

{
	abstract protected function getValue();
	abstract protected function prefixValue($prefix);
	public function printOut()
	{
		print($this->getValue() . "\n");
	}
}
class ConcreteClass1 extends AbstractClass
{
	protected function getValue()
	{
		return "ConcreteClass1";
	}
	public function prefixValue($prefix)
	{
		return "$prefixConcreteClass1";
	}
}
$class1 = new ConcreteClass1;
$class1->printOut();
echo $class1->prefixValue('FOO_') . "\n";

Expected outputl:
ConcreteClass1
FOO_ConcreteClass1

Actualy:
ConcreteClass1

crash with incomplete blocks

When dealing with truncated or otherwise incomplete php files, sometimes the file ends inside a block.

function incomplete() {
    return something();

for some block types - class, switch - it comes out fine, just a syntax error and dropping the entry. For the majority though, - do, foreach, function, if, namespace, while - the parser returns nil and php-parser crashes.

I haven't yet seen an unclosed parenthetical break the parser, but that seems like a possibility too.

`Expr` and `Expression`

I noticed that there are Expr and Expression names in the node field naming. Is there any difference between them?

panic for any input on 32-bit platforms

It looks like the problem is with scanner/lexer.go:465.

        file := token.NewFileSet().AddFile(fName, -1, 1<<31-1)

Replacing 1<<31-1 with 1<<31-3 fixes the problem (the base of a FileSet starts at 1, and adds size+1 to account for EOF).

The "proper" fix would seem to be

        fInfo, err := os.Stat(fName)
        if err != nil {
            panic(err)
        }
        file := token.NewFileSet().AddFile(fName, -1, int(fInfo.Size())

Or passing the file size in from outside. Thoughts?

Error below:

panic: token.Pos offset overflow (> 2G of source code in file set)

goroutine 6 [running]:
go/token.(*FileSet).AddFile(0x1a56a150, 0x1a5143c0, 0x40, 0x1, 0x7fffffff, 0x0)
	/nix/store/0b91dwiap82wpar5b225bs8wig8c7xva-go-1.9.2/share/go/src/go/token/position.go:380 +0x291
github.com/z7zmey/php-parser/scanner.NewLexer(0x8d56b10, 0x1a50c178, 0x1a5143c0, 0x40, 0x1a5143c0)
	/home/imuli/src/github.com/z7zmey/php-parser/scanner/lexer.go:465 +0x5a
github.com/z7zmey/php-parser/php7.NewParser(0x8d56b10, 0x1a50c178, 0x1a5143c0, 0x40, 0x0)
	/home/imuli/src/github.com/z7zmey/php-parser/php7/parser.go:32 +0x3d
main.parserWorker(0x1a514200, 0x1a514240)
	/home/imuli/src/github.com/z7zmey/php-parser/main.go:71 +0xf3
created by main.main
	/home/imuli/src/github.com/z7zmey/php-parser/main.go:30 +0xd9

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.