hadronized / glsl Goto Github PK

View Code? Open in Web Editor NEW

190.0 8.0 27.0 748 KB

GLSL parser for Rust

Rust 99.96% GLSL 0.04%

parser glsl opengl spir-v compiler

glsl's Introduction

OpenGL Shading Language parser and AST manipulation crates

This repository holds the glsl and glsl-quasiquote projects. Feel free to visit each projects for more information.

glsl's People

Contributors

Stargazers

Watchers

glsl's Issues

SingleDeclaration encoding is wrong

Removed FIXME said:

// FIXME: the three fields are wrong. It’s not possible to have the last two if the second one
// is not Some(_) – see page 197 of the GLSLangSpec.4.50.pdf document.

Need more verbose errors

Currently, the reported errors are weak and useless. We need a way to locate them and have more information.

Hint: have a look at the verbose-errors compilation feature. It seems to be the fastest way to
achieve what we need.

Weird parser output with uniform block attributes

The following shader:

layout(set = 0, binding = 0) buffer Foo {
    char a;
} foo;

Yields this:

Done(
    [],
    [
        Declaration(
            Block(
                TypeQualifier {
                    qualifiers: [
                        Layout(
                            LayoutQualifier {
                                ids: [
                                    Identifier(
                                        "set",
                                        Some(
                                            Comma(
                                                IntConst(
                                                    0
                                                ),
                                                Assignment(
                                                    Variable(
                                                        "binding"
                                                    ),
                                                    Equal,
                                                    IntConst(
                                                        0
                                                    )
                                                )
                                            )
                                        )
                                    )
                                ]
                            }
                        ),
                        Storage(
                            Buffer
                        )
                    ]
                },
                "Foo",
                [
                    StructFieldSpecifier {
                        ty: TypeName(
                            "char"
                        ),
                        identifiers: [
                            "a"
                        ]
                    }
                ],
                Some(
                    (
                        "foo",
                        None
                    )
                )
            )
        )
    ]
)

I'm not sure what the actual output is supposed to be, but it seems weird that binding is inside the block of set = 0. There might be a bug here.

Use a state machine in the parsers

Currently, the implementation is very naive. As I was writing tests to test the whole thing, I came across a nasty issue: it just doesn’t work. It might be very inefficient, also. I need a state machine.

The State Machine™

The idea of that state machine is to have “cells” that represents the current context. That context states what was parsed and what is to be expected then. For instance:

foo[3]

There are several ways to parse that. The idea is that the parser should go from left to right. Hence, in the context of expecting an expression, we need to check:

for prefix operators
for infix forms
for suffix operators

The current implementation will try to match all the forms of expressions, while we could start off by trying to match a prefix operator. If it fails, then we try an infix form. If it fails, we abort. If it succeeds, we try the suffix operator. Etc.

According to @Geal, it’d be a good idea to use a FSM over little, small nom parsers, and I think it’s kinda the same idea – so good point there.

[pest-only] Postfix expressions should be usable as function identifiers

I currently disabled them from being used in the grammar because it causes the whole function identifiers and expression to turn left-recursive.

Unit test for syntax::Declaration::Global

`return;` fails to parse properly

void main(void){
        return;
}

parses to

Ok(
    TranslationUnit(
        NonEmpty(
            [
                FunctionDefinition(
                    FunctionDefinition {
                        prototype: FunctionPrototype {
                            ty: FullySpecifiedType {
                                qualifier: None,
                                ty: TypeSpecifier {
                                    ty: Void,
                                    array_specifier: None,
                                },
                            },
                            name: Identifier(
                                "main",
                            ),
                            parameters: [
                                Unnamed(
                                    None,
                                    TypeSpecifier {
                                        ty: Void,
                                        array_specifier: None,
                                    },
                                ),
                            ],
                        },
                        statement: CompoundStatement {
                            statement_list: [
                                Simple(
                                    Declaration(
                                        InitDeclaratorList(
                                            InitDeclaratorList {
                                                head: SingleDeclaration {
                                                    ty: FullySpecifiedType {
                                                        qualifier: None,
                                                        ty: TypeSpecifier {
                                                            ty: TypeName(
                                                                TypeName(
                                                                    "return",
                                                                ),
                                                            ),
                                                            array_specifier: None,
                                                        },
                                                    },
                                                    name: None,
                                                    array_specifier: None,
                                                    initializer: None,
                                                },
                                                tail: [],
                                            },
                                        ),
                                    ),
                                ),
                            ],
                        },
                    },
                ),
            ],
        ),
    ),
)

Instead of a Declaration the return should parse as a Return

About using nom IResult type

I was just interested to know why you are not using the nom::IResult type directly and prefer using a custom version of it, named ParseResult ?

#version 450 triggers an error

Putting #version 450 as the first line of the shader makes the parser return Error(Many1).

Implement a better scheme for parenthesis generation in the GLSL transpiler for expressions

Dot field selection is wrong

The current implementation uses primary_expr as for the left part of the .. This is wrong. It should be postfix_expr.

Add a Block struct

Currently, Declaration::Block is a long variant with several arguments. This is boring, we need a struct for that.

Add a symbol table

This is needed to do semantic analysis and translation to spirv.

Do you have any thoughts on how you'd like to represent it?

glslang builds the symbol table during parsing and refers to it in the resulting parse tree. That has the advantage of not needing to represent both an unresolved syntax tree and a syntax tree with resolved symbols.

Support arrayed subroutine calls

Some floating point literals fail to parse

e.g.

float f()
{
    return 1e-6;
}

gives:

float f()
^
expected ';', found f

1: at line 3, in Alt:
float f()
^

2: at line 0, in Many1:
^

GLSL450 writer

Stochastic fuzzing via roundtrip

Now that I have at least one GLSL writer, I can a very interesting property:

parse(show(ast)) == ast

This property is very interesting, because if we can generate random ASTs, we can have automatic testing for free. However, I don’t have random generation of AST yet. Some work must be done there.

Support GLSL460

It’s rare that I do that but I think I will incorporate this change in a minor patch. For a single reason: the sole change from GLSL450 is that the compiler now accepts extra semicolons at global scope.

Accepting GLSL450 with that change is, to me, not a problem and if it becomes to anyone, I will add a feature flag to protect against that. But I really doubt it will ever be as it’s already the case. From my idea, I think that change is to allow starting a shader with ; (which sounds completely weird).

Also, the changelog from Khronos shows that it was reported from a Private Bug. I have no idea what it means but whatever.

I do this so that we can get going on with rust-gamedev/wg#23.

Add `Identifier` as a type, not an alias

That type would ensure the grammar is respected:

It doesn’t have its first character a digit or anything else than a lowercase or uppercase ASCII alpha.
It’s not empty.

CompoundStatement requires braces.

I want to join existing partial glsl code with one that's built programatically.

    let existing_code = CompoundStatement::parse(
        "
    r = t;
    f = h; 

    ",
    )
    .unwrap();

I'm then adding it to an existing CompoundStatement that has programatically built everything else.

    compound
        .statement_list
        .extend(&existing_code.statement_list);

    let external_declaration = ExternalDeclaration::new_fn(
        TypeSpecifierNonArray::Void,
        "main",
        Vec::new(),
        compound.statement_list,
    );

    let translation_unit = TranslationUnit::from_iter(vec![external_declaration]).unwrap();

The issue, the CompoundStatement::parse() fails unless the string is surrounded with {}.
Maybe CompoundStatement is the wrong type, in which case could you direct me to the correct type for parsing existing code that could be within a function (e.g no declarations).

I tested Statement and Expr but those seem to be for one liners and I'd have to separate each line into its own string before parsing?

Support postfix expressions as function identifiers

The following code will trigger a parse error on the current release / HEAD:

vec3[3] verts = vec3[]( /* whatever */ );

Here, the vec3[](…) is a function call, and the function identifier must be vec3[]. Currently, it’ll fail to parse.

This is not easy to fix because the function call parser is already defined in the postfix expression parser as an alternative. We have to be smart to fix that.

Add line numbers

The AST does not currently contain line numbers. It would be handy to have for reporting semantic analysis problems.

Make all parsers eat &str instead of &[u8]

This will help with both developing and error reporting.

Use piglit shader_runner tests in CI

CI could git clone the piglit project, and have a little parser that runs over glslparser and shader_test files (other than compile-failure ones) and runs them through our parser.

Parsing failures don't cause parsing to fail

The following does not give an error when parsing when it should:

int fetch_transform(int id)
{
    return id;
}

bool ray_plane()
{
    if 1 {
}

Instead it gives back an syntax tree containing only the first function.

Parser Error on #define

I'm guessing this is just confusion on my part:
When trying to parse a shader with a #define declaration, the parser exists with an error.

extern crate glsl;

const SOURCE: &str =  r#"
#define X 1
void main() {
}
"#;

use glsl::parser::Parse;

fn main() {
    let res = glsl::syntax::TranslationUnit::parse_str(SOURCE);
    println!("{:?}", res);
}

Results in

Err(ParseError { kind: Many1, info: "" })

Documentation on Preprocessor says #define is supported by substitution. Implementation seems to disagree.

Am I missing a step or am I misinterpreting the docs?

The pest_parsers should be private

It’s currently pub for convenience when generating the documentation as I’m experimenting around with pest.

Ensure subroutine type names can be expressed as identifiers

The current implementation uses identifiers to represent those type names while the spec uses TYPE_NAME without even defining what the heck it is.

Try to optimize the expression parser

Some benchmarks are needed to ensure what the problem is, but I’m pretty sure (given the current nom-3 implementation) that we have a lot of failures and retries.

Functions’ identifiers shouldn’t be allowed to be constructed from empty strings

The current ExternalDeclaration::new_fn doesn’t do that check yet.

In order to fix this, #47 must be considered first.

Structs should be allowed to be constructed if…

Their identifier is empty or ill-formed (see #47).
They have zero fields.

Generate Type-Safe Bindings for OpenGL?

I ask this question knowing that it's beyond the scope of this crate. It just seems like the best to ask in case someone else is wondering the same thing.

Is there a project to use this crate to generate type-safe bindings for communication with OpenGL shaders? If not, have you thought about if that is feasible/what that might look like?

I ask this because of your work on luminance-derive and on this crate. Thanks in advance!

Cleanup layout

The current layout produces poor documentation,
the module parser.rs holds the external interface and the nom rules. It is pretty cumbersome to read.
A solution would be to separate the external interface from the nom rules, the question is: what is the external interface? would the following be enough?

pub fn parse(source: &[u8]) -> ParseResult<TranslationUnit>

Preprocessor define with arguments

We want to support the following syntax:

#define FOO(x, y) (x + y)

Provide correct implementation for Display

quasiquote doesn't agree on block definition

quasiquote's tokenize_block produces a block containing a fields: glsl::syntax::NonEmpty(vec![fields]), where the base glsl crate expects just fields: vec![fields], resulting in an error:

    | |_________expected struct `std::vec::Vec`, found struct `glsl::syntax::NonEmpty`
    |           in this macro invocation
    |
    = note: expected type `std::vec::Vec<glsl::syntax::StructFieldSpecifier>`
               found type `glsl::syntax::NonEmpty<glsl::syntax::StructFieldSpecifier>`

Happy to send a patch to correct this, but I'm not clear which is the desired structure - should we be adding a NonEmpty node to quasiquote, or removing it from glsl?

Preprocessor define with arguments including whitespace

Would be nice to support the following too:

#define FOO( x, y ) ( x + y )

(Mind the spaces inside the braces!)

SPIR-V transpiler

I implemented the first, naive (yet fully working) GLSL writer in less than 12 hours. I think it’s worth it to write a SPIR-V writer as well, and it shouldn’t take too much time.

It’s not my own priority right now (because I don’t use vulkan nor GL4.6 yet) but if someone provides me with a fully working patch, I’ll accept it for sure.

#if and #ifdef fail to parse

Any input with #if or #ifdef fails to parse with an error of ErrorKind Custom(0).

eg:

use glsl::parser::{Parse, ParseResult};
use glsl::syntax::TranslationUnit;

fn main() {
    let fs = "#define USE_GLOBAL_COLOR  1

            uniform vec4 color;
            out vec4 out_color;

            void main() {
                #if USE_GLOBAL_COLOR
                    out_color = color;
                #else
                    out_color = vec4(1., 0., 0., 1.);
                #endif
            }";

    let parsed = match TranslationUnit::parse_str(fs){
        ParseResult::Ok(parsed) => parsed,
        ParseResult::Incomplete(_needed) =>
            panic!("More data needed to parse shader"),
        ParseResult::Err(err) =>
            panic!("Error parsing shader: {}", err)
    };
}

panics with error Custom(0). removing the ifs makes the parser work correctly

TypeSpecifier is wrong as it doesn’t have Option<ArraySpecifier> yet

Expression grammar

Expr <- AssExpr | Expr , AssExpr
AssExpr <- CondExpr | UnaExpr AssOp AssExpr
CondExpr <- LOrExpr | LOrExpr ? Expr : AssExpr
LOrExpr <- LXorExpr | LOrExpr \|\| LXorExpr
LXorExpr <- LAndExpr | LXorExpr ^^ LAndExpr
LAndExpr <- IOrExr | LAndExpr && IOrExpr
IOrExpr <- EOrExpr | IOrExpr \| EOrExpr
EOrExpr <- AndExpr | EOrExpr ^ AndExpr
AndExpr <- EqExpr | AndExpr & EqExpr
EqExpr <- RelExpr | EqExpr == RelExpr | EqExpr != RelExpr
RelExpr <- ShiftExpr | RelExpr < ShiftExpr | RelExpr > ShiftExpr | RelExpr ≤ ShiftExpr | RelExpr ≥ ShiftExpr
ShiftExpr <- AddExpr | ShiftExpr << AddExpr | ShiftExpr >> AddExpr
AddExpr <- MultExpr | AddExpr + MultExpr | AddExpr - MultExpr
MultExpr <- UnaExpr | MultExpr * UnaExpr | MultExpr / UnaExpr | MultExpr % UnaExpr
UnaExpr <- PostExpr UnaOp
PostExpr <- PrimExpr | PostExpr [ IntExpr ] | FunCall | PostExpr . FieldSel | PostExpr ++ | PostExpr --
PrimExpr <- IDENTIFIER | INTCONST | UINTCONST | FLOATCONST | BOOLCONST | DOUBLECONST | ( Expr )

// and FunCall has a FunIdentifier, which has a PostExpr in it…

glsl-3.0: unresolved questions

I’m about to release glsl-3.0 but there are still some things that make me uncomfortable. The current situation with the preprocessor is a bit uncertain, as we are typing it (e.g. here, here). In my opinion, we should only use String here as at the stage of preprocessing, GLSL types don’t really exist yet.

The main issue I have with that is linked to the actual usefulness of such a representation. Given the parsed AST, I wonder how easy it is to actually preprocess the AST:

fn preprocess(ast: TranslationUnit) -> Result<TranslationUnit, PreprocessorError>;

Maybe we can add that function to see how things are actually going on. In the meantime, it’s likely that I change those Expr to String.

glsl::syntax::StructSpecifier::fields should use NonEmpty instead of Vec

Fuzzer

We need fuzzer support. Plus, each time the fuzzer finds a bad case, we need to include it as a dedicated file in tests/fuzz/ and include_bytes! it to enhance the unit tests.

Expressions associativity is parsed wrong

in vec4 col;

void main() {
    gl_Position = col - col - col;
}

parses as:

Assignment(
    Variable(
        Identifier(
            "gl_Position",
        ),
    ),
    Equal,
    Binary(
        Sub,
        Variable(
            Identifier(
                "col",
            ),
        ),
        Binary(
            Sub,
            Variable(
                Identifier(
                    "col",
                ),
            ),
            Variable(
                Identifier(
                    "col",
                ),
            ),
        ),
    ),
),

which transpiles to:

in vec4 col;
void main() {
        gl_Position = (col)-((col)-(col));
}

which has the wrong associativity.

Authorize multiline merging ('\')

The character \ should be supported to span a multiline.

Declarations of uniforms, in, out and stuff should live in Global, not InitDeclaratorList

Parsing stops when encountering an unsized array

The following shader:

buffer Foo {
    char tiles[];
} main_tiles;

void main() {
}

Gives me this:

Done(
    [
        70,
        111,
        111,
        32,
        123,
        10,
        32,
        32,
        32,
        32,
        99,
        104,
        97,
        114,
        32,
        116,
        105,
        108,
        101,
        115,
        91,
        93,
        59,
        10,
        125,
        32,
        109,
        97,
        105,
        110,
        95,
        116,
        105,
        108,
        101,
        115,
        59,
        10,
        10,
        118,
        111,
        105,
        100,
        32,
        109,
        97,
        105,
        110,
        40,
        41,
        32,
        123,
        10,
        125,
        10
    ],
    [
        Declaration(
            Global(
                TypeQualifier {
                    qualifiers: [
                        Storage(
                            Buffer
                        )
                    ]
                },
                []
            )
        )
    ]
)

In other words the parsing stops when encountering the [] and doesn't process what is afterwards.

Cannot parse legit expression

This doesn’t parse:

void main() {
  float a = 1. * .5;
}

Remove recoverable parsers as much as possible

In nom, a parser can fail by either:

Being recoverable. That means that the error can be ignored by the parent parser and try another parser instead. That is key when using the alt parser combinator for instance (e.g. it tests one parser; if it fails, it tests the next one in the tuple). glsl uses such parsers a lot.
Being unrecoverable. That means that the error cannot be ignored and nothing else can be done: it’s a syntax error in the glsl domain. For instance, a sequence of parser requires all parsers to succeed. Any parser failing put the whole parser in an unrecoverable state. Unrecoverable parsers are neat because they can short-circuit all remaining branches.

As a motivation for this issue, syntax::Expr parsers have been written by following the GLSL450 specification strictly, without optimizing the nom code with recoverable / unrecoverable parsers in mind. The resulting code implies a lot of parsers being recovered over and over, and over, and… over. Especially, this situation occurs in glsl:

fn some_parser(i: &str) -> ParserResult<_> {
  // here we have an alternative, which means we can recover subparsers
  alt((
    foo,
    bar
  ))(i)
}

fn foo(i: &str) -> ParserResult<_> {
  terminated(zoo, quux)(i)
}

fn bar(i: &str) -> ParserResult<_> {
  terminated(zoo, other_parser)(i)
}

As you can see, we are going to try foo and if it fails, we’ll try bar. Both foo and bar will fail if we cannot parse with zoo. The problem is that zoo’s parent recovers it, so everytime bar suceeds, it means two things:

foo has failed, which means it failed to parse quux while succeeded to parse zoo.
bar will parse with zoo again.
Parsing bar will run zoo twice, everytime.

The goal of this issue is to optimize those by turning them in a form such as:

fn some_parser(i: &str) -> ParserResult<_> {
  // we know all parsers in the alternative will require zoo, so parse it first
  terminated(
    zoo,
    alt((
      foo,
      bar
    ))
  )(i)
}

fn foo(i: &str) -> ParserResult<_> {
  quux(i)
}

fn bar(i: &str) -> ParserResult<_> {
  other_parser(i)
}

hadronized / glsl Goto Github PK

glsl's Introduction

OpenGL Shading Language parser and AST manipulation crates

glsl's People

Contributors

Stargazers

Watchers

Forkers

glsl's Issues

The State Machine™

Recommend Projects

Recommend Topics

Recommend Org