Git Product home page Git Product logo

nom-sql's Introduction

nom-sql

Build Status

An incomplete Rust SQL parser written using nom.

This parser is a work in progress. It currently supports:

  • most CREATE TABLE queries;
  • most INSERT queries;
  • simple SELECT queries;
  • simple UPDATE queries; and
  • simple DELETE queries.

We try to support both the SQLite and MySQL syntax; where they disagree, we choose MySQL. (It would be nice to support both via feature flags in the future.)

nom-sql's People

Contributors

alanamarzoev avatar alexsnaps avatar benesch avatar bschwind avatar chernomor avatar eflanagan0 avatar efx avatar ekmartin avatar fintelia avatar frihed avatar greizgh avatar gzsombor avatar ivanceras avatar jmbredenberg avatar jonhoo avatar larat7 avatar lovasoa avatar ms705 avatar samedhg avatar spazm avatar tomharmon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nom-sql's Issues

Disallow illlegal trailing commas in order by clause

Currently on master:

order_expr: many0!(

In src/order.rs, the order by clause allows for trailing commas, which is technically invalid SQL.
Eg, "select * from users order by name ascage desc,\n" will parse, but should not

In the incoming bump nom to v5 PR, this is on line 76 of the same file:

76        many0(order_expr),

parse sub query fail

#[test]
fn test_nomsql() {
    let sql = r#"SELECT COUNT(*) FROM (SELECT id, user_id FROM test WHERE id = 1) t"#;
    let s=nom_sql::parser::parse_query(sql);
    if s.is_err(){
        panic!("{}",s.err().unwrap())
    }
}
running 1 test
test test_nomsql    ... FAILED

failures:

---- bench_nomsql stdout ----
thread 'main' panicked at 'failed to parse query', benches/bench_parser.rs:24:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/core/src/panicking.rs:64:14
   2: core::panicking::panic_display
   3: core::ops::function::FnOnce::call_once
   4: core::ops::function::FnMut::call_mut
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/core/src/ops/function.rs:337:5
   5: test::bench::Bencher::bench
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/bench.rs:56:22
   6: test::bench::benchmark::{{closure}}
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/bench.rs:208:51
   7: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/core/src/panic/unwind_safe.rs:271:9
   8: std::panicking::try::do_call
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panicking.rs:483:40
   9: std::panicking::try
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panicking.rs:447:19
  10: std::panic::catch_unwind
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/std/src/panic.rs:137:14
  11: test::bench::benchmark
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/bench.rs:208:18
  12: test::run_test
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:628:13
  13: test::run_tests
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:465:31
  14: test::console::run_tests_console
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/console.rs:293:5
  15: test::test_main
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:140:15
  16: test::test_main_static
             at /rustc/ec56537c4325ce5b798fc3628cbdd48ba4949ae5/library/test/src/lib.rs:159:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    bench_nomsql

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.01s

error: bench failed, to rerun pass `-p jdb --bench bench_parser`

support MySQL special character escape sequences

Hi!
Thank you for this great library.
I am trying to use it to parse wikipedia dumps in my project wikipedia-externallinks-fast-extraction.
Unfortunately, they contain mysql escape characters that are currently not supported by this library.

Unsupported characters

The escape characters are:

\0
\'
\"
\b
\n
\r
\t
\Z
\\
\%
\_

Example

INSERT INTO externallinks VALUES (23481,120102,'http://home.arcor.de/jean-polmartin/aufsaetze/apliut.htm\'','http://de.arcor.home./jean-polmartin/aufsaetze/apliut.htm\'','http://de.arcor.home./jean-polmartin/aufsaetze/apliut.htm\'');

parse very slow

#![feature(test)]
extern crate test;
use test::Bencher;
use sqlparser::dialect::MySqlDialect;
use sqlparser::parser::Parser;

#[bench]
fn bench_sqlparser(b: &mut Bencher) {
    let pg_dialect = MySqlDialect {};
    let sql = r#"SELECT COUNT(*) FROM t"#;
    let _ = Parser::parse_sql(&pg_dialect, sql).unwrap();
    b.iter(||{
        Parser::parse_sql(&pg_dialect, sql).unwrap();
    });
}

#[bench]
fn bench_nomsql(b: &mut Bencher) {
    let sql = r#"SELECT COUNT(*) FROM t"#;
    let s=nom_sql::parser::parse_query(sql);
    if s.is_err(){
        panic!("{}",s.err().unwrap())
    }
    b.iter(||{
        nom_sql::parser::parse_query(sql).unwrap();
    });
}
  • result
running 2 tests
test bench_nomsql    ... bench:      10,228 ns/iter (+/- 352)
test bench_sqlparser ... bench:       2,832 ns/iter (+/- 64)

No Delete Query?

Hi, your library seem to have a comprehensive structure of queries. I noticed that the DeleteStatement is lacking though.

Support partition selection with PARTITION option

MySQL supports the ability to choose which partitions of a given table will be selected from/changed. We will want to be able to parse this when they show up anywhere in for DELETE, INSERT, REPLACE, SELECT and UPDATE, so I believe it will work best to parse this within the table_reference and schema_table_reference parsers.

Support REPLACE INTO command

I originally opened this as a ticket relating to some INSERT changes but felt it would be better under it's own ticket/commit since it is its own command.

Support nested selection for FROM clauses

The join clauses appear to properly support nested selection but the from clause only looks for a table list. We will want to be able to support parsing these statements since nested selection shows up in the from clause fairly often from my experience.

Support compound nested select statements where permitted

Currently parsers that accept subqueries call the nested_selection parser which won't catch unions compound_selection will. I believe anywhere that a subquery or a filter section of DML commands should always permit a compound selection.

Problem parsing certain CASE statements

For some reason the parser isn't able to parse this fairly simple case statement: CASE WHEN country = 'USA' THEN state ELSE city END. I couldn't find any obvious reason that the case_when_column would miss this one.

Support nested arithmetic expressions

As mentioned in the comments of arithmetic.rs, operator precedence is not currently supported and neither is nested arithmetic expressions.

I plan on eventually submitting a PR to support this but first wanted to open an issue so others may be aware of what I am working on in case they want to also contribute.

I have minimal experience with parsers, but after some research found that Pratt parsers are probably best for this purpose, and plan to implement it using this approach barring any comments/suggestions from other contributors

fix unused doc comment warnings

When running the tests for the first time with cargo test I noticed multiple warnings about unused doc comments:

warning: unused doc comment
    --> src/common.rs:1021:1
     |
1021 |   /// Parse rule for a comment part.
     |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1022 | / named!(pub parse_comment<CompleteByteSlice, String>,
1023 | |     do_parse!(
1024 | |         opt_multispace >>
1025 | |         tag_no_case!("comment") >>
...    |
1029 | |     )
1030 | | );
     | |__- rustdoc does not generate documentation for macro expansions
     |
     = help: to document an item produced by a macro, the macro must produce the documentation as part of its expansion

Should we follow the help suggestion and include the documentation line(s) in the macro or remove the documentation lines for now?

Add support for comments

It would be nice to be able to parse SQL statements with comments in them.

Example

create--blah blah
table--blah blah
tt--blah blah
(--blah blah
x--blah blah
)--blah blah
;

Support selecting literals only

In standard SQL you may select literals without a table, eg SELECT 1 + 1;. This is not supported by nom-sql but in a perfect world it would be.

comment from @spazm (on the bump nom to v5 PR):
nested_selection could attempt to parse with something more specific than field_definition_expr like field_definition_literals(hypothetical) which would parse without the FROM (or FROM as optional).

Upgrade to latest version of nom

The version of nom currently used by nom-sql is fairly old. In and of itself, this isn't so bad (though the newer nom presumably has performance and correctness fixes), but it causes some pain with dependencies. Specifically, the old version of num depends on regex 0.1, whereas regex is now at 0.2 (which is incompatible according to Rust's semver). regex 0.1 in turn has a bunch of outdated dependencies (compare 0.1 to current). These in turn pull in more old dependencies, etc. For crates that use nom-sql this causes many dependencies to be pulled in twice (one old, one new), which has a detrimental effect on compilation time and space usage.

I know fixing this is a bit annoying, as the nom api has changed a fair bit, but it's probably eventually worth doing.

Feature: support comment on CREATE statements

Right now parsing a CREATE TABLE statement will fail:

  • if there are comments on some field
  • if there is a comment on the table itself

Exemple:

CREATE TABLE withcomments (
...
field VARCHAR(100) COMMENT 'some field',
...
) COMMENT='table description';

It would be nice to support those.
I'm not yet really familiar with nom, but I would gladly help on this feature.

Only match balanced parenthesis

In master (and on the incoming bump nom to v5 PR), compound_selection will match unbalanced paren pairs. We only want "(nested)" or "nested" to match, but using opt for both paren tags means that "nested)" and "(nested" will also match.

There may be other similar unbalanced issues with parens in other areas of the code or trailing commas.

This might work if the incoming bump nom to v5 PR is merged, but is definitely not pretty. Maybe it could be extracted into a "opt_parenthetical", ore more general opt_delimited :

alt(
    delimited(opt(tag("(")), nested_selection, opt(tag(")"))),
    nested_selection,
)

(line 110 of src/compound_select in the bump nom to v5 PR)

Improved support for CREATE TABLE parsing

Per the following comment I stumbled upon for the creation parser, // TODO(malte): support types, TEMPORARY tables, IF NOT EXISTS, AS stmt I wanted to get an issue open to add these features along with a few other missing pieces. I will add comments as I figure out what missing pieces those may be.

Support Calculated ORDER BY/GROUP BY clauses

Both ORDER BY AND GROUP BY can have a positional in place of a column name (the positional and column names can be used in the same list) and any conditional expression. Ideally, the parser would able to handle these situations.

support mysql table option DEFAULT CHARSET=binary

Hi,
I'm still trying to parse mysql dumps from wikipedia, and I'm having problems with mysql table options (table_option in the docs).

Example statement that fails to parse

CREATE TABLE x (b integer)  ENGINE=InnoDB AUTO_INCREMENT=413856661 DEFAULT CHARSET=binary;

Support stand-alone HAVING clauses

Currently the parser has the HAVING clause parsed as part of the GROUP BY clause when it was valid, this is needed in cases where a condition on a calculated value is needed and a GROUP BY is inappropriate such as SELECT 1 + 2 as adder HAVING adder > 2.

Queries that contain arbitrary bytes cannot be parsed

SQL files can contain any byte sequence. However, this library only exposes a way to parse a rust string (that is, a sequence of unicode codepoints). This makes it impossible to parse some SQL files (such as the wikipedia dumps I am currently working with), as thay contain byte sequences that are not valid utf-8.

The api should expose a function that takes an &[u8] instead of an &str.

For handling byte sequences that are not valid utf8 in literal strings, I see two possibilities:

  • Using the already existing Blob(Vec<u8>) (the information that the literal was a string and not a blob would be lost)
  • Using a fault-tolerant utf8 decoder like rust-encoding (invalid characters would be lost).

Upgrade to nom v5

Nom recently released v5 and nom-sql still uses v4. Just wondering if there is a plan to upgrade and what that plan might be.

One thing to note: nom v5's macros were rewritten to use the streaming version of the parsers. My assumption would be that nom-sql would only want to deal with complete parsers when upgrading to nom v5.

Nom has good documentation on upgrading to nom v5 here

Support :named place holders

Most of the Rust libraries for accessing MySQL allow named parameters using the :name syntax, it would be ideal to be able to parse these as an ItemPlaceholder::Named(String).

Improve error handling

Currently, when parse fails, the user only gets a very uninformative error string as a result.
It would be nice to have a custom error type with useful functionality , such as knowing where the error happened.

Optional multispace when next to parenthesis

These 2 queries

INSERT INTO user(name, password) VALUES('aaa','xxx');
INSERT INTO user (name, password) VALUES ('aaa','xxx');

are both valid, but nom-sql only parses the 2nd correctly.
I'm guessing this is a bit of complex to do in nom, where the multispace becomes optional when tag!('(') is in the next few bytes.

nom-sql/src/insert.rs

Lines 23 to 40 in 8fe4e92

table: table_reference ~
multispace ~
fields: opt!(chain!(
tag!("(") ~
multispace? ~
fields: field_list ~
multispace? ~
tag!(")") ~
multispace,
|| { fields }
)
) ~
caseless_tag!("values") ~
multispace ~
tag!("(") ~
values: value_list ~
tag!(")") ~
statement_terminator,

Add placeholder `:1` and `$1`

This placholder ? working but :1 and $1 not work.

Example

let sql = "DELETE FROM users WHERE user = $1 AND password = $2";
    let res = parse_query(sql);
    assert!(res.is_ok());
// or
 let sql = "DELETE FROM users WHERE user = :1 AND password = :2";
    let res = parse_query(sql);
    assert!(res.is_ok());

Result:

Err(
    "failed to parse query"
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.