Git Product home page Git Product logo

pythonql's Introduction

pythonql

PythonQL is an extension to Python that allows language-integrated queries against relational, XML and JSON data, as well an Python's collections

Python has pretty advanced comprehensions, that cover a big chunk of SQL, to the point where PonyORM was able to build a whole ORM system based on comprehensions. However, group by mechanisms, outerjoins and support for semi-structured data are not handled well at all.

We propose the following extensions to Python( that are implemeneted in this demo preprocessor and query executor):

  • Path expressions. When working with nested data that has varied structure, path expressions are extremely useful. We have modeled our path expression on XPath, however we use a much simplified verison:

  • Child step: for x in data ./ _ or for x in data ./ expr where expr must evaluate to string

  • Descendants step: for x in data .// _ or for x in data ../ expr where expr must evaluate to string

So we can write path expression in the query language (and elsewhere in Python expressions) like this:

  for x in data ./ "hotels" .// "room"
  • Try-except expressions. Python has try-except statement, but in many cases when working with dirty or semi-structured data, we need to be able to use an expression inside an iterator or the query. So we introduced a try-except expressions:
   try int(x) except 0 for x in values 
  • Tuple constructor. Tuples that have named columns are very useful in querying, however Python's native namedtuple is not very convenient. We have extended Python's tuple constructor syntax:
  (id as employee_id, sum(x) as total_salary)
  • Query expressions: Our query syntax is a strict superset of Python's comprehensions, we extend the comprehensions to do much more powerful queries than they are capable of now.
 [ select (prod,len(p)) 
   for p in sales 
   let prod = p.prod 
   group by prod ]

At the same time our queries look similar to SQL, but are more flexible and of course most of the expressions in the queres are in pure Python. A lot of functionality is cleaner than in SQL, like the window queries, subqueries in general, etc. As in Python, our query expressions can return generators, list, sets and maps.

Documentation

A short tutorial on PythonQL is available here: https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial

Examples

We have a whole site dedicated to various scenarios with lots of queries where PythonQL is especially handy: www.pythonql.org

Here is a small example PythonQL program (we're building a demo website with a number of scenarios that are especially good for solving with PythonQL):

#coding: pythonql
#
# This example illustrates the try-catch business in PythonQL.
# Basically, some data might be dirty, but you still want to be able to write a simple query

from collections import namedtuple
ord = namedtuple('Order', ['cust_id','prod_id','price'])
cust = namedtuple('Cust', ['cust_id','cust_name'])

ords = [ ord(1,1,"16.54"),
         ord(1,2,"18.95"),
         ord(1,5,"8.96"),
         ord(2,1,"????"),
         ord(2,2,"20.00") ]

custs = [ cust(1,"John"), cust(2,"Dave"), cust(3,"Boris") ]

# Basic SQL query, but with some data cleaning
res = [select (name, sum(price) as sum)
        for o in ords
        let price = try float(o.price)  except 0
        for c in custs
        where c.cust_id == o.cust_id
        group by c.cust_id as id, c.cust_name as name]

print (res)

Installing pythonql:

Run pip install pythonql to install pythonql for Python2.7, or pip install pythonql3 for Python 3.x.

Running pythonql:

PythonQL is implemented as a special encoding in a normal python script. When this encoding is specified, the pythonql preprocessor is run, which converts the pythonql syntax to pure python.

So you should have a line in the beginning of your script:

#coding: pythonql

result = [ select y for x in [1,2,3] let y = x**2 ]

Uninstalling pythonql:

PythonQL installs a special file in your library to enable the pythonql encoding. If you decide to uninstall pythonql, run pip uninstall pythonql (or pythonql3) and then delete pythonql.pth file from your Python library.

Help/Bugs/Suggestions:

We have a Google group running, where you can ask any questions, report bugs or suggest improvements: https://groups.google.com/forum/#!forum/pythonql

We also have a telegram group: https://t.me/joinchat/DD9Pag7fUDBrcdVj7uQ2ig

pythonql's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pythonql's Issues

how to use variables in nested queries?

Hi PythonQL Team,

This is not an issue but more of a doubt. I was trying to figure out how to use the nested query pattern for querying JSON with a variable.
I have tried :

ip_val = "Person"
res2 = [ select (l)
match {
"iri": {
"value": as l
},
"label": {
"value": ip_val
}
} as x in ip_json ]

Here I am trying to substitute variable "ip_val" in query under label tag.
I also tried using "for" and let". Can you please guide me for this?

PS: I thought this would be a faster way to reach out to you. :)

Setup?

Hello there,

this looks like a great project. How can I set it up? I tried it using Python 3.5 (virtualenv) but the examples didn't work due to syntax errors.

Best regards

PythonQL3 pattern matching not working

Hi there,

I've recently discovered PythonQL - Wow! I spent my spare time over the last 2 days to go through the intro and tutorial and have created a python script with all the examples in there. Everything works beautifully except for the two pattern matching examples.

The sample data is as follows

people = [ {'first':'John', 'last':'Doe', 'age':30,
             'address' : {'city':'San Jose', 'zipcode':92122}},
           {'first':'Daniela', 'last':'Kleene', 'age':40, 
             'address' : {'city':'San Jose', 'street':'Main', 'zipcode':92314}},
           {'first':'Jerry', 'last':'Lee', 'age':50, 'occupation':'Engineer', 
             'address' : {'city':'San Jose', 'zipcode':93213}}
]

The first pattern matching example is as follows:

res = [ select (l, z)
        match {
         "last" : as l,
         "age" : as a where a > 25,
         "address" : {
             "city" : "San Jose",
             "zipcode" : as z
          }
        } as x in people ]

Now this first example works 100%.

The second pattern matching example is as follows

res = [  select x
          match {
            "last" : as _,
            "first" : as _,
            "age" : as a where a > 25,
            "address" : {
               "city" : as _,
               "zipcode" : as _
            }
          } as x in people ]
]

This second pattern matching example has an error in that there appears to be an extra closing square bracket at the end.

Removing this bracket is obviously fixes the syntax error. This example is however supposed to return only the first record because it matches exactly - meaning it has all the same fields as the query but this example returns every record - according to the tutorial but it returns all the records (maybe this is the correct behaviour?).

If I now add an occupation field to the pattern it matches only the record that contains the occupation field. If I add a street field to the pattern nothing matches. If I now remove the occupation field but leave the street field in the pattern it matches the record that has the street field.

Am I missing something here? Or is the tutorial incorrect?

PythonQL

Hi, if I have a csv

sales
id, customer_name, month, amount
0, A, Jan, 100
1, A, Feb, 200
2, B, Jan, 300
3, C, Mar, 400

What PythonQL would be equivalent to the SQL below? I would like to query sales per customer per month.

SELECT s.customer_name, s.month, SUM(s.amount)
FROM sales AS s
GROUP BY s.customer_name, s.month

`count` does not seem to work

Seems to be an issue with count. This results in Unknown clause encountered

res = [ select (letter, lettercount)
       for letter in "helloworld"
       count lettercount
      ]

If I then apply this patch, it now parses.

diff --git a/pythonql/parser/PythonQLParser.py b/pythonql/parser/PythonQLParser.py
index fb8d304..003f4bd 100644
--- a/pythonql/parser/PythonQLParser.py
+++ b/pythonql/parser/PythonQLParser.py
@@ -835,7 +835,7 @@ class Parser:

   def p_count_clause(self, p):
     """count_clause : COUNT NAME"""
-    p[0] = make_node('count', p)
+    p[0] = make_node('count_clause', p)

   def p_testseq_query(self, p):
     """testseq_query : test_as_list comma_opt

With results:

[{"letter":'h',"lettercount":0}, {"letter":'e',"lettercount":1}, {"letter":'l',"lettercount":2}, {"letter":'l',"lettercount":3}, {"letter":'o',"lettercount":4}, {"letter":'w',"lettercount":5}, {"letter":'o',"lettercount":6}, {"letter":'r',"lettercount":7}, {"letter":'l',"lettercount":8}, {"letter":'d',"lettercount":9}]

Parser crashing when the program is not terminated with a newline

If the program is not terminated with a newline, the parser crashes with an error:

#coding: pythonql

print([ select y for x in [1,2,3] let y = x**2 ])

C:\Python\Scripts\PythonQL>python testPQL.py
'NoneType' object has no attribute 'lineno'
File "testPQL.py", line 1
SyntaxError: encoding problem: pythonql

AttributeError: module '_ast' has no attribute 'Str'

I'm running python in a docker container (image python)

python3.8

#coding: pythonql

result = [ select y for x in [1,2,3] let y = x**2 ]
python test.py
Traceback (most recent call last):
  File "test.py", line 4, in <module>
    print ( PyQuery ( [ For( [ "x" ] , "x" , "[ 1 , 2 , 3 ]" ) , Let( [ "y" ] , "y" , "x ** 2" ) , Select( "y" ) ] , locals ( ) , globals ( ) , "list" ) ) 
  File "/usr/local/lib/python3.8/site-packages/pythonql/Executor.py", line 127, in PyQuery
    plan = rewrite(plan, prior_locs)
  File "/usr/local/lib/python3.8/site-packages/pythonql/Rewriter.py", line 225, in rewrite
    source = get_ast(op.expr)
  File "/usr/local/lib/python3.8/site-packages/pythonql/Ast.py", line 367, in get_ast
    return convert_ast(compile(expr, '<string>', 'eval',ast.PyCF_ONLY_AST).body)
  File "/usr/local/lib/python3.8/site-packages/pythonql/Ast.py", line 107, in convert_ast
    return list_e([convert_ast(x) for x in a.elts])
  File "/usr/local/lib/python3.8/site-packages/pythonql/Ast.py", line 107, in <listcomp>
    return list_e([convert_ast(x) for x in a.elts])
  File "/usr/local/lib/python3.8/site-packages/pythonql/Ast.py", line 118, in convert_ast
    elif isinstance(a,_ast.Str):
AttributeError: module '_ast' has no attribute 'Str'

I fixed it using the image python:3.7

First example in tutorial not working

This is Fedora 32, Python 3.8.5 and pythonql3 0.9.67.

When I run the first example from the tutorial, which is

#coding: pythonql
print([ select y for x in [1,2,3] let y = x**2 ])

I get

Generating LALR tables
WARNING: 34 shift/reduce conflicts
WARNING: 48 reduce/reduce conflicts
WARNING: reduce/reduce conflict in state 52 resolved using rule (comma_opt -> <empty>)
WARNING: rejected rule (path_step -> star_expr) in state 52
WARNING: reduce/reduce conflict in state 263 resolved using rule (comma_opt -> <empty>)
WARNING: rejected rule (path_step -> star_expr) in state 263
WARNING: reduce/reduce conflict in state 494 resolved using rule (star_expr -> * expr)
WARNING: rejected rule (star_expr -> expr) in state 494
Traceback (most recent call last):
 File "test.py", line 4, in <module>
  File "/dante/ctan/query_texcatalogue/pythonql_test/lib64/python3.8/site-packages/pythonql/Executor.py", line 127, in PyQuery
    plan = rewrite(plan, prior_locs)
  File "/dante/ctan/query_texcatalogue/pythonql_test/lib64/python3.8/site-packages/pythonql/Rewriter.py", line 225, in rewrite
    source = get_ast(op.expr)
  File "/dante/ctan/query_texcatalogue/pythonql_test/lib64/python3.8/site-packages/pythonql/Ast.py", line 367, in get_ast
    return convert_ast(compile(expr, '<string>', 'eval',ast.PyCF_ONLY_AST).body)
  File "/dante/ctan/query_texcatalogue/pythonql_test/lib64/python3.8/site-packages/pythonql/Ast.py", line 107, in convert_ast
    return list_e([convert_ast(x) for x in a.elts])
  File "/dante/ctan/query_texcatalogue/pythonql_test/lib64/python3.8/site-packages/pythonql/Ast.py", line 107, in <listcomp>
    return list_e([convert_ast(x) for x in a.elts])
  File "/dante/ctan/query_texcatalogue/pythonql_test/lib64/python3.8/site-packages/pythonql/Ast.py", line 118, in convert_ast
    elif isinstance(a,_ast.Str):
AttributeError: module '_ast' has no attribute 'Str'

Grammar bug in the window clause

The grammar doesn't work that great, the following query:

select y
for sliding window y in x
start at s when True
end at e when e-s == 2

applies == 2 to the whole query, and not the e-s expression

cStringIO no longer in python3

see: https://docs.python.org/3/whatsnew/3.0.html

I am using anaconda 3.6.5, and I was getting the following errors:

Error processing line 1 of C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\pythonql.pth:

  Traceback (most recent call last):
    File "C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site.py", line 168, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "C:\Users\user\AppData\Local\Continuum\anaconda3\lib\site-packages\pythonql\codec\register.py", line 1, in <module>
      import codecs, cStringIO, encodings
  ModuleNotFoundError: No module named 'cStringIO'

I changed register.py line 1 from:

import codecs, cStringIO, encodings

to

import codecs, encodings
from io import StringIO as cStringIO

and things worked as normal.

I am not sure what this will do for python 2 users.

Anyway, I thought I would pass on the bug.

Thanks for your development of this so far.

Switch from ANTLR4 to a different parser tool

ANTLR4 runtime is terribly slow and the parser takes many seconds to parse pretty small programs. Its not the case with the Java runtime of ANTRL4, I have submitted a bug to ANTLR4 guys, but the chances that this will be improved are very small.

Different grammars for different Python versions

Currently PythonQL does support different Python versions for execution.
However, internally Python 3.3 grammar is used, so if you write statements that don't
conform to Python 3.3 grammar, PythonQL will report a syntax error.

We should be able to support multiple versions of Python in PythonQL in the future.

Feature request: use in standard python script with an import statement

Hi!

This project looks really useful!
Would it be possible to wrap it as a python module, to be used in standard python scripts?

The current approach that uses #coding:pythonql is not IDE friendly, and may be obscure when sharing python scripts to colleagues that are not familiar with pythonql

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.