Git Product home page Git Product logo

sql-on-the-fly-1's Introduction

SQL-on-the-fly

Allow users perform SQL over csv dataset without putting it into SQL database.

Run the program:

  1. Put all the .csv files in to the SQL-on-the-fly/ folder.

  2. Modify setup.sh to ensure you choose the correct python2.7 version according to your local environment

  3. Make sure you have the newest version of all the Python packages listed in the requirements.txt

  4. Change the 'flist' and 'nlist' in index.py to the .csv files you use.

  5. Change the 'flist' and 'nlist' in disk.py to the .csv files you use.

  6. Run setup.sh to create all supporting and indexing folders/files automatically.

./setup.sh
  1. Start the program and you don't need to wait for anything.
python myproject.py
  1. Run the queries. The program will take a query statement and output the result and query time to the console. Some sample queries are listed below.

  2. Exit the program.

exit

Query Instruction and Formatting

SELECT basics

Single table

Use attribute of the csv file directly inside SELECT and WHERE clasure. For example:

SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4

Multi table

Use the abbreviation of the Table and the attribute of the csv file along with a '__' inside the query.

SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id

FROM basics

Single table

No abbreviation, just the name of the csv file (without '.csv'). For example:

FROM review-1m

Multi table

Must include abbreviation after the name of the csv file. For example:

FROM review-1m R1, review-1m R2

WHERE basics

Single table

No quotation mark on string. For example:

SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4 AND city = Champaign

Multi table

Join conditions go here. Conditions order matter. Please use your domain knowledge to manipulate the order. Attribtue must along with there name. For example:

WHERE B__city = Urbana

LIKE operation must be warpped into a quotation mark. For example:

SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id

NOTICE: [MODE 0] WHERE conditions in multi table join after the join: Numeric values and string need to be inside the quotation mark. Use [MODE 1] if you don't want to pay attention to the quotation mark. [MODE 0] will be slightly faster than [MODE1] since [MODE 0] will not need to tranfer data type.

Demo queries:

  1. SELECT review_id, funny, useful FROM review-1m WHERE funny >= 20 AND useful > 30

Time: 0.261s

  1. SELECT name, city, state FROM business WHERE city = Champaign AND state = IL

Time: 0.082s

  1. SELECT B__name, B__postal_code, R__stars, R__useful FROM business B, review-1m R WHERE B__name = Sushi Ichiban AND B__postal_code = 61820 AND B__business_id = R__business_id

Time: 0.379s

  1. SELECT R1__user_id, R2__user_id, R1__stars, R2__stars FROM review-1m R1, review-1m R2 WHERE R1__useful > 50 AND R2__useful > 50 AND R1__business_id = R2__business_id AND R1__stars = '5' AND R2__stars = '1'

Time: 0.277s

  1. SELECT B__name, B__city, B__state, R__stars, P__label FROM business B, review-1m R, photos P WHERE B__city = Champaign AND P__label = inside AND B__state = IL AND B__business_id = P__business_id AND B__business_id = R__business_id AND R__stars = '5'

Time: 0.745s

  1. SELECT B__name, R1__user_id, R2__user_id, B__address FROM business B, review-1m R1, review-1m R2 WHERE R1__useful > 50 AND R2__useful > 50 AND B__business_id = R1__business_id AND R1__business_id = R2__business_id AND R1__stars = '5' AND R2__stars = '1'

Time: 0.444s

Sample queries:

  1. SELECT * FROM photos

Time: 0.359s

  1. SELECT DISTINCT stars FROM review-1m

Time: 0.004s

  1. SELECT DISTINCT stars, useful FROM review-1m

Time: 7.694s

  1. SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4

Time: 0.391s

  1. SELECT review_id, stars, useful FROM review-1m WHERE useful > 20 AND stars >= 4 - 0

Time: 0.208s

  1. SELECT review_id, stars, useful FROM review-1m WHERE useful > 10 AND (useful < 20 OR stars >= 4)

Time: 0.306s

  1. SELECT B__city, B__state, R__business_id, R__stars, R__useful FROM business B, review-1m R WHERE B__city LIKE "Champaign" AND B__state LIKE "IL" AND B__business_id = R__business_id

Time: 0.427s

  1. SELECT DISTINCT B__name FROM business B, review-1m R, photos P WHERE B__city = Champaign AND B__state = IL AND P__label = inside AND R__stars = 5 AND B__business_id = P__business_id AND B__business_id = R__business_id

Time: 6.765s

SELECT DISTINCT B__name FROM business B, review-1m R, photos P WHERE B__city = Champaign AND B__state = IL AND P__label = inside AND B__business_id = P__business_id AND B__business_id = R__business_id AND R__stars = '5'

Time: 0.745s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.