Git Product home page Git Product logo

simpledb's Introduction

Hey there, I'm Don ๐Ÿ‘‹

I am a Software Engineer at ShopBack. My interest area include Software/Web Development, Data Engineering and DevOps. Talk to me about databases, Big Data technologies, web development and DevOps.

GitHub/WakaTime Stats

simpledb's People

Contributors

don-tay avatar tishyakhanna97 avatar xinyee20 avatar

Watchers

 avatar  avatar

simpledb's Issues

Query results wrong when index is used

Under working branch for #4 , when a table is queried with a predicate that makes use of the index, the query results returns no rows all the time. #4 uses the improved planners - HeuristicQueryPlanner and IndexUpdatePlanner

Steps to replicate:

  1. Delete existing studentdb directory.
  2. Run test/CreateStudentDB to re-create studentdb with populated data and an index on Student(sid)
  3. Run database with test/SimpleIJ.
  4. Run the following DDL to confirm table is populated: SELECT sid, sname from student
  5. Run any DDLs that makes use of the index on Student(sid)
    Eg.
    SELECT sid, sname from student where sid = 1
    SELECT sid, sname from student where sid <> 0
    SELECT sid, sname from student where sid < 10
    SELECT sid, sname from student where sid > 5

Query bank

List of queries to test on:

select sid,sname from student order by sname
select sid,sname from student order by sname desc
select SId, Sname, GradYear from student order by GradYear asc, Sname desc

3 table join query:

select sid,sname,dname,grade from student,dept,enroll where sid=studentid and majorid=did
select sid,sname,eid,sectid from student, enroll, section where sid=studentid and sectionid=sectid

3 table join with distinct/group by:

select distinct did, title, prof from dept, course, section where did=deptid and cid=courseid order by prof
select distinct sid,sname,dname,title,grade from student,dept,course,enroll where sid=studentid and deptid=did and majorid=did and sectionid = sectid

4 table join query:

select sid,sname,dname,title,grade from student,dept,course,enroll where sid=studentid and deptid=did and majorid=did

Error queries:

select dname,count(sid) from student, dept where majorid=did group by dname order by dname desc

Bug Bounty 4: Joins (Nested, Index, Sort, Hash)

Lab 4

  • Check costing for all joins
  • Check for null pointers in the case of empty tables
  • For sort-merge join, you will find that the sorting phase may stops when there are 2 runs. Change the code to generate 1 sorted run.

Bug Bounty 3: Sorting and order by

Lab 3

  • Ensure order by works (no parser errors), and default order is asc
  • Ensure order by works for multiple columns
    select SId, Sname, GradYear from student order by GradYear asc, Sname desc
  • Check costing of Sort Plan

Bug: Incorrect I/O count for sort merge join operation

blocksAccessed method for sort merge join is implemented incorrectly: https://github.com/don-tay/simpledb/blob/nested-loop-join/src/simpledb/materialize/MergeJoinPlan.java#L55-L66

This results in sort-merge join always having the least block accessed, and being the join method used by the table planner.

SQL> select sid, sname, eid from student, enroll where sid = studentid
p1Cost: 19 p2Cost: 3 p3Cost: 13
Running sort merge
 sid sname eid
--------------
  1  joe 14
  1  joe 24
  2  amy 34
  4  sue 44
  4  sue 54
  6  kim 64

without idx on sid:

SQL> select sid, sname, eid from student, enroll where sid = studentid
p1Cost: 2147483647 p2Cost: 3 p3Cost: 13
Running sort merge
 sid sname eid
--------------
  1  joe 14
  1  joe 24
  2  amy 34
  4  sue 44
  4  sue 54
  6  kim 64

Context: p1 - IndexJoinPlan | p2 - MergeJoinPlan | p3 - NestedLoopsJoinPlan

Originally posted by @xinyee20 in #11 (comment)

Bug: Empty results in input tables throwing error

When a source table is empty, the following operations will throw an error:

  1. Join (except nested loop join)
  2. Cross product
  3. Sort

This is caused by the implementation in their respective scans (ie. SortScan, ProductScan, IndexJoinScan)

For MergeJoinScan, it uses SortScan as an intermediate step. Hence, it should be fixed when SortScan is resolved.

Sample queries:
1: Sort scan

select sname from student where sname='mary' order by sname

2: Product scan

select sname from student, enroll where sname='mary'

3: Joins (ie. merge join)

select eid, sectid from enroll, section where sectionid=sectid and prof='abc'
  1. Index-join (may need to comment out other joins due to bug in #14 )
select sname from student, enroll where studentid=sid and sname='mary'

Sample error:

SQL> select eid, sectid from enroll, section where sectionid=sectid and prof='abc'
Running sort merge
 eid sectid
-----------
SQL Exception: java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0

Feature: Support for display of query plan

Query to trigger query plan: EXPLAIN ANALYZE <stmt>
Examples:

  1. EXPLAIN ANALYZE SELECT sid, sname FROM student
  2. EXPLAIN ANALYZE SELECT DISTINCT sid, sname from student, enroll WHERE sid = studentid

Sorting cost is wrong

We seem to be outputting negative/very low sorting costs, and this leads to the preference of the sortjoin over other joins

Invalid predicate not caught

Example queries:

select sid from student where bla=123
select sid, eid from student,enroll where sid=studentid and abc=def

Feature: Aggregate functions and GROUP BY

Implement aggregate functions (listed below) and GROUP BY (sort-based implementation) clause. Multiple fields should be supported in GROUP BY.

Aggregate functions to be implemented:
SUM
COUNT
AVG
MIN
MAX

Bug bounty 2: Indexes

Lab 2

Check if non equality predicates work fine:

  • Create both type of indexes and ensure command to create works as expected
  • Check costing for hash and btree index lookup

Feature: DISTINCT clause

Support for DISTINCT clause
Example queries:

  1. SELECT DISTINCT sid, sname FROM student
  2. SELECT DISTINCT sid, sname from student, enroll WHERE sid = studentid
  3. SELECT DISTINCT sid, sname, eid from student, enroll WHERE sid = studentid

Query using hash join implementation fails to run when stacked with many plans

When a query uses a hash join, and has 3 or more plans (eg. group by, distinct, sort, joins/cross product)

Example query:

select distinct prof from student, enroll, section where sid=studentid and sectionid=sectid
Running hash join
Running index join
 prof
-----
java.lang.IndexOutOfBoundsException
        at java.base/java.nio.Buffer.checkIndex(Buffer.java:745)
        at java.base/java.nio.DirectByteBuffer.getInt(DirectByteBuffer.java:818)
        at simpledb.file.Page.getInt(Page.java:21)
        at simpledb.tx.Transaction.getInt(Transaction.java:117)
        at simpledb.record.RecordPage.getInt(RecordPage.java:32)
        at simpledb.record.TableScan.getInt(TableScan.java:48)
        at simpledb.record.TableScan.getVal(TableScan.java:57)
        at simpledb.materialize.HashJoinScan.next(HashJoinScan.java:80)
        at simpledb.query.SelectScan.next(SelectScan.java:34)
        at simpledb.index.query.IndexJoinScan.beforeFirst(IndexJoinScan.java:47)
        at simpledb.query.SelectScan.beforeFirst(SelectScan.java:30)
        at simpledb.query.ProjectScan.beforeFirst(ProjectScan.java:28)
        at simpledb.materialize.SortPlan.splitIntoRuns(SortPlan.java:95)
        at simpledb.materialize.SortPlan.open(SortPlan.java:42)
        at simpledb.plan.DistinctPlan.open(DistinctPlan.java:48)
        at test.SimpleIJ.doQuery(SimpleIJ.java:69)
        at test.SimpleIJ.main(SimpleIJ.java:39)

Similar results can be obtained for the query below, which has the same number of plans used by HeuristicQueryPlanner:

select prof from student,enroll,section where studentid=sid and sectionid=sectid group by prof

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.