Git Product home page Git Product logo

sainath09 / simple-database-engine Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 1.0 49.01 MB

A simple heap/Sorted file implementation of in memory database engine. This can provide insert,update and search capabilities as in any other database

Makefile 1.88% Shell 1.82% C++ 57.26% Lex 0.67% Yacc 2.15% Perl 1.33% C 33.00% Objective-C 1.89%
database-management query-builder insert database-engine delete create select sum groupby

simple-database-engine's Introduction

Database Engine

This is a simple in-memory database engine to run SQL queries.

We implemented a database engine that creates, Inserts and perform simple queries like sum and group By for a dataset to understand the basics of any SQL query engine. We implemented 2 diffeerent types of data structures for storing it.

  • Heap based implementation
    • Each record of data sits on top of an other. There is no particular order in which data is sequenced.
  • Sorted file Implementation
    • Data is place in the file after sorting on one or more columns defined in the query.
    • Sorting makes query operations on sorted columns efficient.
  • We planned to implement B+ trees as in any other database engine but were not successful. Any one can use this code and implement B+ trees.

Implementation Details

We developed this engine incrementally.

  • We started it of with implementing the heap data base engine.
    • Record.cc
      • It contains all the functions required to process one record. For example toBinary and fromBinary functions convert data to and from binary format to store them in .bin file.
      • composeRecord makes the record with given schema and the source file name.
    • File.cc
      • It is the higher abstraction to Record in Db. File object contains number of records based on block size.
      • The default block size we used as in most of the database engines ins 4 Kb.
    • Schema.cc, Catalog.cc
      • They contain implemtation fuctions for schema of any table in the database. Schema of the database is stored in catalog file.
    • AbstractDbFile.cc, DBFile.cc, Heap.cc, Sort.cc
      • They implement heap data structure and sort data structure. DBFile objects abstracts the heap or sort data structure.
    • BigQ.cc , Pipe.cc, ComparisionEngine.cc and Comparsion.cc
      • BigQ uses priorty queues to sort the records with order made by ordermaker.
      • It maintains two pipes inputPipe and outputPipe. Data is fed into the input pipe which is sorted based on ** External Merge Sort Technique **
      • BigQs are used not only in the sorted data structure but also to perform Join and Group By operations.

There are few different basic operations that our database can perform Most of the operations in the database are paralized.

  • Join
  • Select from File
  • Select from a pipe - if my db is already in memory.
  • Group By
  • Sum over a column
  • Project - squezes the record to required columns by discarding others.
  • duplicate removal - removes duplicate records after sorting using BigQ.

Example Queries

SELECT ps.ps_partkey ,
ps.ps_suppkey ,
ps.ps_availqty ,
ps.ps_supplycost ,
ps.ps_comment
FROM partsupp AS ps WHERE (ps.ps_supplycost < 1.03)

SELECT SUM (l.l_extendedprice * (1 - l.l_discount)), l.l_orderkey, o.o_orderdate, o.o_shippriority
FROM customer AS c, orders AS o, lineitem AS l
WHERE (c.c_mktsegment = 'BUILDING') AND
(c.c_custkey = o.o_custkey) AND (l.l_orderkey = o.o_orderkey) AND
(l.l_orderkey < 100 OR o.o_orderkey < 100)
GROUP BY l_orderkey, o_orderdate, o_shippriority

Key Words

Keywords Keywords
SELECT GROUP
DISTINCT BY
FROM WHERE
SUM AS
AND OR
CREATE TABLE
HEAP SORTED
INSERT INTO
DROP SET
OUTPUT STDOUT
NONE ON

Data Types supported

  • INTEGER
  • STRING
  • DOUBLE

We tested our code using 1 Gb TPCH-C data set.

Required dependencies

  • The project is run on C++11 clang compiler
  • It requires flex lexer and bison parser. To install these in linux
sudo apt install clang flex bison

To Start Querying

Download the tpch data with the following commands.

cd settings/tpch-dbgen-master
make

to genetrate 10Mb data

./dbgen -s 0.01

to generate 1Gb data

./dbgen -s 1
mv *.tbl  ../../data

This project is tested with 1 Gb data.

To run the queries -

make simpleDb
./bin/simpleDb

You will be prompted to enter queries to execute.

There are some sample queires in doc/create-insert-queries.txt and doc/QueriesToExecute.

Create and Insert queries

Below is the gif to demo the create and insert queries -

Sample execution

We provided the flexibility of storing the returned query records in a file or output them into the screen.

SET OUTPUT <file_name> 

.....dumps the records into <file_name>

  • Make sure the empty file is already present in the current working directory.

Working of Sum operation


Limitations and known issues

  • All key words are case sensitive. You cannot go back and edit the query. This is the limitation from bison parser which we are working on fixing it.
  • There are few seg faults due to Join operation.
  • Although I made sure all operations are done in parallel, Join operation take a lot more time than any other operation. So, We let the join thread complete before starting other processes.
  • Output records to a file need to be created before Seting output to the file or else it is creating a seg fault

If there are segmentation faults -

  • Make sure your data and bin file paths are correct.
  • Make sure catalog file is not deleted.
  • Do not delete bin files. Drop them through the queries.
  • Clean the project and run again.

Any improvements are welcome.

simple-database-engine's People

Contributors

gauravsingh90 avatar sainath09 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

gauravsingh90

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.