Git Product home page Git Product logo

database-ranking's Introduction

Database Ranking

2022-04-25 Database Ranking - 16-9

Link: https://benchant.com/ranking/database-ranking

Motivation

"Data is the new Oil". Data and data processing is one of the central IT topics of the 2020s. Applications in the areas of IoT, Industry 4.0, machine learning, AI, eCommerce, social media, etc. generate and process huge amounts of data. For this reason, there are now over 600 different database management systems with a wide variety of data structures and operating modes. Each database management system has its individual specialization and suitability here.

It is not about which database is the best or the most popular. It is about which database can provide the best performance in which scenario.

Decisions should not be made solely on the basis of popularity, features and (especially not) data structures, but above all by including reliable data on performance and scalability.

Database performance problems, inefficient solutions and over-sized computing power ("Kill it with iron!") must be a thing of the past. In the future, IT applications must be properly scaled and equipped with the best possible technology in order to be efficient and competitive.

This database ranking serves as a first orientation! We decided to make this data publicly to rise the awareness of the performance topic, starting discussions and making better decisions in the FUTURE!

"One accurate measurement is worth a thousand expert opinions.” - Grace M.Hopper

Call for Participation

The future size and content of the database ranking, i.e. the number of databases and workloads available, depends heavily on the dissemination and feedback from the IT community.

  • Please comment on the results.
  • Please share the ranking with your colleagues.
  • Please link the ranking in your posts, tweets, blog articles.
  • Please reach out to database vendors and ask them to participate.

And if you are interested in actively participating and descending into the world of performance engineering, please contact us!

Workloads

A central concept of the ranking is the focus on a wide variety of workloads that IT applications, their users or their components generate and that the database has to deal with.

In general, the following workload types are distinguished:

  • CRUD: Simple READ, WRITE, UPDATE and DELETE operations
  • OLTP: Transactional, complex operations of data processing
  • OLAP: Analytical batch processes
  • HTAP: Hybrid transactional and analytics processing
  • Time-Series: Time-series data with very simple, but high-frequency access patterns

In addition to the access pattern, other workload characteristics can have a major impact

  • Distribution of read/write/.. operations
  • Number of parallel accesses
  • Access pattern and caching
  • Size of the data sets
  • Total number of data sets

The load is generated using publicly available benchmark suites such as the Yahoo! Cloud Serving Benchmark Suite (YCSB). Currently, we have defined and integrated the following workloads:

  • CRUD: General Purpose (at 5 different scaling sizes) based on the YCSB.

The exact specifications can be found below the ranking in detail.

Databases

The following DBMS and DBaaS offerings are currently (see status) included (categorized and sorted alphabetically):

SQL

  • AWS RDS for PostgreSQL (DBaaS)
  • Azure Database for PostgreSQL (DBaaS)
  • Oracle MySQL (community)
  • PostgreSQL (open-source)

NoSQL

  • Apache Cassandra (open source)
  • Couchbase (community)
  • MongoDB (community)

NewSQL

  • CockroachDB (community)

Cloud

Database Performance KPIs may vary on different cloud systems. Please refer to the literature attached below. Therefore, measurements are performed on multiple cloud providers.

The following cloud providers are currently included:

  • AWS EC2
  • MS Azure

How it Works

The database benchmark measurements are performed using a science-based methodology and automated benchmarking technology.

For all measurements the Benchmarking-as-a-Service platform of benchANT is used. This is the technical product of Dr. Daniel Seybold and Dr. Jörg Domaschka and is based on more than 7 years of scientific research, as well as 2 years of industrial use.

Procedure of the benchmarking

The benchmarking platform of benchANT enables automated performance and scalability benchmarks of databases on cloud resources. Hereby, automatically

  1. allocates the cloud resources for the database instances
  2. the databases are installed and configured
  3. a separate cloud resource is allocated for workload execution
  4. load phase for initial test phase
  5. run phase with defined workload
  6. collection of measurement results
  7. testing of the measurement results
  8. preparation of the measurement results
  9. release of the cloud resources

This process takes 20-300 minutes depending on the size of the workload and can be performed multiple times per configuration for statistically provable results.

Scientific Literature

The scientific basis of the benchmarking method as well as many of the statements made above are based on findings from over 7 years of research.

Google Scholar of Dr. Daniel Seybold

The main research results and theoretical basis can be found in the following scientific papers:

The impact of the storage tier: A baseline performance analysis of containerized dbms (2018)
Mowgli: Finding your way in the dbms jungle (2019)
King Louie: DBMS Availability Evaluation Data Sets (2019)
Baloo: Measuring and modeling the performance configurations of distributed DBMS (2020)

License

The database ranking is licensed under the international license Creative Commons 4.0 share Attribution NonCommercial ShareAlike 4.0. You are thus free to copy and redistribute ("share") or remix, transform and build upon ("adapt") the material, subject to the following conditions:

  • Attribution/source attribution ("Name + Link").
  • Licensing of modified data under the same license ("Link")
  • Non-commercial use only

For commercial use of the data contact us please!

Adding more ...

... more databases/workloads/cloud

The goal is to measure and publish as many databases and clouds as possible for different workloads.

If you want a specific DBMS or DBaaS to be integrated, please contact the database provider.

If you are a DBMS vendor or DBaaS provider, please feel free to contact us via GitHub, or via email at [email protected].

... more rankings

Interested in going beyond databases? If so, get in touch via GitHub, or via email at [email protected].

database-ranking's People

Contributors

janorkan avatar jodoma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

database-ranking's Issues

m5 instances for ScyllaDB are not the recommended optimized instance family type

In all of your ScyllaDB benchmarks, you've used m5 instance type family.
(e.g https://github.com/benchANT/database-ranking/blob/main/CRUD-general-purpose/2022/ranking_batch-0_scylladb-451-vanilla-os-large-large-awsec2-2022_0_1NkNG04T/dbms_data_hardware_facts.json)

ScyllaDB recommends to use "Storage Optimized" instance types (as you used in your blog Mongo vs Cassandra).
i3, i3en or the new i4i instances are much more suitable option to benchmark ScyllaDB with.

How can I submit or request a benchmark report?

Hi here!

I'm working at @GreptimeTeam. We built a time-series database GreptimeDB and a cloud service GreptimeCloud.

We did benchmark ourselves on tsbs and wonder how we can submit the result to benchANT or request you experts to do a benchmark on our product.

Looking forward to your feedback :D

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.