Git Product home page Git Product logo

test_db's Introduction

test_db

A sample database with an integrated test suite, used to test your applications and database servers

This repository was migrated from Launchpad.

See usage in the MySQL docs

Where it comes from

The original data was created by Fusheng Wang and Carlo Zaniolo at Siemens Corporate Research. The data is in XML format. http://timecenter.cs.aau.dk/software.htm

Giuseppe Maxia made the relational schema and Patrick Crews exported the data in relational format.

The database contains about 300,000 employee records with 2.8 million salary entries. The export data is 167 MB, which is not huge, but heavy enough to be non-trivial for testing.

The data was generated, and as such there are inconsistencies and subtle problems. Rather than removing them, we decided to leave the contents untouched, and use these issues as data cleaning exercises.

Prerequisites

You need a MySQL database server (5.0+) and run the commands below through a user that has the following privileges:

SELECT, INSERT, UPDATE, DELETE, 
CREATE, DROP, RELOAD, REFERENCES, 
INDEX, ALTER, SHOW DATABASES, 
CREATE TEMPORARY TABLES, 
LOCK TABLES, EXECUTE, CREATE VIEW

Installation:

  1. Download the repository
  2. Change directory to the repository

Then run

mysql < employees.sql

If you want to install with two large partitioned tables, run

mysql < employees_partitioned.sql

Testing the installation

After installing, you can run one of the following

mysql -t < test_employees_md5.sql
# OR
mysql -t < test_employees_sha.sql

For example:

mysql  -t < test_employees_md5.sql
+----------------------+
| INFO                 |
+----------------------+
| TESTING INSTALLATION |
+----------------------+
+--------------+------------------+----------------------------------+
| table_name   | expected_records | expected_crc                     |
+--------------+------------------+----------------------------------+
| employees    |           300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments  |                9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager |               24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp     |           331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles       |           443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries     |          2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
+--------------+------------------+----------------------------------+
| table_name   | found_records    | found_crc                        |
+--------------+------------------+----------------------------------+
| employees    |           300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments  |                9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager |               24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp     |           331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles       |           443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries     |          2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
+--------------+---------------+-----------+
| table_name   | records_match | crc_match |
+--------------+---------------+-----------+
| employees    | OK            | ok        |
| departments  | OK            | ok        |
| dept_manager | OK            | ok        |
| dept_emp     | OK            | ok        |
| titles       | OK            | ok        |
| salaries     | OK            | ok        |
+--------------+---------------+-----------+

DISCLAIMER

To the best of my knowledge, this data is fabricated and it does not correspond to real people. Any similarity to existing people is purely coincidental.

LICENSE

This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

test_db's People

Contributors

datacharmer avatar tedwa5 avatar vaibhawvipul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

test_db's Issues

specifying column orders for .dump files

Thank you for providing us with such nice database sample. I appreciate it.

While I was following the tutorial from the oracle mysql-connector-python, I had trouble executing the load_dept_manager.dump for the order of columns didn't match with the table I've made .

I fixed it with specifying the column order by changing into

INSERT INTO dept_manager(emp_no, dept_no, from_date, to_date) VALUES
(110022,'d001','1985-01-01','1991-10-01'),
...

and it worked well.

I know its really a minor problem that most will be able to handle, but I think it could help a lot for those who might be even newbier than I am.

Thank you.

Replacing the deprected 'source' in employees.sql

My understanding is that source has been deprecated for some years.

However, https://github.com/datacharmer/test_db/blob/master/employees.sql includes:
SELECT 'LOADING departments' as 'INFO';
source load_departments.dump ;

In Workbench source is underlined in red and gives the error: "source is not valid at this position"

I can run the different dump files individually but these scripts should not include an error/bug.

Is there an alternative to using source,

Bizarrely this was reported over 2 years ago: https://forums.mysql.com/read.php?10,685467,685502

Script execution failed

When I execute the following commands at the CentOS terminal:
mysql -uroot -proot -t <test_db-master/employees.sql
Terminal prompt:
ERROR at line 113: Failed to open file 'load_departments.dump', error: 2

Resest

int ndb_mgm_restart
(
NdbMgmHandle handle,
int number,
const int* list
)

What is mean of more than one primary key in table

As I know primary key uniquely identify each row of tables and primary key can't duplicate (not repeated) but I can see in in titles and salaries table more than one primary key and second primary key is duplicate(I high lighted in bold). Is this foreign key or else?

mysql> USE employees;
Database changed
mysql> SHOW TABLES;
+----------------------+
| Tables_in_employees |
+----------------------+
| current_dept_emp |
| departments |
| dept_emp |
| dept_emp_latest_date |
| dept_manager |
| employees |
| salaries |
| titles |
+----------------------+
8 rows in set (0.00 sec)

mysql> DESCRIBE titles;
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| title | varchar(50) | NO | PRI | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | YES | | NULL | |
+-----------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM titles limit 10;
+--------+-----------------+------------+------------+
| emp_no | title | from_date | to_date |
+--------+-----------------+------------+------------+
| 10001 | Senior Engineer | 1986-06-26 | 9999-01-01 |
| 10002 | Staff | 1996-08-03 | 9999-01-01 |
| 10003 | Senior Engineer | 1995-12-03 | 9999-01-01 |
| 10004 | Engineer | 1986-12-01 | 1995-12-01 |
| 10004 | Senior Engineer | 1995-12-01 | 9999-01-01 |
| 10005 | Senior Staff | 1996-09-12 | 9999-01-01 |
| 10005 | Staff | 1989-09-12 | 1996-09-12 |
| 10006 | Senior Engineer | 1990-08-05 | 9999-01-01 |
| 10007 | Senior Staff | 1996-02-11 | 9999-01-01 |
| 10007 | Staff | 1989-02-10 | 1996-02-11 |
+--------+-----------------+------------+------------+

mysql> DESCRIBE salaries;
+-----------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------+------+-----+---------+-------+
| emp_no | int | NO | PRI | NULL | |
| salary | int | NO | | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | NO | | NULL | |
+-----------+------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM salaries limit 10;
+--------+--------+------------+------------+
| emp_no | salary | from_date | to_date |
+--------+--------+------------+------------+
| 10001 | 60117 | 1986-06-26 | 1987-06-26 |
| 10001 | 62102 | 1987-06-26 | 1988-06-25 |
| 10001 | 66074 | 1988-06-25 | 1989-06-25 |
| 10001 | 66596 | 1989-06-25 | 1990-06-25 |
| 10001 | 66961 | 1990-06-25 | 1991-06-25 |
| 10001 | 71046 | 1991-06-25 | 1992-06-24 |
| 10001 | 74333 | 1992-06-24 | 1993-06-24 |
| 10001 | 75286 | 1993-06-24 | 1994-06-24 |
| 10001 | 75994 | 1994-06-24 | 1995-06-24 |
| 10001 | 76884 | 1995-06-24 | 1996-06-23 |
+--------+--------+------------+------------+
10 rows in set (0.00 sec)

primary keys for dept_manager and dept_emp are too strict

In theory, Joe could work in Marketing from 1995 to 1998, and then in Sales from 1998 until 2000, and then in Marketing again. This may not actually occur in the sample data set. But it cannot be modeled in dept_emp if the primary key is (emp_no,dept_no).

If Joe is the department manager, this problem also applies to the dept_manager table.

A looser constraint is that, on any given date, a department has only one manager, and an employee has only one job. To model this, you would need a "temporal range" as a data type that can be indexed, or at least a SQL CHECK that date ranges do not overlap.

But using (emp_no, from_date) as primary key might be a better way to express the constraint. Maybe it would also allow the primary ordered index to be used in evaluating the views.

im a newbee

thanks for help: how can i change the directory to the repository?

Hitting an error during the sql script run

08:44:34 set storage_engine = InnoDB Error Code: 1193. Unknown system variable 'storage_engine' 0.000 sec

When I run the 'employees.sql' I am getting this error. This is happening to all of the scripts. I am using MySql workbench the latest.

How do I make it work?

I am getting this warning as well:

action: DROP TABLE IF EXISTS dept_emp, dept_manager, titles, salaries, employees, departments

message:
0 row(s) affected, 6 warning(s): 1051 Unknown table 'employees.dept_emp' 1051 Unknown table 'employees.dept_manager' 1051 Unknown table 'employees.titles' 1051 Unknown table 'employees.salaries' 1051 Unknown table 'employees.employees' 1051 Unknown table 'employees.departments'

Would please tell me why is it happening? I am a novice learner. So, I do not know many details.
Thanks.

Illegal mix of collations

Data load into database went fine, but while testing installation (mysql -t < test_employees_md5.sql) I get this:

| INFO                 |
+----------------------+
| TESTING INSTALLATION |
+----------------------+
+--------------+------------------+------------------------------------------+
| table_name   | expected_records | expected_crc                             |
+--------------+------------------+------------------------------------------+
| employees    |           300024 | 4d4aa689914d8fd41db7e45c2168e7dcb9697359 |
| departments  |                9 | 4b315afa0e35ca6649df897b958345bcb3d2b764 |
| dept_manager |               24 | 9687a7d6f93ca8847388a42a6d8d93982a841c6c |
| dept_emp     |           331603 | d95ab9fe07df0865f592574b3b33b9c741d9fd1b |
| titles       |           443308 | d12d5f746b88f07e69b9e36675b6067abb01b60e |
| salaries     |          2844047 | b5a1785c27d75e33a4173aaa22ccf41ebd7d4a9f |
+--------------+------------------+------------------------------------------+
ERROR 1271 (HY000) at line 62: Illegal mix of collations for operation 'concat_ws'

Unknown storage engine 'blackhole' , [windows10 xampp]

E:\git_programs\test_db (master -> origin)
λ mysql -u root -p -t < test_employees_md5.sql
Enter password:
+----------------------+
| INFO |
+----------------------+
| TESTING INSTALLATION |
+----------------------+
+--------------+------------------+----------------------------------+
| table_name | expected_records | expected_crc |
+--------------+------------------+----------------------------------+
| employees | 300024 | 4ec56ab5ba37218d187cf6ab09ce1aa1 |
| departments | 9 | d1af5e170d2d1591d776d5638d71fc5f |
| dept_manager | 24 | 8720e2f0853ac9096b689c14664f847e |
| dept_emp | 331603 | ccf6fe516f990bdaa49713fc478701b7 |
| titles | 443308 | bfa016c472df68e70a03facafa1bc0a8 |
| salaries | 2844047 | fd220654e95aea1b169624ffe3fca934 |
+--------------+------------------+----------------------------------+
ERROR 1286 (42000) at line 58: Unknown storage engine 'blackhole'

E:\git_programs\test_db (master -> origin)

Add sentinel tables to facilitate speed measurement

Adding a "sentinel" table that will allow for a quick calculation of performance speed, and also a quick evaluation of replication efficiency.
There should be an empty "first_table" before any other operation on the schema, and a "last_table" after all the data has been loaded.
The method is clearly explained in the article Quick benchmarking trick

Loading employees.sql requires undocumented RELOAD privilege

employees.sql includes the line flush /*!50503 binary */ logs;. FLUSH LOGS requires the RELOAD privilege, which means that a user with normal privileges, following the instructions in README.md, will be unable to load the database.

Solutions:

  1. Get rid of the flush logs line. Why is it there in the first place? What's the point of flushing logs as part of an unrelated data import operation?
  2. Add -f to invocations of the mysql command-line client so that it will continue executing even if an error is encountered.
  3. Document the need for the RELOAD privilege in README.md.

Access/permissions rights denied

After I run:

$ mysql < employees.sql

I get this error:

ERROR 1044 (42000) at line 25: Access denied for user ''@'localhost' to database 'employees'

Inserting data on employees table

When making employee auto_increment. I do get error such as `cannot change column 'emp_no': used in a foreign key constraint 'dept_emp_ibfk_1' of table employees.dept_emp'

Do I have to manually specify the emp_no to be able to correctly insert data?

Also when trying to insert data I get error such as "Duplicate entry "11001" for key 'Primary'

Query

INSERT INTO employees 
VALUES 
(11001,
'1953-09-02',
'Oliver',
'Maglana',
'M',
'1994-09-15',
'2020-05-29 23:41:26');

Execution stoped

Hi,
When I try to create the database the script stops showing the following message

ERROR at line 113: Failed to open file 'load_departments.dump', error: 2

I'm following the steps from README.md, and when I call mysql < employees.sql I see the error.
I'm using an Ubuntu 14.04, the vagrant box is minimal/trusty64. And the mysql is the default, it's installed following next commands

sudo apt-get update
sudo apt-get install mysql-server
sudo mysql_secure_installation
sudo mysql_install_db

So, the version is:

mysql  Ver 14.14 Distrib 5.5.61, for debian-linux-gnu (x86_64) using readline 6.3

Thanks

Error loading source dump file.

16:22:12 source load_departments.dump Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'source load_departments.dump' at line 1 0.000 sec

Now I have this issue. Any guidance please? But of course, I am using the codes I have downloaded right from this source.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.