Git Product home page Git Product logo

dw-query-digest's Introduction

dw-query-digest

Alternative to pt-query-digest.

Go Report Card MIT Licence

Purpose

dw-query-digest reads slow query logs files and extracts a number of statistics. It can list top queries sorted by specific metrics.

It is reasonably fast and can process ~450k slow-log lines per second.

A 470MB slow query log containing 10M lines took approx 150s with pt-query-digest and approx 23s with dw-query-digest (6x faster).

dw-query-digest normalizes SQL queries so identical queries can be aggregated in a report.

See also https://github.com/devops-works/slowql for an alternative implementation conatianing a log replayer.

Usage

Binary

Grab binary releases on Github.

bin/dw-query-digest [options] <file>

Source

Requires Go 1.12+.

make
bin/dw-query-digest [options] <file>

Docker

To run using Docker:

cd where_your_slow_query_log_is
docker run -v $(pwd):/data devopsworks/dw-query-digest /data/slow-query.log

Options

  • --debug: show debugging information; this is very verbose, and meant for debugging
  • --progress: display a progress bar (disabled when reading from STDIN or using --follow)
  • --output <fmt>: produce report using fmt output plugin (default: terminal; see "Outputs" below)
  • --list-outputs: list possible output plugins
  • --quiet: display only the report (no log)
  • --reverse: reverse sort (i.e. lowest first)
  • --follow: follow log file (tail -F style)
  • --sort <string>: Sort key
    • time (default): sort by cumulative execution time
    • count: sort by query count
    • bytes: sort by query bytes size
    • lock[time]: sort by lock time (lock & locktime are synonyms)
    • [rows]sent: sort by rows sent (sent and rowssent are synonyms)
    • [rows]examined: sort by rows examined
    • [rows]affected: sort by rows affected
  • --top <int>: Top queries to display (default 20)
  • --nocache: Disables cache (writing & reading)
  • --version: Show version & exit

Outputs

The default output is "terminal".

terminal

Simple terminal output, designed to be read by humans.

greppable

CSV format output designed to be used in combination with grep. In this format, the two first lines are:

  • meta information (server version, duration, etc...)
  • columns header

Both lines start with #, so if you only want queries, you can filter them using dw-query-digest ... | grep -v '^#'.

Column headers are prefixed by a number. You can directly use this number with cut if you only need a specific column. For instance, if I just want the max time for a query, and since the column header is 16_Max(s), I can use:

$ dw-query-digest ... | grep -v '^#' | cut -f16 -d';'
2.378111
0.243685
0.335063
0.715469

json

Pretty-printed JSON output containing general information in the meta key and queries informations in the stats key. Note that the keys layout are subject to change.

null

Does not output anything, everything goes to /dev/null. This can be used for benchmarking, to prime cache, and whatnot.

Cache

When run against a file, dw-query-digest will try to find a cache file having the same name and ending with .cache. For instance, if you invoke:

dw-query-digest -top 1 dbfoo-slow.log

then dw-query-digest will try to find dbfoo-slow.log.cache. If the cache file is found, the original file is not touched and results are read from the cache. This lets you rerun the command if needed without having to re-analyze the whole original file.

Since all results are cached, you can use different paramaters. For instance, --top, --sort or output can be different and will (hopefully) work.

If you don't want to read or write from/to the cache at all, you can use the ``--nocache` option. You can also remove the file anytime.

If the analyzed file is newer than it's cache, the cache will not be used.

Cache format is not guaranteed to work between different versions.

Continuous reading

There is an alpha support for ever growing files when --follow is set. It should support file rotation & truncation too.

Testing continuous reading

Assuming you have docker and dw-query-digest in your PATH, you can quickly test continuous reading like so:

# Prepare log dir & database
mkdir test
chmod 777 test
docker run --name mysql-test \
    -p 3307:3306 \
    -v $(pwd)/test/:/log \
    -e MYSQL_ALLOW_EMPTY_PASSWORD=true \
    -d mysql:5.7.19 \
    --slow-query-log=ON \
    --slow-query-log-file=/log/slow.log \
    --long-query-time=0

docker exec -ti mysql-test mysqladmin create test || echo 'Unable to create database; mysql not ready ?'
docker exec -ti mysql-test mysql -e 'create user sbtest;' || echo 'Unable to create user; mysql not ready ?'
docker exec -ti mysql-test mysql -e 'grant all privileges on *.* to sbtest@`%`;'|| echo 'Unable to grant user; mysql not ready ?'
docker exec -ti mysql-test mysql -e 'grant all on *.* to root@`%`;' || echo 'Unable to grant root; mysql not ready ?'
docker exec -ti mysql-test mysql -e 'flush privileges;'
docker exec -ti mysql-test mysql -e 'set global slow_query_log="ON";'

# Run this in another terminal
dw-query-digest -top 10 -refresh 1000 --follow test/slow.log

# Then back to previous terminal
docker run \
--rm=true \
--name=sb-prepare \
--link mysql-test:mysql \
severalnines/sysbench \
sysbench \
--db-driver=mysql --table-size=1000000 --mysql-db=test \
--mysql-user=sbtest --mysql-port=3306 --mysql-host=mysql \
oltp_read_write prepare

docker run \
--rm=true \
--name=sb-run \
--link mysql-test:mysql \
severalnines/sysbench \
sysbench \
--db-driver=mysql --table-size=1000000 --mysql-db=test \
--mysql-user=sbtest --mysql-port=3306 --mysql-host=mysql \
--max-requests=0 --threads=8 --time=60 \
oltp_read_write run

# Clean all the things when done
docker run \
--rm=true \
--name=sb-run \
--link mysql-test:mysql \
severalnines/sysbench \
sysbench \
--db-driver=mysql --table-size=1000000 --mysql-db=test \
--mysql-user=sbtest --mysql-port=3306 --mysql-host=mysql \
oltp_read_write cleanup

docker stop mysql-test && docker rm mysql-test

Caveats

Some corners have been cut regarding query normalization. So YMMV regarding aggregations.

Also, dw-query-digest does not support reading a header in the middle of a file (and will crash with a panic). This can happen if you FLUSH LOGS during a capture.

Contributing

If you spot something missing, or have a slow query log that is not parsed correctly, please open an issue and attach the log file.

Comments, criticisms, issues & pull requests welcome.

Roadmap

  • cache
  • tail -f reading (disable linecount !) with periodic reporting (in a TUI ?)
  • internal statistics (logs lines/s, queries/s, ...)
  • web live output
  • pt-query-digest output ?
  • UDP json streamed output (no stats) for filebeat/logstash/graylog ?

Licence

MIT

dw-query-digest's People

Contributors

glaslos avatar leucos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dw-query-digest's Issues

Feature Request: Beautify Fingerprint SQL

I would find it easier to read if the SQL Fingerprint if it would at least UPPERCASE keywords.

I am not sure if it should add new lines or not (not sure how that would look)

Add golden file tests

Add golden files for MAriaDB, Percona & MySQL slow logs in various versions.

See #3 which was caused by MariaDB alternate format.

panic: runtime error: index out of range

./dw-query-digest-amd64 -debug /data/database/db-slow.log
INFO[0000] using "/data/database/db-slow.log" as input file
INFO[0000] using "terminal" output
ERRO[0000] cachefile /data/database/db-slow.log not found: stat /data/database/db-slow.log.cache: no such file or directory
INFO[0000] file has 5933 lines
ERRO[0000] error reading log header: unable to parse server information; beginning of log might be missing
DEBU[0000] line (7) will fold after SELECT
DEBU[0000] line (8) will fold after
DEBU[0000] line (9) will fold after
DEBU[0000] line (10) will fold after
DEBU[0000] line (11) will fold after
DEBU[0000] line (12) will fold after
DEBU[0000] worker exiting
panic: runtime error: index out of range

goroutine 6 [running]:
main.fileReader(0xc000019cf0, 0x5aa9a0, 0xc00000e0a0, 0xc0000569c0, 0x172d)
	/home/leucos/dev/perso/projects/dw-query-digest/main.go:456 +0xaa0
created by main.main
	/home/leucos/dev/perso/projects/dw-query-digest/main.go:303 +0xafa
[root@db ~]# head -n15 /data/database/db-slow.log
/usr/libexec/mysqld, Version: 5.5.60-MariaDB (MariaDB Server). started with:
Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
Time                 Id Command    Argument
# Time: 190603 23:14:02
# User@Host: user[user] @  [172.22.0.10]
# Thread_id: 3  Schema: thedb  QC_hit: No
# Query_time: 0.167214  Lock_time: 0.166911  Rows_sent: 1  Rows_examined: 69
use thedb;
SET timestamp=1559618042;
SELECT
                id, password
            FROM
                Users
            WHERE
                username = 'olntwrdorwvtor'
                AND user_type != '14';
# User@Host: user[user] @  [172.22.0.10]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.