gitbase, is a SQL database interface to Git repositories.
It can be used to perform SQL queries about the Git history and about the Universal AST of the code itself. gitbase is being built to work on top of any number of git repositories.
gitbase implements the MySQL wire protocol, it can be accessed using any MySQL client or library from any language.
The project is currently in alpha stage, meaning it's still lacking performance in a number of cases but we are working hard on getting a performant system able to processes thousands of repositories in a single node. Stay tuned!
SELECT * FROM refs WHERE ref_name = 'HEAD'
SELECT * FROM (
SELECT COUNT(c.commit_hash) AS num, c.commit_hash
FROM refs r
INNER JOIN commits c
ON history_idx(r.commit_hash, c.commit_hash) >= 0
GROUP BY c.commit_hash
) t WHERE num > 1
SELECT COUNT(c.commit_hash), c.commit_hash
FROM refs r
INNER JOIN commits c
ON r.ref_name = 'HEAD' AND history_idx(r.commit_hash, c.commit_hash) >= 0
INNER JOIN blobs b
ON commit_has_blob(c.commit_hash, b.commit_hash)
GROUP BY c.commit_hash
SELECT COUNT(*) as num_commits, month, repo_id, committer_email
FROM (
SELECT
MONTH(committer_when) as month,
r.repository_id as repo_id,
committer_email
FROM repositories r
INNER JOIN refs
ON refs.repository_id = r.repository_id AND refs.ref_name = 'HEAD'
INNER JOIN commits c
ON YEAR(committer_when) = 2015 AND history_idx(refs.commit_hash, c.commit_hash) >= 0
) as t
GROUP BY committer_email, month, repo_id
Check the Release page to download the gitbase binary.
Because gitbase uses bblfsh's client-go, which uses cgo, you need to install some dependencies by hand instead of just using go get
.
go get github.com/src-d/gitbase/...
cd $GOPATH/src/github.com/src-d/gitbase
make dependencies
Usage:
gitbase [OPTIONS] <server | version>
Help Options:
-h, --help Show this help message
Available commands:
server Start SQL server.
version Show the version information.
You can start a server by providing a path which contains multiple git repositories /path/to/repositories
with this command:
$ gitbase server -v -g /path/to/repositories
A MySQL client is needed to connect to the server. For example:
$ mysql -q -u root -h 127.0.0.1
MySQL [(none)]> SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2;
SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2;
+------------------------------------------+---------------------+-----------------------+
| commit_hash | commit_author_email | commit_author_name |
+------------------------------------------+---------------------+-----------------------+
| 003dc36e0067b25333cb5d3a5ccc31fd028a1c83 | [email protected] | Santiago M. Mola |
| 01ace9e4d144aaeb50eb630fed993375609bcf55 | [email protected] | Antonio Navarro Perez |
+------------------------------------------+---------------------+-----------------------+
2 rows in set (0.01 sec)
Name | Description |
---|---|
BBLFSH_ENDPOINT |
bblfshd endpoint, default "127.0.0.1:9432" |
GITBASE_BLOBS_MAX_SIZE |
maximum blob size to return in MiB, default 5 MiB |
GITBASE_BLOBS_ALLOW_BINARY |
enable retrieval of binary blobs, default false |
GITBASE_UNSTABLE_SQUASH_ENABLE |
UNSTABLE check Unstable features |
GITBASE_SKIP_GIT_ERRORS |
do not stop queries on git errors, default disabled |
You can execute the SHOW TABLES
statement to get a list of the available tables.
To get all the columns and types of a specific table, you can write DESCRIBE TABLE [tablename]
.
gitbase exposes the following tables:
Name | Columns |
---|---|
repositories | repository_id |
remotes | repository_id, remote_name, remote_push_url, remote_fetch_url, remote_push_refspec, remote_fetch_refspec |
commits | repository_id, commit_hash, commit_author_name, commit_author_email, commit_author_when, committer_name, committer_email, committer_when, commit_message, tree_hash |
blobs | repository_id, blob_hash, blob_size, blob_content |
refs | repository_id, ref_name, commit_hash |
tree_entries | repository_id, tree_hash, blob_hash, tree_entry_mode, tree_entry_name |
references | repository_id, ref_name, commit_hash |
To make some common tasks easier for the user, there are some functions to interact with the previous mentioned tables:
Name | Description |
---|---|
commit_has_blob(commit_hash,blob_hash)bool | get if the specified commit contains the specified blob |
commit_has_tree(commit_hash,tree_hash)bool | get if the specified commit contains the specified tree |
history_idx(start_hash, target_hash)int | get the index of a commit in the history of another commit |
is_remote(reference_name)bool | check if the given reference name is from a remote one |
is_tag(reference_name)bool | check if the given reference name is a tag |
language(path, [blob])text | gets the language of a file given its path and the optional content of the file |
uast(blob, [lang, [xpath]])json_blob | returns an array of UAST nodes as blobs |
uast_xpath(json_blob, xpath) | performs an XPath query over the given UAST nodes |
- Table squashing: there is an optimization that collects inner joins between tables with a set of supported conditions and converts them into a single node that retrieves the data in chained steps (getting first the commits and then the blobs of every commit instead of joinin all commits and all blobs, for example). It can be enabled with the environment variable
GITBASE_UNSTABLE_SQUASH_ENABLE
.
Apache License Version 2.0, see LICENSE