Deprecation note:
Pilosa was rebranded to FeatureBase, which comes with its own SQL support.
Tools like Looker can solve the problem of joining tables from different sources.
This project is no longer actively maintained.
Plugin for Apache Calcite to query Pilosa distributed index using SQL.
Pilosa is a high-performance distributed bitmap index. It has been successfully used in many data-intensive projects. One of the notable applications is the facts table in the analytical database. As the core of the analytical database, Pilosa solves the computational problem efficiently. Calcite Pilosa Adapter solves the problem of linking the Pilosa table with supplementary dimension tables from other databases, like Postgres, making Pilosa a powerful exploration tool for business intelligence.
The Calcite Pilosa adapter works as a proxy. Clients connect with JDBC-compatible drivers and query the data with SQL, adapter translates queries into Pilosa Query language (PQL), and then translates results into JDBC ResultSet to send them back to the client.
<dependency>
<groupId>com.alexrnv.calcite.adapter.pilosa</groupId>
<artifactId>calcite-pilosa</artifactId>
<version>0.0.1</version>
</dependency>
note the artifact is hosted in GitHub Packages
1.) Start from the configuration:
{
"version": "1.0",
"defaultSchema": "pilosa",
"schemas": [
{
"name": "pilosa-cluster",
"type": "custom",
"factory": "com.alexrnv.calcite.adapter.pilosa.model.PilosaSchemaFactory",
"operand": {
"url": "http://localhost:10101"
}
}
]
}
Provide your Pilosa server endpoint.
2.) The following code snippet starts a JDBC server inside your service.
LocalService service = new PilosaServiceFactory(modelFileUri).createLocalService();
HttpServer server = new PilosaHttpServerFactory(service, serverPort).createHttpServer();
server.start();
try {
server.join();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} finally {
server.stop();
}
Your service is now listening for JDBC connections.
3.) Connect from your favourite JDBC client.
Clients should use Avatica JDBC driver (mvn). Use version 1.14.0 or later.
jdbc:avatica:remote:url=http://<host>:<port>>/sql/v1
4.) Run your analysis with SQL
select
count(distinct facts._id)
from
pilosa.facts_table facts
join
postgres.dimension_table dimension
on dimension.X = facts.X
where
dimension.Y = [value]
Please, check wiki for examples and available options.