spotify / heroic Goto Github PK
View Code? Open in Web Editor NEWThe Heroic Time Series Database
Home Page: https://spotify.github.io/heroic/
License: Apache License 2.0
The Heroic Time Series Database
Home Page: https://spotify.github.io/heroic/
License: Apache License 2.0
RPC calls including these require them to be de-serializable. Without the @JsonCreator
method, jackson can't figure out how to build instances.
poke @juruen
We have been testing the query/metrics
API and we had some confusing results. The biggest problem that we have found is that the end of ranges changes based on the length of the range. We believe this inconsistency can potentially affect usability, ease of consumption of the APIs by frontends and why not, unit testing.
Here's the procedure that we have followed to put this in evidence:
make a call to the query/metrics
API with a relative range of 1 second and extract the latest time in milliseconds. We are aware this is an approximation and the response latency between this call and the next call make it inaccurate. At this point we just want to get a lower bound of the heroic server date since we are going to be using calls with "relative" ranges.
immediately make a call to the /write
API to push two data points, one with the time obtained in the previous step and another data point one second earlier
make several calls to the query/metrics
API using an absolute range of +_ 1500 milliseconds, and relative ranges of 5 seconds, 1 day and 3 days. We have noticed in the responses to these calls:
We believe the reason for this is the logic that is implemented in the buildShiftedRange() method. I do believe that logic might apply in many situations, a perfect example could be when you want to aggregate data on time slots, and data will only make sense if all slots are of the same size, but when you do not need to aggregate data based on time and just want to return data points, or your aggregation is, say for example, based on device type, then the logic of shifting ranges does cause unexpected results to be returned. To test this, we used this bash script (you'll need jq
installed on your machine to be able to test this. Also, all calls are made to localhost:8080 , so you might need to update accordingly)
Start heroic via java -cp heroic.jar -Dlog4j.configurationFile=config/log4j2-file.xml com.spotify.heroic.HeroicService heroic.yaml.
Then, on the console, we get a constant stream of the following messages:
Jun 22, 2016 7:29:05 PM io.grpc.internal.TransportSet$TransportListener transportTerminated
INFO: Transport io.grpc.netty.NettyClientTransport@d6179e1(localhost/127.0.0.1:0) for localhost/127.0.0.1:0 is terminated
Jun 22, 2016 7:30:05 PM io.grpc.internal.TransportSet$1 call
INFO: Created transport io.grpc.netty.NettyClientTransport@5405e612(localhost/127.0.0.1:0) for localhost/127.0.0.1:0
Jun 22, 2016 7:30:05 PM io.grpc.internal.TransportSet$TransportListener transportShutdown
INFO: Transport io.grpc.netty.NettyClientTransport@5405e612(localhost/127.0.0.1:0) for localhost/127.0.0.1:0 is being shutdown
Jun 22, 2016 7:30:05 PM io.grpc.internal.TransportSet$TransportListener transportTerminated
INFO: Transport io.grpc.netty.NettyClientTransport@5405e612(localhost/127.0.0.1:0) for localhost/127.0.0.1:0 is terminated
Even, setting io.grpc to ERROR in the log4j2.xml file does not fix this issue.
Here is an example of the stack trace printed by Heroic when trying to access a non-existing ES index in the suggest code path:
2016-03-03 09:12:48,387 ERROR c.s.h.r.n.NativeRpcServerSession [elasticsearch[SomeNode][transport_client_worker][T#8]{New I/O worker #17}] [id: 0x3cb82c63, /0.0.0.0:53840 => /0.0.0.0:1394]: request failed java.lang.Exception: 1 exception(s) caught: error in transform
at eu.toolchain.async.TinyThrowableUtils.buildCollectedException(TinyThrowableUtils.java:23)
at eu.toolchain.async.helper.CollectHelper.done(CollectHelper.java:142)
at eu.toolchain.async.helper.CollectHelper.add(CollectHelper.java:128)
at eu.toolchain.async.helper.CollectHelper.failed(CollectHelper.java:79)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at eu.toolchain.async.helper.ResolvedTransformHelper.resolved(ResolvedTransformHelper.java:26)
at eu.toolchain.async.DirectAsyncCaller.resolve(DirectAsyncCaller.java:10)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:221)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.resolve(ConcurrentResolvableFuture.java:97)
at com.spotify.heroic.elasticsearch.AbstractElasticsearchBackend$1.onResponse(AbstractElasticsearchBackend.java:43)
at org.elasticsearch.action.support.AbstractListenableActionFuture.executeListener(AbstractListenableActionFuture.java:120)
at org.elasticsearch.action.support.AbstractListenableActionFuture.done(AbstractListenableActionFuture.java:97)
at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:166)
at org.elasticsearch.action.support.AdapterActionFuture.onResponse(AdapterActionFuture.java:96)
at org.elasticsearch.client.transport.TransportClientNodesService$RetryListener.onResponse(TransportClientNodesService.java:234)
at org.elasticsearch.action.TransportActionNodeProxy$1.handleResponse(TransportActionNodeProxy.java:73)
at org.elasticsearch.action.TransportActionNodeProxy$1.handleResponse(TransportActionNodeProxy.java:57)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:163)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:132)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Suppressed: eu.toolchain.async.TransformException: error in transform
... 37 more
Caused by: java.lang.NullPointerException
at com.spotify.heroic.suggest.elasticsearch.SuggestBackendKV.lambda$null$3(SuggestBackendKV.java:294)
at eu.toolchain.async.helper.ResolvedTransformHelper.resolved(ResolvedTransformHelper.java:24)
... 36 more
29d83a7 removed of
as a parameter to filtering aggregations. This is an API regression and should be replaced with a check.
Hey.I found that some of the apis in document wouldn't work when using curl.
when i tested POST: /write
, the response was:
{"message":"Unexpected token (START_ARRAY), expected START_OBJECT","reason":"Bad Request","status":400,"path":"data","type":"json-error"}
then POST: /query/metrics
:
{"message":"Instantiation of [simple type, class com.spotify.heroic.QueryDateRange$Relative] value failed: value","reason":"Bad Request","status":400,"path":"range","type":"json-error"}
however I can write and query in shell.
Also , I could not find any api to delete metric in cassandra.Can we do that through api?
I build heroic-db (commit 54f4443) and the service is up and running using Cassandra as the back end (but no Elastic Search). The service is up and running where both GET /status and POST: /write do not return back errors.
However, the following two queries are returning back errors:
curl -H "Content-Type: application/json" http://localhost:8080/query/metrics
-d '{"range": {"type": "relative"}, "filter": ["and", ["key", "foo"], ["=", "foo", "bar"], ["+", "role"]], "groupBy": ["site"], "aggregation": []}'
This example is from https://spotify.github.io/heroic/#!/docs/api and the error message is the following:
{"message":"Unexpected token (END_ARRAY), expected VALUE_STRING: need JSON String that contains type id (for subtype of com.spotify.heroic.aggregation.Aggregation)","reason":"Bad Request","status":400,"path":"aggregation","type":"json-error"}
I tried changing the query instead to the following:
curl -X -H "Content-Type: application/json" http://localhost:8080/query/metrics
-d '{"range": {"type": "relative", "string": "MONTHS", }, "filter": ["and", ["key", "foo"], ["=", "foo", "bar"], ["+", "role"]], "groupBy": ["site"], "aggregation": []}'
And I am now getting this error message:
curl: (6) Could not resolve host: Content-Type; Unknown error
{"message":"HTTP 405 Method Not Allowed","reason":"Method Not Allowed","status":405,"type":"error"}
I am looking at the logs and I am not seeing anything there which indicates why the queries are failing.
Hello everyone,
I'm new to heroic and not sure how to approach this error.
I've configured the heroic API, using debian package built from git repo.
After the service starts I'm getting the following message every minute.
10:11:15.922 [heroic-scheduler#2] INFO com.spotify.heroic.cluster.CoreClusterManager - [new] grpc://131.x.x.x:8100 10:11:15.929 [nioEventLoopGroup-2-6] INFO com.spotify.heroic.cluster.CoreClusterManager - [refresh] no nodes discovered, including local node 10:11:15.929 [nioEventLoopGroup-2-6] INFO com.spotify.heroic.cluster.CoreClusterManager - [update] [{}] shards: 1 result(s), 1 failure(s) 10:11:15.930 [nioEventLoopGroup-2-5] WARN io.netty.handler.codec.http2.Http2ConnectionHandler - [id: 0x7ffeead0] Sending GOAWAY failed: lastStreamId '0', errorCode '2', debugData 'Connection refused: /131.x.x.x:8100'. Forcing shutdown of the connection. java.nio.channels.ClosedChannelException
Any ideas why ?
Hi,
I'm seeing this error quite often being returned from the Heroic HTTP API. Example:
{"message":"[heroic-1464825600000][1] [series][b5a903eb9726102bf3fb620392d26e00]: document already exists","reason":"Internal Server Error","status":500,"type":"internal-error"}
If I retry the request, it usually returns a 200 straight away. I am wondering if firing concurrent POST requests at the same time (for new metrics) is sending duplicate metadata creation requests to the Elasticsearch API?
In order to provide a more stable environment for Heroic clients it would be beneficial if QueryManager
supported the retrying of requests that are failing towards other nodes.
This hardens the system against conditions where access to the other peer becomes compromised, either because it suddenly shuts down or there is connectivity issues.
API documentation differs from implementation - actual data should be more deeply nested (data
has type
and actual data
containing points).
{
"series": {
"key": "foo",
"tags": {
"site": "lon",
"host": "www.example.com"
}},
"data" : {
"type" : "points",
"data": [
[
1300000000000,
42
],
[
1300001000000,
84
]
]
}
}```
A slowlog
is a log that prints information about queries which are slow.
It is typically used to figure out what is adding stress to the system, and is a feature that is required to find them.
@juruen has done some experimentation regarding this, so I'm assigning this issue to him.
I work on an education startup and content engagement is a great concern for us, I've been thinking how can I know if a given user has engaged with a video or an audio. Is it an arbitraty percentage of content consumed? How can I store the moments of content consumed, which moment has been consumed more than one time, etc.
Is this database or this category of database something I should be looking into to solve this kind of problem?
Currently there seems to be no way of specifying in the settings authentication credentials for the Cassandra DataStax backend and also you can't choose the keyspace to use or am I missing something?
Right now I worked around this issue by hardcoding both things in datastax/ManagedSetupConnection.java.
Seeing weird results when trying to do any kind of aggregation and downsample into a single value.
For example, we're trying to get the min/max/avg
and stddev
over a timerange of ~5 minutes for one metric. Looking at the raw data, the values range from [197, 199]
. In order to only get one value back, we're adding a sampling aggregation that looks like this
{
"range": {
"type": "absolute",
"start": 1479434197000,
"end": 1479434538000 // range = 341 seconds
},
"aggregation": {
"type": "spread",
"sampling": {
"size": "341s",
"extent": "341s"
}
},
"filter": [
"and",
[ "key", "foo.bar" ],
[ "=", "tag1", "baz" ],
[ "=", "tag2", "qux" ]
]
}
Now, the results I would expect to see, would be something like this:
min = 197
avg = (sum/count) = 198.123
max = 199
stddev = 0.123
But instead, it gives us seemingly random values back, like below.
min = 156
avg = (sum/count) = 187.123
max = 199
stddev = 17.123
Findings
stddev
also yields seemingly random values.min
aggregation.max
usually is correct.min
values until you reach a pretty small size/extent....
"aggregation": {
"type": "chain",
"chain": [
{
"type": "spread",
"sampling": {
"size": "341s",
"extent": "341s"
}
},
{
"type": "spread",
"sampling": {
"size": "1s"
"extent": "1s"
}
}
]
}
...
Perhaps I've misunderstood how the downsampling behaves, but can't really see what other options I have for getting the data.
In our installation we have preconfigured keyspaces with access to operations only inside that keyspace.
Unfortunately it seems that the datastax backend doesn't support such case. It would be great if it could detect that the keyspace is present and create only the tables.
Unfortunately "CREATE KEYSPACE IF NOT EXISTS" doesn't work, as it seems that the create keyspace permission is more important.
Exception:
16:01:29.844 [cluster1-nio-worker-5] INFO com.spotify.heroic.metric.datastax.schema.ng.NextGenSchema - Creating keyspace heroic
16:01:29.863 [main] ERROR com.spotify.heroic.HeroicService - Failed to start Heroic instance
java.util.concurrent.ExecutionException: eu.toolchain.async.TransformException: error in transform
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$Sync.get(ConcurrentResolvableFuture.java:527)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.get(ConcurrentResolvableFuture.java:304)
at com.spotify.heroic.HeroicService.main(HeroicService.java:106)
at com.spotify.heroic.HeroicService.main(HeroicService.java:59)
Caused by: eu.toolchain.async.TransformException: error in transform
at eu.toolchain.async.helper.ResolvedTransformHelper.resolved(ResolvedTransformHelper.java:26)
at eu.toolchain.async.DirectAsyncCaller.resolve(DirectAsyncCaller.java:10)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:221)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.resolve(ConcurrentResolvableFuture.java:97)
at com.spotify.heroic.HeroicCore$Instance.start(HeroicCore.java:866)
... 2 more
Caused by: java.util.concurrent.ExecutionException: java.lang.Exception: 1 exception(s) caught: User heroic_adm has no CREATE permission on <all keyspaces> or any of its parents
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$Sync.get(ConcurrentResolvableFuture.java:542)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.get(ConcurrentResolvableFuture.java:316)
at com.spotify.heroic.HeroicCore.awaitLifeCycles(HeroicCore.java:512)
at com.spotify.heroic.HeroicCore.startLifeCycles(HeroicCore.java:461)
at com.spotify.heroic.HeroicCore$Instance.lambda$new$128(HeroicCore.java:817)
at com.spotify.heroic.HeroicCore$Instance$$Lambda$112/1234586997.transform(Unknown Source)
at eu.toolchain.async.helper.ResolvedTransformHelper.resolved(ResolvedTransformHelper.java:24)
... 7 more
Caused by: java.lang.Exception: 1 exception(s) caught: User heroic_adm has no CREATE permission on <all keyspaces> or any of its parents
at eu.toolchain.async.TinyThrowableUtils.buildCollectedException(TinyThrowableUtils.java:23)
at eu.toolchain.async.helper.CollectHelper.done(CollectHelper.java:142)
at eu.toolchain.async.helper.CollectHelper.add(CollectHelper.java:128)
at eu.toolchain.async.helper.CollectHelper.cancelled(CollectHelper.java:85)
at eu.toolchain.async.DirectAsyncCaller.cancel(DirectAsyncCaller.java:28)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:217)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.cancel(ConcurrentResolvableFuture.java:115)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.cancel(ConcurrentResolvableFuture.java:121)
at eu.toolchain.async.helper.CollectHelper.checkFailed(CollectHelper.java:94)
at eu.toolchain.async.helper.CollectHelper.failed(CollectHelper.java:80)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at eu.toolchain.async.helper.ResolvedTransformHelper.failed(ResolvedTransformHelper.java:16)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at eu.toolchain.async.helper.ResolvedLazyTransformHelper.failed(ResolvedLazyTransformHelper.java:16)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at eu.toolchain.async.helper.ResolvedLazyTransformHelper$1.failed(ResolvedLazyTransformHelper.java:33)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at eu.toolchain.async.helper.ResolvedTransformHelper.failed(ResolvedTransformHelper.java:16)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at eu.toolchain.async.helper.ResolvedLazyTransformHelper$1.failed(ResolvedLazyTransformHelper.java:33)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at eu.toolchain.async.helper.ResolvedLazyTransformHelper.failed(ResolvedLazyTransformHelper.java:16)
at eu.toolchain.async.DirectAsyncCaller.fail(DirectAsyncCaller.java:19)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture$2.run(ConcurrentResolvableFuture.java:212)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.run(ConcurrentResolvableFuture.java:439)
at eu.toolchain.async.concurrent.ConcurrentResolvableFuture.fail(ConcurrentResolvableFuture.java:106)
at com.spotify.heroic.metric.datastax.Async$1.onFailure(Async.java:46)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1764)
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:456)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:817)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:753)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:634)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:149)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:183)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:751)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:573)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1009)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:932)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Suppressed: com.datastax.driver.core.exceptions.UnauthorizedException: User heroic_adm has no CREATE permission on <all keyspaces> or any of its parents
at com.datastax.driver.core.Responses$Error.asException(Responses.java:101)
... 25 more
It was discovered after implementing a more comprehensive test suite for metadata backends that or filtering appears to not work in V1 (soon to be deprecated). This should be fixed until V1 is actually deprecated.
gRPC has many of the characteristics useful for a RPC protocol in a high-latency environment, meaning it could be a suitable replacement for nativerpc
.
However, the following things are still concerns which must be addressed.
During certain queries the intra-cluster transport is seemingly corrupting data.
Example query in my case:
{
"range" : { "type": "relative", "unit": "DAYS", "value" : "365"},
"filter" : ["key", "my-metric-2-PT3M"],
"aggregation" : {"type" : "average", "sampling" : {"unit": "HOURS", "value" : "2"}}
}
If the query runs locally (ie. within current instance) it completes successfully:
{
"range": {
"start": 1416830400000,
"end": 1448373600000
},
"result": [
{
"type": "points",
"hash": "7fafe0fd",
"shard": {},
"cadence": 7200000,
"values": [
[
1416844800000,
2.768182805494971
],
[
1416852000000,
2.734896426939305
],
],
....
"statistics": {
"counters": {}
},
"errors": [],
"latencies": [],
"trace": {
"what": {
"name": "com.spotify.heroic.CoreQueryManager#query"
},
"elapsed": 331830371,
"children": [
{
"what": {
"name": "com.spotify.heroic.cluster.LocalClusterNode#query"
},
"elapsed": 331439004,
"children": [
{
"what": {
"name": "com.spotify.heroic.metric.LocalMetricManager#query"
},
"elapsed": 328125094,
"children": []
}
]
}
]
}
}
Response from cluster when querying against remote node:
{
"range": {
"start": 1416657600000,
"end": 1448373600000
},
"result": [
{
"type": "points",
"hash": "ae1a6628",
"shard": {},
"cadence": 7200000,
"values": [],
"tags": {},
"tagCounts": {}
}
],
"statistics": {
"counters": {}
},
"errors": [
{
"type": "node",
"nodeId": "80aa41b3-7c79-4fa3-a7f5-864600ff0b62",
"tags": {
"site": "bos"
},
"error": "Failed to handle response, caused by Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens\n at [Source: [B@4a0d15; line: 1, column: 1000] (through reference chain: com.spotify.heroic.metric.ResultGroups[\"groups\"]->java.util.ArrayList[0]->com.spotify.heroic.metric.ResultGroup[\"group\"]), caused by Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens\n at [Source: [B@4a0d15; line: 1, column: 1000]",
"internal": true,
"node": "nativerpc://198.18.157.86:1394"
}
],
"latencies": [],
"trace": {
"what": {
"name": "com.spotify.heroic.CoreQueryManager#query"
},
"elapsed": 325662602,
"children": [
{
"what": {
"name": "com.spotify.heroic.CoreQueryManager#query_node"
},
"elapsed": 0,
"children": []
}
]
}
}
It was discovered after implementing a more comprehensive test suite for metadata backends that findTags(...)
requests do not appear to work in KV.
Hi guys,
I am using the metadata-fetch
task from HeroicShell. I ran into some inconsistencies in how the filters are parsed that I believe are caused by how the HeroicQuery grammar is defined.
As an example, supposed these two keys exist: 4cf41d4b-a5d2-420e-b008-644f0b3d7832
and c89bf1ee-32ad-44c9-9641-5ee7fe3cac17
. If I run this command I can successfully get the series related to key c89bf1ee-32ad-44c9-9641-5ee7fe3cac17
:
metadata-fetch $key = c89bf1ee-32ad-44c9-9641-5ee7fe3cac17
However, if I run the same for the other key, I get a com.spotify.heroic.grammar.ParseException
:
metadata-fetch $key = 4cf41d4b-a5d2-420e-b008-644f0b3d7832
I found that you have to escape the whole expression to be able to make it work:
metadata-fetch '$key = "4cf41d4b-a5d2-420e-b008-644f0b3d7832"'
This problem only happens with expressions that begin with a numeric. I believe the problem is caused when this Rule is applied, expr
is matched first for an ExpressionInteger
and then instead of finding EOF
if finds the rest of the string. When quotes are applied it correctly matches QuotedString
. I know this can be worked around, but probably having consistency in the syntax would be good.
Regards,
Jorge
Hello,
I am trying to understand how to organize our metrics, and I was wondering if you can provide a little bit more info in the Docs about series. Specifically, what implications exist in terms of querying the data ( I am using the grafana heroic datasource plugin).
Also, when playing with the API I noticed an odd behavior:
First, add a series:
curl -s -X PUT -H "Content-Type: application/json" http://localhost:8080/metadata/series -d '{"key":"3175c2d7-3e93-4e30-9ca3-3bdf3b17baeb", "tags":{"meter":"cpu_util"}}' | python -m json.tool
{
"errors": [],
"times": [
5920
]
}
Second, verify the series
curl -s -X POST -H "Content-Type: application/json" http://localhost:8080/metadata/series -d '{"key":"3175c2d7-3e93-4e30-9ca3-3bdf3b17baeb"}' | python -m json.tool
{
"errors": [],
"limited": false,
"series": [
{
"key": "3175c2d7-3e93-4e30-9ca3-3bdf3b17baeb",
"tags": {
"meter": "cpu_util"
}
}
]
}
Third, delete the series, note deleted = 0
curl -s -X DELETE -H "Content-Type: application/json" http://localhost:8080/metadata/series -d '{"filter":["and", ["key","3175c2d7-3e93-4e30-9ca3-3bdf3b17baeb"],["=","meter","cpu_util"]]}' | python -m json.tool
{
"deleted": 0,
"errors": [],
"failed": 0
}
Fourth, even though "deleted":0, it was actually deleted, which is evident if you query again
curl -s -X POST -H "Content-Type: application/json" http://localhost:8080/metadata/series -d '{"key":"3175c2d7-3e93-4e30-9ca3-3bdf3b17baeb"}' | python -m json.tool
{
"errors": [],
"limited": false,
"series": []
}
However, you can't add the series back again:
curl -s -X PUT -H "Content-Type: application/json" http://localhost:8080/metadata/series -d '{"key":"3175c2d7-3e93-4e30-9ca3-3bdf3b17baeb", "tags":{"meter":"cpu_util"}}' | python -m json.tool
{
"errors": [],
"times": []
}
Hi,
Are there any plans for a rate of change aggregator (deltas)? KairosDB has a "diff" aggregator that performs no sampling. We usually use it as the first aggregator in a chain to feed into downsampling aggregators (min, max, avg, etc.)
It seems like Heroic uses a concurrent bucketing pattern for SamplingAggregation
, so it seems like a FilterAggregation
would be more appropriate, since you would need to keep track of the state of the last point. Something like the FilterKAreaStrategy
might work, which implements MetricCollection
, perhaps containing the two points at a time... e.g.:
private double computeDiff(MetricCollection metricCollection) {
final List<Point> metrics = metricCollection.getDataAs(Point.class);
return metrics.get(1).getValue() - metrics.get(0).getValue()
}
Am I on the right track?
/metrics/write
is wrong.Seems like any time series value for cadence
gets lost after being passed through any of the K filters. Is there a reason for this or could we just pass on the cadence that each time series get from the sample aggregation prior to the K filter (if there is one)?
Do you have a Docker image running heroic to make setting up development a little easier? Or do you have a dockerfile which I can use to create my own with all the dependencies and such?
The current implementation of the result scanner has a synchronous component:
https://github.com/spotify/heroic/blob/master/metric/bigtable/src/main/java/com/spotify/heroic/metric/bigtable/api/BigtableDataClientImpl.java#L162
This should be delegated to a Cached Thread Pool until upstream has fixed googleapis/java-bigtable-hbase#703.
This was discovered as a potential issue arising when doing stress tests, a single request effectively occupies all available threads for an extended period of time instead of time sharing across all requests.
Hello,
We ran into a problem where several metrics are not written to the ElasticSearch metadata backend.
We have an openstack backend that collect metrics from several managed resources. This causes a few hundred messages to be written to kafka almost at the same time (for our testing we are only sending one datapoint for each series to make it easier to troubleshoot). The messages are consumed, and they are written to kafka and cassandra (the message and record count matches on both), however the number of docs created in the metadata index in elastic search does not match (verified using the _cat/indices
ElasticSearch API). This causes many metrics to not be available for retrieval using the query APIs. Something that is interesting is that if we call the query/metrics
API, the number of series returned by this call matches the number of docs in the metadata index, so we know that there is definitely a problem there.
There are no exceptions thrown by Heroic, and I added a few log statement to the MetadataBackendKV.write()
(https://github.com/spotify/heroic/blob/master/metadata/elasticsearch/src/main/java/com/spotify/heroic/metadata/elasticsearch/MetadataBackendKV.java#L1710 method and I can confirm that all the series are processed by this method. I increased the log level of the ElasticSearch client classes and no errors were found.
If we take those same messages and feed them to kafka by throttling the message producer (at lets say 10 per second), then all the series are written to the metadata index.
It all seems that ElasticSearch is not able to process the requests and just drops them. I saw in the docs that there are options flushInterval
and bulkActions
to configure the elasticsearch client, but by looking at the code I believe those are currently not implemented (https://github.com/spotify/heroic/blob/master/heroic-elasticsearch-utils/src/main/java/com/spotify/heroic/elasticsearch/ConnectionModule.java#L59) (please correct me if I am wrong)
After this long description, my questions are: Have you experienced a similar issue? If so, do you have any suggestions about how to avoid this problem? I am really not familiar with ElasticSearch, so if the answer to these questions relies on ElasticSearch knowledge, I apologize first hand. The other question that I have, is there a way to rebuild the ElasticSearch indices from Heroic?
Thanks!
Jorge
When the Heroic Shell ends with an error, an incorrect exit code of zero is returned. This makes it inconvenient to use in shell scripts to automate tasks. In the following example you can see that the exit code was 0
even though the previous command did not succeed.
13:33:56 heroic $ java -cp "target/*" com.spotify.heroic.HeroicShell --connect 172.16.0.13:9190
13:34:32.145 [main] INFO com.spotify.heroic.HeroicShell - Setting up interactive shell...
13:34:32.154 [main] ERROR com.spotify.heroic.HeroicShell - Error when running shell
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at com.spotify.heroic.shell.RemoteCoreInterface.connect(RemoteCoreInterface.java:285)
at com.spotify.heroic.shell.RemoteCoreInterface.commands(RemoteCoreInterface.java:254)
at com.spotify.heroic.HeroicShell.runInteractiveShell(HeroicShell.java:197)
at com.spotify.heroic.HeroicShell.interactive(HeroicShell.java:182)
at com.spotify.heroic.HeroicShell.main(HeroicShell.java:116)
13:34:32.155 [main] INFO com.spotify.heroic.HeroicShell - Closing core bridge...
13:34:32 heroic $ echo $?
0
hello,
i receive empty suggest.
{
"errors": [],
"suggestions": []
}
here is my config
# heroic.yaml
port: 8080
cluster:
tags:
site: nrtb
protocols:
- type: grpc
discovery:
type: static
nodes:
- "grpc://localhost"
metrics:
backends:
- type: datastax
seeds:
- cassandra
metadata:
backends:
- type: elasticsearch
connection:
clusterName: elasticsearch
seeds:
- elasticsearch
index:
type: rotating
pattern: metadata-%s
interval: 1w
suggest:
backends:
- type: elasticsearch
connection:
clusterName: elasticsearch
seeds:
- elasticsearch
index:
type: rotating
pattern: metadata-%s
interval: 1w
consumers:
- type: kafka
schema: com.spotify.heroic.consumer.schemas.Spotify100
topics:
- "metrics"
config:
group.id: heroic-consumer
zookeeper.connect: kafka
auto.offset.reset: smallest
auto.commit.enable: true
for elasticsearch i did
script.inline: on
script.indexed: on
but i have no template inside
It doesn't seem like any of the K-filtering aggregations work. Regardless of the filter / k values that I use, the same sample gets returned every time. Am I using these wrong?
For example, in a set of values:
[
[1, 10.0],
[2, 20.0],
[3, 30.0]
]
I would expect a query w/ "aggregation: {"type": "abovek", "k": 25}
to only return [3, 30.0]
. However, all points from the sample are being returned with or without the aggregation (same goes w/ bottomk, topk, and belowk).
Hi,
For some reason I'm not able to delete keys from heroic...
I've tried using the following curl example and it's just not doing anything.
curl -XDELETE -H "Content-Type: application/json" http://localhost:8080/metadata/series -d '{"filter": ["and", ["key", "key-17"], ["=", "role", "ingestor"]]}'
I ended up truncating the whole table in cqlsh. Any ideas why it's not working?
Hi,
First off wanted to say thank you for open sourcing this excellent project. Our company migrated to Google Cloud Platform specifically for the benefits afforded by BigTable, and we're really happy to see Heroic being built w/ support for it from the outset. We are currently using OpenTSDB in production, but the BigTable adapter is pretty alpha and some important features, such as deleting data, are unsupported at the moment.
With OpenTSDB, we are seeing latencies ~30-40ms for writing 60 points to a series. With Heroic, however, we are seeing performance ~10x slower, ranging from 300-400ms for writing the exact same data. Both services are running on a single node in Google Compute Engine (4 vCPUs, 15 GB). I have included below the API calls to both Heroic and OpenTSDB, as well as a screengrab from our Grafana dashboard that shows the difference in latency.
It might be worth mentioning that we are seeing good read times. It only appears that writes are slower than OpenTSDB.
Hi,
I was wondering on ways to improve resiliency in case of DC failure - and how to marry cross-DC replication with having a federated cluster. While we could set up cross–datacenter replication for C*, this alone would not work due to the fact that the data will be there – but the metadata will be missing Also – since the data wold be replicated into multiple data centers, a federated query would read the same data twice – this would probably break some aggregations like count()
/ sum()
.
From our point of view having a hot-hot multi-DC deployment is an important requirement, both in case of ingestion as well as querying.
I was trying to devise a way how to work around this limitation, some options I was considering were:
By the way I was wondering what approach did you take with regard to rebuilding Elasticsearch indexes – as far as I understand, there’s no need to scan over all the dat a in Cassandra, only the row keys – is there an efficient way to do so- and what numbers are you seeing when rebuilding indexes? I was wondering if we could live with "normal" federation and only rebuilding Elastic indexes when there's failover - they're not that big and we could replicate other datacenters into different C* keyspaces - if there was a failover, the process would have to regenerate/update only the indexes for "remote" data - similarily, the metric-collecting agent could in this case switch to a different DC.
I am getting StackOverflowErrors when running the write-performance
task.
Steps to reproduce:
port: 8080
shellServer:
host: 192.168.42.2
port: 9190
cluster:
protocols:
- type: nativerpc
host: localhost
port: 1394
discovery:
type: static
nodes:
- "nativerpc://127.0.0.1:1394"
...(other settings for suggest, metadata, metrics and consumer follow)
Running with Cassandra c2.1.13, ElasticSearch 2.2.1 (based on PR 62), zookeeper 3.4.6 and Kafka 0.9.0.1
java -cp "../target/*" com.spotify.heroic.HeroicShell --connect 192.168.42.2:9190
heroic> timeout 60
Timeout updated to 60 seconds
write-performance
taskheroic> write-performance --from=heroic --series=10 --target=100 --writes=200
Warmup step 1/4
..................................Command timed out (current timeout = 60s)
The following is the output captured in the heroic logs:
2016-08-04T22:52:05.0969: 22:52:05.092 [heroic-core-2] ERROR com.spotify.heroic.HeroicCore - Unhandled exception caught in core executor
2016-08-04T22:52:05.0971: java.lang.StackOverflowError
2016-08-04T22:52:05.0972: #011at java.util.Spliterators$IteratorSpliterator.<init>(Spliterators.java:1710)
2016-08-04T22:52:05.0973: #011at java.util.Spliterators.spliterator(Spliterators.java:420)
2016-08-04T22:52:05.0974: #011at java.util.Set.spliterator(Set.java:411)
2016-08-04T22:52:05.0976: #011at java.util.Collection.stream(Collection.java:581)
2016-08-04T22:52:05.0977: #011at com.spotify.heroic.common.SelectedGroup.stream(SelectedGroup.java:45)
2016-08-04T22:52:05.0978: #011at com.spotify.heroic.metric.LocalMetricManager$Group.map(LocalMetricManager.java:356)
2016-08-04T22:52:05.0979: #011at com.spotify.heroic.metric.LocalMetricManager$Group.write(LocalMetricManager.java:268)
2016-08-04T22:52:05.0980: #011at com.spotify.heroic.shell.task.WritePerformance.lambda$buildWrites$5(WritePerformance.java:249)
2016-08-04T22:52:05.0981: #011at eu.toolchain.async.DelayedCollectCoordinator.setupNext(DelayedCollectCoordinator.java:124)
2016-08-04T22:52:05.1024: #011at eu.toolchain.async.DelayedCollectCoordinator.checkNext(DelayedCollectCoordinator.java:115)
2016-08-04T22:52:05.1026: #011at eu.toolchain.async.DelayedCollectCoordinator.resolved(DelayedCollectCoordinator.java:60)
2016-08-04T22:52:05.1027: #011at eu.toolchain.async.DirectAsyncCaller.resolve(DirectAsyncCaller.java:10)
2016-08-04T22:52:05.1028: #011at eu.toolchain.async.immediate.ImmediateResolvedAsyncFuture.onDone(ImmediateResolvedAsyncFuture.java:59)
2016-08-04T22:52:05.1029: #011at eu.toolchain.async.DelayedCollectCoordinator.setupNext(DelayedCollectCoordinator.java:130)
2016-08-04T22:52:05.1031: #011at eu.toolchain.async.DelayedCollectCoordinator.checkNext(DelayedCollectCoordinator.java:115)
2016-08-04T22:52:05.1032: #011at eu.toolchain.async.DelayedCollectCoordinator.resolved(DelayedCollectCoordinator.java:60)
2016-08-04T22:52:05.1033: #011at eu.toolchain.async.DirectAsyncCaller.resolve(DirectAsyncCaller.java:10)
2016-08-04T22:52:05.1035: #011at eu.toolchain.async.immediate.ImmediateResolvedAsyncFuture.onDone(ImmediateResolvedAsyncFuture.java:59)
2016-08-04T22:52:05.1036: #011at eu.toolchain.async.DelayedCollectCoordinator.setupNext(DelayedCollectCoordinator.java:130)
2016-08-04T22:52:05.1037: #011at eu.toolchain.async.DelayedCollectCoordinator.checkNext(DelayedCollectCoordinator.java:115)
2016-08-04T22:52:05.1038: #011at eu.toolchain.async.DelayedCollectCoordinator.resolved(DelayedCollectCoordinator.java:60)
2016-08-04T22:52:05.1039: #011at eu.toolchain.async.DirectAsyncCaller.resolve(DirectAsyncCaller.java:10)
Hey guys,
We're trying out Heroic in addition to a few other databases. Both KairosDB and OpenTSDB have the functionality to make multiple aggregations in the same query, and return them as separate result sets.
More specific, is it possible to fetch the min/avg/max for a given metric, without sending 3 separate queries?
Any input appreciated. Thanks!
If you could point me to where I have gone wrong or where in the code a mistake may have been made, it would be appreciated.
I have done the following:
Successfully retrieved results:
Failed to retrieved results (equals):
Failed to retrieved results (starts like):
This RFC suggests to introduce the two following syntactic sugars in aggregation DSLs, the by and the | (pipe) keywords.
by expressions take the form <aggregation> by <tags>
, and are equivalent to the current group(<tags>, <aggregation>)
syntax.
| (pipe) expressions take the form <aggregation> | <aggregation> | ..
, and are equivalent to the current chain(<aggregation>, <aggregation>, ..)
.
It is also possible to group expressions using parenthesis to override priority, like the following:
(average() by host | sum()) by role
Which would read as something like: calculate the sum of all averages by host for each role
The rationale behind the proposal is to flatten the need for overly nested expressions, and to allow for aggregations to read from left to right more naturally what their intent is.
Take the following example:
chain(group([host], average()), group([host], sum()))
This could be written as the following:
average() by host | sum() by site
This would read as: average by host, then sum by site.
Pipe has very strong connotations, especially when considering shell pipes. It might be inadvisable to use such a well-established keyword. I'm open for proposals of alternative operators or syntax.
An alternative proposal is to simply introduce then
, however this can get a bit verbose:
average() by host then sum() by site
Hi,
I'm sure this is a noob question but how do I select metrics from heroic?
I'm interested in doing this using grafana and also the heroicsh.
Any input is appreciated.
Bump the Elasticsearch version and test to support 2.x
.
This will not be a backwards compatible change because of elastic/elasticsearch#13272 unless we switch to some REST client.
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
Still in the process of getting this to actually run, but preliminary findings
Homebrew Cassandra (V2.0 & V3.X) don't work with this
I have gotten the config to work (I think) with cassandra from the apache repo
apache-cassandra-2.1.12
java -cp heroic-dist/target/heroic-dist-0.0.1-SNAPSHOT-shaded.jar com.spotify.heroic.HeroicService heo.yaml
Still having problems with elasticsearch
The current approach CREATE KEYSPACE IF NOT EXISTS
apparently doesn't work without create
permissions.
This relates to #34
For some reason, the calculated cardinality is incorrect when used over gRPC.
This could be related to grpc
having less predictive ordering when recombining the results than the jvm
transport. It could also be this, in the combination of the cardinality implementation being a statistical method (with some associated error) and that condition triggers it.
DSE now supports SOLR indexing pretty much out of the box. Does Heroic have plans to use this behavior natively? Or is ElasticSearch the only game in town for this?
I know this is not the appropriate place for questions but with the lack of a mailing list or gitter channel - I think it's forgivable (feel free to delete if not).
I'm setting up a VM with heroic in order to monitor some of our homegrown metrics (ingesting metric with the REST API into heroic) and got some questions for which I couldn't find any answers on Wiki and homepage.
Thanks!
I want to turn this:
mapper = new ObjectMapper();
mapper.addMixIn(AggregationInstance.class, TypeNameMixin.class);
mapper.registerSubtypes(new NamedType(AboveKInstance.class, AboveK.NAME));
mapper.registerSubtypes(new NamedType(BelowKInstance.class, BelowK.NAME));
mapper.registerSubtypes(new NamedType(BottomKInstance.class, BottomK.NAME));
mapper.registerSubtypes(new NamedType(TopKInstance.class, TopK.NAME));
Into this:
final FakeModuleLoader loader = FakeModuleLoader.builder().module(Module.class).build();
mapper = loader.json();
Module
already maps out the type names. The reason why it can't be used in the tests are because it requires the loading phase of Heroic to be configured.
In this, I propose to introduce FakeModuleLoader
which sets up a fake loading environment and loads the specified module in order to provide a correctly configured ObjectMapper
.
The only configuration step for cassandra according to docs is running tools/heroic-shell -P cassandra -X cassandra.seeds= -X cassandra.configure, but it fails with "Keyspace heroic does not exist".
It works only after importing keyspace.cql and tables.cql manually.
Hi,
I'm making a json structure in logstash and pushing it into a kafka topic (works as expected) but it seems that heroic is not consuming the metrics for some reason.
My structure is the following
{
"time" => "1477401246000",
"host" => "host1.cc",
"tags" => {
"loginResult" => "ok",
"loginType" => "sshd",
"loginUser" => "xuser"
},
"key" => "c2uo0wqqn",
"value" => "1.0"
}
If I connect to the shell server and run keys I only see the keys generated..
{"series":{"key":"key-11","tags":{"host":"11.example.com","role":"web","what":"disk-used"}},"base":1475321265489,"type":"points","token":6743956916690856207}
{"series":{"key":"key-81","tags":{"host":"81.example.com","role":"ingestor","what":"teleported-goats"}},"base":1475321265489,"type":"points","token":7120557367222539016}
{"series":{"key":"key-51","tags":{"host":"51.example.com","role":"ingestor","what":"disk-used"}},"base":1475321265489,"type":"points","token":7178361716493023162}
Any ideas what am I doing wrong?
Thank you!
One of our consumers stopped consuming. While looking into the log we found the exception below. A restart didn't help, as soon as the faulty message was read the consumer would stop.
My initial research seemed to indicate that you can't recover from this programmatically. However, we should verify this is the case and that is not already addressed in a newer version.
I had to resort to remove the partition that contained the faulty message as we could afford doing so. Although maybe increasing the offset in ZK is enough if this happens again before it is properly fixed.
2016-06-17 09:36:47,193 ERROR c.s.h.c.k.ConsumerThread [com.spotify.heroic.consumer.kafka.ConsumerThread:]:0: Error in thread kafka.message.InvalidMessageException: Message is corrupt (stored crc = 0, computed crc = 225171
0752)
at kafka.message.Message.ensureValid(Message.scala:166)
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:102)
at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at com.spotify.heroic.consumer.kafka.ConsumerThread.guardedRun(ConsumerThread.java:140)
at com.spotify.heroic.consumer.kafka.ConsumerThread.run(ConsumerThread.java:89)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.