uber-archive / cherami-server Goto Github PK

Distributed, scalable, durable, and highly available message queue system. This project is deprecated and not maintained.

Home Page: https://eng.uber.com/cherami/

License: MIT License

Makefile 0.14% Go 99.64% Shell 0.10% Dockerfile 0.12%

cherami-server's People

Contributors

Stargazers

Watchers

Forkers

thuningxu georgeerickson cuulee frank-u hhy5277 mindis zhangpeihao amciii710 jmptrader gophersgang upshotech suhuaguo ringtail pilgrim2go setharsh arawalo rjaylyn1925 jaylyngrab alexxnica samarabbas patrick-park ycaihua nagyistge maniacs-ops johndpope enterstudio cube3power tigertian huangdehui2013 kryndex mylearning2017 pandakf azylman satryacode loudbirds boboli611 boyang798 mxk1235 isitwhoisit pombredanne wwjiang007 luuminhthai daniel-007 aryanugroho telegrap kaan16 cluo px307 sudersen tylerarnold vodelerk iforgotband jtliames2013 imanolache fahedhijazi flyingliang renesugar kirg qingzhong110 shasthojoy valdezm servicefoundation manzoid golint-fixer krtkvrm spencerx jianzh28 neeleshlohani deanwei kspine liminxufen xiemaisi imrukh awesomegolang simon-cj minhnhut0602 fisshboneandlestrdev hubertp upnrunnhq ceh-forks caldempsey melonaerial mukteshkrmishra tushar2708 zofuthan mfateev guwu001 qbui zemlyanin7 alexshtin kirinse yulongq fourspaces nzb15555196162 bearerpipelinetest billowqiu iq-scm

cherami-server's Issues

BacklogAvailble becomes a negative number

When reporting backlog available I see a negative number often.

This is both reported through the admin tool and the statsd output. Because the statsd gauge is used to report the metric, the negative number just decrements the previous value rather than setting it to the new negative number - This means within statsd queue backlogs are hard to judge unless there is currently a backlog.

Unable to Create Cluster Within Docker Compose

I'm trying to start cherami-server in a simple cluster, with an external cassandra and each node running every component.

However when running I get this error

time="2017-08-09T20:40:54Z" level=error msg="RingPop GetHosts failed" ctrlID=c2a42453 deploymentName= err="Not enough hosts to serve the request" module=extentMon
time="2017-08-09T20:41:12Z" level=error msg="Can't talk to Controller service, no hosts found" deploymentName= destID=c58c2988-82b9-4b2b-8094-c05eaeac1b33 dstPth="/healthcheck/healthcheck" err="Not enough hosts to serve the request" frntID=1d5c9c1e

A single node works as expected.

My config looks like this:

DefaultServiceConfig:
  ListenAddress: "172.22.0.3"
  RingHosts: "cherami-1,cherami-2,cherami-3"

DefaultDestinationConfig:
  Replicas: 1

MetadataConfig:
  CassandraHosts: "cassandra"

StorageConfig:
  BaseDir: /var/lib/cherami-store
  HostUUID: "afe6c579-e79c-4da7-b56b-dcbdb66ec6b0"

logging:
  level: info
  stdout: true

With the ListenAddress pointing to the correct ip of each node, RingHosts point to the DNS names for each node and each HostUUID is unique.

Can you see what I am doing wrong?

Cheers!

use latest ringpop

currently we lock down ringpop to v0.7.0 as there's some breaking changes(in terms of functionality) in the later releases.

We need to bring in the latest release of ringpop otherwise it's hard to adopt any hot fix. (Generally I think it's better to stay head for all dependencies).

Cannot get StatsD Metrics

I've noticed that Cherami supports sending stats to a statsd server. From looking through the code i have the following configuration:

DefaultServiceConfig:
  ListenAddress: "..."
  RingHosts: "..."
  EnableLimits: true
  Metrics:
    statsd:
      hostPort: "statsd:8125"
      prefix: "ch"

And I can see a RuntimeMetricsReporter starts, but because the statsd metrics exporters doesn't log anywhere I can't see why I'm not getting stats appear on the other end. is there a configuration I have missed?

Cheers,
Dan

startup hanging?

hi there, sorry noob question, go is not my cup of tea..

cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]

./scripts/cherami-setup-schema

Ran fine, can see it has created something in Cassandra. However.

CHERAMI_ENVIRONMENT=laptop CHERAMI_CONFIG_DIR=`pwd`/config CHERAMI_STORE=/tmp/store ./cherami-server start all

seems to just sit there, no output, no logs. what am I missing? :(

[QUESTION] How to get Cherami to Spread the Load

I'm running a Cherami cluster using docker containers. I have:

3x Frontend Hosts (0.5GB mem)
3x Controllers (0.5GB mem)
3x Inputhosts (2GB mem)
3x Outputhosts (2GB mem)
5x Storage hosts (4GB mem)

And Cassandra with 5 nodes, and a RF of 3.

However when we have times of heavy load across the system, most of the boxes sit idle with one storage host that climbs to all its memory and then is killed (OOM).

Is there a way to tell Cherami to spread the load to the other storage hosts? It seems all the extents are placed on the one host, and even when that hosts starts slowing down and getting behind, the load is never shared.

Cassandra is aware of all nodes, each node seems to be working

Many thanks!

[question] Usage with Apache Spark

Hello, sorry to post a question on issues, but I am curious to know if cherami can be a drop-in replacement of Kafka, with Apache Spark as a consumer?

Will you consider HDFS as backend storage?

As most part of the message storage is append only, so why not using HDFS as its backend? In that case, no need to do replication self, HDFS takes the work.

Moreover, by writing messages to HDFS allows possible MR work on storage directly.

Will performance be a problem?

[Question] SSL/TLS/SSH support

Greetings!

I'm wondering if there is, or planned to have, an integration of a security layer for the transportation of the tasks.
It might be useful to have a cluster shared between different locations and guarantee an encrypted transfer.

Thanks!

dependency on cassandra

Could it makes sense to use etc or consul for this ?

It makes deployment etc easier with a 100% golang stack.

Need this documentation

Hello

I am currently starting a distributed structure for my project and I was taking into consideration several tools. As I already used some of the tools I would like to know when they would have the documentation of cheramy list

Why is Cherami deprecated?

Sorry for disturbing you folks, but why is this project deprecated? After googling, I found nothing other than this twitter comment:

Uber has actually moved to using Kafka.

Is that the true reason?

Not enough hosts to serve the request

Hi, I have an issue with Cherami when I try to setup a local cluster using multiple docker cherami-server instances.

I might have the wrong configs or some bad setup, but I would appreciate your help.

I am trying to run 2 nodes (docker instances) of the official ubercherami/cherami-server, but I think that there's something wrong with the discovery, i.e. I don't think Ringpop is working in my case; I get the following logs from the service (errors highlighted in red):

time="2017-03-24T10:31:26Z" level=info msg="New Frontend" deploymentName= frntID=039683e9 serviceName=cherami-frontendhost
time="2017-03-24T10:31:26Z" level=info msg="RuntimeMetricsReporter started"
time="2017-03-24T10:31:26Z" level=info msg="tChannel listening 172.17.0.5:4254"
time="2017-03-24T10:31:26Z" level=debug msg=RingHosts RingHosts=[172.17.0.5 172.17.0.6] hostIP=172.17.0.5
time="2017-03-24T10:31:26Z" level=info msg="service started" hostport="172.17.0.5:4254" service=cherami-outputhost
time="2017-03-24T10:31:26Z" level=info msg="Load reporter started." deploymentName= hostIP="172.17.0.5:4254" outhID=039683e9
time="2017-03-24T10:31:26Z" level=info msg="HostIDHeartbeater started" deploymentName= hostAddr="172.17.0.5:4254" hostID=039683e9-e15f-417d-beb6-764cbf206626 hostName=35e0d2b285ff outhID=039683e9
time="2017-03-24T10:31:26Z" level=info msg="Diagnostic http endpoint listening on 127.0.0.1:14254"
time="2017-03-24T10:31:26Z" level=info msg="WebSocket listening 172.17.0.5:6190"
time="2017-03-24T10:31:26Z" level=info msg="RingpopMonitor worker loop started"
time="2017-03-24T10:31:26Z" level=info msg="Config refresher started" deploymentName= outhID=039683e9
time="2017-03-24T10:31:26Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-controllerhost:0xc4202f3360 cherami-frontendhost:0xc4202f33a0 cherami-replicator:0xc4202f33e0 cherami-inputhost:0xc4202f32a0 cherami-outputhost:0xc4202a4c00 cherami-storehost:0xc4202f3320]
time="2017-03-24T10:31:26Z" level=info msg="RuntimeMetricsReporter started"
time="2017-03-24T10:31:26Z" level=info msg="tChannel listening 172.17.0.5:4253"
time="2017-03-24T10:31:26Z" level=debug msg=RingHosts RingHosts=[172.17.0.5 172.17.0.6] hostIP=172.17.0.5
time="2017-03-24T10:31:26Z" level=info msg="RingpopMonitor worker loop started"
time="2017-03-24T10:31:26Z" level=info msg="service started" hostport="172.17.0.5:4253" service=cherami-storehost
time="2017-03-24T10:31:26Z" level=info msg="HostIDHeartbeater started" deploymentName= hostAddr="172.17.0.5:4253" hostID=11111111-1111-1111-1111-111111111111 hostName=35e0d2b285ff storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="StorageMonitor: started" deploymentName= storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="Load reporter started." deploymentName= storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="extStatsReporter: started" deploymentName= storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="StoreHost: started" deploymentName= options="Store=UNKNOWN BaseDir=/tmp/cherami-store" storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="Diagnostic http endpoint listening on 127.0.0.1:14253"
time="2017-03-24T10:31:26Z" level=info msg="ReplicationJobRunner: started" deploymentName= storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="extStatsReporter: reporterPump started" deploymentName= storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="WebSocket listening 172.17.0.5:6191"
time="2017-03-24T10:31:26Z" level=info msg="extStatsReporter: schedulerPump started" deploymentName= report-interval=1m0s storID=11111111
time="2017-03-24T10:31:26Z" level=info msg="Update the uconfig value" inputhost.HostOverallConnLimit=10000
time="2017-03-24T10:31:26Z" level=info msg="RuntimeMetricsReporter started"
time="2017-03-24T10:31:26Z" level=info msg="tChannel listening 172.17.0.5:4240"
time="2017-03-24T10:31:26Z" level=debug msg=RingHosts RingHosts=[172.17.0.5 172.17.0.6] hostIP=172.17.0.5
time="2017-03-24T10:31:26Z" level=info msg="service started" hostport="172.17.0.5:4240" service=cherami-inputhost
time="2017-03-24T10:31:26Z" level=info msg="HostIDHeartbeater started" deploymentName= hostAddr="172.17.0.5:4240" hostID=ed1490d7-444e-4658-b486-406e8fc03982 hostName=35e0d2b285ff inhoID=ed1490d7
time="2017-03-24T10:31:26Z" level=info msg="Load reporter started." deploymentName= inhoID=ed1490d7
time="2017-03-24T10:31:26Z" level=info msg="Diagnostic http endpoint listening on 127.0.0.1:14240"
time="2017-03-24T10:31:26Z" level=info msg="WebSocket listening 172.17.0.5:6189"
time="2017-03-24T10:31:26Z" level=info msg="RingpopMonitor worker loop started"
time="2017-03-24T10:31:26Z" level=info msg="New Frontend" deploymentName= frntID=3002148f serviceName=cherami-frontendhost
time="2017-03-24T10:31:26Z" level=info msg="RuntimeMetricsReporter started"
time="2017-03-24T10:31:26Z" level=info msg="tChannel listening 172.17.0.5:4922"
time="2017-03-24T10:31:26Z" level=debug msg=RingHosts RingHosts=[172.17.0.5 172.17.0.6] hostIP=172.17.0.5
time="2017-03-24T10:31:26Z" level=info msg="service started" hostport="172.17.0.5:4922" service=cherami-frontendhost
time="2017-03-24T10:31:26Z" level=info msg="HostIDHeartbeater started" deploymentName= frntID=3002148f hostAddr="172.17.0.5:4922" hostID=3002148f-941d-43ef-9f7c-1c464f48c137 hostName=35e0d2b285ff
time="2017-03-24T10:31:26Z" level=info msg="Diagnostic http endpoint listening on 127.0.0.1:14922"
time="2017-03-24T10:31:26Z" level=info msg="RingpopMonitor worker loop started"
time="2017-03-24T10:31:26Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-storehost:0xc4203914e0 cherami-controllerhost:0xc420370ca0 cherami-frontendhost:0xc420370ce0 cherami-replicator:0xc420370d20 cherami-inputhost:0xc420370be0 cherami-outputhost:0xc420370c20]
time="2017-03-24T10:31:26Z" level=info msg="replication run started" deploymentName= storID=11111111
time="2017-03-24T10:31:26Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-frontendhost:0xc4203a5900 cherami-replicator:0xc4203a5940 cherami-inputhost:0xc4203d2160 cherami-outputhost:0xc4203a5840 cherami-storehost:0xc4203a5880 cherami-controllerhost:0xc4203a58c0]
time="2017-03-24T10:31:26Z" level=info msg="replication run finished" deploymentName= stats="total extents: 0, remote extents:0, opened for replication: 0, primary: 0, secondary: 0, failed: 0" storID=11111111
time="2017-03-24T10:31:26Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-inputhost:0xc4203efbe0 cherami-outputhost:0xc4203efc20 cherami-storehost:0xc4203efc60 cherami-controllerhost:0xc4203efca0 cherami-frontendhost:0xc420406140 cherami-replicator:0xc4203efd20]
- time="2017-03-24T10:31:27Z" level=error msg="Cannot initialize topology for placement" ctrlID=531fb4cb deploymentName= err="open : no such file or directory"
time="2017-03-24T10:31:27Z" level=info msg="RuntimeMetricsReporter started"
time="2017-03-24T10:31:27Z" level=info msg="tChannel listening 172.17.0.5:5425"
time="2017-03-24T10:31:27Z" level=debug msg=RingHosts RingHosts=[172.17.0.5 172.17.0.6] hostIP=172.17.0.5
time="2017-03-24T10:31:27Z" level=info msg="service started" hostport="172.17.0.5:5425" service=cherami-controllerhost
time="2017-03-24T10:31:27Z" level=info msg="RingpopMonitor worker loop started"
time="2017-03-24T10:31:27Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-frontendhost:0xc420452fe0 cherami-replicator:0xc420453020 cherami-inputhost:0xc420452ee0 cherami-outputhost:0xc420452f20 cherami-storehost:0xc420452f60 cherami-controllerhost:0xc420427520]
time="2017-03-24T10:31:27Z" level=info msg="Event pipeline started" ctrlID=531fb4cb deploymentName= module=EventPipeline
time="2017-03-24T10:31:27Z" level=info msg="Timeslot metrics aggregator started" ctrlID=531fb4cb deploymentName=
time="2017-03-24T10:31:27Z" level=info msg="Failure Detector Daemon started" ctrlID=531fb4cb deploymentName=
time="2017-03-24T10:31:27Z" level=debug msg="retMgrRunner: Starting" ctrlID=531fb4cb deploymentName=
time="2017-03-24T10:31:27Z" level=info msg="HostIDHeartbeater started" ctrlID=531fb4cb deploymentName= hostAddr="172.17.0.5:5425" hostID=4e4af61f-185a-4e7d-87dc-54f67d737234 hostName=35e0d2b285ff
time="2017-03-24T10:31:27Z" level=info msg="ExtentStateMonitor started" ctrlID=531fb4cb deploymentName= module=extentMon
time="2017-03-24T10:31:27Z" level=info msg="Diagnostic http endpoint listening on 127.0.0.1:15425"
time="2017-03-24T10:31:27Z" level=info msg="Config refresher started" ctrlID=531fb4cb deploymentName=
time="2017-03-24T10:31:27Z" level=debug msg="retMgrRunner: not primary; not running here" ctrlID=531fb4cb deploymentName=
- time="2017-03-24T10:31:27Z" level=error msg="RingPop GetHosts failed" ctrlID=531fb4cb deploymentName= err="Not enough hosts to serve the request" module=extentMon
time="2017-03-24T10:31:27Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-controllerhost:0xc4202f3360 cherami-frontendhost:0xc4202f33a0 cherami-replicator:0xc4202f33e0 cherami-inputhost:0xc4202f32a0 cherami-outputhost:0xc420427d60 cherami-storehost:0xc4202f3320]
time="2017-03-24T10:31:27Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-controllerhost:0xc4203a58c0 cherami-frontendhost:0xc4203a5900 cherami-replicator:0xc4203a5940 cherami-inputhost:0xc4203ee720 cherami-outputhost:0xc4203a5840 cherami-storehost:0xc4203a5880]
time="2017-03-24T10:31:28Z" level=debug msg="Ring membership updated, new membershipMap" newembershipMap=map[cherami-controllerhost:0xc4203d2960 cherami-frontendhost:0xc420452fe0 cherami-replicator:0xc420453020 cherami-inputhost:0xc420452ee0 cherami-outputhost:0xc420452f20 cherami-storehost:0xc420452f60]
time="2017-03-24T10:31:28Z" level=debug msg="retMgrRunner: Starting retention manager" ctrlID=531fb4cb deploymentName=
time="2017-03-24T10:31:28Z" level=info msg="RetentionMgr starting" ctrlID=531fb4cb deploymentName= dlqinterval=0s interval=10m0s module=retMgr workers=8
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=0
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=1
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=2
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=3
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=4
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=5
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=6
time="2017-03-24T10:31:28Z" level=debug msg="retentionWorker: started" ctrlID=531fb4cb deploymentName= module=retMgr worker=7
- time="2017-03-24T10:31:28Z" level=error msg="RingPop GetHosts failed" ctrlID=531fb4cb deploymentName= err="Not enough hosts to serve the request" module=extentMon
- time="2017-03-24T10:31:28Z" level=error msg="RingPop GetHosts failed" ctrlID=531fb4cb deploymentName= err="Not enough hosts to serve the request" module=extentMon
- time="2017-03-24T10:31:28Z" level=error msg="getControllerClient - Failed to find controller host" err="Not enough hosts to serve the request"
...

After this, it keeps logging the last errors with the underlying error Not enough hosts to serve the request.

Edit: the cherami-cli answers whit the same error when trying to create destination on a node (with stacktrace included):

root@35e0d2b285ff:/go/src/github.com/uber/cherami-server# ./cherami-cli --env=prod --hostport=172.17.0.5:4922 create destination /test/cherami
tchannel error ErrCodeUnexpected: InternalServiceError({Message:Not enough hosts to serve the request})

--env=staging is now the default. Did you mean '--env=prod' ?

goroutine 1 [running]:
runtime/debug.Stack(0xc4201bbd20, 0x4638f5, 0xa6345d)
	/usr/local/go/src/runtime/debug/stack.go:24 +0x79
runtime/debug.PrintStack()
	/usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/uber/cherami-server/tools/common.ExitIfError(0xd7ec40, 0xc42013dd70)
	/go/src/github.com/uber/cherami-server/tools/common/lib.go:101 +0x1cb
github.com/uber/cherami-server/tools/common.CreateDestination(0xc420153680, 0xd88b00, 0xc42013b900, 0xd84c80, 0xc42013ff60)
	/go/src/github.com/uber/cherami-server/tools/common/lib.go:257 +0xa22
github.com/uber/cherami-server/tools/cli.CreateDestination(0xc420153680, 0xd84c80, 0xc42013ff60)
	/go/src/github.com/uber/cherami-server/tools/cli/lib.go:42 +0x76
main.main.func1(0xc420153680)
	/go/src/github.com/uber/cherami-server/cmd/tools/cli/main.go:129 +0x3d
github.com/uber/cherami-server/vendor/github.com/codegangsta/cli.HandleAction(0x981ac0, 0xc42013ff80, 0xc420153680, 0xc420148a00, 0x0)
	/go/src/github.com/uber/cherami-server/vendor/github.com/codegangsta/cli/app.go:487 +0x7c
github.com/uber/cherami-server/vendor/github.com/codegangsta/cli.Command.Run(0xa5e189, 0xb, 0x0, 0x0, 0xc42013ffe0, 0x2, 0x2, 0xa705e5, 0x23, 0x0, ...)
	/go/src/github.com/uber/cherami-server/vendor/github.com/codegangsta/cli/command.go:193 +0xb96
github.com/uber/cherami-server/vendor/github.com/codegangsta/cli.(*App).RunAsSubcommand(0xc420164d00, 0xc420153400, 0x0, 0x0)
	/go/src/github.com/uber/cherami-server/vendor/github.com/codegangsta/cli/app.go:374 +0xb1a
github.com/uber/cherami-server/vendor/github.com/codegangsta/cli.Command.startApp(0xa59d23, 0x6, 0x0, 0x0, 0xc42013ffc0, 0x2, 0x2, 0xa7153d, 0x24, 0x0, ...)
	/go/src/github.com/uber/cherami-server/vendor/github.com/codegangsta/cli/command.go:280 +0x82c
github.com/uber/cherami-server/vendor/github.com/codegangsta/cli.Command.Run(0xa59d23, 0x6, 0x0, 0x0, 0xc42013ffc0, 0x2, 0x2, 0xa7153d, 0x24, 0x0, ...)
	/go/src/github.com/uber/cherami-server/vendor/github.com/codegangsta/cli/command.go:79 +0x16a5
github.com/uber/cherami-server/vendor/github.com/codegangsta/cli.(*App).Run(0xc4201649c0, 0xc42000c180, 0x6, 0x6, 0x0, 0x0)
	/go/src/github.com/uber/cherami-server/vendor/github.com/codegangsta/cli/app.go:250 +0x812
main.main()
	/go/src/github.com/uber/cherami-server/cmd/tools/cli/main.go:407 +0x2aca

The config I used for the first node is this:

DefaultServiceConfig:
  ListenAddress: "172.17.0.5"
  RingHosts: "172.17.0.5,172.17.0.6"

DefaultDestinationConfig:
  Replicas: 1

MetadataConfig:
  CassandraHosts: "172.17.0.2"

StorageConfig:
  BaseDir: /tmp/cherami-store
  HostUUID: "11111111-1111-1111-1111-111111111111"

For the second node is similar, using

DefaultServiceConfig:
  ListenAddress: "172.17.0.6"
...
StorageConfig:
  BaseDir: /tmp/cherami-store
  HostUUID: "22222222-2222-2222-2222-222222222222"

I know this is a handful and I might miss something obvious, but because there is not much documentation it's hard to get all the things right.

Thanks in advance,
Adrian

What should the value of RingHosts be?

If i build a simple cluster, i set the RingHosts property to the hostname:<frontendport> of all the other hosts in the cluster - That works fine

If I want to separate out each Role, what do I use as the RingHosts value? Is it still just the frontend hosts?

cherami-server is not respecting ListenAddress setting in config

Symptom: setting ListenAddress to 127.0.0.1, but services still bind to external address.

Problem is that ServiceConfig::GetListenAddress() fills in external address when the ListenAddress is not specified, thus in CommonConfigure::SetupServerConfig(), the call to sCfg.GetListenAddress() never returns nil even if service-specific ListenAddress is not set.

ConcurrentMap size calculation is wrong?

Please see the followings:

Remove decrement the counter with no checking of whether key exists or not:
https://github.com/uber/cherami-server/blob/master/common/concurrentmap.go#L173
Put increase the counter with no checking of whether key exists or not:
https://github.com/uber/cherami-server/blob/master/common/concurrentmap.go#L148

Is MinFreeDiskSpaceBytes still relevant?

Following an update a couple of by @kirg days ago I'm confused about this value.

This can be configured by setting:
./cherami-admin --env=prod --hostport=fehost2:4922 cfg set cherami-storehost.*.*.*.minFreeDiskSpaceBytes 10000000 and defaults to 40GB.

I've noticed too that the storagemonitor has now hardcoded thresholds of:

thresholdWarn                = 75GB
thresholdReadOnly         = 50GB
thresholdResumeWrites = 100GB

Why is the placement threshold set below the threshold for storage hosts to become read only, why is it such large values and is there any way we can affect the monitor's thresholds?

I test cherami locally on my machine with Docker, and ony my mac docker's default overlay disk size is 63GB, and theres probably only about 32GB of that free - this was fine as I used to hop on, and change the minFree to 1gb, so I could carry on testing. Now it doesn't seem like I can do that anymore, as the storehosts all become readonly.

In production we use cherami as a distributed message queue and don't expect more than 20minutes of messages to be backed to disk at any one time, which is no way near the 50GB limit.

Cheers,
Dan

Cannot build bins cockroachdb/c-jemalloc deprecated

Apparently since go 1.9.4 + there are many problems with #cgo

output to make bins (same for make cherami-server)
mark@DESKTOP-1NPUJN1:~/go_projects/src/github.com/uber/cherami-server$ make bins
go build -i -tags=embed -o cherami-server cmd/standalone/main.go
go build github.com/uber/cherami-server/vendor/github.com/cockroachdb/c-jemalloc: invalid flag in #cgo CFLAGS: -funroll-loops
make: *** [cherami-server] Error 1

from: https://github.com/cockroachdb/c-jemalloc
The repo is tagged as deprecated.

Using:
go version go1.10.1 linux/amd64

go env:
GOARCH="amd64"
GOBIN="/home/mark/go_projects/bin"
GOCACHE="/home/mark/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/mark/go_projects"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build812705251=/tmp/go-build -gno-record-gcc-switches"

Slow startup time of cherami-server-standalone Docker image

YARPC for Go is now using the Cherami server to run end-to-end tests, which is awesome, except that it adds minutes to our test suite completion time because this image takes so long to become ready.

Guessing this is related to provisioning Cassandra w/ schema and required data. Let's get that way down to at least >10s. Here are some ideas:

Create and maintain a Cassandra image that has the schema already provided
Mount the dir Cassandra persists to, and then use Travis CI's cache feature to ensure subsequent runs are speedy.

Anything would be useful on our end, this makes things really slow.

StatsdReporter merges tags in random order

In Uber metrics are tagged, but when they are converted to StatsD metrics, the tags are converted into prefixes. Unfortunately, because tags are map[string]string, during the conversion, we iterate this map in random order (in statsdreporter.go, metricstoPrefix()). This results in different prefixes being produced for the same metric, e.g.

Unable to Publish Message into Cluster

Could do with a hand debugging this.

I have a K8s cluster of 5 containers machines, each ran with --all. They are all talking to each other, and I can create a destination and consumer and verify each host can see it.

However when trying to publish I get the following error:

Error resolving input hosts: tchannel error ErrCodeUnexpected: InternalServiceError({Message:Nil result, nil error}) dstPth="/foo/bar"

And in the logs I see:

time="2017-08-10T13:41:26Z" level=error msg="ReadPublisherOptions: No hosts returned from controller" deploymentName= destID=fa02b132-1ba2-4e75-b867-469f4f172f69 dstPth="/foo/bar" err=<nil> frntID=5e95a39c

Any help with what to look at next?

I've restarted the nodes, and I could publish and subscribe for a time, and then the cli disconnected from the inputhost and the same issue happened.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.