skynetservices / skydns Goto Github PK
View Code? Open in Web Editor NEWThis project forked from miekg/skydns2
DNS service discovery for etcd
License: MIT License
This project forked from miekg/skydns2
DNS service discovery for etcd
License: MIT License
To what number of hosts can SkyDNS scale?
Document Etcd watches? How does this influence scalabilty?
After looking for some time at number of orchestration and service discovery tools I noticed something.
Containers don't know the host IP-addresses. They know the IP-address of the container on the docker0 bridge and the IP-address of the host on the bridge, but nothing else.
Even though you will need to find a way to register the IP-address of the host with SkyDNS when you have containers on multiple hosts that you want to talk to each other.
It also makes things more complicated when you want to publish the public IP-address of a loadbalancer at a public authoritive nameserver.
My idea was, maybe SkyDNS or SkyDock can help with that.
What if at startup, or regularly, we collect the IP-addresses of the host and store it as a local value in SkyDNS. I know of at least 2:
If the Docker host is connected directly to the Internet both would be the same.
This would be a local value in a similar fashion to id.server or version.bind of the CHAOS-class:
$ dig +short @open.nlnetlabs.nl. id.server txt chaos
"dicht.nlnetlabs.nl"
With an authoritive or recursive DNS-server they are not part of the normal DNS-zones or -database and are specific for that DNS-server and pretty much static.
But in this case I think it would be better if you can query it with a simple A and AAAA queries without resorting to TXT records and the CHAOS class, for example like so:
hostip.dockerhost.skydns.local.
publicip.dockerhost.skydns.local.
Now when a container starts it can query for it's own address. You don't have to add an, possibly static, argument to the commandline or set an environment variable.
I think that kind of information is clearly host-specific and should not be stored in etcd.
It is just a small part of a solution, but I think it could be useful.
When starting up, or when it runs, keep re-trying connections to ectd with exponential backoff, with a fix limit. This makes starting etcd and skydns in parallel easier
I kept getting failures to load the configuration from etcd. I modified the code to add an extra print:
config.log.Info(err)
And this led to an error message (below) to help me figure out what is wrong. Perhaps similar verbose logging should be added?
501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
Need to double check, but I think the dns.Client use with Exchange can be made globally, this drops the mem usage somewhat.
Perhaps this isn't an issue with skydns or the Dockerfile, but just a user error, but I can't for the life of me get skydns to connect to etcd in another running container on the same host machine. For instance, under boot2docker:
Run etcd:
$ docker run --rm -it -p 4001:4001 coreos/etcd
[etcd] Jun 13 23:43:10.461 WARNING | Using the directory 7fbc5734a046.etcd as the etcd curation directory because a directory was not specified.
[etcd] Jun 13 23:43:10.461 INFO | 7fbc5734a046 is starting a new cluster
[etcd] Jun 13 23:43:10.466 INFO | etcd server [name 7fbc5734a046, listen on :4001, advertised url http://127.0.0.1:4001]
[etcd] Jun 13 23:43:10.466 INFO | peer server [name 7fbc5734a046, listen on :7001, advertised url http://127.0.0.1:7001]
[etcd] Jun 13 23:43:10.466 INFO | 7fbc5734a046 starting in peer mode
[etcd] Jun 13 23:43:10.466 INFO | 7fbc5734a046: state changed from 'initialized' to 'follower'.
[etcd] Jun 13 23:43:10.466 INFO | 7fbc5734a046: state changed from 'follower' to 'leader'.
[etcd] Jun 13 23:43:10.466 INFO | 7fbc5734a046: leader changed from '' to '7fbc5734a046'.
Run skydns
$ boot2docker ip
The VM's Host only interface IP address is: 192.168.59.103
$ curl http://192.168.59.103:4001/version
etcd 0.4.3+git
$ curl -XPUT http://192.168.59.103:4001/v2/keys/skydns/config -d value='{"nameservers": ["8.8.8.8:53","8.8.4.4:53"]}'
{"action":"set","node":{"key":"/skydns/config","value":"{\"nameservers\": [\"8.8.8.8:53\",\"8.8.4.4:53\"]}","modifiedIndex":4,"createdIndex":4},"prevNode":{"key":"/skydns/config","value":"","modifiedIndex":3,"createdIndex":3}}
$ docker run --rm -it -p 5300:53 -p 5300:53/udp skydns -machines=http://192.168.59.103:4001
[skydns] Jun 13 23:43:53.109 INFO | falling back to default configuration, could not read from etcd:501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
[skydns] Jun 13 23:43:53.109 INFO | ready for queries
And this should confirm that any other container should be able to reach the etcd container:
$ docker run --rm -it busybox wget http://192.168.59.103:4001/version
Connecting to 192.168.59.103:4001 (192.168.59.103:4001)
version 100% |*************************************************************| 14 0:00:00 ETA
Make the DNSSEC cache limit the amount of signatures it will cache.
Skynet1 give back different prio bases on the region. We don't have the concept of a region anymore, so we need to think what we are going to do.
Right now, weight is computed automatically, it would be nice to allow for a weight-factor to be set, so we can use that when calculating it.
I noticed that wildcard queries behave differently for underscore-prefixed records. Example:
root@06d9a36fb32b:/# dig +short SRV _zookeeper._tcp.test.cluster.local
10 33 2181 zookeeper-1.test.cluster.local.
10 33 2181 zookeeper-2.test.cluster.local.
10 33 2181 zookeeper-3.test.cluster.local.
root@06d9a36fb32b:/# dig +short SRV _tcp.test.cluster.local
root@06d9a36fb32b:/# dig +short SRV zookeeper.tcp.test.cluster.local
10 33 2181 zookeeper-1.test.cluster.local.
10 33 2181 zookeeper-2.test.cluster.local.
10 33 2181 zookeeper-3.test.cluster.local.
root@06d9a36fb32b:/# dig +short SRV tcp.test.cluster.local
10 33 2181 zookeeper-1.test.cluster.local.
10 33 2181 zookeeper-2.test.cluster.local.
10 33 2181 zookeeper-3.test.cluster.local.
root@06d9a36fb32b:/# dig +short zookeeper-1.test.cluster.local
e11ccacf6d174348a1578f045886f982.hosts.cluster.local.
172.31.46.183
root@06d9a36fb32b:/# dig +short _zookeeper._tcp.test.cluster.local
root@06d9a36fb32b:/# dig +short 1._zookeeper._tcp.test.cluster.local
zookeeper-1.test.cluster.local.
e11ccacf6d174348a1578f045886f982.hosts.cluster.local.
172.31.46.183
Why the difference in behavior of dig +short SRV _tcp.test.cluster.local
vs dig +short SRV tcp.test.cluster.local
?
Does SkyDNS support setting multiple IPs behind the same record? I can see from the readme that it supports round robin response for A and AAAA records but no mention of how to set them with multiple hosts behind them.
I've tried...
curl -XPUT http://127.0.0.1:4001/v2/keys/skydns/local/skydns/west/production/rails/2 -d value='[{"host":"10.9.3.1"}, {"host":"10.9.3.2"}]'
But I get the below error...
INFO | failed to parse json: json: cannot unmarshal array into Go value of type main.Service
Is this feature available or is my JSON structure incorrect?
Thanks for sharing your work.
There is no inherit reason why skydns could not host multiple domain, skydns.local, sykydns.test etc. Only the search path in etcd would change.
Do we actually need to write timeout?
I'm running skydns2 on an instance with multiple ips (10.0.10.10 and 10.0.10.11) and it's configured to listen on 0.0.0.0:53. Regardless of whether I query it at .10 or .11, it responds from the .11 address (which the client either drops or never receives when it's querying the .10 address) -- here's the tcpdump. I'm new to Go and can't figure out if it's possible to get the destination address of a received UDP packet, so please forgive me if this isn't currently possible to implement.
18:50:23.529981 IP 10.11.0.6.49372 > 10.0.10.10.domain: 54270+ A? google.com. (28)
18:50:23.530212 IP 10.0.10.11.59249 > 10.0.0.2.domain: 54270+ A? google.com. (28)
18:50:23.531308 IP 10.0.0.2.domain > 10.0.10.11.59249: 54270 11/0/0 A 173.194.121.6, A 173.194.121.7, A 173.194.121.8, A 173.194.121.9, A 173.194.121.14, A 173.194.121.0, A 173.194.121.1, A 173.194.121.2, A 173.194.121.3, A 173.194.121.4, A 173.194.121.5 (204)
18:50:23.531525 IP 10.0.10.11.domain > 10.11.0.6.49372: 54270 11/0/0 A 173.194.121.6, A 173.194.121.7, A 173.194.121.8, A 173.194.121.9, A 173.194.121.14, A 173.194.121.0, A 173.194.121.1, A 173.194.121.2, A 173.194.121.3, A 173.194.121.4, A 173.194.121.5 (314)
Thanks!
It'd be great if skydns supported systemd socket activation. For example, if no -addr
is specified, then it would check to see if systemd has supplied any sockets, otherwise go with default 127.0.0.1:53
.
I looked at go-systemd unfortunately it currently only supports TCP sockets, however there is an open issue for UDP support.
Using wildcards * in the middle of the query (as could be done in SkyDNS version 1), is not supported anymore.
What's the reasoning behind it?
Returns like this might also benefit from looking up the web.google.nl. from within SkyDNS:
a1.east.skydns.local. 3600 IN SRV 10 100 80 web.google.nl.
I am using dnsperf to benchmark SkyDNS 2, and I am running into some weird performance issues I wanted to bring up. There seems to be some kind of leak.
I am running SkyDNS with the following config:
etcdctl set skydns/config '{"dns_addr":"10.8.171.85:5353","domain":"skydns.local","ttl":30,"nameservers": ["208.67.222.222","208.67.220.220"],"rcache":10,"rcache-ttl":10}'
I am running dnsperf from another machine on my network. I have one record in SkyDNS with no set ttl (so it should be using the default 30 second) and my inputfile is looking for its A record.
If I start SkyDNS and then immediately test, the results are really encouraging.
Statistics:
Queries sent: 10000
Queries completed: 10000 (100.00%)
Queries lost: 0 (0.00%)
Response codes: NOERROR 10000 (100.00%)
Average packet size: request 39, response 55
Run time (s): 0.257366
Queries per second: 38855.171235
Average Latency (s): 0.002519 (min 0.000762, max 0.017845)
Latency StdDev (s): 0.001086
I can run this and get similar results over and over again. I ran it 30 or 40 times within a minute, and the results were all very similar. Other times I ran it once every ten seconds, it doesn't matter. After 60-70 seconds though, performance goes in the toilet:
Statistics:
Queries sent: 10000
Queries completed: 10000 (100.00%)
Queries lost: 0 (0.00%)
Response codes: NOERROR 10000 (100.00%)
Average packet size: request 39, response 55
Run time (s): 5.400238
Queries per second: 1851.770237
Average Latency (s): 0.053890 (min 0.005920, max 0.254507)
Latency StdDev (s): 0.018682
Performance never recovers until I stop the process, start it again, and we are super fast. I have replicated this pattern over 15 times, so I'm pretty confident they are accurate.
It might make sense to delegate subdomains to different skydns servers. This would mean allowing store NS records inside skydns.
Might be even make more sense if there is local caching inside skydns, as this would relax the per-server memory requirements.
The machines in the etcd raft cluster are autodiscovered, this required a mutex around this config in SkyDNS. For the rest of the config we can do the same, but I think it is easer to allow SkyDNS to react to the HUP signal and re-read the config on that.
README is out of date
Is the ETCD_MACHINES list just used at SkyDNS initial start-up, followed by a gathering of the full list of etcd machines (/v2/machines) and then SkyDNS uses any/all of them?
OR
Is the ETCD_MACHINES list include the only nodes that SkyDNS contact for ever more?
A bit of background on this. We have a design decision where our remote ETCD machines are on DHCP and the individual nodes may come and go in the future.
Keep up the great work!
consul.io is an alternative for Etcd. It has a REST API.
It can be very useful for stacks who use it.
Right now it is assumed the requester's buffer is large enough. We should explicitly check for this in the code,
we only speak http with etcd at the moment
All "non-standard" qtype will return nxdomain, instead of the sometimes more appropriate nodata. Not sure what harm this does.
Add instrumentation to get a profile and optimize from there.
PASS
coverage: 57.4% of statements
ok github.com/miekg/skydns 1.112s
should be higher
Probably a dumb question but we need to create an internal CNAME (within an SRV) for an external A/CNAME record (happens to be an AWS ELB).
E.g. service1.blah.internal -> service1-elb1-2062473951.eu-west-1.elb.amazonaws.com
Suspect this is recursiveness. Is it possible for SkyDNS to allow this?
It would be great if a Dockerfile was provided to build SkyDNS containers :)
When SkyDNS queries etcd it creates a slice of Service structs, these structs are then converted in SRV, A or AAAA RRs. This is wasteful, a better idea is to drop Service altogether and create SRV records from get get go, then we a SRV needs to be converted to an A records we can reuse some of the memory.
Guestimate of speed increase: factor of 2. Together with yesterday's speedup this should bring the performance up to 6000 qps.
Need to get locking out of the fast path.
We use Hosted Graphite which requires an api key be prepended to every metric. Could a METRIC_PREFIX environmental variable be supported?
Skydns should accept an username and group and drop privileges accordingly after getting a listen socket on :53.
Right now, inflight is only used for RRSIGs, this should be extended to outgoing queries and the cache.
This is more of a question than an issue.
I'm using SkyDNS for service registration and discovery. DNS-SD is an existing standard (with an RFC!) for one method for using DNS as a service discovery and registration mechanism.
Would it be useful to adopt DNS-SD as the proposed/preferred way to use SkyDNS for service registration and discovery?
DNS-SD may be too heavy-weight for the scenarios that SkyDNS is intended to serve, but maybe there's useful stuff there (I'm just starting to digest DNS-SD myself).
Also, maybe you (@miekg) have already studied and understood DNS-SD and integrated as much of it into SkyDNS as you think is appropriate. Hence why this is a question.
The recommended way to set up an etcd cluster is via the discovery api, Skydns2 should support the same and use a discovery api token to populate ETCD_MACHINES.
https://coreos.com/docs/cluster-management/setup/etcd-cluster-discovery/
The roundtrip to ectd is probably a performance drag. A local cache will make this work better. This cache can be shared with the NSEC3 and signature caching we already do.
Getting rid of command line args was nice. But for stuff like TLS we do need them. Also for quick and dirty running SkyDNS it mght make sense to bring back some flags. Most notably I would think are: domain, dns and maybe nameservers.
Flags then take precedence over stuff found in etcd.
Use a watcher in etcd to get notified when the config changes and just reload?
Currently SkyDNS assumes all records returned are in-zone so it will happily sign the records. With CNAME processing this assumption might change.
I setup the skydns following the guide. It works fine to resolve a hostname to an IP address.
But it can not resolve an IP address to a hostname.
Do I need some additional steps to let skydns support it? Or this is a limitation on skydns?
enviroments:
I run skydns on CentOS 6.4.
br//lan
Use this as a factor when setting the weight in SRV responses.
Pls, enable travis builds. It'll allow to avoid things as #26
We don't know the ordering of names in etcd, so DNSSEC with NSEC won't work. Make it work with NSEC3 and whitelies.
Hey everybody,
We are running a protected etcd instance (both SSL, and client certificates as well), and I cannot get SkyDNS to connect to it.
I am starting skydns with the following parameters:
ETCD_MACHINES=https://134.119.11.176:4001 ETCD_TLSKEY=/home/etcd/thisHost.key ETCD_TLSPEM=/home/etcd/thisHost.crt /bin/skydns
... and it aborts with the following error:
2014/07/08 18:37:12 failure to connect: Require both cert and key path
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x50 pc=0x4c37d2]
goroutine 1 [running]:
runtime.panic(0x6fe3a0, 0xac85e8)
/usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
github.com/coreos/go-etcd/etcd.(*Client).SyncCluster(0x0, 0x17)
/home/sebastian/go/src/github.com/coreos/go-etcd/etcd/client.go:284 +0x22
main.NewClient(0xc21000a800, 0x1, 0x1, 0x0)
/home/sebastian/go/src/github.com/skynetservices/skydns/client.go:33 +0x2b3
main.main()
/home/sebastian/go/src/github.com/skynetservices/skydns/main.go:73 +0x51
goroutine 3 [runnable]:
github.com/stathat/go.(*Reporter).processReports(0xc210048480)
/home/sebastian/go/src/github.com/stathat/go/stathat.go:364
created by github.com/stathat/go.NewReporter
/home/sebastian/go/src/github.com/stathat/go/stathat.go:93 +0x117
goroutine 4 [runnable]:
github.com/stathat/go.(*Reporter).processReports(0xc210048480)
/home/sebastian/go/src/github.com/stathat/go/stathat.go:364
created by github.com/stathat/go.NewReporter
/home/sebastian/go/src/github.com/stathat/go/stathat.go:93 +0x117
... continues like this ...
I am sure the files exist etc, can I somehow help to debug this? I currently do not really know where to start looking.
Greets & thanks, Sebastian
passing nameservers
command line argument surrounded by double quotes from a non-shell environment (e.g. systemd unit file)
ExecStart=skydns -nameservers="8.8.8.8:53"
Results in the following error when a forwarded request is attempted:
ERROR | failure to forward request "dial udp: unknown port udp/53\""
Passing the argument without double quotes fixes the issue. Some validation checks are needed here.
If I query a SRV record with 4 results behind it using DNS::Resolv in Ruby (1.8.7, or 2.1.2) SkyDNS responds as expected.
However, if I try and resolve a record returning 5 or more results, the Ruby DNS::Resolv lookup fails. It's also worth noting that testing the Ruby DNS::Resolv with _xmpp-server._tcp.gmail.com records works correctly (i.e. I don't think this is a Ruby DNS::Resolv issue, I think it's SkyDNS).
Also worth noting is that records with more than 5 results hosted externally (but forwarded by SkyDNS) - e.g. _xmpp-server._tcp.gmail.com - also work as expected.
Here is some test code that could be used...
require 'resolv'
require 'pp'
puts "Test using google served records..."
resolver = Resolv::DNS.new(:nameserver => ['127.0.0.1'],
:search => ['skydns.local'],
:ndots => 1)
hosts = resolver.getresources('_xmpp-server._tcp.gmail.com', Resolv::DNS::Resource::IN::SRV)
pp hosts
puts "Test using SkyDNS served records..."
resolver = Resolv::DNS.new(:nameserver => ['127.0.0.1'],
:search => ['skydns.local'],
:ndots => 1)
hosts = resolver.getresources('something.with.five.or.more.records.skydns.local', Resolv::DNS::Resource::IN::SRV)
pp hosts
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.