skynetservices / skydns Goto Github PK

View Code? Open in Web Editor NEW

This project forked from miekg/skydns2

2.2K 2.2K 308.0 985 KB

DNS service discovery for etcd

License: MIT License

Go 99.86% Dockerfile 0.14%

skydns's People

Contributors

Stargazers

Watchers

Forkers

mrwilson crosbymichael jonboulle ahjohannessen ismasan shepmaster njhartwell torstehu lk4d4 nhulac cheruisibesares lcheng61 2325407504 lennie ifeatu cloyne-archives seanclerkin skurfuerst skion emmanuel porjo spotify yemaocheng korya mischief direktspeed mleventi lyft tonicmuroq peerlibrary bketelsen th3architect sstarcher sagar-khanna jeremyot sheldonh dminkovsky yaronr ddysher adonis2014 pweil- shenyanjun zvelo neigor di-stars tsaikd omriiluz smarterclayton ljb-2000 vonwenm andrewrothstein pombredanne sudokeys tn2-solutions chancez pdwinkel twnel picorb jbguerraz mnjstwins erimatnor chenyf pavetok trainchou zofuthan qmsk lvlv suzuken vinceveve artfulcoder mrxiaoz ryanharper timoseven docker-tools dieterreuter rockerboo spaiter bluekarl wealthworks mbrukman artbin weekface yanana lilwulin netf no2key cloudxtreme chriscao10000 arypurnomoz shenjinxi etopian yonglehou hw-qiaolei vinh84 zhang0137 xiicloud charlieflowers zhangpengshan ali-mosavian pwnall

skydns's Issues

Get scalability figures

To what number of hosts can SkyDNS scale?
Document Etcd watches? How does this influence scalabilty?

Feature request: support for local data

After looking for some time at number of orchestration and service discovery tools I noticed something.

Containers don't know the host IP-addresses. They know the IP-address of the container on the docker0 bridge and the IP-address of the host on the bridge, but nothing else.

Even though you will need to find a way to register the IP-address of the host with SkyDNS when you have containers on multiple hosts that you want to talk to each other.

It also makes things more complicated when you want to publish the public IP-address of a loadbalancer at a public authoritive nameserver.

My idea was, maybe SkyDNS or SkyDock can help with that.

What if at startup, or regularly, we collect the IP-addresses of the host and store it as a local value in SkyDNS. I know of at least 2:

The IP-address of the host on the tenant network (you can have multiple hosts behind a NAT)
the IP-address of the host on the public network (or IP-address of the NAT)

If the Docker host is connected directly to the Internet both would be the same.

This would be a local value in a similar fashion to id.server or version.bind of the CHAOS-class:

$ dig +short @open.nlnetlabs.nl. id.server txt chaos
"dicht.nlnetlabs.nl"

With an authoritive or recursive DNS-server they are not part of the normal DNS-zones or -database and are specific for that DNS-server and pretty much static.

But in this case I think it would be better if you can query it with a simple A and AAAA queries without resorting to TXT records and the CHAOS class, for example like so:

hostip.dockerhost.skydns.local.
publicip.dockerhost.skydns.local.

Now when a container starts it can query for it's own address. You don't have to add an, possibly static, argument to the commandline or set an environment variable.

I think that kind of information is clearly host-specific and should not be stored in etcd.

It is just a small part of a solution, but I think it could be useful.

exponential backoff when etcd is not available

When starting up, or when it runs, keep re-trying connections to ectd with exponential backoff, with a fix limit. This makes starting etcd and skydns in parallel easier

Add a more informative error message when skydns fails to read configuration

I kept getting failures to load the configuration from etcd. I modified the code to add an extra print:

config.log.Info(err)

And this led to an error message (below) to help me figure out what is wrong. Perhaps similar verbose logging should be added?

501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

dns.Client is readonly and can be shared

Need to double check, but I think the dns.Client use with Exchange can be made globally, this drops the mem usage somewhat.

Docker container won't connect to etcd in another container on same host

Perhaps this isn't an issue with skydns or the Dockerfile, but just a user error, but I can't for the life of me get skydns to connect to etcd in another running container on the same host machine. For instance, under boot2docker:

Run etcd:

$ docker run --rm -it -p 4001:4001 coreos/etcd
[etcd] Jun 13 23:43:10.461 WARNING   | Using the directory 7fbc5734a046.etcd as the etcd curation directory because a directory was not specified.
[etcd] Jun 13 23:43:10.461 INFO      | 7fbc5734a046 is starting a new cluster
[etcd] Jun 13 23:43:10.466 INFO      | etcd server [name 7fbc5734a046, listen on :4001, advertised url http://127.0.0.1:4001]
[etcd] Jun 13 23:43:10.466 INFO      | peer server [name 7fbc5734a046, listen on :7001, advertised url http://127.0.0.1:7001]
[etcd] Jun 13 23:43:10.466 INFO      | 7fbc5734a046 starting in peer mode
[etcd] Jun 13 23:43:10.466 INFO      | 7fbc5734a046: state changed from 'initialized' to 'follower'.
[etcd] Jun 13 23:43:10.466 INFO      | 7fbc5734a046: state changed from 'follower' to 'leader'.
[etcd] Jun 13 23:43:10.466 INFO      | 7fbc5734a046: leader changed from '' to '7fbc5734a046'.

Run skydns

$ boot2docker ip
The VM's Host only interface IP address is: 192.168.59.103

$ curl http://192.168.59.103:4001/version
etcd 0.4.3+git

$ curl -XPUT http://192.168.59.103:4001/v2/keys/skydns/config -d value='{"nameservers": ["8.8.8.8:53","8.8.4.4:53"]}'
{"action":"set","node":{"key":"/skydns/config","value":"{\"nameservers\": [\"8.8.8.8:53\",\"8.8.4.4:53\"]}","modifiedIndex":4,"createdIndex":4},"prevNode":{"key":"/skydns/config","value":"","modifiedIndex":3,"createdIndex":3}}

$ docker run --rm -it -p 5300:53 -p 5300:53/udp skydns -machines=http://192.168.59.103:4001
[skydns] Jun 13 23:43:53.109 INFO      | falling back to default configuration, could not read from etcd:501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
[skydns] Jun 13 23:43:53.109 INFO      | ready for queries

And this should confirm that any other container should be able to reach the etcd container:

$ docker run --rm -it busybox wget http://192.168.59.103:4001/version
Connecting to 192.168.59.103:4001 (192.168.59.103:4001)
version              100% |*************************************************************|    14   0:00:00 ETA

limit the DNSSEC cache

Make the DNSSEC cache limit the amount of signatures it will cache.

Fix SRV.Prio

Skynet1 give back different prio bases on the region. We don't have the concept of a region anymore, so we need to think what we are going to do.

Allow weight to be set

Right now, weight is computed automatically, it would be nice to allow for a weight-factor to be set, so we can use that when calculating it.

Surprising behavior of underscore-prefixed records

I noticed that wildcard queries behave differently for underscore-prefixed records. Example:

root@06d9a36fb32b:/# dig +short SRV _zookeeper._tcp.test.cluster.local
10 33 2181 zookeeper-1.test.cluster.local.
10 33 2181 zookeeper-2.test.cluster.local.
10 33 2181 zookeeper-3.test.cluster.local.
root@06d9a36fb32b:/# dig +short SRV _tcp.test.cluster.local
root@06d9a36fb32b:/# dig +short SRV zookeeper.tcp.test.cluster.local
10 33 2181 zookeeper-1.test.cluster.local.
10 33 2181 zookeeper-2.test.cluster.local.
10 33 2181 zookeeper-3.test.cluster.local.
root@06d9a36fb32b:/# dig +short SRV tcp.test.cluster.local
10 33 2181 zookeeper-1.test.cluster.local.
10 33 2181 zookeeper-2.test.cluster.local.
10 33 2181 zookeeper-3.test.cluster.local.
root@06d9a36fb32b:/# dig +short zookeeper-1.test.cluster.local
e11ccacf6d174348a1578f045886f982.hosts.cluster.local.
172.31.46.183
root@06d9a36fb32b:/# dig +short _zookeeper._tcp.test.cluster.local
root@06d9a36fb32b:/# dig +short 1._zookeeper._tcp.test.cluster.local
zookeeper-1.test.cluster.local.
e11ccacf6d174348a1578f045886f982.hosts.cluster.local.
172.31.46.183

Why the difference in behavior of dig +short SRV _tcp.test.cluster.local vs dig +short SRV tcp.test.cluster.local?

Multiple hosts behind one record

Does SkyDNS support setting multiple IPs behind the same record? I can see from the readme that it supports round robin response for A and AAAA records but no mention of how to set them with multiple hosts behind them.

I've tried...

curl -XPUT http://127.0.0.1:4001/v2/keys/skydns/local/skydns/west/production/rails/2 -d value='[{"host":"10.9.3.1"}, {"host":"10.9.3.2"}]'

But I get the below error...

INFO      | failed to parse json: json: cannot unmarshal array into Go value of type main.Service

Is this feature available or is my JSON structure incorrect?

Thanks for sharing your work.

multiple domains

There is no inherit reason why skydns could not host multiple domain, skydns.local, sykydns.test etc. Only the search path in etcd would change.

write_timeout ?

Do we actually need to write timeout?

Server not responding from address on which it was queried

I'm running skydns2 on an instance with multiple ips (10.0.10.10 and 10.0.10.11) and it's configured to listen on 0.0.0.0:53. Regardless of whether I query it at .10 or .11, it responds from the .11 address (which the client either drops or never receives when it's querying the .10 address) -- here's the tcpdump. I'm new to Go and can't figure out if it's possible to get the destination address of a received UDP packet, so please forgive me if this isn't currently possible to implement.

18:50:23.529981 IP 10.11.0.6.49372 > 10.0.10.10.domain: 54270+ A? google.com. (28)
18:50:23.530212 IP 10.0.10.11.59249 > 10.0.0.2.domain: 54270+ A? google.com. (28)
18:50:23.531308 IP 10.0.0.2.domain > 10.0.10.11.59249: 54270 11/0/0 A 173.194.121.6, A 173.194.121.7, A 173.194.121.8, A 173.194.121.9, A 173.194.121.14, A 173.194.121.0, A 173.194.121.1, A 173.194.121.2, A 173.194.121.3, A 173.194.121.4, A 173.194.121.5 (204)
18:50:23.531525 IP 10.0.10.11.domain > 10.11.0.6.49372: 54270 11/0/0 A 173.194.121.6, A 173.194.121.7, A 173.194.121.8, A 173.194.121.9, A 173.194.121.14, A 173.194.121.0, A 173.194.121.1, A 173.194.121.2, A 173.194.121.3, A 173.194.121.4, A 173.194.121.5 (314)

Thanks!

systemd

It'd be great if skydns supported systemd socket activation. For example, if no -addr is specified, then it would check to see if systemd has supplied any sockets, otherwise go with default 127.0.0.1:53.

I looked at go-systemd unfortunately it currently only supports TCP sockets, however there is an open issue for UDP support.

Wildcards

Using wildcards * in the middle of the query (as could be done in SkyDNS version 1), is not supported anymore.

What's the reasoning behind it?

Resolve external targets in CNAMEs to IP addresses too?

Returns like this might also benefit from looking up the web.google.nl. from within SkyDNS:

a1.east.skydns.local. 3600 IN SRV 10 100 80 web.google.nl.

Performance

I am using dnsperf to benchmark SkyDNS 2, and I am running into some weird performance issues I wanted to bring up. There seems to be some kind of leak.

I am running SkyDNS with the following config:

etcdctl set skydns/config '{"dns_addr":"10.8.171.85:5353","domain":"skydns.local","ttl":30,"nameservers": ["208.67.222.222","208.67.220.220"],"rcache":10,"rcache-ttl":10}'

I am running dnsperf from another machine on my network. I have one record in SkyDNS with no set ttl (so it should be using the default 30 second) and my inputfile is looking for its A record.

If I start SkyDNS and then immediately test, the results are really encouraging.

Statistics:

  Queries sent:         10000
  Queries completed:    10000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 10000 (100.00%)
  Average packet size:  request 39, response 55
  Run time (s):         0.257366
  Queries per second:   38855.171235

  Average Latency (s):  0.002519 (min 0.000762, max 0.017845)
  Latency StdDev (s):   0.001086

I can run this and get similar results over and over again. I ran it 30 or 40 times within a minute, and the results were all very similar. Other times I ran it once every ten seconds, it doesn't matter. After 60-70 seconds though, performance goes in the toilet:

Statistics:

  Queries sent:         10000
  Queries completed:    10000 (100.00%)
  Queries lost:         0 (0.00%)

  Response codes:       NOERROR 10000 (100.00%)
  Average packet size:  request 39, response 55
  Run time (s):         5.400238
  Queries per second:   1851.770237

  Average Latency (s):  0.053890 (min 0.005920, max 0.254507)
  Latency StdDev (s):   0.018682

Performance never recovers until I stop the process, start it again, and we are super fast. I have replicated this pattern over 15 times, so I'm pretty confident they are accurate.

Allow for zone delegation

It might make sense to delegate subdomains to different skydns servers. This would mean allowing store NS records inside skydns.

Might be even make more sense if there is local caching inside skydns, as this would relax the per-server memory requirements.

kill -HUP will reread configuration

The machines in the etcd raft cluster are autodiscovered, this required a mutex around this config in SkyDNS. For the rest of the config we can do the same, but I think it is easer to allow SkyDNS to react to the HUP signal and re-read the config on that.

Fix README

README is out of date

Initial config question

Is the ETCD_MACHINES list just used at SkyDNS initial start-up, followed by a gathering of the full list of etcd machines (/v2/machines) and then SkyDNS uses any/all of them?
OR
Is the ETCD_MACHINES list include the only nodes that SkyDNS contact for ever more?

A bit of background on this. We have a design decision where our remote ETCD machines are on DHCP and the individual nodes may come and go in the future.

Keep up the great work!

Add a consul backend

consul.io is an alternative for Etcd. It has a REST API.
It can be very useful for stacks who use it.

Check/test issues like message too big and truncation

Right now it is assumed the requester's buffer is large enough. We should explicitly check for this in the code,

https etcd client does not work

we only speak http with etcd at the moment

query with TXT for known to exist data return NODATA

All "non-standard" qtype will return nxdomain, instead of the sometimes more appropriate nodata. Not sure what harm this does.

Make SkyDNS faster

Add instrumentation to get a profile and optimize from there.

increase test coverage

PASS
coverage: 57.4% of statements
ok github.com/miekg/skydns 1.112s

should be higher

CNAME to external CNAME

Probably a dumb question but we need to create an internal CNAME (within an SRV) for an external A/CNAME record (happens to be an AWS ELB).

E.g. service1.blah.internal -> service1-elb1-2062473951.eu-west-1.elb.amazonaws.com

Suspect this is recursiveness. Is it possible for SkyDNS to allow this?

Add Dockerfile

It would be great if a Dockerfile was provided to build SkyDNS containers :)

Drop Service in server.go

When SkyDNS queries etcd it creates a slice of Service structs, these structs are then converted in SRV, A or AAAA RRs. This is wasteful, a better idea is to drop Service altogether and create SRV records from get get go, then we a SRV needs to be converted to an A records we can reuse some of the memory.

Guestimate of speed increase: factor of 2. Together with yesterday's speedup this should bring the performance up to 6000 qps.

cache.go is a mutex hell

Need to get locking out of the fast path.

Support a configurable metric prefix

We use Hosted Graphite which requires an api key be prepended to every metric. Could a METRIC_PREFIX environmental variable be supported?

Drop permissions after listening

Skydns should accept an username and group and drop privileges accordingly after getting a listen socket on :53.

use inflight for outgoing queries and caching too

Right now, inflight is only used for RRSIGs, this should be extended to outgoing queries and the cache.

Memory consumption

root@msk1:~# top | grep skydns
 1922 root      20   0 1664m 1,0g 3268 S     1  6,6  33:05.06 skydns

Skydns uses more than 1g memory on my server. Is it normal?

Make it easy to support DNS-SD

This is more of a question than an issue.

I'm using SkyDNS for service registration and discovery. DNS-SD is an existing standard (with an RFC!) for one method for using DNS as a service discovery and registration mechanism.

Would it be useful to adopt DNS-SD as the proposed/preferred way to use SkyDNS for service registration and discovery?

DNS-SD may be too heavy-weight for the scenarios that SkyDNS is intended to serve, but maybe there's useful stuff there (I'm just starting to digest DNS-SD myself).

Also, maybe you (@miekg) have already studied and understood DNS-SD and integrated as much of it into SkyDNS as you think is appropriate. Hence why this is a question.

Support etcd's discovery api

The recommended way to set up an etcd cluster is via the discovery api, Skydns2 should support the same and use a discovery api token to populate ETCD_MACHINES.

https://coreos.com/docs/cluster-management/setup/etcd-cluster-discovery/

Reverse DNS

See: crosbymichael/skydock#61

Local etcd cache

The roundtrip to ectd is probably a performance drag. A local cache will make this work better. This cache can be shared with the NSEC3 and signature caching we already do.

command line args

Getting rid of command line args was nice. But for stuff like TLS we do need them. Also for quick and dirty running SkyDNS it mght make sense to bring back some flags. Most notably I would think are: domain, dns and maybe nameservers.

Flags then take precedence over stuff found in etcd.

Reload configuration

Use a watcher in etcd to get notified when the config changes and just reload?

only sign inzone records

Currently SkyDNS assumes all records returned are in-zone so it will happily sign the records. With CNAME processing this assumption might change.

resolve an IP address to hostname

I setup the skydns following the guide. It works fine to resolve a hostname to an IP address.
But it can not resolve an IP address to a hostname.
Do I need some additional steps to let skydns support it? Or this is a limitation on skydns?

enviroments:
I run skydns on CentOS 6.4.

br//lan

Allow services to set Weight

Use this as a factor when setting the weight in SRV responses.

Enable travis

Pls, enable travis builds. It'll allow to avoid things as #26

Fix DNSSEC

We don't know the ordering of names in etcd, so DNSSEC with NSEC won't work. Make it work with NSEC3 and whitelies.

connection to SSL-protected etcd not working

Hey everybody,

We are running a protected etcd instance (both SSL, and client certificates as well), and I cannot get SkyDNS to connect to it.

I am starting skydns with the following parameters:

ETCD_MACHINES=https://134.119.11.176:4001 ETCD_TLSKEY=/home/etcd/thisHost.key ETCD_TLSPEM=/home/etcd/thisHost.crt /bin/skydns

... and it aborts with the following error:

2014/07/08 18:37:12 failure to connect: Require both cert and key path
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x50 pc=0x4c37d2]

goroutine 1 [running]:
runtime.panic(0x6fe3a0, 0xac85e8)
    /usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
github.com/coreos/go-etcd/etcd.(*Client).SyncCluster(0x0, 0x17)
    /home/sebastian/go/src/github.com/coreos/go-etcd/etcd/client.go:284 +0x22
main.NewClient(0xc21000a800, 0x1, 0x1, 0x0)
    /home/sebastian/go/src/github.com/skynetservices/skydns/client.go:33 +0x2b3
main.main()
    /home/sebastian/go/src/github.com/skynetservices/skydns/main.go:73 +0x51

goroutine 3 [runnable]:
github.com/stathat/go.(*Reporter).processReports(0xc210048480)
    /home/sebastian/go/src/github.com/stathat/go/stathat.go:364
created by github.com/stathat/go.NewReporter
    /home/sebastian/go/src/github.com/stathat/go/stathat.go:93 +0x117

goroutine 4 [runnable]:
github.com/stathat/go.(*Reporter).processReports(0xc210048480)
    /home/sebastian/go/src/github.com/stathat/go/stathat.go:364
created by github.com/stathat/go.NewReporter
    /home/sebastian/go/src/github.com/stathat/go/stathat.go:93 +0x117

... continues like this ...

I am sure the files exist etc, can I somehow help to debug this? I currently do not really know where to start looking.

Greets & thanks, Sebastian

nameservers quotes

passing nameservers command line argument surrounded by double quotes from a non-shell environment (e.g. systemd unit file)

ExecStart=skydns -nameservers="8.8.8.8:53"

Results in the following error when a forwarded request is attempted:

ERROR     | failure to forward request "dial udp: unknown port udp/53\""

Passing the argument without double quotes fixes the issue. Some validation checks are needed here.

SRV record lookups returning more than 4 results are not queryable by Ruby DNS resolver

If I query a SRV record with 4 results behind it using DNS::Resolv in Ruby (1.8.7, or 2.1.2) SkyDNS responds as expected.

However, if I try and resolve a record returning 5 or more results, the Ruby DNS::Resolv lookup fails. It's also worth noting that testing the Ruby DNS::Resolv with _xmpp-server._tcp.gmail.com records works correctly (i.e. I don't think this is a Ruby DNS::Resolv issue, I think it's SkyDNS).

Also worth noting is that records with more than 5 results hosted externally (but forwarded by SkyDNS) - e.g. _xmpp-server._tcp.gmail.com - also work as expected.

Here is some test code that could be used...

require 'resolv'
require 'pp'

puts "Test using google served records..."
resolver = Resolv::DNS.new(:nameserver => ['127.0.0.1'],
                :search => ['skydns.local'],
                :ndots => 1)
hosts = resolver.getresources('_xmpp-server._tcp.gmail.com', Resolv::DNS::Resource::IN::SRV)
pp hosts

puts "Test using SkyDNS served records..."
resolver = Resolv::DNS.new(:nameserver => ['127.0.0.1'],
                :search => ['skydns.local'],
                :ndots => 1)
hosts = resolver.getresources('something.with.five.or.more.records.skydns.local', Resolv::DNS::Resource::IN::SRV)
pp hosts