Git Product home page Git Product logo

Comments (10)

jprovaznik avatar jprovaznik commented on August 22, 2024

Hi Judd, I couldn't reproduce this locally, could you please paste content of /etc/resolv.conf? I suppose that "nc -v -u ip_addr 53" works right? When you do the query, anything interesting pops up in logs (journalctl)?

from openshift-on-openstack.

juddmaltin-dell avatar juddmaltin-dell commented on August 22, 2024

Yes, nc works.

I'm seeing a lot of bad domain name searches in the logs, which are including the openstacklocal domain name. Seemingly created by packstack or cloud-config.

Mar 30 10:47:18 openshift-infra.example.com dnsmasq[9850]: query[A] openshift-lb.example.com.openstacklocal from 172.24.4.13
Mar 30 10:47:18 openshift-infra.example.com dnsmasq[9850]: forwarded openshift-lb.example.com.openstacklocal to 172.24.4.13

Very interesting...

[root@openshift-infra log]# hostname                                                                                                                          [3/801]
openshift-infra.example.com
[root@openshift-infra log]# hostname --fqdn
openshift-infra.example.com
[root@openshift-infra log]# cat /etc/resolv.conf
# Generated by NetworkManager
search example.com
nameserver 192.168.0.3
nameserver 172.24.4.13
#nameserver 8.8.4.4
#nameserver 8.8.8.8
[root@openshift-infra log]# systemctl restart dnsmasq
[root@openshift-infra log]# systemctl restart dnsmasq^C
[root@openshift-infra log]# cat /etc/dnsmasq.conf
# this file is generated/overwritten by os-collect-config
strict-order
domain-needed
local=/example.com/
bind-dynamic
resolv-file=/etc/resolv.conf
log-queries
[root@openshift-infra log]# cat /etc/hosts
#
# Initialize the hosts file for dnsmasq on the infra host
#
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

172.24.4.13 openshift-infra.example.com openshift-infra
172.24.4.14 openshift-lb openshift-lb.example.com
172.24.4.15 openshift-openshift-master-1.example.com openshift-openshift-master-1 #openshift
172.24.4.16 openshift-openshift-master-0.example.com openshift-openshift-master-0 #openshift
172.24.4.17 openshift-openshift-master-2.example.com openshift-openshift-master-2 #openshift
172.24.4.18 openshift-openshift-node-w85u3o8f.example.com openshift-openshift-node-w85u3o8f #openshift
172.24.4.19 openshift-openshift-node-fp0d5341.example.com openshift-openshift-node-fp0d5341 #openshift
172.24.4.20 openshift-openshift-node-ye2291cb.example.com openshift-openshift-node-ye2291cb #openshift
172.24.4.21 openshift-openshift-node-i9gp091p.example.com openshift-openshift-node-i9gp091p #openshift

from openshift-on-openstack.

jprovaznik avatar jprovaznik commented on August 22, 2024

I think the problem is caused by presence of "nameserver 192.168.0.3" in /etc/resolv.conf which causes that dnsmasq is looping queries to itself. This nameserver should not be in the resolv.conf, any chance this was a manual change?

from openshift-on-openstack.

jprovaznik avatar jprovaznik commented on August 22, 2024

If "nameserver 172.24.4.13" is infra node's IP too, try remove this one too.

from openshift-on-openstack.

juddmaltin-dell avatar juddmaltin-dell commented on August 22, 2024

I added that nameserver entry. I removed them both.
I rebooted the infra node, and it resolved the UDP 53 access problems.
Now I have a new interface created by libvirt, 192.168.122.0/23

[root@openshift-infra ~]# grep -r 192.168.122 /etc
/etc/libvirt/qemu/networks/default.xml:  <ip address='192.168.122.1' netmask='255.255.255.0'>
/etc/libvirt/qemu/networks/default.xml:      <range start='192.168.122.2' end='192.168.122.254'/>
[root@openshift-infra ~]#
[root@openshift-infra ~]# grep -r 192.168.122 /var/log*
/var/log/anaconda/syslog:19:14:57,818 INFO dhclient: DHCPACK from 192.168.122.1 (xid=0x1eac0ae1)
/var/log/anaconda/syslog:19:14:57,831 INFO NetworkManager: <info>    address 192.168.122.121
/var/log/anaconda/syslog:19:14:57,831 INFO NetworkManager: <info>    gateway 192.168.122.1
/var/log/anaconda/syslog:19:14:57,831 INFO NetworkManager: <info>    server identifier 192.168.122.1
/var/log/anaconda/syslog:19:14:57,831 INFO NetworkManager: <info>    nameserver '192.168.122.1'
/var/log/anaconda/syslog:19:14:57,846 INFO dhclient: bound to 192.168.122.121 -- renewal in 1640 seconds.
/var/log/anaconda/journal.log:Nov 02 19:14:35 localhost dhclient[675]: DHCPOFFER from 192.168.122.1
/var/log/anaconda/journal.log:Nov 02 19:14:35 localhost dhclient[675]: DHCPACK from 192.168.122.1 (xid=0x2a903f1e)
/var/log/anaconda/journal.log:Nov 02 19:14:37 localhost dhclient[675]: bound to 192.168.122.121 -- renewal in 1630 seconds.

But fine, whatever, that doesn't seem horrible.

The heat data collection doesn't seem to have run to completion..

Mar 30 11:07:29 openshift-infra os-collect-config: 2016-03-30 11:07:29.206 10048 WARNING os_collect_config.heat [-] No auth_url configured.
Mar 30 11:07:29 openshift-infra os-collect-config: 2016-03-30 11:07:29.206 10048 WARNING os_collect_config.request [-] No metadata_url configured.
Mar 30 11:07:29 openshift-infra os-collect-config: 2016-03-30 11:07:29.206 10048 WARNING os-collect-config [-] Source [request] Unavailable.
Mar 30 11:07:29 openshift-infra os-collect-config: 2016-03-30 11:07:29.206 10048 WARNING os_collect_config.local [-] /var/lib/os-collect-config/local-data not found. Skipping
Mar 30 11:07:29 openshift-infra os-collect-config: 2016-03-30 11:07:29.206 10048 WARNING os_collect_config.local [-] No local metadata found (['/var/lib/os-collect-config/local-data'])

and on the master nodes, the DNS issues persist (even after master node reboot):

Mar 30 11:11:18 openshift-openshift-master-0 atomic-openshift-node: E0330 11:11:18.505105    9753 common.go:197] Failed to obtain ClusterNetwork: Get https://openshift-lb.example.com:8443/oapi/v1/clusternetworks/default: dial tcp: lookup openshift-lb.example.com: no such host
Mar 30 11:11:18 openshift-openshift-master-0 atomic-openshift-node: F0330 11:11:18.505138    9753 node.go:175] SDN Node failed: Get https://openshift-lb.example.com:8443/oapi/v1/clusternetworks/default: dial tcp: lookup openshift-lb.example.com: no such host

removing all nameservers from /etc/resolv.conf upsets dnsmasq a bit:

Mar 30 11:12:41 openshift-infra dnsmasq[15148]: using local addresses only for domain example.com
Mar 30 11:12:41 openshift-infra dnsmasq[15148]: no servers found in /etc/resolv.conf, will retry
Mar 30 11:12:41 openshift-infra dnsmasq[15148]: read /etc/hosts - 11 addresses

but UDP 53 is open:

[root@openshift-openshift-master-0 ~]# iptables -L | grep udp
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:domain

and also open on the infra server:

[root@openshift-infra ~]# iptables -L | grep udp
ACCEPT     udp  --  anywhere             anywhere             udp dpt:domain

from openshift-on-openstack.

juddmaltin-dell avatar juddmaltin-dell commented on August 22, 2024

os-collect-config is NOT happy.

Mar 30 13:38:39 openshift-infra os-collect-config: 2016-03-30 13:38:39.479 10048 WARNING os_collect_config.heat [-] No auth_url configured.
Mar 30 13:38:39 openshift-infra os-collect-config: 2016-03-30 13:38:39.480 10048 WARNING os_collect_config.request [-] No metadata_url configured.
Mar 30 13:38:39 openshift-infra os-collect-config: 2016-03-30 13:38:39.480 10048 WARNING os-collect-config [-] Source [request] Unavailable.
Mar 30 13:38:39 openshift-infra os-collect-config: 2016-03-30 13:38:39.480 10048 WARNING os_collect_config.local [-] /var/lib/os-collect-config/local-data not found. Skipping
Mar 30 13:38:39 openshift-infra os-collect-config: 2016-03-30 13:38:39.480 10048 WARNING os_collect_config.local [-] No local metadata found (['/var/lib/os-collect-c

from openshift-on-openstack.

jprovaznik avatar jprovaznik commented on August 22, 2024

Hi, I'll take a closer look at all the bits above later (under time pressure ATM), but couple of notes which might help:

  • no need to reboot server after changes in /etc/resolv.conf
  • it's usually good to add additional nameservers into resolv.conf (but other than its own ip addr to avoid the looping issue) - you can just pass -P dns_nameserver=x.x.x.x,y.y.y.y,... when creating stack. So then hostname resolving works for all hostnames, not only the ones served by dnsmasq
  • I think os-collect-config is actually quite happy - the above output is just warning about fetching data from various sources (these warning messages are IMO very confusing as almost everybody got tricked by it) - we use heat-api-cfn so you care only about os_collect_config.cfn - see /etc/os-collect-config.conf how settings looks like
  • os-collect-config doesn't re-run config scripts on node reboot, only on heat data change. On reboot it's neither desired nor needed - the node should jsut work after reboot (if it doesn't there is a bug in deployment setup which we should fix).
  • real issue is probably dns resolving on master node which doesn't seem to work. Can you connect to port 53 from master node? If so, what is /etc/resolv.conf content on master node, if it points to infra node. can you install on master node bind-utils package and try "dig @ip_from_resolv.conf openshift-lb.example.com"?

from openshift-on-openstack.

juddmaltin-dell avatar juddmaltin-dell commented on August 22, 2024

master node:

/etc/resolv.conf

nameserver 172.24.4.13
nameserver 8.8.4.4
nameserver 8.8.8.8

master node:

[root@openshift-openshift-master-0 ~]# dig @172.24.4.13 openshift-lb.example.com

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.24.4.13 openshift-lb.example.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
[root@openshift-openshift-master-0 ~]# dig @172.24.4.13 +tcp openshift-lb.example.com
;; Connection to 172.24.4.13#53(172.24.4.13) for openshift-lb.example.com failed: host unreachable.
[root@openshift-openshift-master-0 ~]# ping 172.24.4.13
PING 172.24.4.13 (172.24.4.13) 56(84) bytes of data.
64 bytes from 172.24.4.13: icmp_seq=1 ttl=63 time=0.208 ms
64 bytes from 172.24.4.13: icmp_seq=2 ttl=63 time=0.334 ms
^C
--- 172.24.4.13 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.208/0.271/0.334/0.063 ms
[root@openshift-openshift-master-0 ~]# ping 192.168.0.3
PING 192.168.0.3 (192.168.0.3) 56(84) bytes of data.
64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=0.482 ms
64 bytes from 192.168.0.3: icmp_seq=2 ttl=64 time=0.247 ms
^C
--- 192.168.0.3 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.247/0.364/0.482/0.119 ms
[root@openshift-openshift-master-0 ~]# iptables -L | grep udp
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:domain
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:24224
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:hpoms-dps-lstn
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:netsupport
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:10255
ACCEPT     udp  --  anywhere             anywhere             state NEW udp dpt:4789
[root@openshift-openshift-master-0 ~]#

from openshift-on-openstack.

jprovaznik avatar jprovaznik commented on August 22, 2024

Hi Judd, does this problem still remain? If so, I can paste more detailed debugging steps of tracing dns queries so we can compare with your output, otherwise I'm out of ideas :(.

from openshift-on-openstack.

jprovaznik avatar jprovaznik commented on August 22, 2024

This is inactive for a while, if the issue occurs again, please reopen.

from openshift-on-openstack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.