contiv-experimental / demo Goto Github PK
View Code? Open in Web Editor NEWEasy cut demos to try contiv [DEPRECATED]
License: Other
Easy cut demos to try contiv [DEPRECATED]
License: Other
The k8s demo needs an automatic restart script or make target. I'm told there's one in the swarm demo already.
Trying to run the installer the python to read the cfg.yml fails because the python script can not include/find the yaml module. Suggest installing pyyaml as part of the installer script.
This can be accomplished with pip install pyyaml, but only if user has python-pip installed. Alternative is to apt-get pyyaml.
Please see output from below:
contiv@contivB1:$ bash net_demo_installer$
Parsing config file...
Traceback (most recent call last):
File "./genInventoryFile.py", line 4, in
import yaml
ImportError: No module named yaml
Fatal: error parsing ./cfg.yml and generating inventory file
contiv@contivB1:
Please modify the net_demo_installer script and k8s setup script to take contiv_network_version through command line / environment variable. Right we hard code version string in out scripts which is not ideal way to do it.
By default, contrib scripts assume aci mode of operation for k8scluster. But do not setup netplugin in aci fabric mode.
[admin@k8master pods]$ netctl global info
Fabric mode: default <-- not aci in aci-fabric-mode
Vlan Range: 1-4094
Vxlan range: 1-10000
admin05@nxmonit-05-135:$ netctl network ls$
ERRO[0000] Get http://netmaster:9999/api/networks/: dial tcp: lookup netmaster on 171.70.168.183:53: no such host
admin05@nxmonit-05-135:
I have brought the contiv cluster manually due to the OVS issue with ./net_demo_installer.
Not sure if some of the dependencies are not met.
admin05@nxmonit-05-135:~$ sudo docker version
Client:
Version: 1.11.1
API version: 1.23
Go version: go1.5.4
Git commit: 5604cbe
Built: Tue Apr 26 23:38:55 2016
OS/Arch: linux/amd64
Server:
Version: 1.11.1
API version: 1.23
Go version: go1.5.4
Git commit: 5604cbe
Built: Tue Apr 26 23:38:55 2016
OS/Arch: linux/amd64
admin05@nxmonit-05-135:~$
admin05@nxmonit-05-135:$ etcdctl cluster-health$
member 493c117890e44f30 is healthy: got healthy result from http://172.23.145.135:2379
member 70448b100ff5839a is healthy: got healthy result from http://172.23.145.136:2379
member 70559519761b575d is healthy: got healthy result from http://172.23.145.134:2379
cluster is healthy
admin05@nxmonit-05-135:
right now the demo installer run commands like apt-get which make it run only on ubuntu hosts, the installer should be able to run on centos as well.
poc-net and poc-epg are setup by default with setup_k8s_cluster.sh, but not created after clean_restart.sh.
URL should be https://repos.fedorapeople.org/repos/openstack/openstack-kilo/rdo-release-kilo-2.noarch.rpm
TASK [contiv : get openstack kilo rpm] *****************************************
fatal: [k8node-02]: FAILED! => {"changed": false, "dest": "/tmp/rdo-release-kilo-1.noarch.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "https://repos.fedorapeople.org/repos/openstack/openstack-kilo/rdo-release-kilo-1.noarch.rpm"}
fatal: [k8node-01]: FAILED! => {"changed": false, "dest": "/tmp/rdo-release-kilo-1.noarch.rpm", "failed": true, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "state": "absent", "status_code": 404, "url": "https://repos.fedorapeople.org/repos/openstack/openstack-kilo/rdo-release-kilo-1.noarch.rpm"}
Hi @ALL,
while running net_demo_installer script I am getting an error @task [docker | add docker's public key for CS-engine (debian)](see end of output).
Is there a solution availabel?
Thank you in advance
Sebastian
labor@ubuntu15-1:~$ ./net_demo_installer
Using version: v0.1-07-14-2016.07-06-17.UTC
W: Fehlschlag beim Holen von http://ppa.launchpad.net/ansible/ansible/ubuntu/dists/wily/main/binary-i386/Packages 403 Forbidden
E: Einige Indexdateien konnten nicht heruntergeladen werden. Sie wurden ignoriert oder alte an ihrer Stelle benutzt.
sudoExec apt-get install wget git build-essential python-dev software-properties-common -y
sudo -E apt-get install wget git build-essential python-dev software-properties-common -y
Paketlisten werden gelesen... Fertig
Abhängigkeitsbaum wird aufgebaut.
Statusinformationen werden eingelesen.... Fertig
build-essential ist schon die neueste Version.
python-dev ist schon die neueste Version.
git ist schon die neueste Version.
software-properties-common ist schon die neueste Version.
wget ist schon die neueste Version.
0 aktualisiert, 0 neu installiert, 0 zu entfernen und 0 nicht aktualisiert.
sudoExec apt-add-repository -y ppa:ansible/ansible
sudo -E apt-add-repository -y ppa:ansible/ansible
gpg: Schlüsselbund /tmp/tmpjrlt0wyy/secring.gpg' erstellt gpg: Schlüsselbund
/tmp/tmpjrlt0wyy/pubring.gpg' erstellt
gpg: Schlüssel 7BB9C367 von hkp-Server keyserver.ubuntu.com anfordern
gpg: /tmp/tmpjrlt0wyy/trustdb.gpg: trust-db erzeugt
gpg: Schlüssel 7BB9C367: Öffentlicher Schlüssel "Launchpad PPA for Ansible, Inc." importiert
gpg: Anzahl insgesamt bearbeiteter Schlüssel: 1
gpg: importiert: 1 (RSA: 1)
OK
sudoExec apt-get install -y ansible
sudo -E apt-get install -y ansible
Paketlisten werden gelesen... Fertig
Abhängigkeitsbaum wird aufgebaut.
Statusinformationen werden eingelesen.... Fertig
ansible ist schon die neueste Version.
0 aktualisiert, 0 neu installiert, 0 zu entfernen und 0 nicht aktualisiert.
[[ Ubuntu =~ ^CentOS ]]
++ which ansible
'[' /usr/bin/ansible == '' ']'
set +x
Parsing config file...
==== Contiv Netplugin Demo Installer ====
Netplugin Cluster will be set up on the following servers in Standalone mode:
10.10.0.164
Ready to proceed(y/n)? y
[netplugin-node]
Setting up services on nodes
PLAY [devtest] ****************************************************************
skipping: no hosts matched
PLAY [volplugin-test] *********************************************************
skipping: no hosts matched
PLAY [cluster-node] ***********************************************************
skipping: no hosts matched
PLAY [cluster-control] ********************************************************
skipping: no hosts matched
PLAY [service-master] *********************************************************
skipping: no hosts matched
PLAY [service-worker] *********************************************************
skipping: no hosts matched
PLAY [netplugin-node] *********************************************************
GATHERING FACTS ***************************************************************
ok: [node1]
TASK: [base | upgrade system (debian)] ****************************************
ok: [node1]
TASK: [base | install base packages (debian)] *********************************
ok: [node1] => (item=ntp,unzip,bzip2,curl,python-software-properties,bash-completion,python-selinux,e2fsprogs,openssh-server)
TASK: [base | install epel release package (redhat)] **************************
skipping: [node1]
TASK: [base | install/upgrade base packages (redhat)] *************************
skipping: [node1]
TASK: [base | install and start ntp] ******************************************
skipping: [node1]
TASK: [docker | check docker version] *****************************************
changed: [node1]
TASK: [docker | create docker daemon's config directory] **********************
ok: [node1]
TASK: [docker | setup docker daemon's environment] ****************************
ok: [node1]
TASK: [docker | add docker's public key for CS-engine (debian)] ***************
failed: [node1] => {"failed": true}
msg: Failed to download key at https://sks-keyservers.net/pks/lookup?op=get&search=0xee6d536cf7dc86e2d7d56f59a178ac6c6238f52e: Request failed: <urlopen error EOF occurred in violation of protocol (_ssl.c:590)>
FATAL: all hosts have already failed -- aborting
PLAY RECAP ********************************************************************
to retry, use: --limit @/home/labor/site.retry
node1 : ok=6 changed=1 unreachable=0 failed=1
I am running the script on Ubuntu 15.10
TASK [base : upgrade system (debian)] ******************************************
ok: [node1]
fatal: [node3]: FAILED! => {"changed": false, "failed": true, "msg": "Could not fetch updated apt files"}
fatal: [node2]: FAILED! => {"changed": false, "failed": true, "msg": "Could not fetch updated apt files"}
error log :
TASK [include_vars] ************************************************************
ok: [node1] => (item=contiv_network)
ok: [node2] => (item=contiv_network)
ok: [node3] => (item=contiv_network)
ok: [node2] => (item=contiv_storage)
ok: [node1] => (item=contiv_storage)
ok: [node2] => (item=swarm)
ok: [node1] => (item=swarm)
ok: [node3] => (item=contiv_storage)
ok: [node2] => (item=ucp)
ok: [node1] => (item=ucp)
ok: [node3] => (item=swarm)
ok: [node1] => (item=docker)
ok: [node2] => (item=docker)
ok: [node3] => (item=ucp)
ok: [node1] => (item=etcd)
ok: [node3] => (item=docker)
ok: [node2] => (item=etcd)
ok: [node3] => (item=etcd)
TASK [include] *****************************************************************
included: /home/admin/ansible/roles/ucarp/tasks/cleanup.yml for node1, node2, node3
fatal: [node1]: FAILED! => {"failed": true, "reason": "'item' is undefined"}
...ignoring
fatal: [node2]: FAILED! => {"failed": true, "reason": "'item' is undefined"}
...ignoring
fatal: [node3]: FAILED! => {"failed": true, "reason": "'item' is undefined"}
...ignoring
included: /home/admin/ansible/roles/contiv_storage/tasks/cleanup.yml for node1, node2, node3
included: /home/admin/ansible/roles/swarm/tasks/cleanup.yml for node1, node2, node3
included: /home/admin/ansible/roles/ucp/tasks/cleanup.yml for node1, node2, node3
included: /home/admin/ansible/roles/etcd/tasks/cleanup.yml for node1, node2, node3
included: /home/admin/ansible/roles/nfs/tasks/cleanup.yml for node1, node2, node3
included: /home/admin/ansible/roles/docker/tasks/cleanup.yml for node1, node2, node3
NO MORE HOSTS LEFT *************************************************************
to retry, use: --limit @./ansible/cleanup.retry
PLAY RECAP *********************************************************************
node1 : ok=9 changed=0 unreachable=0 failed=1
node2 : ok=9 changed=0 unreachable=0 failed=1
node3 : ok=9 changed=0 unreachable=0 failed=1
Both issues were resolved once I went ahead and implemented in passwordless sudo.
Here is the ansible bug I filed : ansible/ansible#18475
Need to add to k8s demo instructions README:
Make sure ifconfig is installed (sudo yum install net-tools)
Names suggested in cluster_defs.json.README (k8master and k8node-XX) are NOT optional (names are hardcoded in verification step).
I had run the script and everything was up. My VMs shut down for some reason. Once I brought them up, docker swarm and netmaster were down. I tried running the script again. I am consistently running into this error
TASK [docker : ensure docker is started] ***************************************
fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.\n"}
Currently clean_restart.sh is used for upgrade of contiv binaries. clean_restart.sh erases all contiv configuration - networks, epgs etc, and so all kube pods need to be recreated. Will be good to separate out upgrade/clean functionality such that upgrade retains contiv config.
Current script brings up Docker Swarm + Contiv cluster of machines / VMs. We run netplugin and netmaster through binaries.
Change it and make sure it runs as containers.
This causes issue while running docker-py
admin06-18:~$ sudo docker -H :2375 version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:20:08 UTC 2015
OS/Arch: linux/amd64
Server:
Version: swarm/1.2.0
API version: 1.22
Go version: go1.5.4
Git commit: a6c1f14
Built: Wed Apr 13 05:58:31 UTC 2016
OS/Arch: linux/amd64
admin@06-18:~$
Rerunning prepare.sh and setup_k8s_cluster.sh on a k8s setup that is already up, causes cluster communication failure since netmaster dns entry is not present in /etc/hosts.
At the end of install, the installer needs to provide a note on how the DOCKER_HOST should be set.
The previous installer provided this info and it is necessary for a user who has no idea how the docker cluster is set up by ansible.
After a successful install and running netctl to create policy, rules and group. Traffic is not being affected by the rules as expected.
So after running the instructions from docks.contiv.io to try and reproduce the same effect blocking the traffic. It hasn’t been successful as all traffic is passing as normal and ignoring any applied rule.
I believe contiv might have an issue or the instructions have changed for setting up rules.
In current script, it fetches aci-gw container via ansible playbook with latest tag. Add support to specify version so that we can specify specific aci version in the script and it will pull that container with that tag from docker hub.
with Centos 7.x version , steps mentioned in the net_demo_installer script, installs ansible 1.9.4 version. Because of which this setup does not succeed. Need to install ansible 2.0.x using pip.
Current Steps to install ansible :
if [[ ${OS_TYPE} =~ ^CentOS ]]; then
sudoExec yum install -y epel-release epel-testing git wget
sudoExec yum install -y ansible
docker_version="1.10.3"
fi
Is this expected. When we delete tenant from Contiv, it doesn't delete the tenant from ACI.
We are running older version in those scripts. If anyone wants to make this demo work, I think they should be using new version.
Currently any use of VPCs even single legged, can cause the end points to receive mac addresses, but not IPs.
Currently k8sdemo setup scripts work on all nodes of the cluster. Will be good to have a script which can setup a list of nodes and add to existing cluster.
Add -s to net_demo_installer script. So that it should not ask for any manual intervention during the installation process.
When deploying multiple ACI tenants, with a shared subnet assigned to them all, each tenant is unaware of other tenants assignments within that subnet.
it appears the /etc/etcd/etcd.conf file isn't being properly configured for start up.
Made the following modifications:
ETCD_INITIAL_CLUSTER=svlngen4-fab2-container1=http://netmaster:2380
to:
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:4001"
ETCD_INITIAL_CLUSTER=svlngen4-fab2-container1=http://svlngen4-fab2-container1.cisco.com:2380
TASK [contiv_network : set aci mode] *******************************************
fatal: [node2]: FAILED! => {"changed": true, "cmd": "contivctl network global set --fabric-mode aci", "delta": "0:00:00.088297", "end": "2016-06-23 11:34:16.300682", "failed": true, "rc": 3, "start": "2016-06-23 11:34:16.212385", "stderr": "time="2016-06-23T11:34:16-07:00" level=error msg="Resource with id: \"global\" already exists [github.com/contiv/netplugin/netmaster/resources.(*StateResourceManager).DefineResource stateresourcemanager.go 127]\n" ", "stdout": "", "stdout_lines": [], "warnings": []}
When you change contiv_network_version in net_demo_installer script to v0.1-05-22-2016.17-46-35.UTC , It does not give you successfully message.
Working version :
contiv_network_version="v0.1-03-16-2016.13-43-59.UTC"
docker_version="1.9.1"
Non working versions:
contiv_network_version="v0.1-05-22-2016.17-46-35.UTC"
docker_version="1.11.1"
Error message:
TASK [contiv_network : set aci mode] *******************************************
fatal: [node1]: FAILED! => {"changed": true, "cmd": "contivctl net global set --fabric-mode aci", "delta": "0:00:00.013386", "end": "2016-05-23 14:34:26.545144", "failed": true, "rc": 3, "start": "2016-05-23 14:34:26.531758", "stderr": "time="2016-05-23T14:34:26-07:00" level=error msg="Invalid fabric mode\n" ", "stdout": "", "stdout_lines": [], "warnings": []}
fatal: [node2]: FAILED! => {"changed": true, "cmd": "contivctl net global set --fabric-mode aci", "delta": "0:00:00.010618", "end": "2016-05-23 14:34:26.547567", "failed": true, "rc": 3, "start": "2016-05-23 14:34:26.536949", "stderr": "time="2016-05-23T14:34:26-07:00" level=error msg="Invalid fabric mode\n" ", "stdout": "", "stdout_lines": [], "warnings": []}
to retry, use: --limit @./ansible/site.retry
Submitted a fix, simple missing closing quote.
31578d3
A few dependencies that should probably be checked for in the Ansible Script:
Install fails due to a lack of python 2 on hosts.
Install fails due to a lack of unzip on hosts.
It seems that there has to be a way to eliminate the need for a dedicated interface for management traffic.
This is the follow up from contiv/ansible#138
When bringing up a new cluster on the same fabric, Contiv doesn't pull down existing tenants on the ACI Fabric.
Current version in net_demo_installer script is :
v0.1-03-16-2016.13-43-59.UTC
which is old one.
Reference issue : contiv/ansible#259 and contiv/ansible#264
epel-testing to epel-release repo transition wont work now. ansible 2.1.1.1 version gets installed and then our ansible scripts will fail because of this error :
TASK [include] *****************************************************************
included: /home/admin/ansible/roles/ucarp/tasks/cleanup.yml for node1, node2
fatal: [node1]: FAILED! => {"failed": true, "reason": "'item' is undefined"}
...ignoring
fatal: [node2]: FAILED! => {"failed": true, "reason": "'item' is undefined"}
Ran the k8s demo scripts. Noticed there are two entries for netmaster in the /etc/hosts file, one for the node I'm on and one for a different node. I assume the other node had been installed as master for a previous demo.
There are other duplicate entries in the hosts file as well.
Restart a netplugin k8s node. After restart:
a) netplugin service is not started
b) firewall rules are not saved
After manual restart of netplugin service, below error is seen (possibly due to iptables rules not saved)
Jun 1 06:58:32 contiv-k8-11 netplugin: time="Jun 1 06:58:32.303980899" level=error msg="Failed to add endpoint &{EndpointID:41.1.1.3:default EndpointType:internal EndpointGroup:1 IpAddr:41.1.1.3 IpMask:255.255.255.255 Vrf:default MacAddrStr:02:02:29:01:01:03 Vlan:1 Vni:0 OriginatorIp:10.193.246.18 PortNo:4 Timestamp:2016-06-01 06:58:27.29486
5438 -0700 PDT EndpointGroupVlan:1} to master &{HostAddr:10.193.246.8 HostPort:9001}. Err: Could not connect to server"
Need to update documentation for k8s setup, listing some items:
The result of the parsed yaml file always puts the numerically higher IP address as node 1 instead of the following the order set in the yaml file. For example if the yaml file has ips configured at 74 and 78 as node 1 and 2 respectively the parsed result ends up being 78 and 74 as node 1 and 2 respectively.
If you try to setup k8s cluster after running cleanup_machines.sh, below issues are seen:
a) prepare.sh fails saying etcd cannot be uninstalled
b) Setup script fails saying netmaster, netplugin and aci-gw services cannot be started because service does not exist - systemd needs reload after copying the .service files which have been deleted by cleanup_machines.sh
b) docker service is stopped, and not restarted by the setup scripts on master node
When an interface is not specified for a node the script does not derive an interface for it and leave the variable blank. This results in failures late with ansible configuration.
I am not sure how a later version of OVS got installed on one of the nodes. But even after running net_demo_installer -c/net_demo_installer -r it does not work.
Logs:
TASK [contiv_network : install ovs-common (debian)] ****************************
fatal: [node1]: FAILED! => {"changed": false, "failed": true, "msg": "A later version is already installed"}
admin04@monit-04-176:~$ sudo ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.4.0
Compiled Oct 16 2015 09:22:33
OpenFlow versions 0x1:0x4
Working nodes:
admin04@monit-04-175:~$ sudo ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.3.1
Compiled Jun 12 2015 00:09:03
OpenFlow versions 0x1:0x4
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.