nuagenetworks / nuage-metroae Goto Github PK
View Code? Open in Web Editor NEWNuage Networks Metro Automation Engine
Home Page: http://devops.nuagenetworks.net
License: Apache License 2.0
Nuage Networks Metro Automation Engine
Home Page: http://devops.nuagenetworks.net
License: Apache License 2.0
Feature idea is to have an playbook that unpacks the official Nuage binaries so they can be referenced and used by the deploy playbooks.
It will take as input a directory or s3 link, and unpack the files to a particular destination directory.
It could be transformed to a role if desired, so it can be deployed against a jumphost and have src/dest variables.
Possible enhancement: Deployed VSD/VSC VMs should have autostart enabled so they start automatically after reboot of the host or a power outage.
Issue:
When deploying a new NSGV, we see that we can not connect via virsh console.
Reason:
Missing information in the nsgv.xml.j2 regarding serial connectivity.
Proposal Changes:
Add the following part in the nsgv.xml.j2 template.
<console type='pty' tty='/dev/pts/16'>
<source path='/dev/pts/16'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
This error only happens in VSD HA mode, not in standalone.
configuration in build_vars.yaml (see atachment).
build_vars.yaml.txt
Install playbook looks like this:
With the HA option you have following proxy issues on the VMs:
On VSD:
/opt/ejabberd/bin/ejabberdctl connected_users
Proxy is not connected
On Proxy:
/var/log/vns/na.log :
[02-May-2017 23:22:13.812] [LOG] Client reconnects
[02-May-2017 23:22:13.815] [LOG] Client is connected
[02-May-2017 23:22:13.816] [LOG] Client is disconnected true { Error
at Connection.onStanza (/opt/notification/node_modules/node-xmpp-core/lib/connection.js:316:21)
at StreamParser. (/opt/notification/node_modules/node-xmpp-core/lib/connection.js:213:14)
at emitOne (events.js:96:13)
at StreamParser.emit (events.js:188:7)
at SaxLtx. (/opt/notification/node_modules/node-xmpp-core/lib/stream_parser.js:56:22)
at emitOne (events.js:96:13)
at SaxLtx.emit (events.js:188:7)
at SaxLtx._handleTagOpening (/opt/notification/node_modules/ltx/lib/sax/sax_ltx.js:30:18)
at SaxLtx.write (/opt/notification/node_modules/ltx/lib/sax/sax_ltx.js:92:26)
at StreamParser.write (/opt/notification/node_modules/node-xmpp-core/lib/stream_parser.js:125:21)
stanza:
Stanza {
name: 'stream:error',
parent: null,
attrs: { 'xmlns:stream': 'http://etherx.jabber.org/streams' },
children: [ [Object] ],
nodeType: 1,
nodeName: 'error' } }
Restart services will no solve the issue
On the Elastic VM:
no Firewall rules are added
On VSDs:
Statistic collections is not activated on any vsd
During the upgrade of a VSD cluster, one of the VSDs wouldn’t come up into operational state. monit giving authentication error. Metro failed. Maybe Metro could have had some kind of error handling. After reboot, the root user could start monit, but no services would start. Nothing in install log. Had to start, then shut down services, then reboot, then it worked. Maybe find flags in the install log. Metro would have complained. If one of the VSDs didn’t come up, had to edit the ansible by hand to make it work.
Customers are often sensitive to the applications and packages that get installed on their lab systems. Any packages that get installed during an upgrade must be removed when the upgrade is complete. Care must be taken not to uninstall a package that was already present before the upgrade.
The vsc-deploy playbook provisions the configuration on the VSC's, if there are problems with the cf1:\config.cfg (and possibly bof.cfg) file the configuration on the VSC cannot be loaded and the vsc-postdeploy will fail.
Could we add a check to make sure the configuration was loaded properly?
This is an snippet from a failed boot on my VSC:
Initializing VMM
Virtual address sharing is disabled
Time from clock is TUE FEB 21 19:24:23 2017 UTC
Initial DNS resolving preference is ipv4-only
Attempting to exec primary configuration file:
'cf1:\config.cfg' ...
System Configuration
MAJOR: CLI #1009 An error occurred while processing a CLI command -
File cf1:\config.cfg, Line 11: Command "server 0.centos.pool.ntp.org" failed.
CRITICAL: CLI #1002 The system configuration is missing or incomplete because an error occurred while processing the configuration file.
Issue:
We have seen that MAC address (nsgv_mac) is a mandatory parameter in the build_vars.yml file.
Proposal:
We consider this parameter should be an optional parameter in stead of mandatory.
Example of desired change:
{% if item.nsgv_mac is defined %}
nsgv_mac: '{{ item.nsgv_mac }}'
{% endif %}
Kind Regards
Guillermo
As of this writing, we support one and only one build.yml file. If a customer wants to use the same ansible host to deploy multiple places, they need to do something like have multiple clones, one per deployment target, or multiple build.yml files that they copy over build.yml as needed. This is unwieldy.
Note that it could be that they Metro GUI will solve this for us, hiding the build.yml manipulation undere the covers.
Metro, as of v2.1.1, uses the configured hostname as the VM name on KVM. (Is there an equivalent on VMware????) Some installations may want them to be different or we could be using Metro to upgrade an installation that was done manually.
The VM name must be optional. If it is not specified, default to the hostname.
Please implement this one asset at a time, e.g. submit a PR for VSD, then a PR for VSC, etc.
Please investigate whether this is an issue on VMware.
Issue:
We have seen that is not possible to generate variables when you have only "mynsgvs" parameters.
Reason: You have used the default AND condition ansible that checks for myvnsutils AND mynsgvs parameters.
Prroposal: We suggest to change the logic for the conditional statement when filling variables for NSG and Util
Changes:
We have put an “OR” condition in stead of an “AND” in the role “build”, task “get_paths”, “VNS utility /NSGV”
Example:
Kind Regards
Guillermo
In current version, the main folder contains a very big set of playbooks which creates a certain level of complexity and confusion.
A suggestions is to re-structure this in a way that
build.yml
, install_xxx.yml
playbooks
folder is created with an internal structure covering all the high-level playbooks that refer to the roles of this repo
playbooks/nuage
- contains pure Nuage components delivered as part of software distribution (ie VSD, VSC, VRS, VCIN, VSTAT, NSGV, VNS-Util, etc.)playbooks/ci
- contains playbooks for the continuous integration labs.playbooks/openstack
- contains playbooks to deploy osc and compute nodesplaybooks/mesos
- contains playbooks to deploy mesos and associated docker hostsFeedback welcome
We now require Ansible 2.2 for full support. Update the version check appropriately.
If build.yml doesn't include a section for nsg-v, for example, nuage-unpack role still expects to find nsg-v files and errors if they are not present. We shouldn't require files that aren't used.
I think there has been a change in the workflow for deploying VSD for 4.0R6. We need to conditionally execute based on VSD versions.
In our current automated test environment, we run several tests of the build role for several kvm-based scenarios. The purpose of these tests is to make sure the variables are being processed correctly. We do not have tests of vcenter-based scenarios. Please implement tests for vcenter builds for 4.0.R8 and 5.0.1.
In current version build_vars.yml
can accomodate the same WAN/LAN bridges for every NSG defined under mynsgvs
section of the file.
It is common to have different WAN/LANs bridges on different NSGs deployed in a single sweep. Thus it is necessary to have port <-> bridge
setting in each NSGV instance.
A quick fix from @GuillermoMM was to provide the following config:
- hostname: NSGV_BRANCH_2
target_server_type: "kvm"
target_server: 10.167.62.5
bootstrap_method: zfb_external
iso_path: '/tmp/'
port1_bridge: br12
port2_bridge: br10
port3_bridge: br1nsg11
Naming was not an issue back then, maybe it will be good to express it like this (for a 6port):
port_bridges:
- br12
- br_dummy
- br1nsg11
- br_dummy
- br_dummy
- br_dummy
Metro, in v2.1.1, assumes that the VSD has one and only one network interface. When doing an upgrade from a configuration in which the VSD has added network interfaces, the additional interfaces will not exist when the upgrade completes.
We must add support in VSD upgrade that will ensure that the post-upgrade network configuration on VSD matches the pre-upgrade network configuration. If it has 2 NICs before the upgrade, it must have 2 after the upgrade.
We should change the code that supports the Jenkins jobs (in ./test/) such that we pull binary files from another location such as stratos.
In v2.1.1, when storing backup folders on ansible deployment host,the folders are stored in different /tmp/ paths for different nuage components. Having consistency with the paths/folder names would help.
Issue severity
Critical
Type
Bug
Description
When using vcenter as the server for a VSD, both the non_heat.yml and the vcenter.yml are being executed, causing the install and all tasks being executed twice.
Code reference
https://github.com/nuagenetworks/nuage-metro/blob/6390b882d013c35ae607a5e6646c992e1656d5dd/roles/vsd-deploy/tasks/main.yml#L2-L12
From @jonasvermeulen:
---------- Forwarded message ----------
From: Jonas Vermeulen [email protected]
Date: Mon, Oct 10, 2016 at 5:28 AM
Subject: Supporting different target types
To: Brian Castelli [email protected], Philippe DELLAERT [email protected]
Hi Brian,
I'm evaluating the use of metro for deploying VSD/VSC/Proxy/NSGV on top of
Unfortunately I noticed the "xxxx-deploy" playbooks and associated roles have a mixture of image manipulation tasks, image deployment tasks and inside-OS installation/configuration tasks.
As such, these roles cannot be reused when the target-type will be another hypervisor/cloud type.
My suggestion would be to use use pre-tasks with conditional includes to prepare the image and deployment, before calling the role.
Example is at
https://github.com/openstack/openstack-ansible/blob/master/playbooks/os-horizon-install.yml
Another suggestion is to
Philippe might have some more suggestions.
At the moment it is more a structural change, not really changing any of the tasks, but it would affect the way how all files are laid out of course. So just looking to get your view, and see what you think.
If you like, we can also discuss over the github Issue board so it becomes open to everyone's view.
Would be great to have fedora be a supported deployment host. Is anyone successfully using it ?
With the use of Metro in VNS deployments, it is apparent there is a need for abstracting the way how NSG(V)s are modeled and deployed.
Basically Nuage VSD uses the concept of NSG Templates
(nsgatewaytemplates) to model a group of NSGs. This includes the ports it has, the VSCs it talks to, the underlays it is connected to etc. It would also be the perfect place to define what linux bridges a virtual NSG should connect to.
As such, I propose to split up NSGV roles into
mynsgtemplates
- comprises all information for each group of NSG. As such you can define templates for NSG-BR, NSG-UBR, NSG with one/dual uplink etc etc.mynsgvs
- which can then be a list of NSGVs referring to the Metro-name or the UUID of a pre-existing template. You could also define here then what bootstrap method to use.The NSGV for AWS is already written with the concept of using an external pre-provisioned UUID, but I think the same should apply for the nsgvs.
During the upgrade in the lab, we found that we had vports configured for VMs that didn't exist. Before the upgrade, the health check reports 200 vports. After the upgrade it reported 150 vports. But it wasn't a mistake. 150 was the proper number. We need to enhance the vport health checks to complain when it detects vports that will not be present after the upgrade.
From 4.0R6 up both dockermon and lib networking plugins are supported. We need to add lib networking support for image-unpack, build and vrs-deploy roles.
Once the VSD cluster setup is alive, haproxy can load balance VSD api requests. Currently metro does not support any haproxy configuration.
Description:
We see that for customer testing, they would like to deploy NSGV with 6 ports as well.
Proposal Change:
It is possible to adapt the roles to include 6 ports?
Thanks a lot in advance,
Guillermo
Issue severity
Minor
Type
Optimisation
Description
The Elastic search package is not unpacked during nuage-unzip because it checks whether both the image package and the backup package are present.
The backup package should only be required if an upgrade is requested. During a fresh install, this package is not required.
Code reference
https://github.com/nuagenetworks/nuage-metro/blob/6390b882d013c35ae607a5e6646c992e1656d5dd/roles/nuage-unzip/tasks/main.yml#L118
Syntax errors in the build_vars.yml file can be tricky to find and fix. This issue has the following tasks:
Investigate methods for doing this check. I would like to kick off the check from within the build.yml playbook, but I'm concerned that a syntax error will prevent that from happening when the vars file is loaded.
Present options with a recommended choice
Implement and test the feature
Currently the build playbook executes the nuage-unpack role. This is unnecessary for an upgrade. The problem is that the build role depends on variables set by the nuage-unpack role. Think about combining the two roles into one and making it such that we can run build without any binary files for an upgrade operation.
Issue severity
Major
Type
Bug
Description
vcin-destroy fails in a vCenter environment because a bad check (non existing variable)
Error
fatal: [vcin01.phd.eu.nuagedemo.net]: FAILED! => {"failed": true, "msg": "The conditional check 'not vcin_vm_facts.failed' failed. The error was: error while evaluating conditional (not vcin_vm_facts.failed): 'dict object' has no attribute 'failed'\n\nThe error appears to have been in '/home/pdellaer/GitHub/nuage-metro/roles/vcin-destroy/tasks/vcenter.yml': line 17, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block:\n - name: Power off the VCIN VM\n ^ here\n"}
Code reference
https://github.com/nuagenetworks/nuage-metro/blob/6390b882d013c35ae607a5e6646c992e1656d5dd/roles/vcin-destroy/tasks/vcenter.yml#L37
Hi,
The ./metro-ansible nuage_unzip.yml is failing due to a file structure change in 5.0.1.
Below the temp fix i used with the master branch:
1--> fix the unzip for 5.0.R1:
tar xzvf Nuage-VNS-Utils-5.0.1_4.tar.gz
md5sum -c vns-util-5.0.1_4.qcow2.md5
mv * /images/5.0.R1/unziped/vns/utils/
this fixed the unzip path for the nsg file : ncpe_centos7.qcow2
2--> VNS Utility/ NSGV path file change:
edit the file:
vi /home/nuage/metro/nuage-metro/roles/build/tasks/get_paths.yml
change the following two rows:
- { subdir: "vns/utils/", pattern: "vns-util-*.qcow2" } --> util qcoq2 is not there, unzip failure !
- { subdir: "vns/", pattern: "ncpe_centos7.qcow2" } --> path was vns/nsg/
Thanx.
Niek van der Ven
nsgv-destroy role has the same play Destroy the images directory
executed twice
first time in the included nsgv_destroy_helper.yml and then in the kvm.yml
Issue severity
Major
Type
Bug/Enhancement
Description
When installing 5.0.1, the HA deployment fails because the requirement on the pass-phrase-less SSH has changed users (no longer root, but vsd user requires pass-phrase-less SSH)
Error
[root@vsd02 ~]# cat /opt/vsd/logs/install.log
Info: no migration files found
/opt/vsd/vsd-deploy.sh -1 vsd01.phd.eu.nuagedemo.net -t 2 -x xmpp.phd.eu.nuagedemo.net -y
Note: Forwarding request to 'systemctl is-enabled ntpd.service'.
enabled
synchronised to NTP server (10.189.1.254) at stratum 7
time correct to within 8130 ms
polling server every 64 s
25-05-17 12:26:15 ERROR: fail pass-phrase-less ssh as vsd to [email protected]
Error: fail /opt/vsd/vsd-deploy.sh -1 vsd01.phd.eu.nuagedemo.net -t 2 -x xmpp.phd.eu.nuagedemo.net -y
Issue severity
Major
Type
Bug
Description
vstat-destroy fails in a vCenter environment because a bad check (non existing variable)
Error
fatal: [ela01.phd.eu.nuagedemo.net]: FAILED! => {"failed": true, "msg": "The conditional check 'not vstat_vm_facts.failed' failed. The error was: error while evaluating conditional (not vstat_vm_facts.failed): 'dict object' has no attribu
te 'failed'\n\nThe error appears to have been in '/home/pdellaer/GitHub/nuage-metro/roles/vstat-destroy/tasks/vcenter.yml': line 17, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n- block:\n - name: Power off the Stats VM\n ^ here\n"}
Code reference
https://github.com/nuagenetworks/nuage-metro/blob/6390b882d013c35ae607a5e6646c992e1656d5dd/roles/vstat-destroy/tasks/vcenter.yml#L37
Hi team,
in the nsgv-predeploy
role (in its kvm part) we do not track if an NSGV already defined with the same hostname before running the plays.
This leads to the following situation:
Suppose I provision in VSD 2 new NSGVs which happen to have the same hostnames, as the ones already defined on the hypervisor.
Currently playbook will go through each step (except for defining new VM, where we do when: inventory_hostname not in virt_vms.list_vms
) resulting in the 0
return code.
So an end user wont see the real reason behind his NSGs stay in non-bootstrap state.
I would suggest to stop the playbook immediately if one tries to define NSGVs with the hostnames which already defined.
Re-usage of variables in build.yml
could eliminate fat-finger related errors.
One example of this could be made on dns_domain
variable which is defined in the end of build.yml
.
If dns_doman
is defined explicitely, we can re-use it in hostnames variables of different components.
For example, consider the myvsds:hostname definition:
# current version
myvsds:
- { hostname: vsd1.example.com,
# <cropped>
# with dns_domain re-use
myvsds:
- { hostname: "vsd1.{{ dns_domain }}",
# <cropped>
# dns_domain defined explicitely
dns_domain: example.com
Same steps could apply for different variables used in build.yml
like vsd_fqdn
used in myvscs definition, etc.
Current UPGRADE.md file is missing required folders/paths to be present when running build_upgrade.yml.
The following paths are needed while performing vsd,vsc,vstat upgrade/rollbacks
Currently, there is only one global var to install dockermon or not. This change would make it more flexible to choose which VRS node to install the dockermon.
From #162
We only see one thing when running the playbooks:
TASK [vns-deploy : Get output of 'show vswitch-controller xmpp-server detail'] *****************************************************
[WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{ groups['vnsutils'] is
defined and groups['vnsutils'] }}
fatal: [metro-vsc1.nuage.stgt]: FAILED! => {"failed": true, "msg": "The conditional check 'xmpp_detail.stdout[0].find('Functional') != -1' failed. The error was: error while evaluating conditional (xmpp_detail.stdout[0].find('Functional') != -1): 'dict object' has no attribute 'stdout'"}
The playbook fails at this point, but a manual check reveals that the XMPP session is up, so we continued the playbooks after vns-deploy, i.e. with vns-postdeploy and from there everything works like a charm.
Perhaps it's already included, but I did not see a script step to check that the VSC can reach the NTP servers and is sync'ed with them
Current VSD installation happens in sequence one node after the other.
Issue:
We see that when you try to deploy a new NSGV, we will always trigger the ZFB.yml.
Proposal:
We consider that this should be optional since in some cases the zero factor bootstrap needs to be done by an external script or perhaps there is no need for ZFB and a manual activation is desired.
Proposal Changes:
1.- Options in build_vars for ZFB = true/false
2.- In case ZFB=true, we will need a flag that indicates if needs to be done by METRO =True/False or third party.
Thanks a lot in advance.
Guillermo
We spin up stand-alone stats VMs only. Need to support cluster.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.