sunet / cnaas-nms Goto Github PK
View Code? Open in Web Editor NEWCampus Network as-a-Service - Network Management System (Campus network automation software)
License: Other
Campus Network as-a-Service - Network Management System (Campus network automation software)
License: Other
Is your feature request related to a problem? Please describe.
We need the ability to run IPv6 in linknets. Currently it is not possible to add IPv4 addresses in linknets.
Describe the solution you'd like
It needs to be possible to add these settings to linknets:
Describe alternatives you've considered
We've considered deriving IPv6 address from the IPv4 address. But this is not optimal for several reasons. Most of all it is not possible if we want to go for a IPv6 only underlay.
Describe the bug
When running a repository refresh for setting, we get this error message.
{
"status": "error",
"message": "Error in repository: local variable 'e' referenced before assignment"
}
Using the logs we traced it to settings.py line 255 (in check_settings_syntax). We believe it is caused by the nested try-except using the same variable name e
. The outer try-except contains a for loop, and inside the for loop is the second try-except
In python, the variable assigned to the exception (e
in this case) is deleted after exiting the try-except statement. So if the inner try-except triggers, it will define its exception to e
and delete it after exiting its local scope. Now e
does not exist. So in the next iteration of the loop, the error will occur when e
is referenced on line 255.
as of writing this line 255 is the below code:
if len(e.errors()) == 2 and num == 1 and error['type'] == 'type_error.none.allowed':
To Reproduce
Im not sure how to give concrete steps, as you might need a similar setting_fields.py override file and settings.py repo (explained more at the bottom of the report)
Steps to reproduce the behavior:
alias curlJ curl -k -s -H "Authorization: Bearer $JWT_AUTH_TOKEN" -H "Content-Type: application/json"
curlJ https://hostname/api/v1.0/repository/settings -d '{"action": "refresh"}' -X PUT | jq
Expected behavior
Return message from the API should give the real reason why the settings refresh did not work, i.e. what was wrong with the settings/settings_fields.
Screenshots
Not a screenshot, but the relevant lines from the log
2021-11-15 14:50:23,726 DEBG 'uwsgi' stdout output:
[2021-11-15 14:50:23,725] INFO in git: Trying to acquire lock for devices to run refresh repo: 64
2021-11-15 14:50:23,726 DEBG 'uwsgi' stdout output:
2021-11-15 14:50:24,061 DEBG 'uwsgi' stdout output:
[2021-11-15 14:50:24,061] DEBUG in git: Clearing redis-lru cache for settings
2021-11-15 14:50:24,089 DEBG 'uwsgi' stdout output:
[2021-11-15 14:50:24,088] DEBUG in settings: unhashable type: 'dict'
2021-11-15 14:50:24,089 DEBG 'uwsgi' stdout output:
[2021-11-15 14:50:24,089] ERROR in git: Exception while scheduling job for refresh repo: local variable 'e' referenced before assignment
Traceback (most recent call last):
File "/opt/cnaas/venv/lib/python3.7/site-packages/redis_lru/lru.py", line 46, in inner
return self[key]
File "/opt/cnaas/venv/lib/python3.7/site-packages/redis_lru/lru.py", line 71, in __getitem__
raise KeyError()
KeyError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./cnaas_nms/db/settings.py", line 247, in check_settings_syntax
ret_dict = f_root(**settings_dict).dict()
File "pydantic/main.py", line 362, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 4 validation errors for f_root
vrfs
none is not an allowed value (type=type_error.none.not_allowed)
vxlans -> ztp -> dns_servers_6
field required (type=value_error.missing)
vxlans -> ansatt -> dns_servers_6
value is not a valid list (type=type_error.list)
slaac_dns_servers
value is not a valid list (type=type_error.list)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./cnaas_nms/db/git.py", line 88, in refresh_repo
result = _refresh_repo_task(repo_type)
File "./cnaas_nms/db/git.py", line 170, in _refresh_repo_task
get_settings()
File "/opt/cnaas/venv/lib/python3.7/site-packages/redis_lru/lru.py", line 48, in inner
result = func(*args, **kwargs)
File "./cnaas_nms/db/settings.py", line 613, in get_settings
verified_settings = check_settings_syntax(settings, settings_origin)
File "./cnaas_nms/db/settings.py", line 255, in check_settings_syntax
if len(e.errors()) == 2 and num == 1 and error['type'] == 'type_error.none.allowed':
UnboundLocalError: local variable 'e' referenced before assignment
2021-11-15 14:50:24,092 DEBG 'uwsgi' stdout output:
[2021-11-15 14:50:24,092] INFO in git: Releasing lock for devices from refresh repo job: 64
2021-11-15 14:50:24,156 DEBG 'uwsgi' stdout output:
[2021-11-15 14:50:24,156] INFO in app: User: cnaas, Method: PUT, Status: 500, URL: https://hostname/api/v1.0/repository/settings, JSON: {'action': 'refresh'}
2021-11-15 14:50:24,160 DEBG 'uwsgi' stdout output:
[pid: 17|app: 0|req: 5/5] 172.30.0.1 () {40 vars in 813 bytes} [Mon Nov 15 14:50:23 2021] PUT /api/v1.0/repository/settings => generated 103 bytes in 444 msecs (HTTP/1.1 500) 4 headers in 198 bytes (3 switches on core 999)
Desktop (please complete the following information):
Additional context
This happened while overriding settings_fields.py, so its likely that there was something wrong with this file that caused exceptions to occur in the first place. The settings repo also contain a lot of changes that were not tested, so likely there are issues there as well, reflected by the logs containing several errors with the settings. This error occured while trying to test the settings_fields override file with our settings repository and figure out what is wrong, so I cant say for certain if both are broken or not.
I think this local variable error will only occur if there is something wrong with either the settings or the settings_fields. If there are no errors, the exceptions wont happen in the first place and there wont be a chance for the e
variable to be referenced.
** Proposed change **
Make it so the inner try-except block uses a different variable name for the exception than the outer block.
Describe the bug
SQLalchemy QueuePool limit reached
To Reproduce
Steps to reproduce the behavior:
Expected behavior
No exception should be raised, just slow down
Outputs
File "./cnaas_nms/confpush/sync_devices.py", line 339, in push_sync_device
dev: Device = session.query(Device).filter(Device.hostname == hostname).one()
...
sqlalchemy.exc.TimeoutError: QueuePool limit of size 50 overflow 0 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/13/3o7r)
Environment:
Additional context
Quite big job db?
Is your feature request related to a problem? Please describe.
From the dashboard view you can see the number of unsynchronized devices, but there is no way to sync only these devices.
Describe the solution you'd like
It would be nice if there was a way to synchronize just these devices without having to navigate to "groups" and then find a relevant group or groups to sync. Maybe there could be a "Sync" buttom next to the unsynchronized devices from the dashboard view.
Additional context
Is your feature request related to a problem? Please describe.
We are now at version 3.7. We would like to go to the most recent version possible.
Describe the solution you'd like
Prefer version 3.12. Else 3.11 would also be good.
Is your feature request related to a problem? Please describe.
We use the package fcntl which is Linux based. This creates a problem for making the Dockers more lightweight and for people developing on a not-Linux machine. fcntl locks specific files. It's only used in the scheduler.py.
Describe the solution you'd like
We could replace the entire scheduler with the package apscheduler.
Other solutions retaining the scheduler
We will use a different packages. Online I found the following suggestions: portalocker, waitress. We can also check if the file locking necessary or if we would prefer to solve it in a different way.
NAPALM -> pyez -> ncclient dependency chain seems to support python 3.12 3.11 since recently
napalm-automation/napalm#2020
Juniper/py-junos-eznc#1276
Is your feature request related to a problem? Please describe.
We have had issues regarding the session.py module. The module automatically runs code for connecting to the database.
So when we want to test just a function (a unit test if you will), and this module imports from session.py then the program will crash
if there is no db to connect to. This also of course happens if the module you test imports another module which imports session.py, so this can be quite frustrating.
Describe the solution you'd like
We want the session.py to not automatically attempt to connect to the database when imported. The simplest solution would be to just put the code in a function, and call the function when its needed. As far as i can see, the only code that references any of the varibles created is the sqla_session function. So creating a function, and calling the function from sqla_session.py is a solution.
Is your feature request related to a problem? Please describe.
I want to configure LACP with two links to servers or other things connected to my access switches.
Describe the solution you'd like
I need a way to define which access ports should be in the same LACP. A simple integer ID should be enough, if it's defined then use the ID for LACP config. If it's undefined don't configure LACP. If it has the special value of -1 then configure a MLAG based on the port number of the switch.
Describe alternatives you've considered
Match something in the interface description to generate LACP config
Additional context
At least two customers wanted this functionality
os_version for some Cisco devices can exceed 64 chars and won't fit into the database (max 64 chars)
Example os_version: "C2960X Software (C2960X-UNIVERSALK9-M), Version 15.2(2)E7, RELEASE SOFTWARE (fc3)" (81 characters)
Is your feature request related to a problem? Please describe.
When copy pasting part of a switchname to search under "Devices", it is some times in capital letters and then i get no hit on my searh.
Describe the solution you'd like
It would be nice if the search was case insensitive instead.
Additional context
Is your feature request related to a problem? Please describe.
We sometimes need to make experimental changes to our templates or settings, and test these in single instances of CNaaS-NMS. This would be much easier if we could follow a normal Git workflow and make separate branches for these experiements. However, the current version of CNaaS-NMS will only check out the default branch of any referenced repo, which forces us to take the more laborious route of forking a separate repo for this purpose.
Describe the solution you'd like
We would like a solution where we can specifying a different branch name than the default as part of the GITREPO_TEMPLATES
or GITREPO_SETTINGS
environment variables.
An example of a URL pattern we would like: GITREPO_SETTINGS=https://git.example.org/cnaas/settings.git#alternate-branch
should cause CNaaS-NMS to clone a settings repo from the alternate-branch
branch of the repo at https://git.example.org/cnaas/settings.git
Is your feature request related to a problem? Please describe.
The docker containers are now quite bulky. They are linux based. We would like to use something that uses less space.
Describe the solution you'd like
We would like to use Python3.12slim or something similar.
Is your feature request related to a problem? Please describe.
It takes long and is hard to debug issues with integration tests
Describe the solution you'd like
Also if a short summary of errors could be sent into the github PR somehow that would be good?
Describe alternatives you've considered
Additional context
Is your feature request related to a problem? Please describe.
I want to define custom behavior for some VXLANs, like special multicast behavior only on specific VXLANs. It would be nice with a well defined way to define "tags" to specify custom behavior/config per VXLAN.
Describe the solution you'd like
Allow a list of alphanumeric strings to be added per VXLAN in global/vxlans.yml
Describe alternatives you've considered
Matching on description/name/vrf
Additional context
Is your feature request related to a problem? Please describe.
We use the field "infra_ip" for devices to generate lo0 IP-address. We also need to be able to set a similar field for IPv6.
Describe the solution you'd like
We want a additional field to be available for devices. For instance called "infra_ip6". This field should be of type IPv6 address. The field should of course also be available as a setting for the device in templates as "infra_ip6".
Describe alternatives you've considered
We've considered automatically generating IPv6 address from the IPv4 address, but this does not work for us.
Additional context
The attached image shows a snippet from the documentation and marks the field that we want added as an IPv6 field.
The file lock used by scheduler/scheduler.py to detect if a scheduler is already running is not working properly when flask/werkzeug is started with debug=True. When in debug mode the scheduler module seems to be loaded once, and then immediately reloaded, without releasing the file lock in between. This results in the scheduler starting with "is_mule = True" since it things a wsgi mule has already acquired the file lock. This is not a problem if running in wsgi mode (in docker etc) but only if you run in standalone API mode. Possible solutions:
Workaround: Set debug=False when starting werkzeug
Is your feature request related to a problem? Please describe.
We constantly find ourselves writing new template tags to make our template logic more concise. Some of these functions are accepted upstream, while some of them are very specific to our own setup and do not belong upstream.
Currently, we need to patch CNaaS-NMS locally to add our own template tags, which becomes rather tedious every time we need to do it, or every time we need to upgrade CNaaS-NMS.
Describe the solution you'd like
It would be much better if there was a mechanism or option to configure a list of third party Python modules with template tag functions, to import at runtime and make available when rendering Jinja2 templates.
Describe alternatives you've considered
Keep using the tedious manual patch management.
Describe the bug
When doing a dry-run to a group of switches and one of them is unreachable you get a long python stack trace in the NMS GUI.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Instead of getting 50 lines of stack trace It would be preferable if it was handled with some kind of message saying the device could not be reached or something like that.
Environment:
CNaaS-NMS version: 1.2.1
While working on #179, I ran hadolint against all Dockerfiles.
Not much came up, and this warning looks like it's an easy fix against unpleasant surprises.
$ hadolint docker/redis/Dockerfile
redis/Dockerfile:1 DL3007 warning: Using latest is prone to errors if the image will ever update. Pin the version explicitly to a release tag
While testing the automated dev-setup for the frontend I wound up taking a look at the swagger-interface (runs on localhost:443) to the backend API. I have some (pedantic) comments.
In most cases you have two endpoints to work with one type of data. For instance /devices
to list all, and /device/<something>
. But not for groups, there you only have /groups
.
Furthermore, the descriptions of several of the endpoints seems copy-pasted. Compare
/job
API for handling jobs/jobs
API for handling jobs/joblock
API for handling jobsThird, why /joblocks
instead of /job_locks
when you have /device_<something>
? I read it as "job blocks" at first.
Fourth, there is no example response body, which is very useful for somebody trying to figure out the API without having possibly dangerous access. I assume the swagger is auto-generated and that it is a hassle to extend it to do the right thing but it is very much worth it, in my experience.
Fifth, how are you planning to do versioning of the API? I assure you, you'll want to have a way to version the API.
Is your feature request related to a problem? Please describe.
There is still too much configuration by hand when setting up a new CNAAS configuration on the database side. We could automatize this if a database schema definition is part of the Alembic migrations
Describe the solution you'd like
Add the database schema to the Alembic migrations
Using the default docker-compose setup, the cnaas-nms API is unable to detect its own version.
cd docker
docker-compose.override.yaml
file with custom settings. This override will set BUILDBRANCH=uninett/lab-deploy
, among other things.docker-compose build, docker-compose up -d
curl -k -s -H "Authorization: Bearer $JWT_AUTH_TOKEN" $CNAAS_URL/api/v1.0/system/version | jq
git_version shows a commit hash, branch and timestamp.
{
"status": "success",
"data": {
"version": "1.3.0dev1",
"git_version": "Unhandled exception"
}
}
docker-compose.override.yaml
With some manual debugging, I was able to trace the issue back to a file permission/ownership problem on the .git
directory of the container.
I will suggest a patch to fix this.
Describe the bug
Through conversation with @indy-independence, we have learned that any environment variable prefixed with TEMPLATE_SECRET_
, exported to the CNaaS-NMS API process, will be made available as variables in Jinja templates.
This fact is not documented in the official documentation, but it should be.
Additional context
#215 further modifies how these variables are made available to templates.
Is your feature request related to a problem? Please describe.
At this point in time the user must setup their own certificate chain to verify tokens produced by the auth-poc-service
. This is not very secure as a lot of deployment probably use the built in certificates.
Describe the solution you'd like
Refactor the jwt_security
decorator to use the dynamically retrieved configuration and certificates used in issue #310
In various docker-files, you install 3rd party python libraries like this:
$ pip install dependency1 dependency2 ..
Sooner or later, these dependencies will need different versions of a sub-dependency, and then pip will fail because of a version conflict.
I recommend that you put these dependencies in requirements-files with version numbers and install them like this:
$ pip install -r requirements-for-subsystem-1.txt
You can generate a requirements file with pip freeze
or better: pip-compile from piptools.
By using the same requirements-files everywhere you can make reproducible deploys, and also test on the actual versions that will be deployed. Much recommended!
Is your feature request related to a problem? Please describe.
Logging is done via the log.py file, where get_logger is used to get a logger object. This obejct has a handler for writing to redis. If
redis is not running, logging will not work in cnaas-nms, as it will crash when trying to connect to redis. So if you try to run tests that test code trying to log, you are required to have a redis session.
An overall problem is that library code trying to log sets up logging handlers. Handlers should be setup by the program using the library, for instance the main program.
Describe the solution you'd like
Remove setup of handlers in get_logging(), and instead require that handlers and stuff is setup by the main program that is using the code. This way, tests can just decide to not setup the redis logging, and are then able to run tests without needing a redis session.
Describe the bug
Dist ZTP init fails with "Neighbor device <> not synchronized" even though neighbors was synchronized before starting init
To Reproduce
Steps to reproduce the behavior:
Expected behavior
ZTP init success
Outputs
"Neighbor device <> not synchronized"
Environment:
Additional context
We have all of our CNaaS equipment on the .c.uninett.no
domain. This is not supported by CNaaS-NMS, rendering us unable to use any setting that is validated by Pydantic using host_schema
fields.
The source here is this:
>>> import re
>>> from cnaas_nms.db import settings_fields as sf
>>> pattern = re.compile(sf.FQDN_REGEX)
>>> pattern.match("test.c.uninett.no")
>>> pattern.match("test.cd.uninett.no")
<re.Match object; span=(0, 18), match='test.cd.uninett.no'>
./integrationtests.sh
fails with an error while building the API container:
ERROR: Cannot install -r requirements.txt (line 12) and napalm==3.1.0 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested napalm==3.1.0
nornir 2.4.0 depends on napalm<3 and >=2
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
ERROR: Service 'cnaas_api' failed to build : The command '/bin/sh -c /opt/cnaas/cnaas-setup.sh $BUILDBRANCH' returned a non-zero code: 1
The error seems related to changes in the pip dependency resolver. See https://pip.pypa.io/en/latest/user_guide/#changes-to-the-pip-dependency-resolver-in-20-3-2020
A temporary fix is adding --use-deprecated=legacy-resolver
to the pip call.
A long-term seems to be updating requirements.txt
with newer, compatible versions.
Problem statement
It should be possible to dynamically configure the CNAAS-NMS api and frontend application to be compatible with any OIDC provider. This could be Google, Authy, Facebook, Microsoft and/or SURFconext. Ususally this is done by feeding the application with a .well-known-endpoint that configures the auth(n|z) of your application.
Describe the solution you'd like
Implement the authlib library to configure the api: https://authlib.org/
The dynamic configuration should look something like this
oauth.register(
"connext",
server_metadata_url=settings.OIDC_CONF_WELL_KNOWN_URL,
client_id=settings.OIDC_CLIENT_ID,
client_secret=settings.OIDC_CLIENT_SECRET,
client_kwargs={"scope": "openid"},
response_type="id_token token",
response_mode="query",
)
The Authlib library provides a mechanism to retrieve tokens and dynamically and download the verification certificates into the application.
Is your feature request related to a problem? Please describe.
We CNAAS-NMS has an option for configuring dhcp_relays. because (for example) Juniper has a different config tree for dhcpv6 we have to 'detect' in Jinja if a dhcp relay is ipv4 or ipv6. This makes the jinja2 templates hacky and ugly.
Describe the solution you'd like
Thats why It would be prettier tot just include an option dhcpv6_relays as a list of ipv6 addresses
Describe alternatives you've considered
The alternative is to solve it in jinja2 templates. But that can become hacky and cumbersome....at least for a Juniper
Additional context
None
Is your feature request related to a problem? Please describe.
We need the ability to run IPv6 in the management domain. Management domain does not allow to set ipv6_gw as a field and then we are unable to generate a management vxlan with IPv6-adresses.
Describe the solution you'd like
It should be possible to add a ipv6_gw field to the management domain.
Describe alternatives you've considered
Deriving ipv6_gw from the ipv4-address (ipv4_gw) is not at good solution for us.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.