netgroup-polito / crownlabs Goto Github PK
View Code? Open in Web Editor NEWKubernetes-based Remote Laboratories
Home Page: https://crownlabs.polito.it
License: Apache License 2.0
Kubernetes-based Remote Laboratories
Home Page: https://crownlabs.polito.it
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
When a user has multiple VMs open in the same browser it is hard knowing the tab corresponding to a particular VM.
Describe the solution you'd like
It could be useful to provide in the title of the noVNC tab the id of the VM and after its description.
Additional context
I have no idea if this is possible or where to configure it, but I think it could be useful
#373 introduced the support for persistent VMs, hence enabling to create VMs whose disk is kept even when the VM is turned down. However, users still need to upload the base image of their VMs, which is a long and error-prone process (and not available to all users).
This feature request proposes to explore the capability of Kubevirt to pause/unpause VMs to allow the dynamic creation of snapshots of a current VM. This would allow a user to modify its own VM and create a new "base" image to build upon, without the necessity to create a new VM offline and upload the disk.
Prettier ensures that the TS code is formatted following the project specifications. It should be added as mandatory check when proposing a new PR.
Items:
./node_modules/.bin/prettier --check */**/*.{js,json,css}
npm run prettify
Currently, VMs work in isolation. There are cases in which different VMs should be allocated on the same LAN, sharing the same broadcast domain.
This feature request suggests to introduce the support for services that require multiple VMs running at the same time, possibly with a given network setup.
In some cases, a console-mode is enough. This would greatly decrease the necessities in terms of CPU on the cluster.
It would be helpful to support also SSH-only backend services, which may require a different "ingress controller" that is able to demultiplex SSH sessions toward the involved VM.
'Logout' hangs the VM: processes are still active, but the user can no longer login and is forced to reboot the machine.
So, better to disable this option from the machine. This should be possible by configuring the VM to work in "kiosk mode".
So far, the current approach courses and users is done leveraging Python scripts which have to be manually launched. It could be nice to have this workflow exposed via APIs, without having to rewrite all the Python code which is complete and pretty stable.
Proposal: Create a kopf-based 1 operator which consumes dedicated CRDs for course, students, professors ( define APIs to generate CRDs and instrument existing code to handle CRDs reconciliation).
Improvements:
Bug fix:
This looks redundant, as the password is well-known for everybody.
It seems that TigerVNC server supports the "noauth" mode:
https://serverfault.com/questions/376302/tigervnc-ssh-without-a-vnc-password
Not clear if this has to be coupled with a modification to the NoVNC code in order to support this feature:
novnc/noVNC#551
Sometimes, it is impossible to connect to the remote desktop of a newly created VM and the 500 Internal Server Error is returned.
The current workaround consists in deleting the VM instance and creating a new one. The problem should not persist.
Currently, every ingress resource providing access to tenant VMs triggers the creation of a Certificate
resource to fetch the certificate and fill the corresponding secret (through the cert-manager.io/cluster-issue
annotation).
Although all ingresses point to the same domain (i.e. crownlabs.polito.it
), the Order
is nonetheless issued and the certificate request counts as a renewal. Hence, hitting let's encrypt rate limits (5 renewal per week per domain).
Apparently, this issue is not causing any visible problem right now (not sure why). Yet, it will break everything when the certificate will need to be renewed (the current one expires on June, 15th).
Possible solutions identified up to now:
kubed
to automatically synchronize secrets between namespaces [1], [2]. This appears to be the solution suggested by cert-manager
[3].reflector
to automatically synchronize secrets between namespaces [4]. It seems to include a cert-manager
extension.CronJob
to copy the secret generated by one single Certificate
to all tenant namespaces.crownlabs.polito.it
certificate as the default one in nginx. I would discard this solution since it does not work for different domains (e.g test).Relevant cert-manager issue [5].
Any comments? Suggestions?
The email should come from a well-identified sender. Currently it comes from "netgroup,[email protected]", which doesnt' identify who is currently sending the message. Better to change the sender in "CrownLabs service administrator [email protected]", so that people feel easier to identify what is this message about.
The email with the link to register to the service lasts only 12 hours. It should be extended to 48 hours. Otherwise professors will spend a huge amount of time re-registering students who did not click on the link in time.
The email received by the student MUST tell explicitly the username that has to be used to register. Currently this information is missing, so the student has no idea about his account.
The login page (https://auth....) should be changed in order to ask explicitly for the username. Currently it asks for username or email address; if you try to log-in with the latter, the service refuses the access. Alternatively (honestly, preferred) the login procedure should support the email as well.
Not mandatory, but it should be nice to send another email to the person telling him that the registration procedure has been completed successfully, possibly with a link to update his personal data (password, user, etc).
Sometimes, the CrownLabs dashboard becomes unresponsive and it is forces to close the browser tab. The problem appears to be related with the keycloak token, since it is triggered when it is necessary to reinsert the credentials in a different page (e.g. to access the desktop of a VM or NextCloud after a certain time).
Since the bug involves long waiting times, one simple way to reproduce it is (with reference to the chrome browser):
Minor aspect: when a VM instance is destroyed, the secrets associated with the ingresses are not deleted, polluting the namespace.
This feature request proposes to implement a mechanism to automate and simplify the creation of new VM images for CrownLabs, as well as the subsequent conversion and upload to the target Docker registry.
The workflow currently envisioned for the creation of a new VM image (originated as a trade-off, hence not yet optimal) involves the following steps:
ImageConversionRequest
, specifying the URL the VM can be downloaded from and the name to be assigned to the resulting image.ImageConversionRequest
, creating a Job
to download, convert and push the specified image to the Docker registry. The image name must be prepended by an identifier of the creator, to prevent name overlapping.In order to implement the steps 4 to 6, the tasks currently identified involve the following.
ImageConversionRequest
CRD.Job
to download, process and upload the VM image to the Docker Registry. Essentially, the tasks to be performed are those implemented by this script. Possible containers composing the job are:
ImageConversionRequest
;qcow2
format;ImageConversionRequest
resource, creates a new Job according to the previous specifications to process the CrownLabs VM image. The operator should also update the status of the resource, to provide feedback regarding how the conversion is proceeding. It is proposed to use kopf to develop the operator using python. An example of the usage of python to interact with Kubernetes resources is provided by this script.ImageConversionRequest
resources (both by the operator and a subset of end-users).ImageConversionRequest
, specifying the URL the image can be downloaded from and the destination name.ImageConversionRequest
resources, to provide feedback regarding how the conversion is proceeding and whether it succeeded or failed.VMs running on clusters may have full Internet access. This may pose non trivial security risks, as traffic is NATted and hence it may not be possible to discover to the originating VMs in case of malicious actions are carried out on the Internet.
This feature request proposes the creation of a "per-session logger", per-VM, which is able to save a record containing the most important information associated to each TCP connection established from the VM to the outside world.
Is your feature request related to a problem? Please describe.
Describe the solution you'd like
Additional context
Related to #213
Describe the bug
When a labinstance referencing a labtemplate with field labNum
non numeric is created, the operator crashes.
To Reproduce
Steps to reproduce the behavior:
labNum
to any non numeric stringExpected behavior
The reconciliation should be completed correctly or, alternatively, the insertion of the incorrect labtemplate should fail in the first place.
Additional context
In the short term, I would focus on fixing this bug to enable non-numeric values. Apparently, it is due to a mismatch between the CRD definition considering the labNum
value of type string:
labNum
is of type numeric:Yet, I believe that in the longer term it is necessary to develop a cleaner and simpler version of the labtemplate CRD. Then, this field may be no longer necessary
In case a huge load is started on the VM, the machine starts responding very slowly. However, the user is not aware of this, hence he may conclude that the cluster is not working properly.
Add a widget in the dashboard that prints the current usage of the user in term of cpu, memory, network, in order to inform when some critical parameters exceed a given threshold.
The cloud-init script fails to configure and mount the NextCloud disk in xubuntu 18.04
2020-04-20 16:35:37,943 - stages.py[INFO]: Loaded datasource DataSourceNoCloud - DataSourceNoCloud [seed=/dev/vdb][dsmode=net]
2020-04-20 16:35:38,084 - stages.py[INFO]: Applying network configuration from fallback bringup=False: {'ethernets': {'enp1s0': {'dhcp4': True, 'set-name': 'enp1s0', 'match': {'macaddress': '26:54:06:c4:9b:b0'}}}, 'version': 2}
{'type': 'physical', 'name': 'enp1s0', 'mac_address': '26:54:06:c4:9b:b0', 'match': {'macaddress': '26:54:06:c4:9b:b0'}, 'subnets': [{'type': 'dhcp4'}]}
{'enp1s0': {'dhcp4': True, 'set-name': 'enp1s0', 'match': {'macaddress': '26:54:06:c4:9b:b0'}}}
2020-04-20 16:35:38,850 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
in "<unicode string>", line 8, column 6:
- [https://crownlabs.polito.it/clou ...
^
found unexpected ':'
in "<unicode string>", line 8, column 11:
- [https://crownlabs.polito.it/cloud/rem ...
^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 16:35:38,862 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
in "<unicode string>", line 8, column 6:
- [https://crownlabs.polito.it/clou ...
^
found unexpected ':'
in "<unicode string>", line 8, column 11:
- [https://crownlabs.polito.it/cloud/rem ...
^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 16:35:38,862 - util.py[WARNING]: Failed at merging in cloud config part from part-001
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 140, in handle_part
self._merge_part(payload, headers)
File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 116, in _merge_part
(payload_yaml, my_mergers) = self._extract_mergers(payload, headers)
File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 95, in _extract_mergers
mergers_yaml = mergers.dict_extract_mergers(payload_yaml)
File "/usr/lib/python3/dist-packages/cloudinit/mergers/__init__.py", line 83, in dict_extract_mergers
raw_mergers = config.pop('merge_how', None)
AttributeError: 'NoneType' object has no attribute 'pop'
2020-04-20 16:35:39,478 - __init__.py[INFO]: Created new group lxd
2020-04-20 16:35:48,094 - cc_apt_configure.py[INFO]: No custom template provided, fall back to builtin
2020-04-20 17:13:30,313 - stages.py[INFO]: Loaded datasource DataSourceNoCloud - DataSourceNoCloud [seed=/dev/vdb][dsmode=net]
2020-04-20 17:13:30,960 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
in "<unicode string>", line 8, column 6:
- [https://crownlabs.polito.it/clou ...
^
found unexpected ':'
in "<unicode string>", line 8, column 11:
- [https://crownlabs.polito.it/cloud/rem ...
^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 17:13:30,986 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
in "<unicode string>", line 8, column 6:
- [https://crownlabs.polito.it/clou ...
^
found unexpected ':'
in "<unicode string>", line 8, column 11:
- [https://crownlabs.polito.it/cloud/rem ...
^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 17:13:30,987 - util.py[WARNING]: Failed at merging in cloud config part from part-001
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 140, in handle_part
self._merge_part(payload, headers)
File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 116, in _merge_part
(payload_yaml, my_mergers) = self._extract_mergers(payload, headers)
File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 95, in _extract_mergers
mergers_yaml = mergers.dict_extract_mergers(payload_yaml)
File "/usr/lib/python3/dist-packages/cloudinit/mergers/__init__.py", line 83, in dict_extract_mergers
raw_mergers = config.pop('merge_how', None)
AttributeError: 'NoneType' object has no attribute 'pop'
CrownLabs includes a set of different services, each one with its strengths and weakenesses. Furthermore, it leverages an innovative interaction with the API server, in which each client (running in the browser) directly interacts with the API server itself, without an intermediate backend.
This feature request suggests the opportunity of a proper security assessment to make sure that the current install is secured enough to be offered on the public Internet.
A better name allow people to understand better what is this folder about.
Suggest to change it into "myfolder" or (probably better) "MyFolder", whatever you folk feel it more appropriate.
Noticed that the crown favicon is displayed on the base url but after entering the app and logging in the favicon becomes the default one of the theme. @palexster could this be related to the ingress configuration?
Objective: CrownLabs is quite effective to host laboratories and manage different courses. Particularly, desktop sharing is very useful to support the work of multiple people at the same time (e.g., teammates), the help from instructors, but also from other people (e.g., other students) who can provide an help to a student who is blocked on some problems.
However, desktop sharing has to be disabled in case of exam, allowing only professors to connect to student's desktop.
So, it would be nice to support an "exam mode" in CrownLabs, where:
This issue is probably related to #168. When we logout from Crownlabs the same operation isn't performed in NextCloud: the session with NextCloud remains open and it is still possible to access our storage.
Describe the bug
It is impossible to upload new files to NextCloud (both via the web interface and through webdav). Indeed, the file appears to be uploaded but then NextCloud displays a popup reading "An unknown error has occurred" and the file is deleted. Instead, empty folders and files can be created from the web interface correctly.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The file should be uploaded correctly
Desktop (please complete the following information):
Additional context
One of the database pods is returning strange logs (may or may not be related to this issue):
$ kubectl logs -n nextcloud nextcloud-db-cluster-1
/var/run/postgresql:5432 - rejecting connections
2020-05-15 09:07:29,766 INFO: Lock owner: nextcloud-db-cluster-2; I am nextcloud-db-cluster-1
2020-05-15 09:07:29,767 INFO: Still starting up as a standby.
2020-05-15 09:07:29,768 INFO: Lock owner: nextcloud-db-cluster-2; I am nextcloud-db-cluster-1
2020-05-15 09:07:29,768 INFO: does not have lock
2020-05-15 09:07:29,769 INFO: establishing a new patroni connection to the postgres cluster
2020-05-15 09:07:30,492 INFO: establishing a new patroni connection to the postgres cluster
2020-05-15 09:07:30,518 WARNING: Retry got exception: 'connection problems'
2020-05-15 09:07:30,519 INFO: Error communicating with PostgreSQL. Will try again later
Additionally, if I remember correctly, the nextcloud-db-cluster-1
pod has been killed forcefully while draining the node to upgrade the system O.S.
After creating the VM, the dashboard leaves the VM as "In progress", even if the VM is correctly ready and accessible:
apiVersion: instance.crown.team.com/v1
kind: LabInstance
metadata:
creationTimestamp: "2020-03-30T13:19:01Z"
generation: 1
name: landc-lab1-alex.palesandro-5877
namespace: tenant-alex-palesandro
resourceVersion: "17484434"
selfLink: /apis/instance.crown.team.com/v1/namespaces/tenant-alex-palesandro/labinstances/landc-lab1-alex.palesandro-5877
uid: c930f429-a7ae-42f8-97c3-b46307ff9e4a
spec:
labTemplateName: landc-lab1
labTemplateNamespace: course-landc
studentId: alex.palesandro
status:
observedGeneration: 1
phase: VmiReady
url: apiVersion: instance.crown.team.com/v1
kind: LabInstance
metadata:
creationTimestamp: "2020-03-30T13:19:01Z"
generation: 1
name: landc-lab1-alex.palesandro-5877
namespace: tenant-alex-palesandro
resourceVersion: "17484434"
selfLink: /apis/instance.crown.team.com/v1/namespaces/tenant-alex-palesandro/labinstances/landc-lab1-alex.palesandro-5877
uid: c930f429-a7ae-42f8-97c3-b46307ff9e4a
spec:
labTemplateName: landc-lab1
labTemplateNamespace: course-landc
studentId: alex.palesandro
status:
observedGeneration: 1
phase: VmiReady
url: https://crownlabs.polito.it/aa4b6712-63d4-41c9-9cd0-ff67ad5f7b77
I would expect that after the login, the VM becomes green.
The filter on namespace at the moment is very simple: we only use a namespacePrefix argument which allows to run the reconcile method only on LabInstances in the namespaces with that prefix.
It would be nice to have a more generic whitelist, for example something similar to matchLabels
used by kubernetes.
@palexster is working on this in #149
Currently, the content of the persistent folder on VMs is lost when we stop and restart the machine. The content of this folder should persist across VM reboots.
The content of the persistent folder is not shared among VMs of different labs. This folder should be a property of the user, hence should be mounted on all VMs belonging to the same user.
So far, when we perform kubectl get labinstances.instance.crown.team.com
we obtained just the basic information about resources. It could be nice to customize the CRD definition in order to have at least:
This issue defines the path to refactor the code of the webservice folder.
The path described aims to provide a way to refactor the code while still being able to develop new features. This is not an ideal solution, because there will be some new features that will not match the new refactor style until it is fully completed. Also, some time will be spent to refactor the new features in the 'new' way. After the refactoring is complete all new parts of the code will need to respect the new guidelines and architecture.
from this point forward all coding guidelines and ESLint must be respected
/app
) #318npm run prettify-all
The adoption of ESLint will be incremental: first, disable to familiarize with it, then with some feature disabled, then in future some other feature could be added for better code quality
Complete checklist proposed here
Useful links:
The VM is listed in the "starting" state (orange hourglass) forever, without recognizing that the VM is actually ready and the user can connect to it.
Bur reported mostly on Firefox and Safari; some sporadic case in which this bug is also present on Chrome.
Temporary solution: log out of Crownlabs, re-enter again, and usually the VM has become green.
In order to go public, we have to move those files to CrownOps, since they are specific to our infrastructure.
The Crownlab webpage started to drain too many resources (in my case almost 3GB of RAM and all the CPU cores)
Currently there are no instructions about how to create a "void" VM here.
I would suggest to move the procedure we have for Computer Network courses here, at least for what concerns the initial steps.
The name of the resources is based on the LabTemplate name, there are no checks of possible troublesome names (for example max length, accepted characters...) and also there are some mismatches in the resources names (for example the LabInstance has a random number different from the one of all other resources)
VNCServer does not behave correctly, by presenting a blank screen when scheduled on a node with medium load. In particular, vnserver is started long before the network becomes available in the VM.
Strategy:
Currently our laboratories runs in VMs. Each virtual machine requires its underlying operating system, and then the hardware is virtualized. VMs, however, can take up a lot of system resources. Each VM runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of RAM and CPU cycles and implies low scalability.
A possible solution is to use Containers.
With containers, instead of virtualizing the underlying computer like a virtual machine (VM), just the OS is virtualized. Docker containers are easily portable and less resource-intensive than virtual machines and this is what we need in our dynamic environment. When Docker runs a container, it runs an image inside it. This image is usually built by executing Docker instructions, which add layers on top of existing image or OS distribution.
We can use Containers to follow two different flavors:
Build a full image with a Desktop Environment and allow users to use it like we are doing now with vms.
So we can build our laboratories from a same base layer and this allow us to use less storage in the registry.
This picture below show's performances consumed by a running container in my laptop that runs:
Ubuntu 18.04 with lxde DE and novnc setup. As you can see in idle uses 300 MB of memory and with firefox uses around 470MB.
If you want try yourself you can run this command
docker run -p 6080:80 -v /dev/shm:/dev/shm dorowu/ubuntu-desktop-lxde-vnc
and when container starts browse http://127.0.0.1:6080/
We can also change DE and use for example XFCE4 or share audio only if users runs linux distos. For more information you can see here
Is the best one in my opinion.
Instead of building an entire image with Desktop Environment we can use containers to deploy the specific application for the specific laboratory. An example is to run GNS3 in a container like done here. With this method we will have some performance improvement,but we lose functionality:
Crownlabs will be not a desktop exposer but only an application exposer, so will lose a lot of great stuffs done.
We must study how to install the specific application that we will want to expose and all the minimum dependencies required to work.
How we can share not only a single window of the application but also the pop up windows like terminals of the various routers in GNS3 trough novnc?
Needed to monitor how the system reacts with the one of the most important parameter for user experience, i.e., how fast is the service to start.
Describe the bug
When deleting a student's labinstance from the professor view, the deletion fails with a "Resource not found, probably you have already destroyed it" error message. Then the instance it is removed from the list, but not actually deleted. It is still present after reloading the page. Conversely, this operation completes correctly if the instance was created by the professor himself.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The instance should be deleted correctly.
Additional context
Looking at the DELETE request, the problem is related with the target namespace of the operation. Indeed, it is always the one associated with the professor, while it should be the one where the instance resides. This is one example of wrong url (incorrectly pointing to my namespace): https://apiserver.crown-labs.ipv6.polito.it/apis/instance.crown.team.com/v1/namespaces/tenant-marco-iorio/labinstances/benchmarks-xubuntu-base-oc-0000
.
When the load in the VM is very high, VNC connection looks difficult.
Try to prioritize the VNC daemon (and the companion services, such as websockify) in order to see if this improves the behavior of the machine.
Currently VMs mount the persistent disk as block storage.
Move to NFS, so that multiple VMs can mount the same disk, which can be shared among different labs.
The login button should be encapsulated in the new blog page and the login page should be replaced by the blog.
At the moment we watch the service which exposes the VM to know if it's in Ready state. This control loop sometimes gives some problems and in case of errors the goroutine continues to be executed forever.
To do: think about a different method to get the actual status of the VM.
This issue is probably related to #166. The logout button of NextCloud doesn't properly work. When we try to logout, there is a period of time in which the page loads something, then the NextCloud page refreshes without being logged out (the session remains open).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.