netgroup-polito / crownlabs Goto Github PK

Kubernetes-based Remote Laboratories

License: Apache License 2.0

Dockerfile 1.67% Makefile 0.29% Go 59.29% CSS 0.32% JavaScript 4.22% Shell 4.00% Python 2.16% FreeMarker 0.21% Mustache 1.65% HTML 0.24% TypeScript 25.36% Less 0.43% C 0.01% C++ 0.01% Smarty 0.15%

golang kubebuilder kubernetes kubevirt novnc reactjs

crownlabs's People

Contributors

Stargazers

Watchers

crownlabs's Issues

Better title of VM noVNC tab

Is your feature request related to a problem? Please describe.
When a user has multiple VMs open in the same browser it is hard knowing the tab corresponding to a particular VM.

Describe the solution you'd like
It could be useful to provide in the title of the noVNC tab the id of the VM and after its description.

Additional context
I have no idea if this is possible or where to configure it, but I think it could be useful

[New feature] Add snapshots to VMs

#373 introduced the support for persistent VMs, hence enabling to create VMs whose disk is kept even when the VM is turned down. However, users still need to upload the base image of their VMs, which is a long and error-prone process (and not available to all users).
This feature request proposes to explore the capability of Kubevirt to pause/unpause VMs to allow the dynamic creation of snapshots of a current VM. This would allow a user to modify its own VM and create a new "base" image to build upon, without the necessity to create a new VM offline and upload the disk.

Add Prettier as mandatory check in CI Pipeline

Prettier ensures that the TS code is formatted following the project specifications. It should be added as mandatory check when proposing a new PR.
Items:

Add the check to a new Github check ./node_modules/.bin/prettier --check */**/*.{js,json,css}
Document how to run the prettify over the project npm run prettify
Document how to enable git commit pre-hook

[New feature] Support for inter-VM LAN

Currently, VMs work in isolation. There are cases in which different VMs should be allocated on the same LAN, sharing the same broadcast domain.
This feature request suggests to introduce the support for services that require multiple VMs running at the same time, possibly with a given network setup.

[New feature] Support for SSH-only VMs

In some cases, a console-mode is enough. This would greatly decrease the necessities in terms of CPU on the cluster.
It would be helpful to support also SSH-only backend services, which may require a different "ingress controller" that is able to demultiplex SSH sessions toward the involved VM.

Disable 'logout' button from VM

'Logout' hangs the VM: processes are still active, but the user can no longer login and is forced to reboot the machine.
So, better to disable this option from the machine. This should be possible by configuring the VM to work in "kiosk mode".

Add APIs for Course Creation Workflow

So far, the current approach courses and users is done leveraging Python scripts which have to be manually launched. It could be nice to have this workflow exposed via APIs, without having to rewrite all the Python code which is complete and pretty stable.

Proposal: Create a kopf-based 1 operator which consumes dedicated CRDs for course, students, professors ( define APIs to generate CRDs and instrument existing code to handle CRDs reconciliation).

non-optimal running vms windows on professor view

Improvements:

Change dimension of running vms on professor view, is better to use all the screen height
~~Change padding and margin, it is better to have a narrower list.~~
Change text color on list, now is black on green background, maybe we can keep black and use a lighter green (Lime green for example)
GitHub logo is disappeared

Bug fix:

The button to kill a vm is disappeared

Avoid authentication in VNC connection

This looks redundant, as the password is well-known for everybody.
It seems that TigerVNC server supports the "noauth" mode:
https://serverfault.com/questions/376302/tigervnc-ssh-without-a-vnc-password

Not clear if this has to be coupled with a modification to the NoVNC code in order to support this feature:
novnc/noVNC#551

The connection to a new VM fails with 500 Internal Server Error

Problem description

Sometimes, it is impossible to connect to the remote desktop of a newly created VM and the 500 Internal Server Error is returned.

Workaround

The current workaround consists in deleting the VM instance and creating a new one. The problem should not persist.

Additional information

The bug is related with oauth2-proxy: checking the network tab of browser developer tools, the authentication process is retried multiple times: looking at the oauth2-proxy logs, the cookie appears to be considered as invalid (invalid signature).
oauth2-proxy is configured to use a cookie stoage, where it saves the entire token returned by keycloak.
oauth2-proxy automatically splits cookies larger than 4K, since they are not supported by most browsers. Unfortunately, this feature does not work well (many issues, we had to temporarily merge an existing PR to fix a bug and the developers seem willing to deprecate it and adopt alternative solutions).
The problem is not deterministic (it appears with some instances but not with others). Yet, once hit, it afflicts all users (at least the ones with token >4K - the length depends on the number of courses a user can access).
Currently, the only alternative solution to the cookie storage consists in using a redis backend, introducing quite a lot of complexity given the number of oauth2-proxy instances we have to manage.

Dry Run - Problems

High number of Failed VMs to investigate
Lack of metrics scraping from Kubevirt
Lack of dedidated Dashboard to analyze platform consumption
Lack of ServiceMonitor on created VMs
Timeout of 5 minutes to have updated interface (i.e. timeout watching)
Lack of real network traffic dashboards
Nginx metrics scraping
NoVNC disconnects when using the interface
Multiple VMs of the same type share the same ingress

Hitting let's encrypt rate limits for crownlabs.polito.it

Currently, every ingress resource providing access to tenant VMs triggers the creation of a Certificate resource to fetch the certificate and fill the corresponding secret (through the cert-manager.io/cluster-issue annotation).

Although all ingresses point to the same domain (i.e. crownlabs.polito.it), the Order is nonetheless issued and the certificate request counts as a renewal. Hence, hitting let's encrypt rate limits (5 renewal per week per domain).

Apparently, this issue is not causing any visible problem right now (not sure why). Yet, it will break everything when the certificate will need to be renewed (the current one expires on June, 15th).

Possible solutions identified up to now:

Set-up kubed to automatically synchronize secrets between namespaces [1], [2]. This appears to be the solution suggested by cert-manager [3].
Set-up reflector to automatically synchronize secrets between namespaces [4]. It seems to include a cert-manager extension.
Configure a CronJob to copy the secret generated by one single Certificate to all tenant namespaces.
Configure the crownlabs.polito.it certificate as the default one in nginx. I would discard this solution since it does not work for different domains (e.g test).

Relevant cert-manager issue [5].

Any comments? Suggestions?

Operator Improvements

List here and add some possible improvements we could make on the operator

VM ready status watcher: #138
Resources names: #139
Namespace filter: #140
Expose metrics about its execution and statistics about launched VMs
Increase the level of information obtained by getting a LabInstance ( #145)

Improve registration/login experience

The email should come from a well-identified sender. Currently it comes from "netgroup,[email protected]", which doesnt' identify who is currently sending the message. Better to change the sender in "CrownLabs service administrator [email protected]", so that people feel easier to identify what is this message about.
The email with the link to register to the service lasts only 12 hours. It should be extended to 48 hours. Otherwise professors will spend a huge amount of time re-registering students who did not click on the link in time.
The email received by the student MUST tell explicitly the username that has to be used to register. Currently this information is missing, so the student has no idea about his account.
The login page (https://auth....) should be changed in order to ask explicitly for the username. Currently it asks for username or email address; if you try to log-in with the latter, the service refuses the access. Alternatively (honestly, preferred) the login procedure should support the email as well.
Not mandatory, but it should be nice to send another email to the person telling him that the registration procedure has been completed successfully, possibly with a link to update his personal data (password, user, etc).

The CrownLabs dashboard stops responding

Problem description

Sometimes, the CrownLabs dashboard becomes unresponsive and it is forces to close the browser tab. The problem appears to be related with the keycloak token, since it is triggered when it is necessary to reinsert the credentials in a different page (e.g. to access the desktop of a VM or NextCloud after a certain time).

How to reproduce

Since the bug involves long waiting times, one simple way to reproduce it is (with reference to the chrome browser):

Open the crownlabs dashboard
Open the developer tools -> Application -> Cookies -> https://auth.crown-labs/...
Delete the KEYCLOAK_IDENTITY/SESSION cookies
Wait few seconds and the page will stop responding

LabInstance does not clean secrets upon deletion

Minor aspect: when a VM instance is destroyed, the secrets associated with the ingresses are not deleted, polluting the namespace.

[Feature] Automating the conversion of CrownLabs VM Images

This feature request proposes to implement a mechanism to automate and simplify the creation of new VM images for CrownLabs, as well as the subsequent conversion and upload to the target Docker registry.

Workflow overview

The workflow currently envisioned for the creation of a new VM image (originated as a trade-off, hence not yet optimal) involves the following steps:

The final user (e.g. professor) downloads to his own PC a ready-to-customize base image available on the CrownLabs website (with all the tools required for CrownLabs already installed).
The final user executes the VM in VirtualBox to customize the base image and install additional software required for his needs.
The final user uploads the resulting VM image to a public location (e.g. Google Drive).
The final user logs into the CrownLabs dashboard and creates a new ImageConversionRequest, specifying the URL the VM can be downloaded from and the name to be assigned to the resulting image.
A Kubernetes operator reacts to the creation of a new ImageConversionRequest, creating a Job to download, convert and push the specified image to the Docker registry. The image name must be prepended by an identifier of the creator, to prevent name overlapping.
The final user is notified about the success or the failure of the conversion task through his dashboard.

Practical tasks to perform

In order to implement the steps 4 to 6, the tasks currently identified involve the following.

Back-end

Define the ImageConversionRequest CRD.
Define a Job to download, process and upload the VM image to the Docker Registry. Essentially, the tasks to be performed are those implemented by this script. Possible containers composing the job are:
1. curlimages/curl, to download the image from the url specified in the ImageConversionRequest;
2. diraimondo/virt-sparsify or equivalent, to convert the image in the qcow2 format;
3. kanico, to build the docker image and push it to the registry;
Develop an operator which, upon creation of an ImageConversionRequest resource, creates a new Job according to the previous specifications to process the CrownLabs VM image. The operator should also update the status of the resource, to provide feedback regarding how the conversion is proceeding. It is proposed to use kopf to develop the operator using python. An example of the usage of python to interact with Kubernetes resources is provided by this script.
Define the different resources necessary to run the operator, including the permissions to access ImageConversionRequest resources (both by the operator and a subset of end-users).
Write a README file describing the operator and the overall approach.

Front-end

Develop a form to create a new ImageConversionRequest, specifying the URL the image can be downloaded from and the destination name.
Track the status of existing ImageConversionRequest resources, to provide feedback regarding how the conversion is proceeding and whether it succeeded or failed.

Possible Follow-ups

Integrate the available object storage to prevent the users from being required to upload the images to a third-party service.

[New feature] Logging external network connections

VMs running on clusters may have full Internet access. This may pose non trivial security risks, as traffic is NATted and hence it may not be possible to discover to the originating VMs in case of malicious actions are carried out on the Internet.
This feature request proposes the creation of a "per-session logger", per-VM, which is able to save a record containing the most important information associated to each TCP connection established from the VM to the outside world.

[Feature] Add Github star button

Is your feature request related to a problem? Please describe.

Our repo has not a lot of GitHub traffic

Describe the solution you'd like

Adding a GitHub star button could ease the user to add a star to the project and come in contact with it
By adding this button we could remove the GitHub logo in the footer (which get never displayed)

Additional context

It is fairly easy to add the button to the UI
It is a bit late since now fewer people will use CrownLabs but could still be useful

Related to #213

The labinstance operator crashes if labNum is non numeric

Describe the bug
When a labinstance referencing a labtemplate with field labNum non numeric is created, the operator crashes.

To Reproduce
Steps to reproduce the behavior:

Create a new labtemplate, setting labNum to any non numeric string
Create a new labinstance, referring to the previously created labtemplate
Check the status of the operator

Expected behavior
The reconciliation should be completed correctly or, alternatively, the insertion of the incorrect labtemplate should fail in the first place.

Additional context
In the short term, I would focus on fixing this bug to enable non-numeric values. Apparently, it is due to a mismatch between the CRD definition considering the labNum value of type string:

CrownLabs/operators/labInstance-operator/labTemplate/crd/bases/template.crown.team.com_labtemplates.yaml

Lines 47 to 48 in 630f0ea

 labNum: 

 type: string

and the API, where labNum is of type numeric:

CrownLabs/operators/labInstance-operator/labTemplate/api/v1/labtemplate_types.go

Line 28 in 630f0ea

LabNum resource.Quantity `json:"labNum,omitempty"`

Yet, I believe that in the longer term it is necessary to develop a cleaner and simpler version of the labtemplate CRD. Then, this field may be no longer necessary

Show resource consumption on the user's dashboard

In case a huge load is started on the VM, the machine starts responding very slowly. However, the user is not aware of this, hence he may conclude that the cluster is not working properly.
Add a widget in the dashboard that prints the current usage of the user in term of cpu, memory, network, in order to inform when some critical parameters exceed a given threshold.

The NextCloud disk is not mounted in xubuntu 18.04

Problem description

The cloud-init script fails to configure and mount the NextCloud disk in xubuntu 18.04

Logs

cloud-init log

2020-04-20 16:35:37,943 - stages.py[INFO]: Loaded datasource DataSourceNoCloud - DataSourceNoCloud [seed=/dev/vdb][dsmode=net]
2020-04-20 16:35:38,084 - stages.py[INFO]: Applying network configuration from fallback bringup=False: {'ethernets': {'enp1s0': {'dhcp4': True, 'set-name': 'enp1s0', 'match': {'macaddress': '26:54:06:c4:9b:b0'}}}, 'version': 2}
{'type': 'physical', 'name': 'enp1s0', 'mac_address': '26:54:06:c4:9b:b0', 'match': {'macaddress': '26:54:06:c4:9b:b0'}, 'subnets': [{'type': 'dhcp4'}]}
{'enp1s0': {'dhcp4': True, 'set-name': 'enp1s0', 'match': {'macaddress': '26:54:06:c4:9b:b0'}}}
2020-04-20 16:35:38,850 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
  in "<unicode string>", line 8, column 6:
      - [https://crownlabs.polito.it/clou ... 
         ^
found unexpected ':'
  in "<unicode string>", line 8, column 11:
      - [https://crownlabs.polito.it/cloud/rem ... 
              ^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 16:35:38,862 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
  in "<unicode string>", line 8, column 6:
      - [https://crownlabs.polito.it/clou ... 
         ^
found unexpected ':'
  in "<unicode string>", line 8, column 11:
      - [https://crownlabs.polito.it/cloud/rem ... 
              ^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 16:35:38,862 - util.py[WARNING]: Failed at merging in cloud config part from part-001
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 140, in handle_part
    self._merge_part(payload, headers)
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 116, in _merge_part
    (payload_yaml, my_mergers) = self._extract_mergers(payload, headers)
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 95, in _extract_mergers
    mergers_yaml = mergers.dict_extract_mergers(payload_yaml)
  File "/usr/lib/python3/dist-packages/cloudinit/mergers/__init__.py", line 83, in dict_extract_mergers
    raw_mergers = config.pop('merge_how', None)
AttributeError: 'NoneType' object has no attribute 'pop'
2020-04-20 16:35:39,478 - __init__.py[INFO]: Created new group lxd
2020-04-20 16:35:48,094 - cc_apt_configure.py[INFO]: No custom template provided, fall back to builtin
2020-04-20 17:13:30,313 - stages.py[INFO]: Loaded datasource DataSourceNoCloud - DataSourceNoCloud [seed=/dev/vdb][dsmode=net]
2020-04-20 17:13:30,960 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
  in "<unicode string>", line 8, column 6:
      - [https://crownlabs.polito.it/clou ... 
         ^
found unexpected ':'
  in "<unicode string>", line 8, column 11:
      - [https://crownlabs.polito.it/cloud/rem ... 
              ^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 17:13:30,986 - util.py[WARNING]: Failed loading yaml blob. Invalid format at line 8 column 6: "while scanning a plain scalar
  in "<unicode string>", line 8, column 6:
      - [https://crownlabs.polito.it/clou ... 
         ^
found unexpected ':'
  in "<unicode string>", line 8, column 11:
      - [https://crownlabs.polito.it/cloud/rem ... 
              ^
Please check http://pyyaml.org/wiki/YAMLColonInFlowContext for details."
2020-04-20 17:13:30,987 - util.py[WARNING]: Failed at merging in cloud config part from part-001
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 140, in handle_part
    self._merge_part(payload, headers)
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 116, in _merge_part
    (payload_yaml, my_mergers) = self._extract_mergers(payload, headers)
  File "/usr/lib/python3/dist-packages/cloudinit/handlers/cloud_config.py", line 95, in _extract_mergers
    mergers_yaml = mergers.dict_extract_mergers(payload_yaml)
  File "/usr/lib/python3/dist-packages/cloudinit/mergers/__init__.py", line 83, in dict_extract_mergers
    raw_mergers = config.pop('merge_how', None)
AttributeError: 'NoneType' object has no attribute 'pop'

[New feature] Security assessment of CrownLabs

CrownLabs includes a set of different services, each one with its strengths and weakenesses. Furthermore, it leverages an innovative interaction with the API server, in which each client (running in the browser) directly interacts with the API server itself, without an intermediate backend.
This feature request suggests the opportunity of a proper security assessment to make sure that the current install is secured enough to be offered on the public Internet.

Change name of "persistent" disk into "myfolder"

A better name allow people to understand better what is this folder about.
Suggest to change it into "myfolder" or (probably better) "MyFolder", whatever you folk feel it more appropriate.

Favicon doesn't show in React part of Crownlabs

Noticed that the crown favicon is displayed on the base url but after entering the app and logging in the favicon becomes the default one of the theme. @palexster could this be related to the ingress configuration?

Exam Mode

Objective: CrownLabs is quite effective to host laboratories and manage different courses. Particularly, desktop sharing is very useful to support the work of multiple people at the same time (e.g., teammates), the help from instructors, but also from other people (e.g., other students) who can provide an help to a student who is blocked on some problems.
However, desktop sharing has to be disabled in case of exam, allowing only professors to connect to student's desktop.
So, it would be nice to support an "exam mode" in CrownLabs, where:

Instances are only accessible to candidates and professor for assistance
Network traffic with internet and other vms has to be blocked
Student should be able to hand in an elaborate (e.g; through Shared Storage)

Selected LabInstance index

User type: Professor
Device: Any

When selecting a LabInstance sometimes, due to wrong indexes assigned to those rows, multiple laboratories are selected the same time. This happens when you are a Professor, meaning that there are both LabInstances as a normal user and as a privileged one.

Crownlabs logout

This issue is probably related to #168. When we logout from Crownlabs the same operation isn't performed in NextCloud: the session with NextCloud remains open and it is still possible to access our storage.

Upload to NextCloud no longer works

Describe the bug
It is impossible to upload new files to NextCloud (both via the web interface and through webdav). Indeed, the file appears to be uploaded but then NextCloud displays a popup reading "An unknown error has occurred" and the file is deleted. Instead, empty folders and files can be created from the web interface correctly.

To Reproduce
Steps to reproduce the behavior:

Go to https://crownlabs.polito.it/cloud
Log-in
Try to upload a new file
NextCloud will show a popup reading "An unknown error has occurred"

Expected behavior
The file should be uploaded correctly

Desktop (please complete the following information):

OS: Ubuntu/Debian
Browser Firefox/Chrome
Version Up-to-date

Additional context
One of the database pods is returning strange logs (may or may not be related to this issue):

$ kubectl logs -n nextcloud nextcloud-db-cluster-1

/var/run/postgresql:5432 - rejecting connections
2020-05-15 09:07:29,766 INFO: Lock owner: nextcloud-db-cluster-2; I am nextcloud-db-cluster-1
2020-05-15 09:07:29,767 INFO: Still starting up as a standby.
2020-05-15 09:07:29,768 INFO: Lock owner: nextcloud-db-cluster-2; I am nextcloud-db-cluster-1
2020-05-15 09:07:29,768 INFO: does not have lock
2020-05-15 09:07:29,769 INFO: establishing a new patroni connection to the postgres cluster
2020-05-15 09:07:30,492 INFO: establishing a new patroni connection to the postgres cluster
2020-05-15 09:07:30,518 WARNING: Retry got exception: 'connection problems'
2020-05-15 09:07:30,519 INFO: Error communicating with PostgreSQL. Will try again later

Additionally, if I remember correctly, the nextcloud-db-cluster-1 pod has been killed forcefully while draining the node to upgrade the system O.S.

LabInstance does not become green (ready) on the dashboard, even if VM is ready

After creating the VM, the dashboard leaves the VM as "In progress", even if the VM is correctly ready and accessible:

apiVersion: instance.crown.team.com/v1
kind: LabInstance
metadata:
  creationTimestamp: "2020-03-30T13:19:01Z"
  generation: 1
  name: landc-lab1-alex.palesandro-5877
  namespace: tenant-alex-palesandro
  resourceVersion: "17484434"
  selfLink: /apis/instance.crown.team.com/v1/namespaces/tenant-alex-palesandro/labinstances/landc-lab1-alex.palesandro-5877
  uid: c930f429-a7ae-42f8-97c3-b46307ff9e4a
spec:
  labTemplateName: landc-lab1
  labTemplateNamespace: course-landc
  studentId: alex.palesandro
status:
  observedGeneration: 1
  phase: VmiReady
  url: apiVersion: instance.crown.team.com/v1
kind: LabInstance
metadata:
  creationTimestamp: "2020-03-30T13:19:01Z"
  generation: 1
  name: landc-lab1-alex.palesandro-5877
  namespace: tenant-alex-palesandro
  resourceVersion: "17484434"
  selfLink: /apis/instance.crown.team.com/v1/namespaces/tenant-alex-palesandro/labinstances/landc-lab1-alex.palesandro-5877
  uid: c930f429-a7ae-42f8-97c3-b46307ff9e4a
spec:
  labTemplateName: landc-lab1
  labTemplateNamespace: course-landc
  studentId: alex.palesandro
status:
  observedGeneration: 1
  phase: VmiReady
  url: https://crownlabs.polito.it/aa4b6712-63d4-41c9-9cd0-ff67ad5f7b77

I would expect that after the login, the VM becomes green.

Namespace filter

The filter on namespace at the moment is very simple: we only use a namespacePrefix argument which allows to run the reconcile method only on LabInstances in the namespaces with that prefix.
It would be nice to have a more generic whitelist, for example something similar to matchLabels used by kubernetes.
@palexster is working on this in #149

Persistent disk improvements

Currently, the content of the persistent folder on VMs is lost when we stop and restart the machine. The content of this folder should persist across VM reboots.
The content of the persistent folder is not shared among VMs of different labs. This folder should be a property of the user, hence should be mounted on all VMs belonging to the same user.

Increase the level of information when getting a LabInstance

So far, when we perform kubectl get labinstances.instance.crown.team.com we obtained just the basic information about resources. It could be nice to customize the CRD definition in order to have at least:

Status
Node
URL

/webservice refactoring

This issue defines the path to refactor the code of the webservice folder.

Evolution path

The path described aims to provide a way to refactor the code while still being able to develop new features. This is not an ideal solution, because there will be some new features that will not match the new refactor style until it is fully completed. Also, some time will be spent to refactor the new features in the 'new' way. After the refactoring is complete all new parts of the code will need to respect the new guidelines and architecture.

Phase 1 - cleaning

setup ESlint (for the moment just to get used to it, is not required for commit)
clean package.json
- redistributed packages between dependencies and devDependacies
- remove unnecessary packages
optimize MUI imports (change non-defaults to defaults and use @material-ui/core)
remove unnecessary components (Home.js)
completely remove bootstrap and w3c ands unsued css files

Phase 2 - prepare to major code changes

remove unnecessary CSS
setup ESlint (on the pre-commit hook and GitHub action)
add coding guidelines to repo guide for contributing

from this point forward all coding guidelines and ESLint must be respected

Phase 3 - write documentation and refactor code

move dashboard to a subroute (could be /app) #318
create a single component for lists, which will be filled with different props
improve mobile visualization
make only one watcher active at the time (prof or student)
write what the GUI has to do
write component tree of new project
write documentation of the project and start refactoring: these 2 steps can be done in parallel, no need to have the all documentation written before starting refactoring but can do a part of docs and then implement it. This is to avoid freezing development.
confirm development pipeline in order not to do->decide->redo:
1. propose mockup (tools or code) to other team members on Slack
2. team accepts final mockup on Slack
3. define react implementation (components, components tree if design is too complex)
4. implementation

Additional out-of-order refinements

look more into feasibilty of a /login page - due to the high integration with keycloak, is not easy to make it in React, it could be possible to make it using Keycloak template
check for bundle size optimization with webpack tree-shaking (especially for K8s_library)
improve the compatibility of npm run prettify-all

Note on ESLint

The adoption of ESLint will be incremental: first, disable to familiarize with it, then with some feature disabled, then in future some other feature could be added for better code quality

Optimize repo SEO to reach community

Complete checklist proposed here

add PR template
add issue template
add code of conduct
add contributing guidelines

Useful links:

VM not reported as "started" even after long time

The VM is listed in the "starting" state (orange hourglass) forever, without recognizing that the VM is actually ready and the user can connect to it.
Bur reported mostly on Firefox and Safari; some sporadic case in which this bug is also present on Chrome.

Temporary solution: log out of Crownlabs, re-enter again, and usually the VM has become green.

Move Infrastructure files to CrownOps

In order to go public, we have to move those files to CrownOps, since they are specific to our infrastructure.

Cluster-setup directory move
Add README.md to cluster-config explaining why those files are here and why they are useful for a user.
Move cluster-config to another name? Infrastructure?
Move user generation script away from infrastructure part to an application folder
Rename folders by function and not by technology
Move ingresses to pointed feature

Crownlabs webpage drains resources

The Crownlab webpage started to drain too many resources (in my case almost 3GB of RAM and all the CPU cores)

Add VM creation sample

Currently there are no instructions about how to create a "void" VM here.
I would suggest to move the procedure we have for Computer Network courses here, at least for what concerns the initial steps.

Names of resources created by the operator

The name of the resources is based on the LabTemplate name, there are no checks of possible troublesome names (for example max length, accepted characters...) and also there are some mismatches in the resources names (for example the LabInstance has a random number different from the one of all other resources)

VNCserver shows blank screen when nodes are on medium load

VNCServer does not behave correctly, by presenting a blank screen when scheduled on a node with medium load. In particular, vnserver is started long before the network becomes available in the VM.

Strategy:

Let VNCService wait for network-online.target before being started and noVNC consequently.
Reduce the amount of enabled services in the VM to reduce disk operations (mandb, log-rotate)

[New feature] Container-based lab-instances

Problem

Currently our laboratories runs in VMs. Each virtual machine requires its underlying operating system, and then the hardware is virtualized. VMs, however, can take up a lot of system resources. Each VM runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of RAM and CPU cycles and implies low scalability.

Possible Solutions

A possible solution is to use Containers.
With containers, instead of virtualizing the underlying computer like a virtual machine (VM), just the OS is virtualized. Docker containers are easily portable and less resource-intensive than virtual machines and this is what we need in our dynamic environment. When Docker runs a container, it runs an image inside it. This image is usually built by executing Docker instructions, which add layers on top of existing image or OS distribution.

We can use Containers to follow two different flavors:

Build a full image with a Desktop Environment and allow users to use it like we are doing now with vms.
So we can build our laboratories from a same base layer and this allow us to use less storage in the registry.
This picture below show's performances consumed by a running container in my laptop that runs:
Ubuntu 18.04 with lxde DE and novnc setup. As you can see in idle uses 300 MB of memory and with firefox uses around 470MB.

If you want try yourself you can run this command
docker run -p 6080:80 -v /dev/shm:/dev/shm dorowu/ubuntu-desktop-lxde-vnc and when container starts browse http://127.0.0.1:6080/
We can also change DE and use for example XFCE4 or share audio only if users runs linux distos. For more information you can see here
Is the best one in my opinion.
Instead of building an entire image with Desktop Environment we can use containers to deploy the specific application for the specific laboratory. An example is to run GNS3 in a container like done here. With this method we will have some performance improvement,but we lose functionality:
Crownlabs will be not a desktop exposer but only an application exposer, so will lose a lot of great stuffs done.
We must study how to install the specific application that we will want to expose and all the minimum dependencies required to work.
How we can share not only a single window of the application but also the pop up windows like terminals of the various routers in GNS3 trough novnc?

Potential Benefits

Improved Scalability
Faster boot
Layered Filesystems for lighter downloads and better caching

Potential Risks

VMs are more secure as compared to Containers. A container have a lot of security risks, and vulnerabilities as the containers have shared host kernel. In a virtual machine, you don’t get direct access to the resources, and hypervisor is there to restrict the usage of resources in a VM. So we must check if we will not have new security problems and how to solve it.
Adapt our environment to run containers instead of VMs so new testing and bug fixing.
Some actions cannot be done in a container with normal privileges (e.g. Tcpdump issues, bind on 0.0.0.0)

Possible steps

Adapt the LabTemplate/LabInstance API to support both VM and container workloads
- Extract the kubevirt details
- Add the Type field
Adapt the operator logic to create different objects if the target laboratory is a vm or a container.
Define Pod Security Policies (PSP) to constraint the pod capabilities.
Adapt VM generation scripts to work with containers.
Frontend: Show if the laboratory is a VM or not a VM, and accessory information
Operator: Introduce a different check if the container laboratory is ready
Define a base Dockerfile which respects all the security best practices

Add a metric about VM starting time in Prometheus

Needed to monitor how the system reacts with the one of the most important parameter for user experience, i.e., how fast is the service to start.

Minor problems in the CrownLabs dashboard

Problem description

The logout button does redirect the user to the login form instead of the home page.
The GitHub image in the footer is not displayed correctly (maybe it is related to the ingress configuration).
The Student/Professor Area button stays selected after being clicked.

Impossible to delete student VMs from the professor view

Describe the bug
When deleting a student's labinstance from the professor view, the deletion fails with a "Resource not found, probably you have already destroyed it" error message. Then the instance it is removed from the list, but not actually deleted. It is still present after reloading the page. Conversely, this operation completes correctly if the instance was created by the professor himself.

To Reproduce
Steps to reproduce the behavior:

Go to crownlabs.polito.it
Open the professor view
Select one running laboratory (not yours)
Click the delete button
See the Toastr error

Expected behavior
The instance should be deleted correctly.

Additional context
Looking at the DELETE request, the problem is related with the target namespace of the operation. Indeed, it is always the one associated with the professor, while it should be the one where the instance resides. This is one example of wrong url (incorrectly pointing to my namespace): https://apiserver.crown-labs.ipv6.polito.it/apis/instance.crown.team.com/v1/namespaces/tenant-marco-iorio/labinstances/benchmarks-xubuntu-base-oc-0000.

Prioritize VNC connection in the user's VM?

When the load in the VM is very high, VNC connection looks difficult.
Try to prioritize the VNC daemon (and the companion services, such as websockify) in order to see if this improves the behavior of the machine.

[EPIC] Move VM persistent storage to Webdav

Currently VMs mount the persistent disk as block storage.
Move to NFS, so that multiple VMs can mount the same disk, which can be shared among different labs.

Integrate CrownSite in Crownlabs login process

The login button should be encapsulated in the new blog page and the login page should be replaced by the blog.

VM status watch

At the moment we watch the service which exposes the VM to know if it's in Ready state. This control loop sometimes gives some problems and in case of errors the goroutine continues to be executed forever.
To do: think about a different method to get the actual status of the VM.

NextCloud logout

This issue is probably related to #166. The logout button of NextCloud doesn't properly work. When we try to logout, there is a period of time in which the page loads something, then the NextCloud page refreshes without being logged out (the session remains open).

netgroup-polito / crownlabs Goto Github PK

crownlabs's People

Contributors

Stargazers

Watchers

Forkers

crownlabs's Issues

Problem description

Workaround

Additional information

Problem description

How to reproduce

Workflow overview

Practical tasks to perform

Back-end

Front-end

Possible Follow-ups

Problem description

Logs

Evolution path

Phase 1 - cleaning

Phase 2 - prepare to major code changes

Phase 3 - write documentation and refactor code

Additional out-of-order refinements

Note on ESLint

Problem

Possible Solutions

Potential Benefits

Potential Risks

Possible steps

Problem description

Recommend Projects

Recommend Topics

Recommend Org