arrebol's People
arrebol's Issues
create a cli
we need job submission and status
create static resource pool
Fix REST API
There are some troubles in the REST API that might confuse the users. Some of them are listed below, but the whole API must be checked for problems like these.
- The system returns 400 (Http Bad Request) when a GET request is done using an invalid Job Id. It should be 404 (Http Not found).
- The system returns 201 (Http Created) when a GET request is done using a valid Job Id. It should be 200 (Http OK).
create a new doc to explain arrebol operation
add version endpoint
Retrieves the execution status of all jobs in a given queue of a such label
Add to the api the get of jobs from a queue according to their label.
The endpoint: /api/queues/{queue_id}/jobs?label=awesome_job
Fetch worker deploy scripts
The worker deploy scripts are in a separate repository. It is necessary to move them here.
Using same taskspec id in different jobs
Probably using the same id in a TaskSpec in different jobs can cause exceptions.
Fix API doc
The current API doc of the system is obsolete.
Allow to execute tasks in a isolated way.
With docker, by example.
create job model
support taksSpec in the job model
"label": "mk-dir",
"tasksSpecs": [
{
"id": "TaskNumber-0-36b8d41a-8611-4468-93ee-40f4140c7555",
"spec": {
"image":"ubuntu",
"requirements": {
"DockerRequirements": "DockerMemory == 1024 && DockerCPUWeight == 512"
}
},
"commands": [
"mkdir test-dir",
"cd test-dir",
"touch test.file"
],
"metadata": {
"owner": "admin",
"group": "admin"
}
}
]
}```
allow to execute remote commands
allow to add jobs to specific queue
allow to list available queues
add cancel job endpoint
Use Digest from Image
Use digest from a docker image in the image field of task and verify that it works
remote user is not set when installing worker
When deploying the worker, even if the user sets remote_user in host.conf, the user used is ubuntu, which is hardwired in ansible.cfg file. Also, deploy-worker.yml assumes that the software will be installed in /home/ubuntu. It should be /home/<remote_user>.
Code refactoring based on the Google Java style
Currently this repository does not follow any coding style, so it was defined that it must follow the Google Java Style described here
Arrebol deployed in a K8s cluster accessing a K8s API server
The Arrebol has the K8s worker type, which uses the Kubernetes API Server to submit jobs. For this, we created a proxy server, using kubectl proxy
command, which proxies the Kubernetes API to localhost interface and after that we can put the proxy address in the arrebol configuration file.
Given this context, when we deploy the arrebol in a K8s cluster, we also need to make it have access to a proxy service in the localhost. For this, it is necessary to run a k8s proxy in a sidecar container in the arrebol pod, so that other processes in any container of the pod can access it. This solution is described in the K8s doc.
Implement: REST API for Task Resource
Implement the following ENDPOINTS:
- POST Task (add a new task to list of pending tasks)
- POST TaskList
- DELETE Task (remove a task from list of pending tasks if it is there)
- GET TaskStatus (get task's status)
Sync docs with the new code.
As in the title.
create a new doc to give an overview of arrebol design
add the support for system tests
the system test suit reads a conf file which indicates the URL of a arrebol deploy to test, run the test cases against the systems and report the results
Import new golang code
As in the title.
allow to retrieve jobs by label
In the current version, Arrebol retrieves information for a job from the returned job ID when previously created. However, it should also be possible to retrieve jobs by their labels. Note that there may be jobs with equal labels, which results in a list of jobs matching the query. For example, if there are three jobs:
A {
id: 'a2-s3-d7-w8',
label: 'downloading files',
...
}
,
B {
id: '3s-a2-03-pd',
label: 'testing the network',
...
}
and
C {
id: 'ad-p0-w8-e9',
label: 'downloading files',
...
}
If the query looks for jobs with labels 'downloading files' it should return a list of jobs A and C.
Adding Sphinx to documentation
Use a Sphinx tool that generates documentation
Arrebol does not have access to the Worker host
Amanda Calatrava deployed Arrebol using the docker swarm and reported a network problem: Arrebol does not have access to the Worker Host, which was on the same network.
When she deploy the containers by hand, using the ‘bridge’ network instead 'overlay', she can ping successfully the worker.
Create a new task executor to work with K8s
In addition to the Docker and Raw versions, Arrebol now needs to know how to work with a Kubernetes cluster as a worker to process its jobs. There is a Java library for Kubernetes client at: https://github.com/kubernetes-client/java it would be important to use it to support this communication between Java and the K8s API.
Implement persistence of objects
Following models need to be stored in some database:
- Tasks
- Resources
After implementing persistence, it will also be needed to read from datastore before starting Arrebol-Service, to map active resources.
Update worker ansible files for Saps
To setup a worker to run a task from Saps is needed more configurations than the default worker.
Currently, we have documented instructions for preparing a worker for Saps, here, and the ansible scripts to setup a default worker, here.
It would be interesting for the ansible scripts to make this setup for the Saps Worker.
add create job endpoint
Add requirements in job k8s for CPU and RAM info
Currently, the version of Arrebol to work with the k8s cluster does not use the requirements to inform about minimum and maximum CPU and RAM that needs/can be consumed by the k8s worker POD container.
Useful link: Managing Resources for Containers
arrebol is showing low performance on long runs
After a long run (three days), Arrebol is showing low performance. For example, SAPS (a client of the Arrebol service) blocked for some minutes to finish a job submission (see bellow log)
2020-01-28 11:43:25 INFO ArrebolRequestsHelper:91 - Building JSON body of Job ...
2020-01-28 11:43:25 INFO ArrebolRequestsHelper:97 - JSON body: {
"label": "9d6bba88-2291-11ea-80dd-fa163e1143c1",
"tasks_specs": [
{
"label": "9d6bba88-2291-11ea-80dd-fa163e1143c1#fogbow/preprocessor@sha256:145ca9ebe92cafed4f06ff0ba0823944b6e63f02e03f0024e82b8558c55f4a5b",
"requirements": {
"image": "fogbow/preprocessor@sha256:145ca9ebe92cafed4f06ff0ba0823944b6e63f02e03f0024e82b8558c55f4a5b"
},
"commands": [
"rm -rf /nfs/9d6bba88-2291-11ea-80dd-fa163e1143c1/preprocessing",
"mkdir -p /nfs/9d6bba88-2291-11ea-80dd-fa163e1143c1/preprocessing",
"bash /home/saps/run.sh /nfs/9d6bba88-2291-11ea-80dd-fa163e1143c1 landsat_7 215065 2011-07-22"
],
"metadata": {}
}
]
}
2020-01-28 11:51:55 INFO ArrebolRequestsHelper:64 - Job was submitted with success to Arrebol.
2020-01-28 11:51:55 DEBUG Scheduler:432 - Result submited job: c15933c0-1aa1-4244-9edd-ef8f47205ce2
we believe, there is high contention on shared data structured at arrebol side (those are protected by mutexes).
We also attached the arrebol log here (so one can debug and find the blocked threads):
https://drive.google.com/open?id=13OqMLrSbqWJ_ztWJthHpayCQvogqxkV9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.