funcx-web-service's People
funcx-web-service's Issues
Merge status and result endpoints
We should merge these two such that status returns the result in a details branch of the output.
Clean up prod options in config
Create a test environment
We need a separate test environment we can use to work through updates.
Heartbeat endpoints
Use heartbeats to maintain the online status of an endpoint. Otherwise tear down the ZMQ server for it.
Allow registration of existing containers
Add REST endpoints to the web service to allow containers to be added (e.g., send a path to a docker file) and get back a uuid for it. At the same time we should add a new parameter to the function registration endpoint so we can push a container uuid and map it internally. These changes will allow DLHub to register its containers during the servable publication flow and associate the container with the servable's run function.
Add test_endpoint endpoint
Provide an endpoint to check whether an endpoint is online.
Insert new users on first login
We should be inserting the user into the user table in the database the first time they login via either the SDK or website.
Web sockets for web service
Adding web sockets into the web service. This is needed to allow the service to push back function results to the client.
Dev ops and continuous integration
Use code pipeline with a github hook to deploy the master branch to elastic beanstalk on changes.
Validate POST inputs
We should validate the inputs to the various post routes. Request parsers look like a nice way to do this.
Improve throughput performance
We seem to hit a wall with ZMQ when sending tasks too quickly from the redis worker to the zmq broker. It looks like it gets caught up in a REQ/REP deadlock, but I'm not exactly sure why.
Track completed tasks in the database
We'll end up deleting tasks from redis, so we should persist them in the postgres database.
Tracking Endpoint Status
We should have some way of tracking the status of an endpoint, since many client-side applications will allow users to enter their own endpoints (and there may be typos, endpoints may go offline, etc.). This should work via the following.
- Add a flag to Redis that says whether a given endpoint is online/offline. This can be edited when we heartbeat.
- Add an endpoint to the web service to get_endpoint_status.
Remove _ prefixed functions
Functions should not have _ prefixes.
Raw JSON inputs and outputs
To accommodate several use cases (Automate, MERF, anything without the sdk) we need a mechanism to pass through and act on raw json inputs. The initial plan is to have a flag in the submission request to specify a serialization mechanism to use and have the web service append it to the task_header. The serializer will then use this to guide deserializtion and again when serializing the outputs.
Toggle online/offline on heartbeat requests
In-code, change a field in the database (to be added) whether the endpoint is online or offline. Heartbeats already in place.
Use Globus groups to enable multi-tenancy
Allow a Globus group to be specified for an endpoint and then restrict access to users within that group.
Refactor all the auth checking into an @authenticated decorator
Forwarder service
We need a mechanism for the web service to receive requests for endpoint registration, and create a forwarder for each endpoint. The funcx endpoint_rearchitect branch has this in development, where we have a REST interface that manages forwarder creation. We could use this as-is on the web-service as an internal micro-service.
We'll need the forwarder service to live on dedicated EC2 nodes with elastic-ips to which endpoints can connect (open port 30k to 60k). We could later add a load balancer once we need the forwarder nodes to scale.
Detect and retry idle tasks
We need to iterate through the cache and find tasks still in a pending state such that we can restart them. To start with we can poll once a minute and scan through the "task:" set looking for those that are pending and have been for some time.
REST api to terminate endpoints
For the endpoints to be cleanly terminated, we need support from the web-service and the forwarder to cleanly terminate endpoints and check liveness.
Move all auth components into a dedicated module
Move all auth related flask routines and other utility functions into "auth.py"
Make task payload a dictionary
Currently the task header encodes a bunch of information such as task_id, container_id, container_address, serialization method. We should put all this info into the payload as a dict to make it more extensible.
Properly encode database insertions
Some of our database insertions are not using the proper insert tools meaning the inputs are not stringified and escaped. This is most notable on function definitions as things like single quotes are causing it to fail.
Apply static-checkers to function fields.
There are a number of 'out-of-the-box' static checkers available, so would be nice if we could let users see what's wrong with their functions before they submit them to our service.
Multiple endpoints
Create a new broker for each endpoint and return port numbers.
Switch over to uuids for endpoints
Use uuids instead of user friendly endpoint names.
Remove duplicated code between APIs
There is a lot of duplication between the execute and status endpoints, and the automate API is essentially exactly the same. We should refactor some of this to make it cleaner and easier to extend.
Add Automate Action Provider API
We need to integrate the funcX platform with Automate. To do this it must expose an Automate Action Provider.
Refactor api and automate api
These two do essentially the same thing but return different structures. We should either extract the common parts or standardize around the automate api and return that structure for all requests.
Support public group endpoints
We currently support shared endpoints by checking if a user is a member of a group associated with an endpoint. However, this doesn't work for public groups. We need to also check the group policies to see if they are set to public to determine whether someone is allowed to use it. This is needed for demo endpoints that anyone can use.
Move service to Elastic Beanstalk
Our current Web service is a basic Flask app replicated using gunicorn with 3 threads. To make the front-end more scalable we should deploy the app using Elastic Beanstalk and add a load balancer. This will require taking some of the lookup-logic (e.g., finding the function code etc.) and moving it toward the "workers" that pull tasks from the Redis store.
Integrate forwarder service to the top level API
We have a new forwarder micro service that we want the top level web service to trigger as part of the endpoint registration process. Ideally we want to have these changes be merged to dev, have it be tested there before merging to master, which mostly likely will break the current API.
There's an example of how the forwarder API needs to be invoked here : https://github.com/funcx-faas/funcX/blob/endpoint_rearchitect_%238/funcx/endpoint/endpoint.py#L114
Some minimal docs on the forwarder service are here : https://github.com/funcx-faas/funcx-web-service/tree/dev/forwarder
Attach an Automate API
Finish adding automate requirements including introspect and validation.
Add heartbeats to forwarder
We want the forwarder to send heartbeats to endpoints and terminate itself if the endpoint stops responding. This will allow endpoint re-registration as we will safely cleanup zombie forwarders, such that endpoints can create new ones on-demand.
Set up test webservice
Put demo web-GUI over the web-service
Build a simple web-GUI in Flask that can authenticate users, let users add/view their existing functions (and runtimes?), and their resource configs. Should dynamically return function results in the GUI like in AWS Lambda.
Restrict access to forwarder nodes.
Currently forwarder nodes can be reached from the web over the REST ports. They should only be accessible from within the security group, such that the only the EBS nodes can make REST calls to them.
Only allow https
The user facing service should deny any incoming http connections.
Redis integration for task tracking.
We want to track tasks within a Redis store for reliability. The plan is to insert a redis store between the Web service and the broker such that we can't lose tasks once they have been received.
Clean up task tracking
The create_task function should be split into two functions that either create tasks or insert results.
Initiate code pipeline in response to releases
Code pipeline is currently running every time there is a push to the master branch. We should change this to only run on releases.
Fix user, endpoint, and function caching
We should do this with a module so we can extend how we manage these caches.
Support task cancellation
We need to support cancelling tasks through the API.
Have forwarder recover from malicious inputs
Malicious bots etc. can send invalid input to the forwarder. We need to capture failures during deserialization, such as below such that the forwarder doesn't stop.
Traceback (most recent call last):
File "/home/ubuntu/funcx_test/lib/python3.6/site-packages/funcx/executors/high_throughput/executor.py", line 391, in _queue_management_worker
msg = pickle.loads(serialized_msg)
_pickle.UnpicklingError: invalid load key, '\x00'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/funcx_test/lib/python3.6/site-packages/funcx/executors/high_throughput/executor.py", line 395, in _queue_management_worker
raise BadMessage("Message received could not be unpickled")
parsl.executors.errors.BadMessage: Message received could not be unpickled```
Caching for users, functions, and endpoints
...to reduce # of DB accesses and therefore web service latency. Will need to keep in mind that endpoint DB access also checks if endpoint online.
Web service throws Globus 500 error on on all client .run() commands after first
This is @annawoodard's coffeea+funcx use case (uses master branch of funcx as of 10/10/19).
I set up the endpoint using the normal funcx-endpoint configure
and funcx-endpoint start
. If I register then .run() a hello-world function, it executes and returns successfully on the first .run() command by the client.
But it fails on the second (and subsequent tries) with the following:
---------------------------------------------------------------------------
GlobusAPIError Traceback (most recent call last)
<ipython-input-61-df318ff7cb66> in <module>()
1 payload = [1, 2, 3]
2
----> 3 task_id = client.run(*payload, endpoint_id=ndt3_uuid, function_id=func_uuid)
4 print(task_id)
~/.local/lib/python3.6/site-packages/funcx/sdk/client.py in run(self, endpoint_id, function_id, asynchronous, *args, **kwargs)
162
163 # Send the data to funcX
--> 164 r = self.post(servable_path, json_body=data)
165 if r.http_status is not 200:
166 raise Exception(r)
~/.local/lib/python3.6/site-packages/globus_sdk/base.py in post(self, path, json_body, params, headers, text_body, response_class, retry_401)
286 text_body=text_body,
287 response_class=response_class,
--> 288 retry_401=retry_401,
289 )
290
~/.local/lib/python3.6/site-packages/globus_sdk/base.py in _request(self, method, path, params, headers, json_body, text_body, response_class, retry_401)
551 "request completed with (error) response code: {}".format(r.status_code)
552 )
--> 553 raise self.error_class(r)
554
555
GlobusAPIError: (500, 'Error', '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator at \n root@localhost to inform them of the time this error occurred,\n and the actions you performed just before this error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n</body></html>\n')
Fix Redis interactions in forwarder
The forwarder throws an exception due to trying to increment the total core hours counter with float. In addition, we call redis_client.close() but redis doesn't have a close function.
Asynchronous tasks
Add support back in for async tasks.
Support DLHub integration
Add endpoints to determine which executors an endpoint should be starting to let DLHub models be served via funcX.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.