Git Product home page Git Product logo

funcx-web-service's People

Contributors

avresl avatar benclifford avatar bengalewsky avatar dependabot[bot] avatar joshbryan-globus avatar knagaitsev avatar leeteresamaria avatar ryanchard avatar sirosen avatar tginsbu avatar theodore-ando avatar tskluzac avatar yadudoc avatar yongyanrao avatar zhuozhaoli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

funcx-web-service's Issues

Heartbeat endpoints

Use heartbeats to maintain the online status of an endpoint. Otherwise tear down the ZMQ server for it.

Allow registration of existing containers

Add REST endpoints to the web service to allow containers to be added (e.g., send a path to a docker file) and get back a uuid for it. At the same time we should add a new parameter to the function registration endpoint so we can push a container uuid and map it internally. These changes will allow DLHub to register its containers during the servable publication flow and associate the container with the servable's run function.

Insert new users on first login

We should be inserting the user into the user table in the database the first time they login via either the SDK or website.

Web sockets for web service

Adding web sockets into the web service. This is needed to allow the service to push back function results to the client.

Validate POST inputs

We should validate the inputs to the various post routes. Request parsers look like a nice way to do this.

Improve throughput performance

We seem to hit a wall with ZMQ when sending tasks too quickly from the redis worker to the zmq broker. It looks like it gets caught up in a REQ/REP deadlock, but I'm not exactly sure why.

Tracking Endpoint Status

We should have some way of tracking the status of an endpoint, since many client-side applications will allow users to enter their own endpoints (and there may be typos, endpoints may go offline, etc.). This should work via the following.

  1. Add a flag to Redis that says whether a given endpoint is online/offline. This can be edited when we heartbeat.
  2. Add an endpoint to the web service to get_endpoint_status.

Raw JSON inputs and outputs

To accommodate several use cases (Automate, MERF, anything without the sdk) we need a mechanism to pass through and act on raw json inputs. The initial plan is to have a flag in the submission request to specify a serialization mechanism to use and have the web service append it to the task_header. The serializer will then use this to guide deserializtion and again when serializing the outputs.

Forwarder service

We need a mechanism for the web service to receive requests for endpoint registration, and create a forwarder for each endpoint. The funcx endpoint_rearchitect branch has this in development, where we have a REST interface that manages forwarder creation. We could use this as-is on the web-service as an internal micro-service.

We'll need the forwarder service to live on dedicated EC2 nodes with elastic-ips to which endpoints can connect (open port 30k to 60k). We could later add a load balancer once we need the forwarder nodes to scale.

Detect and retry idle tasks

We need to iterate through the cache and find tasks still in a pending state such that we can restart them. To start with we can poll once a minute and scan through the "task:" set looking for those that are pending and have been for some time.

REST api to terminate endpoints

For the endpoints to be cleanly terminated, we need support from the web-service and the forwarder to cleanly terminate endpoints and check liveness.

Make task payload a dictionary

Currently the task header encodes a bunch of information such as task_id, container_id, container_address, serialization method. We should put all this info into the payload as a dict to make it more extensible.

Properly encode database insertions

Some of our database insertions are not using the proper insert tools meaning the inputs are not stringified and escaped. This is most notable on function definitions as things like single quotes are causing it to fail.

Apply static-checkers to function fields.

There are a number of 'out-of-the-box' static checkers available, so would be nice if we could let users see what's wrong with their functions before they submit them to our service.

Remove duplicated code between APIs

There is a lot of duplication between the execute and status endpoints, and the automate API is essentially exactly the same. We should refactor some of this to make it cleaner and easier to extend.

Refactor api and automate api

These two do essentially the same thing but return different structures. We should either extract the common parts or standardize around the automate api and return that structure for all requests.

Support public group endpoints

We currently support shared endpoints by checking if a user is a member of a group associated with an endpoint. However, this doesn't work for public groups. We need to also check the group policies to see if they are set to public to determine whether someone is allowed to use it. This is needed for demo endpoints that anyone can use.

Move service to Elastic Beanstalk

Our current Web service is a basic Flask app replicated using gunicorn with 3 threads. To make the front-end more scalable we should deploy the app using Elastic Beanstalk and add a load balancer. This will require taking some of the lookup-logic (e.g., finding the function code etc.) and moving it toward the "workers" that pull tasks from the Redis store.

Integrate forwarder service to the top level API

We have a new forwarder micro service that we want the top level web service to trigger as part of the endpoint registration process. Ideally we want to have these changes be merged to dev, have it be tested there before merging to master, which mostly likely will break the current API.

There's an example of how the forwarder API needs to be invoked here : https://github.com/funcx-faas/funcX/blob/endpoint_rearchitect_%238/funcx/endpoint/endpoint.py#L114

Some minimal docs on the forwarder service are here : https://github.com/funcx-faas/funcx-web-service/tree/dev/forwarder

Add heartbeats to forwarder

We want the forwarder to send heartbeats to endpoints and terminate itself if the endpoint stops responding. This will allow endpoint re-registration as we will safely cleanup zombie forwarders, such that endpoints can create new ones on-demand.

Put demo web-GUI over the web-service

Build a simple web-GUI in Flask that can authenticate users, let users add/view their existing functions (and runtimes?), and their resource configs. Should dynamically return function results in the GUI like in AWS Lambda.

Restrict access to forwarder nodes.

Currently forwarder nodes can be reached from the web over the REST ports. They should only be accessible from within the security group, such that the only the EBS nodes can make REST calls to them.

Only allow https

The user facing service should deny any incoming http connections.

Redis integration for task tracking.

We want to track tasks within a Redis store for reliability. The plan is to insert a redis store between the Web service and the broker such that we can't lose tasks once they have been received.

Clean up task tracking

The create_task function should be split into two functions that either create tasks or insert results.

Have forwarder recover from malicious inputs

Malicious bots etc. can send invalid input to the forwarder. We need to capture failures during deserialization, such as below such that the forwarder doesn't stop.

Traceback (most recent call last):
  File "/home/ubuntu/funcx_test/lib/python3.6/site-packages/funcx/executors/high_throughput/executor.py", line 391, in _queue_management_worker
    msg = pickle.loads(serialized_msg)
_pickle.UnpicklingError: invalid load key, '\x00'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/funcx_test/lib/python3.6/site-packages/funcx/executors/high_throughput/executor.py", line 395, in _queue_management_worker
    raise BadMessage("Message received could not be unpickled")
parsl.executors.errors.BadMessage: Message received could not be unpickled```

Web service throws Globus 500 error on on all client .run() commands after first

This is @annawoodard's coffeea+funcx use case (uses master branch of funcx as of 10/10/19).

I set up the endpoint using the normal funcx-endpoint configure and funcx-endpoint start. If I register then .run() a hello-world function, it executes and returns successfully on the first .run() command by the client.

But it fails on the second (and subsequent tries) with the following:

---------------------------------------------------------------------------
GlobusAPIError                            Traceback (most recent call last)
<ipython-input-61-df318ff7cb66> in <module>()
      1 payload = [1, 2, 3]
      2 
----> 3 task_id = client.run(*payload, endpoint_id=ndt3_uuid, function_id=func_uuid)
      4 print(task_id)

~/.local/lib/python3.6/site-packages/funcx/sdk/client.py in run(self, endpoint_id, function_id, asynchronous, *args, **kwargs)
    162 
    163         # Send the data to funcX
--> 164         r = self.post(servable_path, json_body=data)
    165         if r.http_status is not 200:
    166             raise Exception(r)

~/.local/lib/python3.6/site-packages/globus_sdk/base.py in post(self, path, json_body, params, headers, text_body, response_class, retry_401)
    286             text_body=text_body,
    287             response_class=response_class,
--> 288             retry_401=retry_401,
    289         )
    290 

~/.local/lib/python3.6/site-packages/globus_sdk/base.py in _request(self, method, path, params, headers, json_body, text_body, response_class, retry_401)
    551             "request completed with (error) response code: {}".format(r.status_code)
    552         )
--> 553         raise self.error_class(r)
    554 
    555 

GlobusAPIError: (500, 'Error', '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator at \n root@localhost to inform them of the time this error occurred,\n and the actions you performed just before this error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n</body></html>\n')

Fix Redis interactions in forwarder

The forwarder throws an exception due to trying to increment the total core hours counter with float. In addition, we call redis_client.close() but redis doesn't have a close function.

Support DLHub integration

Add endpoints to determine which executors an endpoint should be starting to let DLHub models be served via funcX.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.