In the first get_model() call, the weight needs to be

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Adding some model-loading specific utilities sounds sensible <a class="user-mention no

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Works great indeed. Thanks <a class="user-mention notranslate" data-hovercard-type="us

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

get_model Takes more too long to initialise so the endpoint times out about sagify HOT 6 CLOSED

kenza-ai commented on May 24, 2024

get_model Takes more too long to initialise so the endpoint times out

from sagify.

Comments (6)

m4nuC commented on May 24, 2024 1

Thanks for the input guys, I 've ended up using threading as well but with a less refined solution than @pm3310 suggest. I will try this out and confirm.

from sagify.

pm3310 commented on May 24, 2024

Hey @m4nuC

Great question! Please, find below an example solution

from __future__ import absolute_import

import os
# Do not remove the following line
import sys;sys.path.append("..")  # NOQA

import logging
import concurrent.futures


_MODEL_PATH = os.path.join('/opt/ml/', 'model')  # Path where all your model(s) live in

log = logging.getLogger(__name__)


class ModelLoader:
    def __init__(self, load_method):
        log.info("setting up ModelLoader")
        # Invoke load method asynchronously so process does not block.
        self.model = None
        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
        self.future = self.executor.submit(load_method)
        self.future.add_done_callback(self.done)

    def get_model(self):
        if self.future.done():
            return self.model

        log.error("get_model called before ready")
        raise Exception("model not loaded")

    def done(self, future):
        """Callback method invoked when model load complete. Sets us to ready status."""
        log.info("model load done")
        self.model = future.result()
        log.info("shutting down executor")
        self.executor.shutdown(wait=False)

    def get_ready(self):
        return self.future.done()


class ModelService(object):
    model = None

    @staticmethod
    def load_model():
        # Load your model, your model that takes time, here
        from sklearn.externals import joblib

        return joblib.load(os.path.join(_MODEL_PATH, 'model.pkl'))

    @classmethod
    def get_model(cls):
        """Get the model object for this instance, loading it if it's not already loaded."""
        if cls.model is None:

            cls.model = ModelLoader(cls.load_model)
        return cls.model

    @classmethod
    def predict(cls, input):
        """For the input, do the predictions and return them."""
        return cls.model.get_model().predict(input)


ModelService.get_model()


def predict(json_input):
    """
    Prediction given the request input
    :param json_input: [dict], request input
    :return: [dict], prediction
    """

    # TODO Transform json_input and assign the transformed value to model_input
    try:
        model_input = json_input['features']
        prediction = ModelService.predict(model_input)

        result = {'prediction': prediction.item()}

        return result
    except Exception as e:
        return {"error": str(e)}

Essentially, it spawns a thread to load a long-loading model. @ilazakis Do you think we can make the ModelLoader a Sagify utility?

Pls, @m4nuC let me know if it solved your issue.

Thanks

from sagify.

ilazakis commented on May 24, 2024

Adding some model-loading specific utilities sounds sensible @pm3310, yes.

Loading the model on a different thread will cause the sync call to the API not to block, but if it still needs 10 minutes to load, the end user will still not get anything back, the client or a proxy or similar in between will time out anyway. We could return a specific "loading model, please try again in X minutes" response to mitigate the bad experience.

One thing we could do to solve the actual problem is tie the loading of the model to the deploy command. Training and deploying takes time anyway, so if we add it right after the deploy command, the model will be loaded for whoever calls the predict endpoint first.

Open to any suggestions.

from sagify.

pm3310 commented on May 24, 2024

@ilazakis I like the idea of returning a specific "loading model, please try again in X minutes" response to mitigate the bad experience.

from sagify.

m4nuC commented on May 24, 2024

Works great indeed. Thanks @pm3310
I have made a little modification handle case where model is not yet loaded using a cusomt exception. However I am not sure it's idiomatic python. See below.

from __future__ import absolute_import

import os
# Do not remove the following line
import sys;sys.path.append("..")  # NOQA

import logging
import concurrent.futures


_MODEL_PATH = os.path.join('/opt/ml/', 'model')  # Path where all your model(s) live in

log = logging.getLogger(__name__)

class ModelNotYetLoadedException(Exception):
    def __init__(self, message):
        super().__init__(message)

class ModelLoader:
    def __init__(self, load_method):
        log.info("setting up ModelLoader")
        # Invoke load method asynchronously so process does not block.
        self.model = None
        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
        self.future = self.executor.submit(load_method)
        self.future.add_done_callback(self.done)

    def get_model(self):
        if self.future.done():
            return self.model

        log.error("get_model called before ready")
        raise ModelNotYetLoadedException("model not loaded")

    def done(self, future):
        """Callback method invoked when model load complete. Sets us to ready status."""
        log.info("model load done")
        self.model = future.result()
        log.info("shutting down executor")
        self.executor.shutdown(wait=False)

    def get_ready(self):
        return self.future.done()


class ModelService(object):
    model = None

    @staticmethod
    def load_model():
        # Load your model, your model that takes time, here
        from sklearn.externals import joblib

        return joblib.load(os.path.join(_MODEL_PATH, 'model.pkl'))

    @classmethod
    def init_model(cls):
        """Get the model object for this instance, loading it if it's not already loaded."""
        if cls.model is None:
            cls.model = ModelLoader(cls.load_model)
        return cls.model

    @classmethod
    def predict(cls, input):
        """For the input, do the predictions and return them."""
        try:
            return cls.model.get_model().predict(input)

        except ModelNotYetLoadedException as e:
            return 'model not yet loaded'

ModelService.init_model()


def predict(json_input):
    """
    Prediction given the request input
    :param json_input: [dict], request input
    :return: [dict], prediction
    """

    # TODO Transform json_input and assign the transformed value to model_input
    try:
        model_input = json_input['features']
        prediction = ModelService.predict(model_input)

        result = {'prediction': prediction.item()}

        return result
    except Exception as e:
        return {"error": str(e)}

from sagify.

pm3310 commented on May 24, 2024

@m4nuC Perfect solution ;-)

from sagify.

get_model Takes more too long to initialise so the endpoint times out about sagify HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent