Git Product home page Git Product logo

microprofile-fault-tolerance's Introduction

microprofile fault tolerance

Eclipse MicroProfile Fault Tolerance

Introduction

It is increasingly important to build fault tolerant micro services. Fault tolerance is about leveraging different strategies to guide the execution and result of some logic. Retry policies, bulkheads, and circuit breakers are popular concepts in this area. They dictate whether and when executions should take place, and fallbacks offer an alternative result when an execution does not complete successfully.

Overview

Fault Tolerance provides developers with the following strategies for dealing with failure:

  • Timeout: Define a maximum duration for execution

  • Retry: Attempt execution again if it fails

  • Bulkhead: Limit concurrent execution so that failures in that area can’t overload the whole system

  • CircuitBreaker: Automatically fail fast when execution repeatedly fails

  • Fallback: Provide an alternative solution when execution fails

Fault Tolerance provides an annotation for each strategy which can be placed on the methods of CDI beans. When an annotated method is called, the call is intercepted and the corresponding fault tolerance strategies are applied to the execution of that method.

Documentation

For links to the latest maven artifacts, Javadoc and specification document, see the latest release.

Example

Apply the retry and fallback strategies to doWork(). It will be executed up to two additional times if if throws an exception. If all executions throw an exception, doWorkFallback() will be called and the result of that returned instead.

@ApplicationScoped
public class FaultToleranceBean {

   @Retry(maxRetries = 2)
   @Fallback(fallbackMethod = "doWorkFallback")
   public Result doWork() {
      return callServiceA(); // This service usually works but sometimes
                             // throws a RuntimeException
   }

   private Result doWorkFallback() {
      return Result.emptyResult();
   }
}

From elsewhere, inject the FaultToleranceBean and call the method:

@ApplicationScoped
public class TestBean {

    @Inject private FaultToleranceBean faultToleranceBean;

    public void test() {
        Result theResult = faultToleranceBean.doWork();
    }
}

Configuration

The annotation parameters can be configured via MicroProfile Config. For example, imagine you have the following code in your application:

package org.microprofile.readme;

@ApplicationScoped
public class FaultToleranceBean {

   @Retry(maxRetries = 2)
   public Result doWork() {
      return callServiceA(); // This service usually works but sometimes
                             // throws a RuntimeException
   }
}

At runtime, you can configure maxRetries to be 6 instead of 2 for this method by defining the config property org.microprofile.readme.FaultToleranceBean/doWork/Retry/maxRetries=6.

Alternatively, you can configure maxRetries to be 6 for all instances of Retry in your application by specifying the property Retry/maxRetries=6.

Contributing

Do you want to contribute to this project? Find out how you can help here.

microprofile-fault-tolerance's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microprofile-fault-tolerance's Issues

Running the TCK - don't assume it's running on a server

The TCK seems to assume it's running on a server somewhere. In Hammock, I'm just using an embedded Weld/OWB container so there is no separate server runtime. HOwever, the packaging assumes that nothing else goes into the WAR. To work around this, I had to introduce an archive processor in Arquillian to add my classes. Not sure if that's a good approach.

Asynchronous handling

In supporting of Asynchronous method invocation, there are a few options. We need to agree on which one is the best approach or add more ideas.

  • Use the callback mechanism

The method must declare the void return type, while the return value passed via the callback.

@Asynchronous(MyCallback.class)
public void myService(){
...
}

public class MyCallback implements AsyncCallback{
..
}

public interface AsyncCallback<T>{
      boolean isComplete();
      T getObject();
}

The result will be passed to MyCallback

  • all return a Future: Enforce method to return Future - easier approach for calling point, but not very clean for the dev creating the method. Method should return null or a false Future since the interceptor will do the real job. This approach also prevent using existing class method in async mode (adding @Asyncrhonous with a portable extension for instance)

  • method signature unchanged but the return will be a proxy

  • Introduce a specific Future in the API: AsyncFuture<> like it's done for EJB asynchronous call.

  • Use Object return type. More versatile but not clean from calling side

  • Accept any return type and have the interceptor return null and sending the Future<> in a monitoring bean. (this is effectively same as the first option?)

Create interceptors for JAX-RS client

Some functionality would be nice to have in JAX-RS client instances, since JAX-RS is already in MP.
They could be registered for:

  • individual client instances with client.register() method
  • globally via an SPI we would need to define - either with a default config, or via a name which could be set with client.setProperty("MICROPROFILE_NAME", "my-client-name")

This would be useful for circuit breakers, retries, etc.

It would be also useful to define producers for client instances so that a JAX-RS client could be injected, with reasonable defaults configurable with fault-tolerance annotations:

@Inject @Retry @CircuitBreaker Client webClient;

Maybe we should create a new proposal for JAX-RS extensions and keep this functionality separate from the general Fault Tolerance proposal?

Clarify usage of Timeout without asynchronous invocation

Usage of @Timeout in synchronous mode isn't very clear from Javadocs and the spec - especially all the consequences.

The TCK tests in TimeoutTest expect that the implementation will interrupt the thread when timeout is reached. However, this isn't specified nowhere in Javadocs or PDF spec. Moreover, interrupting a thread only works in some scenarios. It doesn't work if:

  • the thread is blocked on blocking I/O (database, file read/write), an exception is thrown only in case of waiting for a NIO channel
  • the thread isn't waiting (CPU intensive task) and isn't checking for being interrupted
  • the thread will catch the interrupted exception (with a general catch block) and will just continue processing, ignoring the interrupt

In the above cases (the first one more likely than the others), it's technically impossible to suspend execution and throw a timeout exception after the specified timeout. It should be clarified that these limitations apply and that the execution of the operation may, in the worst case, complete in as if the timeout wasn't specified, throwing an exception after it completed if it took longer than the timeout.

In the future version, it should be reconsidered whether in such a case, when processing completes completely but misses the timeout, an exception should be thrown or processing should continue without any notice.

Istio Integration - overwrite Istio's retry and timeout

Istio Integration discussion

Istio Timeout and Retry

When MP FT works with Istio, Timeout and Retry are two problematic area. At the moment, the most restrictive Timeout will be honoured. As for Retry, if MP FT maxRetries =3 and Istio says 5, the app will retry 15 times.
In order to fix the conflicting area, we need to provide a way just to switch off Istio's timeout and retries.

Istio provides two special http headers “x-envoy-upstream-rq-timeout-ms” and “x-envoy-max-retries”. If the runtime e.g. Liberty, Wildfly swarm detects MP FT present and then put the http header on the request with the value of 0. In this way, it can turn off Istio's timeout and max retries while keeping the rest Istio's FT capabilities.

One possible solution is to ask app to set the headers, which is not clean as the app has no idea it will run in Istio. The better solution will be MP FT runtime set the headers when it detects Istio env. This might be achieved via HttpFilter or ClientRequestFilter.

Clarification on CDI and Interceptor spec restrictions

We should state in the spec that interceptor binding should be apply on bean and will only be activated on business method invocations.
Clarification should also state that Fault Tolerance needs at least one interceptor binding on method or class to have FT enabled on the invocation

CompletionStage return type support in Asynchronous

As mention in Asynchronous doc, The method annotated with Asynchronous must return a Future.
We should also consider to support java.util.concurrent.CompletionStage.

New Java EE 8 api like javax.enterprise.event.Event.fireAsync(event) in CDI 2.0 and JAX-RS 2.1 supports the CompletionStage return type.

Fallback should not be placed on class level

Fallback annotation is to provide an fallback method for method invocation. It does not make sense to be on class level. There won't be a universal method for the whole class methods to fallback to.

How to add a test while require an environment variable

We need to add a tck to test the following statement:
Set the environment variable of MP_Fault_Tolerance_NonFallback_Enabled with the value of false means the Fault Tolerance is disabled, except Fallback. If the environment variable is absent or with the value of true, it means that MicroProfile Fault Tolerance is enabled if any annotations are specified.

How to set the environment variable in tck and then restart the server by using arquillian?

Create Fallback functionality

When an invocation fails, provide the ability to fallback to another invocation that may be more reliable not necessarily the full need.

Define events for Faults

Whenever there is fault-tolerance error, an event should be fired.

  • Client application can easily observe and process the failsafe data
  • Hooks for other libraries/dashboard will be easier to create
  • Hooks should be asynchronous - CDI 2. Blocking execution of business code in order to handle event observers is not acceptable.

Example with event fired thrown after an exception is thrown can be found in FortEE

In case of Microprofile fault tolerance, there could be more types of events:

  • Circuit opened
  • Circuit closed
  • Exception thrown
  • Timeout reached

Disable individual Fault Tolerance annotation using external config

Based on the current state of Fault Tolerance configuration, It is not possible to disable specific annotation individually or globally.

Possible solution could be :

  1. Introduce an enabled element with default value true in annotation to disable it using external config source.
    e.g : com.acme.test.MyClient/serviceA/CircuitBreaker/enabled=false
  2. The annotation can be disabled via system properties in the naming convention of <classname>/<methodname>/<annotation>=false
    e.g : com.acme.test.MyClient/serviceA/CircuitBreaker=false

API should define exception

API should define one or more exceptions for common error case (operation in timeout with defined fallback or circuit breaker for instance).

Fallbacks should fail deployment when return types don't match

The following TCK assertion is bad

    public void testFallbackFailure() {
        try {
            fallbackClient.serviceB();
            Assert.fail("serviceB should throw an IllegalArgumentException in testFallbackFailure");
        }
        catch(RuntimeException ex) {
            Assert.assertTrue(ex instanceof IllegalArgumentException, "serviceB should throw an IllegalArgumentException in testFallbackFailure");
        }
        Assert.assertEquals(fallbackClient.getCounterForInvokingCountService(), 5, "The max number of execution should be 5");

    }

IMHO, we should fail deployment if the fallback handler doesn't match the return type of the method. Not throw an illegal argument exception.

Need 3rd party clearance for dependencies

Per the Initial Contribution Questionnaire process, we need to file Contribution Questionnaires for all of the 3rd party software that we are dependent on.
https://dev.eclipse.org/ipzilla/show_bug.cgi?id=12628

The Process is documented here:
https://www.eclipse.org/projects/handbook/#ip-third-party

I grepped the pom.xml files for this component and found the following dependencies... We will need to get these filed before we could release fault-tolerance-1.0. Also, some of these dependencies (all?) may already be covered by the Config component via their Issue:
eclipse/microprofile-config#170

<dependency>
    <groupId>javax.enterprise</groupId>
    <artifactId>cdi-api</artifactId>
    <version>1.2</version>
</dependency>
<dependency>
    <groupId>org.jboss.arquillian</groupId>
    <artifactId>arquillian-bom</artifactId>
    <version>1.1.12.Final</version>
    <scope>import</scope>
    <type>pom</type>
</dependency>

<dependency>
    <groupId>org.asciidoctor</groupId>
    <artifactId>asciidoctorj-pdf</artifactId>
    <version>${asciidoctorj-pdf.version}</version>
</dependency>

<dependency>
  <groupId>javax.enterprise</groupId>
  <artifactId>cdi-api</artifactId>
  <version>1.2</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <!-- actually only referenced in JavaDoc -->
  <groupId>org.apache.geronimo.specs</groupId>
  <artifactId>geronimo-annotation_1.2_spec</artifactId>
  <version>1.0</version>
  <scope>provided</scope>
  <optional>true</optional>
</dependency>

<dependency>
  <groupId>org.apache.geronimo.specs</groupId>
  <artifactId>geronimo-atinject_1.0_spec</artifactId>
  <version>1.0</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <groupId>org.testng</groupId>
  <artifactId>testng</artifactId>
  <version>6.9.9</version>
  <scope>compile</scope>
</dependency>
<dependency>
  <groupId>org.jboss.arquillian.testng</groupId>
  <artifactId>arquillian-testng-container</artifactId>
  <version>${arquillian.version}</version>
</dependency>

<dependency>
  <groupId>org.hamcrest</groupId>
  <artifactId>hamcrest-all</artifactId>
  <version>1.3</version>
  <scope>compile</scope>
</dependency>

<dependency>
  <groupId>org.jboss.shrinkwrap</groupId>
  <artifactId>shrinkwrap-api</artifactId>
  <scope>compile</scope>
</dependency>

Introduce an annotation to configure default operation handler

API should introduce an annotation to configure default (i.e non in fallback) operation handler.
This should be done for multiple reason:

  • To provide same level of configuration for default operation than we have for fallback
  • To allow user to create handler class having implementation specific feature
  • Mark method for Fault Tolerance handling with default behaviour (default timeout, fallback, etc...)

This annotation would work like @Fallback, we only may have to rename FallbackHandler interface to OperationHandler

I suggest the following names for this annotation:

  • @FaultToleranceOperation
  • @FaultToleranceHandler
  • @FaultToleranceConfiguration (if we'd like to had impl specific config in a Map member)

I chose the term 'operation' to avoid 'command' which is too much "Hystrix-ish", but of course another term could be chosen here.

Explore how we should integrate MP config spec

Right now we don't use config spec while some elements in the spec are related to configuration (i.e. environment variable). We should check that we are not doing config spec job with those IMO.

Is CircuitBreakerClientWithRetry.serviceC implicitly async?

I'm looking at the definition of this class

    @CircuitBreaker(successThreshold = 2, requestVolumeThreshold = 4, failureRatio = 0.75, delay = 50000)
    @Retry(retryOn = {RuntimeException.class, TimeoutException.class}, maxRetries = 7)
    @Timeout(500)

IT seems like the addition of timeout makes this invocation asynchronous. However, it's a request scoped bean which can't be async.

Avoid the use of ChronoUnit

The usage of ChronoUnit has to do with specific dates and times. Since we're dealing with operations, we should be using TimeUnit.

Service Mesh usage

On Thursday, May 25, 2017 at 2:44:43 PM UTC-7, [email protected] wrote: Ok, I'm beginning to see the syntax. I feel there needs to be more of a description about the workflow a method is invoking, and this has only increased given the service mesh for micro services announcement by Google, IBM, and Lyft, with support from Red Hat and others:

https://istio.io/blog/istio-service-mesh-for-microservices.html
https://github.com/istio

A question is how the MicroProfile effort fits into a service mesh architecture? I can see both externalization of fault-tolerance and injection of either policies or service proxies needing to be supported. ...

and here is another based on some type of injection of service proxies that are registered with the mesh:

public class MyAbstractGateway { 

@ServiceReference(name="dbResults", endpoint = "/books") 
WebTarget dbResults;

@Asynchronous
@ServiceReference(name="images", endpoint = "/books/images")
WebTarget images;

@ServiceReference(name="reviews", endpoint = "/books/reviews")
WebTarget reviews;

@Workflow(
    name = "display-book-info",
    services = {
        @ServiceReference(name="dbResults", endpoint = "/books"),
        @ServiceReference(name="images", endpoint = "/books/images"),
        @ServiceReference(name="reviews", endpoint = "/books/reviews")
    }
)
@GET
@Path("frontend-gateway")
@Produces("application/json")
public String myGatewayMethod() {
    dbResults.request().get(...);

    return null;
}

}

public class MyAbstractGateway { @Retry(maxRetries=3, delay=1)

@ServiceReference(name="dbResults", endpoint = "/books")
WebTarget dbResults;

@Asynchronous
@CircuitBreaker(delay = 5, delayUnit = ChronoUnit.SECONDS)
@ServiceReference(name="images", endpoint = "/books/images")
WebTarget images;

@Retry(maxRetries=3, delay=1)

@ServiceReference(name="reviews", endpoint = "/books/reviews")
WebTarget reviews;

Better instructions on how to run the TCK

The instructions for running the TCK require a test suite. YOu can actually do this entirely in maven config, which makes it much easier to run.

  1. Add a dependency on the TCK
        <dependency>
            <groupId>org.eclipse.microprofile.fault.tolerance</groupId>
            <artifactId>microprofile-fault-tolerance-tck</artifactId>
            <version>1.0-SNAPSHOT</version>
            <scope>test</scope>
        </dependency>
  1. Add the maven coordinates to surefire plugin
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.19.1</version>
                <configuration>
                    <dependenciesToScan>
                        <dependency>org.eclipse.microprofile.fault.tolerance:microprofile-fault-tolerance-tck</dependency>
                    </dependenciesToScan>
                </configuration>
            </plugin>

Then the TCK can run in your build.

Add a maxDuration to the Retry in CircuitbreakerRetryTest

The Service that is called in testCircuitOpenWithMultiTimeouts will time out a number of times (one for each retry) before the Circuit is opened. We should add a maxDuration to the @Retry to ensure that all the timeouts will be completed before the circuit is opened.

Shouldn't use a colon in a configuration property key

The current spec says that Fault Tolerance configuration is done via system properties with keys in the following format

ft:<classname>/methodname/Annotation/parameter

Notice that it starts with ft: ... while that colon might be fine when specified on the command line, there's a good chance it will actually be stored in a java properties file, e.g. bootstrap.properties

In a java properties file, a colon is treated the same as an equals and so can not be used in a key name unless it is escaped. i.e.

ft:com.acme.test.MyClient/serviceB/Retry/maxRetries=100

results in a key of ft and a value of com.acme.test.MyClient/serviceB/Retry/maxRetries=100

I suggest using a different character for that ft prefix such as hash (ft#) or dollar (ft$). I think hash looks better but it is of course used to represent a comment line (when it is the first char).

TCK fails 'Unsatisfied dependencies'

arquillianBeforeClass(org.eclipse.microprofile.fault.tolerance.tck.FallbackTest)  Time elapsed: 0.032 sec  <<< FAILURE!
org.jboss.weld.exceptions.DeploymentException: 
WELD-001408: Unsatisfied dependencies for type MyBean with qualifiers @Default
  at injection point [BackedAnnotatedField] @Inject private org.eclipse.microprofile.fault.tolerance.tck.retry.clientserver.FallbackA.myBean
  at org.eclipse.microprofile.fault.tolerance.tck.retry.clientserver.FallbackA.myBean(FallbackA.java:0)

Looking at the JAR, it's not a valid bean archive (based on the SE spec portions at least). It has no beans.xml.

Allow FT annotations on Final class?

In CDI spec, final classes cannot be proxied. As a consequence, no interceptor bindings can be added to final classes. In MP FT, the annotations are efficiently interceptor bindings. Should we just document this limitation or try to figure out how to support MP FT on final classes?

BulkHead

Provide a way to configure the maximum number of threads accessing a resource.

Inconsistent usage of threshold and ratio

In order to configure successes, you use a successThreshold() and requestVolumeThreshold() to specify how many requests need to be successful.

In order to configure failures, you use failureRatio() and requestVolumeThreshold()

Its assumed that the successThreshold is a count of successes, where as for failures you need to calculate this number. As a user, this is confusing as in one case I need to calculate a threshold where as in the other I need to just list out counts. It would be better to only use one or the other.

force open a circuit breaker at runtime

UseCase :
In case of failure of service in production environment, Product team should be able to force open the circuit breaker at runtime using external config without the need for restarting the application.

Force open :
A new property circuitBreaker.forceOpen can be introduced, if true, forces the circuit breaker into an open (tripped) state in which it will reject/fallback all requests.

At Runtime :
MP Config (dynamic configuration functionality) will also help to achieve this at runtime.

Define a FaultToleranceDefinitionException

Or something along those lines.

Instead of directly relying on CDI's DeploymentException, we should have our own for definition errors and throw that if something isn't defined properly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.