eclipse / microprofile-fault-tolerance Goto Github PK

microprofile fault tolerance

License: Apache License 2.0

Java 99.61% Shell 0.14% Ruby 0.17% Batchfile 0.07%

microprofile-fault-tolerance's Introduction

Eclipse MicroProfile Fault Tolerance

Introduction

It is increasingly important to build fault tolerant micro services. Fault tolerance is about leveraging different strategies to guide the execution and result of some logic. Retry policies, bulkheads, and circuit breakers are popular concepts in this area. They dictate whether and when executions should take place, and fallbacks offer an alternative result when an execution does not complete successfully.

Overview

Fault Tolerance provides developers with the following strategies for dealing with failure:

Timeout: Define a maximum duration for execution
Retry: Attempt execution again if it fails
Bulkhead: Limit concurrent execution so that failures in that area can’t overload the whole system
CircuitBreaker: Automatically fail fast when execution repeatedly fails
Fallback: Provide an alternative solution when execution fails

Fault Tolerance provides an annotation for each strategy which can be placed on the methods of CDI beans. When an annotated method is called, the call is intercepted and the corresponding fault tolerance strategies are applied to the execution of that method.

Documentation

For links to the latest maven artifacts, Javadoc and specification document, see the latest release.

Example

Apply the retry and fallback strategies to doWork(). It will be executed up to two additional times if if throws an exception. If all executions throw an exception, doWorkFallback() will be called and the result of that returned instead.

@ApplicationScoped
public class FaultToleranceBean {

   @Retry(maxRetries = 2)
   @Fallback(fallbackMethod = "doWorkFallback")
   public Result doWork() {
      return callServiceA(); // This service usually works but sometimes
                             // throws a RuntimeException
   }

   private Result doWorkFallback() {
      return Result.emptyResult();
   }
}

From elsewhere, inject the FaultToleranceBean and call the method:

@ApplicationScoped
public class TestBean {

    @Inject private FaultToleranceBean faultToleranceBean;

    public void test() {
        Result theResult = faultToleranceBean.doWork();
    }
}

Configuration

The annotation parameters can be configured via MicroProfile Config. For example, imagine you have the following code in your application:

package org.microprofile.readme;

@ApplicationScoped
public class FaultToleranceBean {

   @Retry(maxRetries = 2)
   public Result doWork() {
      return callServiceA(); // This service usually works but sometimes
                             // throws a RuntimeException
   }
}

At runtime, you can configure maxRetries to be 6 instead of 2 for this method by defining the config property org.microprofile.readme.FaultToleranceBean/doWork/Retry/maxRetries=6.

Alternatively, you can configure maxRetries to be 6 for all instances of Retry in your application by specifying the property Retry/maxRetries=6.

Contributing

Do you want to contribute to this project? Find out how you can help here.

microprofile-fault-tolerance's People

Stargazers

Watchers

microprofile-fault-tolerance's Issues

Bulkheads should only apply to components that can be accessed from multiple contexts

The bulkhead section does not mention that bulkheads only apply to components that can be access from multiple contexts. It should be impossible to apply a bulkhead to a @RequestScoped bean, for instance.

Update the bnd version and properly process package version

Use the latest bnd with the correct versioning scheme

Running the TCK - don't assume it's running on a server

The TCK seems to assume it's running on a server somewhere. In Hammock, I'm just using an embedded Weld/OWB container so there is no separate server runtime. HOwever, the packaging assumes that nothing else goes into the WAR. To work around this, I had to introduce an archive processor in Arquillian to add my classes. Not sure if that's a good approach.

Asynchronous handling

In supporting of Asynchronous method invocation, there are a few options. We need to agree on which one is the best approach or add more ideas.

Use the callback mechanism

The method must declare the void return type, while the return value passed via the callback.

@Asynchronous(MyCallback.class)
public void myService(){
...
}

public class MyCallback implements AsyncCallback{
..
}

public interface AsyncCallback<T>{
      boolean isComplete();
      T getObject();
}

The result will be passed to MyCallback

all return a Future: Enforce method to return Future - easier approach for calling point, but not very clean for the dev creating the method. Method should return null or a false Future since the interceptor will do the real job. This approach also prevent using existing class method in async mode (adding @Asyncrhonous with a portable extension for instance)
method signature unchanged but the return will be a proxy
Introduce a specific Future in the API: AsyncFuture<> like it's done for EJB asynchronous call.
Use Object return type. More versatile but not clean from calling side
Accept any return type and have the interceptor return null and sending the Future<> in a monitoring bean. (this is effectively same as the first option?)

Create Retry annotation and Retry functionality

The ability to retry an invocation repeatedly, for some number of iterations, based on some exception or cause.

Create Bulkhead functionality

The ability to create bulkheads or areas of surviving failures.

Consider to use awaitibility

John brought up our tck might benefit from awaitibility. Use this one to prototype and then refactor the current tck.

Create interceptors for JAX-RS client

Some functionality would be nice to have in JAX-RS client instances, since JAX-RS is already in MP.
They could be registered for:

individual client instances with client.register() method
globally via an SPI we would need to define - either with a default config, or via a name which could be set with client.setProperty("MICROPROFILE_NAME", "my-client-name")

This would be useful for circuit breakers, retries, etc.

It would be also useful to define producers for client instances so that a JAX-RS client could be injected, with reasonable defaults configurable with fault-tolerance annotations:

@Inject @Retry @CircuitBreaker Client webClient;

Maybe we should create a new proposal for JAX-RS extensions and keep this functionality separate from the general Fault Tolerance proposal?

Create baseline for fault tolerance for Asynchronous functionality

Add Asynchronous annotation

Add a programmatic API (i.e. built-in CDI beans to launch FT operation) to the spec

Right now, spec API is only driven thru annotation, it could bring some limitation since FT operation can only bound to a method.
User may like to launch different operations in the same method.
I'd like to introduce a built-in bean injected with FaultToleranceController interface allowing such usages.

MP config API coordinates are wrong

Group id and Artifact id are not the lat one for config API.

Clarify usage of Timeout without asynchronous invocation

Usage of @Timeout in synchronous mode isn't very clear from Javadocs and the spec - especially all the consequences.

The TCK tests in TimeoutTest expect that the implementation will interrupt the thread when timeout is reached. However, this isn't specified nowhere in Javadocs or PDF spec. Moreover, interrupting a thread only works in some scenarios. It doesn't work if:

the thread is blocked on blocking I/O (database, file read/write), an exception is thrown only in case of waiting for a NIO channel
the thread isn't waiting (CPU intensive task) and isn't checking for being interrupted
the thread will catch the interrupted exception (with a general catch block) and will just continue processing, ignoring the interrupt

In the above cases (the first one more likely than the others), it's technically impossible to suspend execution and throw a timeout exception after the specified timeout. It should be clarified that these limitations apply and that the execution of the operation may, in the worst case, complete in as if the timeout wasn't specified, throwing an exception after it completed if it took longer than the timeout.

In the future version, it should be reconsidered whether in such a case, when processing completes completely but misses the timeout, an exception should be thrown or processing should continue without any notice.

Istio Integration - overwrite Istio's retry and timeout

Istio Integration discussion

Istio Timeout and Retry

When MP FT works with Istio, Timeout and Retry are two problematic area. At the moment, the most restrictive Timeout will be honoured. As for Retry, if MP FT maxRetries =3 and Istio says 5, the app will retry 15 times.
In order to fix the conflicting area, we need to provide a way just to switch off Istio's timeout and retries.

Istio provides two special http headers “x-envoy-upstream-rq-timeout-ms” and “x-envoy-max-retries”. If the runtime e.g. Liberty, Wildfly swarm detects MP FT present and then put the http header on the request with the value of 0. In this way, it can turn off Istio's timeout and max retries while keeping the rest Istio's FT capabilities.

One possible solution is to ask app to set the headers, which is not clean as the app has no idea it will run in Istio. The better solution will be MP FT runtime set the headers when it detects Istio env. This might be achieved via HttpFilter or ClientRequestFilter.

Create baseline project structure

Create base pom's based on whats in config.

Automatically configure the fault tolerance policy

https://groups.google.com/forum/#!topic/microprofile/u4HkS-HqTIs has the full details on configuring the Fault Tolerance Policies.

Create timeout annotation

Clarification on CDI and Interceptor spec restrictions

We should state in the spec that interceptor binding should be apply on bean and will only be activated on business method invocations.
Clarification should also state that Fault Tolerance needs at least one interceptor binding on method or class to have FT enabled on the invocation

CompletionStage return type support in Asynchronous

As mention in Asynchronous doc, The method annotated with Asynchronous must return a Future.
We should also consider to support java.util.concurrent.CompletionStage.

New Java EE 8 api like javax.enterprise.event.Event.fireAsync(event) in CDI 2.0 and JAX-RS 2.1 supports the CompletionStage return type.

Fallback should not be placed on class level

Fallback annotation is to provide an fallback method for method invocation. It does not make sense to be on class level. There won't be a universal method for the whole class methods to fallback to.

How to add a test while require an environment variable

We need to add a tck to test the following statement:
Set the environment variable of MP_Fault_Tolerance_NonFallback_Enabled with the value of false means the Fault Tolerance is disabled, except Fallback. If the environment variable is absent or with the value of true, it means that MicroProfile Fault Tolerance is enabled if any annotations are specified.

How to set the environment variable in tck and then restart the server by using arquillian?

Provide a way to switch off FT policy except retry

Istio integration discussion lists a few options to support Istio integration.

If a microservice app is deployed to Istio, Istio would like to switch off MicroProfile FT feature except fallback. We need to provide a way to do so, e.g. a config property.

Update the package name and reset the default value

Update the package name to end with .inject. Reset the corresponding default value to sync up with Hystrix default values.

Create Fallback functionality

When an invocation fails, provide the ability to fallback to another invocation that may be more reliable not necessarily the full need.

Define events for Faults

Whenever there is fault-tolerance error, an event should be fired.

Client application can easily observe and process the failsafe data
Hooks for other libraries/dashboard will be easier to create
Hooks should be asynchronous - CDI 2. Blocking execution of business code in order to handle event observers is not acceptable.

Example with event fired thrown after an exception is thrown can be found in FortEE

In case of Microprofile fault tolerance, there could be more types of events:

Circuit opened
Circuit closed
Exception thrown
Timeout reached

Does configuration require the annotation to be present on a method?

if a retry annotation is not present on a method, but I configure the method to have retry, etc configured, does that configuration still apply?

Disable individual Fault Tolerance annotation using external config

Based on the current state of Fault Tolerance configuration, It is not possible to disable specific annotation individually or globally.

Possible solution could be :

Introduce an enabled element with default value true in annotation to disable it using external config source.
e.g : com.acme.test.MyClient/serviceA/CircuitBreaker/enabled=false
The annotation can be disabled via system properties in the naming convention of <classname>/<methodname>/<annotation>=false
e.g : com.acme.test.MyClient/serviceA/CircuitBreaker=false

API should define exception

API should define one or more exceptions for common error case (operation in timeout with defined fallback or circuit breaker for instance).

Fallbacks should fail deployment when return types don't match

The following TCK assertion is bad

    public void testFallbackFailure() {
        try {
            fallbackClient.serviceB();
            Assert.fail("serviceB should throw an IllegalArgumentException in testFallbackFailure");
        }
        catch(RuntimeException ex) {
            Assert.assertTrue(ex instanceof IllegalArgumentException, "serviceB should throw an IllegalArgumentException in testFallbackFailure");
        }
        Assert.assertEquals(fallbackClient.getCounterForInvokingCountService(), 5, "The max number of execution should be 5");

    }

IMHO, we should fail deployment if the fallback handler doesn't match the return type of the method. Not throw an illegal argument exception.

Need 3rd party clearance for dependencies

Per the Initial Contribution Questionnaire process, we need to file Contribution Questionnaires for all of the 3rd party software that we are dependent on.
https://dev.eclipse.org/ipzilla/show_bug.cgi?id=12628

The Process is documented here:
https://www.eclipse.org/projects/handbook/#ip-third-party

I grepped the pom.xml files for this component and found the following dependencies... We will need to get these filed before we could release fault-tolerance-1.0. Also, some of these dependencies (all?) may already be covered by the Config component via their Issue:
eclipse/microprofile-config#170

<dependency>
    <groupId>javax.enterprise</groupId>
    <artifactId>cdi-api</artifactId>
    <version>1.2</version>
</dependency>
<dependency>
    <groupId>org.jboss.arquillian</groupId>
    <artifactId>arquillian-bom</artifactId>
    <version>1.1.12.Final</version>
    <scope>import</scope>
    <type>pom</type>
</dependency>

<dependency>
    <groupId>org.asciidoctor</groupId>
    <artifactId>asciidoctorj-pdf</artifactId>
    <version>${asciidoctorj-pdf.version}</version>
</dependency>

<dependency>
  <groupId>javax.enterprise</groupId>
  <artifactId>cdi-api</artifactId>
  <version>1.2</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <!-- actually only referenced in JavaDoc -->
  <groupId>org.apache.geronimo.specs</groupId>
  <artifactId>geronimo-annotation_1.2_spec</artifactId>
  <version>1.0</version>
  <scope>provided</scope>
  <optional>true</optional>
</dependency>

<dependency>
  <groupId>org.apache.geronimo.specs</groupId>
  <artifactId>geronimo-atinject_1.0_spec</artifactId>
  <version>1.0</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <groupId>org.testng</groupId>
  <artifactId>testng</artifactId>
  <version>6.9.9</version>
  <scope>compile</scope>
</dependency>
<dependency>
  <groupId>org.jboss.arquillian.testng</groupId>
  <artifactId>arquillian-testng-container</artifactId>
  <version>${arquillian.version}</version>
</dependency>

<dependency>
  <groupId>org.hamcrest</groupId>
  <artifactId>hamcrest-all</artifactId>
  <version>1.3</version>
  <scope>compile</scope>
</dependency>

<dependency>
  <groupId>org.jboss.shrinkwrap</groupId>
  <artifactId>shrinkwrap-api</artifactId>
  <scope>compile</scope>
</dependency>

Mark all annotations explicitly interceptor bindings

Mark all annotations as interceptor bindings so that stereotype can be used. This will also lead the end users to refer to interceptor behavior.

Update all contributors details to the NOTICE file

Create CircuitBreaker Functionality

The ability to fail an invocation fast based on known failure state of the component, both annotation and programmatic fashions.

Introduce an annotation to configure default operation handler

API should introduce an annotation to configure default (i.e non in fallback) operation handler.
This should be done for multiple reason:

To provide same level of configuration for default operation than we have for fallback
To allow user to create handler class having implementation specific feature
Mark method for Fault Tolerance handling with default behaviour (default timeout, fallback, etc...)

This annotation would work like @Fallback, we only may have to rename FallbackHandler interface to OperationHandler

I suggest the following names for this annotation:

@FaultToleranceOperation
@FaultToleranceHandler
@FaultToleranceConfiguration (if we'd like to had impl specific config in a Map member)

I chose the term 'operation' to avoid 'command' which is too much "Hystrix-ish", but of course another term could be chosen here.

Explore how we should integrate MP config spec

Right now we don't use config spec while some elements in the spec are related to configuration (i.e. environment variable). We should check that we are not doing config spec job with those IMO.

change package name to org.eclipse.microprofile.faulttolerance

Update the package name based on the mailing list discussion

Is CircuitBreakerClientWithRetry.serviceC implicitly async?

I'm looking at the definition of this class

    @CircuitBreaker(successThreshold = 2, requestVolumeThreshold = 4, failureRatio = 0.75, delay = 50000)
    @Retry(retryOn = {RuntimeException.class, TimeoutException.class}, maxRetries = 7)
    @Timeout(500)

IT seems like the addition of timeout makes this invocation asynchronous. However, it's a request scoped bean which can't be async.

MP_Fault_Tolerance_NonFallback_Enabled uses non-standard naming conventions

MP_Fault_Tolerance_NonFallback_Enabled uses non-standard _ separators. It should be using . separators

Avoid the use of ChronoUnit

The usage of ChronoUnit has to do with specific dates and times. Since we're dealing with operations, we should be using TimeUnit.

Remove the namespace for configuring the FT parameters

Remove the namespace ft$, based on the thread discussion

Service Mesh usage

On Thursday, May 25, 2017 at 2:44:43 PM UTC-7, [email protected] wrote: Ok, I'm beginning to see the syntax. I feel there needs to be more of a description about the workflow a method is invoking, and this has only increased given the service mesh for micro services announcement by Google, IBM, and Lyft, with support from Red Hat and others:

https://istio.io/blog/istio-service-mesh-for-microservices.html
https://github.com/istio

A question is how the MicroProfile effort fits into a service mesh architecture? I can see both externalization of fault-tolerance and injection of either policies or service proxies needing to be supported. ...

and here is another based on some type of injection of service proxies that are registered with the mesh:

public class MyAbstractGateway { 

@ServiceReference(name="dbResults", endpoint = "/books") 
WebTarget dbResults;

@Asynchronous
@ServiceReference(name="images", endpoint = "/books/images")
WebTarget images;

@ServiceReference(name="reviews", endpoint = "/books/reviews")
WebTarget reviews;

@Workflow(
    name = "display-book-info",
    services = {
        @ServiceReference(name="dbResults", endpoint = "/books"),
        @ServiceReference(name="images", endpoint = "/books/images"),
        @ServiceReference(name="reviews", endpoint = "/books/reviews")
    }
)
@GET
@Path("frontend-gateway")
@Produces("application/json")
public String myGatewayMethod() {
    dbResults.request().get(...);

    return null;
}

}

public class MyAbstractGateway { @Retry(maxRetries=3, delay=1)

@ServiceReference(name="dbResults", endpoint = "/books")
WebTarget dbResults;

@Asynchronous
@CircuitBreaker(delay = 5, delayUnit = ChronoUnit.SECONDS)
@ServiceReference(name="images", endpoint = "/books/images")
WebTarget images;

@Retry(maxRetries=3, delay=1)

@ServiceReference(name="reviews", endpoint = "/books/reviews")
WebTarget reviews;

Better instructions on how to run the TCK

The instructions for running the TCK require a test suite. YOu can actually do this entirely in maven config, which makes it much easier to run.

Add a dependency on the TCK

        <dependency>
            <groupId>org.eclipse.microprofile.fault.tolerance</groupId>
            <artifactId>microprofile-fault-tolerance-tck</artifactId>
            <version>1.0-SNAPSHOT</version>
            <scope>test</scope>
        </dependency>

Add the maven coordinates to surefire plugin

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.19.1</version>
                <configuration>
                    <dependenciesToScan>
                        <dependency>org.eclipse.microprofile.fault.tolerance:microprofile-fault-tolerance-tck</dependency>
                    </dependenciesToScan>
                </configuration>
            </plugin>

Then the TCK can run in your build.

Add a maxDuration to the Retry in CircuitbreakerRetryTest

The Service that is called in testCircuitOpenWithMultiTimeouts will time out a number of times (one for each retry) before the Circuit is opened. We should add a maxDuration to the @Retry to ensure that all the timeouts will be completed before the circuit is opened.

Add ElementType.TYPE to annotations @Target

Not having these prevents creation of CDI interceptors in implementations.

Shouldn't use a colon in a configuration property key

The current spec says that Fault Tolerance configuration is done via system properties with keys in the following format

ft:<classname>/methodname/Annotation/parameter

Notice that it starts with ft: ... while that colon might be fine when specified on the command line, there's a good chance it will actually be stored in a java properties file, e.g. bootstrap.properties

In a java properties file, a colon is treated the same as an equals and so can not be used in a key name unless it is escaped. i.e.

ft:com.acme.test.MyClient/serviceB/Retry/maxRetries=100

results in a key of ft and a value of com.acme.test.MyClient/serviceB/Retry/maxRetries=100

I suggest using a different character for that ft prefix such as hash (ft#) or dollar (ft$). I think hash looks better but it is of course used to represent a comment line (when it is the first char).

TCK fails 'Unsatisfied dependencies'

arquillianBeforeClass(org.eclipse.microprofile.fault.tolerance.tck.FallbackTest)  Time elapsed: 0.032 sec  <<< FAILURE!
org.jboss.weld.exceptions.DeploymentException: 
WELD-001408: Unsatisfied dependencies for type MyBean with qualifiers @Default
  at injection point [BackedAnnotatedField] @Inject private org.eclipse.microprofile.fault.tolerance.tck.retry.clientserver.FallbackA.myBean
  at org.eclipse.microprofile.fault.tolerance.tck.retry.clientserver.FallbackA.myBean(FallbackA.java:0)

Looking at the JAR, it's not a valid bean archive (based on the SE spec portions at least). It has no beans.xml.

Allow FT annotations on Final class?

In CDI spec, final classes cannot be proxied. As a consequence, no interceptor bindings can be added to final classes. In MP FT, the annotations are efficiently interceptor bindings. Should we just document this limitation or try to figure out how to support MP FT on final classes?

BulkHead

Provide a way to configure the maximum number of threads accessing a resource.

Inconsistent usage of threshold and ratio

In order to configure successes, you use a successThreshold() and requestVolumeThreshold() to specify how many requests need to be successful.

In order to configure failures, you use failureRatio() and requestVolumeThreshold()

Its assumed that the successThreshold is a count of successes, where as for failures you need to calculate this number. As a user, this is confusing as in one case I need to calculate a threshold where as in the other I need to just list out counts. It would be better to only use one or the other.

force open a circuit breaker at runtime

UseCase :
In case of failure of service in production environment, Product team should be able to force open the circuit breaker at runtime using external config without the need for restarting the application.

Force open :
A new property circuitBreaker.forceOpen can be introduced, if true, forces the circuit breaker into an open (tripped) state in which it will reject/fallback all requests.

At Runtime :
MP Config (dynamic configuration functionality) will also help to achieve this at runtime.

Define a FaultToleranceDefinitionException

Or something along those lines.

Instead of directly relying on CDI's DeploymentException, we should have our own for definition errors and throw that if something isn't defined properly.