Git Product home page Git Product logo

cra's Introduction

Common Runtime for Applications

Common Runtime for Applications (CRA) is a software layer (library) that makes it easy to create and deploy distributed dataflow-style applications on top of resource managers such as Kubernetes, YARN, and stand-alone cluster execution. Currently, we support stand-alone execution (just deploy an .exe on every machine in your cluster) as well as execution in a Kubernetes/Docker environment.

CRA has been used to build both offline and streaming analytics platforms such as Quill and online microservice fabrics such as Ambrosia. Learn more about CRA here:

After you clone the source code, check out the wiki at https://github.com/Microsoft/CRA/wiki for instruction on building CRA and running your first sample distributed application - either locally or in a Kubernetes (Windows) cluster using a Docker image of CRA. We have provided detailed step-by-step instructions for accomplishing this in the wiki. We show how to do this on Azure Container Service (ACS), but CRA should work on any other Kubernetes cluster as well.

Contributing

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

cra's People

Contributors

arcademode avatar badrishc avatar cmeiklejohn avatar ibrahimsabek avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar mike-barnett avatar msftgits avatar rohankadekodi avatar saj9191 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cra's Issues

CRA instances in Kubernetes fail to reconnect

Hi!

I've been using CRA as a base to build a stream processor and am running into a strange problem in kubernetes. When instances are starting up all works well, connections are set up and are functional. However it gets interesting when introducing failures.

By default there seems no support for detecting a dropped TCP connection in CRA so I built a simple ping-pong/keepalive based timeout mechanism to detect a broken connection on either side.

Steps to reproduce:

  1. Create two vertices with a one way connection.
  2. Spin up the kubernetes cluster with those vertices in it
  3. Delete the pod of the sending vertex (using kubectl for example)
  4. Kubernetes will recreate the deleted pod and it will restart.

Now this is where things get funky: both ends get reconnected, however the connection does not work, i.e. times out again and gets reconnected, and times out again etc..

My suspicion is that the changed IP of the killed vertex does not reflect in the living vertex. So it reconnect to the killed vertex's old IP while the killed vertex connects to the correct IP but gets ignored for some reason?

Please let me know if any clarification is needed or how I can circumvent this issue.

Thanks!

DLL version mismatches between NuGet package and CRA.ClientLibrary DLL

When I create a simple .NET Framework console program that depends on the CRA NuGet package and instantiates a CRAWorker object, the program crashes with the following exception:

System.IO.FileLoadException: 'Could not load file or assembly 'Microsoft.WindowsAzure.Storage, Version=9.3.2.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)'

After inspecting the DLLs in my program's bin directory and the solution's packages directory with the "ildasm" disassembler, it looks like what is happening is that the CRA NuGet package pulls in a dependency on WindowsAzure.Storage NuGet package version 9.2.0, which includes Microsoft.WindowsAzure.Storage.dll version 9.2.0.0, but then the copy of CRA.ClientLibrary.dll included with the CRA NuGetPackage has a manifest that lists a dependency on Microsoft.WindowsAzure.Storage.dll version 9.3.2.0.

If I manually upgrade my program's WindowsAzure.Storage NuGet dependency to version 9.3.2, it fixes this specific problem, but the program then crashes with a similar error for the Aqua dependency. The Aqua error is also fixed by manually upgrading the version of the aqua-core NuGet package from 4.2.0 to 4.5.1.

Below I've included the code for a minimal repro, but you can also reproduce the bug by changing the CRA.Worker project in this repo to depend on the CRA NuGet package instead of the CRA.ClientLibrary project in its solution.

namespace CRATest
{
    class Program
    {
        static void Main(string[] args)
        {
            CRAWorker worker = new CRAWorker("dummyInstance", "127.0.0.1", 1500, "dummyConnectionString");
            worker.Start();
        }
    }
}

Generalize metadata storage provider

Currently, we are hard-coded to use Azure Tables for storage of metadata. We would like to generalize the storage provider so that one can plugin any storage provider (possibly outside Azure) such as etcd, zookeeper, FASTER, Cassandra, etc.

High latencies when one CRA worker has connections with many offline CRA workers

When a CRA worker has established connections with several other CRA workers, and many of those other workers are offline, its connections with the online workers take a long time to get established and have high latency.

I have noticed this behavior when working with AMBROSIA applications with a client/server architecture. A typical configuration is 10 clients that are each connected to a single server through CRA. If I initially spin up all 10 clients and the server, kill all nodes, and then start up the server and a single client, their connection takes several seconds to finish being established, and that connection has high latency even after it is set up.

Resource Temporarily Unavailable while it really is..?

For a few days I have been playing around with the Microsoft.CRA framework. While the codebase is not up to date (docker build on master doesnt work among other things) I managed to get a sample project running in some time.

This was in a local kubernetes cluster, connected to Azure Storage. Launching the application would go via the recovery route, ie first 'deploy' to the Azure storage and then let each instance in the kubernetes cluster pull in binaries etc. (as I have yet to figure out how to expose the containers so the cra-instances could be reached from outside the kubernetes cluster)

Now I wanted to experiment with some custom code and this has left me utterly puzzled, I keep getting Azure storage errors when the cra-instances are trying to pull their endpoint/vertex etc data from the Azure tables. Please see below. Its very hard to spot the error as it takes 5 minutes before it gets thrown. Interestingly the error only gets thrown in the kubernetes cluster, as when I try to run a cra-instance from the host OS (debug from vs) it just hangs forever after showing the startup message.

Perhaps the issue lies in the docker containers I am running the application from (containers based on netcore runtime 2.1), though as it did work with the example and also does not work from the host OS this may be unlikely.

Some help would be greatly appreciated!

PS. the interest in this particular framework is because I feel it is a perfect fit to use in a part of my MSc thesis.

Unhandled Exception: System.AggregateException: One or more errors occurred. (Resource temporarily unavailable) ---> Microsoft.WindowsAzure.Storage.StorageException: Resource temporarily unavailable ---> System.Net.Http.HttpRequestException: Resource temporarily unavailable ---> System.Net.Sockets.SocketException: Resource temporarily unavailable
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.CreateConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.WaitForCreatedConnectionAsync(ValueTask`1 creationTask)
   at System.Threading.Tasks.ValueTask`1.get_Result()
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncUnbuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteAsyncInternal[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext, CancellationToken token)
   --- End of inner exception stack trace ---
   at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteAsyncInternal[T](RESTCommand`1 cmd, IRetryPolicy policy, OperationContext operationContext, CancellationToken token)
   at CRA.ClientLibrary.TableExtensions.ExecuteQueryAsync[T](CloudTable table, TableQuery`1 query, CancellationToken ct, Action`1 onProgress) in C:\projects\CRA-master\src\CRA.ClientLibrary\Utilities\AssemblyUtils.cs:line 32
   at CRA.DataProvider.Azure.AzureVertexInfoProvider.GetAll() in C:\projects\CRA-master\src\CRA.ClientLibrary\AzureProvider\AzureVertexInfoProvider.cs:line 22
   at CRA.DataProvider.Azure.AzureVertexInfoProvider.GetAllRowsForInstance(String instanceName) in C:\projects\CRA-master\src\CRA.ClientLibrary\AzureProvider\AzureVertexInfoProvider.cs:line 39
   at CRA.ClientLibrary.CRAWorker.RestoreVerticesAndConnections() in C:\projects\CRA-master\src\CRA.ClientLibrary\Main\CRAWorker.cs:line 899
   at CRA.ClientLibrary.CRAWorker.StartServer() in C:\projects\CRA-master\src\CRA.ClientLibrary\Main\CRAWorker.cs:line 959
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at CRA.ClientLibrary.CRAWorker.<Start>b__21_0() in C:\projects\CRA-master\src\CRA.ClientLibrary\Main\CRAWorker.cs:line 138
   at System.Threading.Thread.ThreadMain_ThreadStart()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

Edit: fix formatting of stacktrace

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.