Git Product home page Git Product logo

dataprocessingapi-mvp-example's Introduction

DevExpress Data Processing API (MVP)

DevExpress Data Processing API is a prototype (MVP) of a new product and is therefore not yet a part of the DevExpress product line.

DevExpress Data Processing API is a .NET library that allows you to convert your data (including ETL and data analysis) into usable and desired form.

flow

Typical scenarios:

  • Add ETL (Extract, Transform, Load) capabilities to .NET applications.
  • Use data shaping (grouping, sorting, filtering, applying analytics functions) before you display data in a UI application, regardless of platform.

The main features of the presented library:

  • Connect to different data sources (relational databases, web services, Excel spreadsheets, JSON data, and so on) using a unified interface.
  • Process data at runtime in the application memory.
  • Embed business logic written in .NET at any point in your data processing.
  • Use functions to clean and structure data alongside with analytical functions.
  • Debug your app using a wide range of API.
  • Transform your data quickly from raw data to final output.

This example is created to collect feedback and usage data. If you are interested in DevExpress Data Processing API, leave a comment under our blog post or create a ticket in our Support Center.

About the Example

The following repository contains a .NET 5.0 solution with three projects that show how to use DevExpress Data-Processing API.

To launch the example, you need to update your DevExpress NuGet packages to v21.2.4. You can find the instructions for upgrading in the following section: Install DevExpress Controls Using NuGet Packages.

For .NET Framework, reference the DevExpress.DataProcessingApi.MVP.NetFramework.dll and DevExpress libraries v.21.2.4 in your project.

ConsoleExample

View file: Program.cs

The following project is an example that transforms the user survey data in the JSON format and data about users from an XLSX file in the following way:

  • Joins these two data flows to get a one data source. The "Feature list" column is an array of data. The Unfold operation creates a new row for each item in the array.
  • Aggregates data by "RegionCountryName" and "Feature list".
  • Calculates the top 3 achievements for each country.
  • Sorts data.
  • Uploads data to an XLSX file.

The resulting XLSX file:

survey-result

CodeSamples

View file: Program.cs

The following project contains unit tests that cover different scenarios.

You can use Test Explorer in Visual Studio to launch and debug these code samples or launch them in a console application (view the Main() function for details).

test-explorer

PerformanceExample

View file: Program.cs

This example compares different data processing technologies:

  • DevExpress Data-Processing API
  • Microsoft Linq
  • Microsoft Parallel Linq

More about performance

How to Work with this API

Common Concepts

The common algorithm:

  1. Create a new data flow (DataFlow) and use one of the functions to load data (for example, FromCsv or FromDatabase).
  2. Use functions to clean and structure your data and apply analytical functions (for example, ProcessColumn, AddColumn, Join, Aggregate, and so on).
  3. Define the output data format (for example, ToExcel, ToDataTable).
  4. Execute the previously defined data flow to generate resulting data (Execute).

Load Data

  • From a CVS file: FromCsv
  • From a database: FromDatabase
  • From Excel spreadsheets (XLSX and XLS files): FromExcel
  • From a Web Service (JSON): FromJsonFile and FromJsonUrl
  • From .NET object: FromObject

Transform Data

  • Join data from different sources: Join
  • Unfold array values and display a new data row for every element in the array: Unfold
  • Add columns: AddColumn (using criteria operator or in code)
  • Modify column data: ProcessColumn
  • Filter data: Filter (using criteria operator or in code)
  • Sort data: Sort
  • Manage columns: SelectColumns, RenameColumns, RemoveColumn/RemoveColumns

Analyze Data

Upload Data

Debug

  • Get data for each step in the processed data flow: Debug

Performance

  • Data is stored by column (column-oriented DBMS). This approach allows us to optimize data analysis operations, such as data aggregation, join data from different sources, and so on.
  • Data engine supports multi-threaded data calculation to handle a large amount of data efficiently.
  • An optimized graph of data operations.

Our experiments showed that DevExpress Data Processing API can be faster or equal to Parallel Linq in aggregation calculation tasks (grouping and sums calculation).

Note that we made a number of assumptions in the MVP implementation which do not fully reveal the performance. At the same time, performance can depend on many factors (for example, just-in-time (JIT) compilation). If you encounter performance issues, please fell free to describe your scenario in our Support Center.

Documentation

Product Development Plans

If we decide to release the product, we plan to develop in the following directions:

  • Support more popular data sources and a variety of upload methods.
  • Add more features to solve the most popular ETL and analytics problems.
  • Create tools for developers to simplify the creating and debugging of data flows, including the development of Visual Studio built-in tools.
  • Performance optimization.
  • Improve diagnostic logging and error output.
  • Integration with DevExpress controls: Winforms, WPF, Blazor (as ASP.NET Core Backend).

Your opinion matters to us. Please share your thoughts in comments in our blog post: .NET Data Processing API - First Community-Sourced Scenarios Addressed.

dataprocessingapi-mvp-example's People

Contributors

dimarudnev avatar natakazakova avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.