Git Product home page Git Product logo

tinesoft / spring-esdata-loader Goto Github PK

View Code? Open in Web Editor NEW
6.0 5.0 2.0 658 KB

Set of JUnit Rules/Extensions to easily load data to test your spring-data elasticsearch-based projects

Home Page: https://tinesoft.github.io/spring-esdata-loader/

License: MIT License

Java 98.75% Shell 1.25%
spring-data-elasticsearch junit4 integration-tests junit-jupiter elasticsearch spring-test spring hacktoberfest

spring-esdata-loader's Introduction

spring-esdata-loader

Build  Status codebeat badge GitHub

spring-esdata-loader is a Java 8+ testing library to help you write integration tests for your spring-data elasticsearch-based projects, by allowing you to easily load data into Elasticsearch, using entity mappings (i.e domain classes annotated with @Document, @Field, etc) and via specific Junit 4's Rules or JUnit Jupiter's Extensions.

The library reads all the metadata it needs from the entity classes (index name, index type, etc) , uses them to create/refresh the index on the ES server and feeds it with the data using the ElasticsearchOperations present in your test application context.

Features

  • Simple API and no configuration required
  • Support for JUnit 4 via @LoadEsDataRule, @DeleteEsDataRule
  • Support for JUnit Jupiter via
    • @LoadEsDataConfig / @LoadEsDataExtension
    • @DeleteEsDataConfig / @DeleteEsDataExtension
  • Built-in support for gzipped data
  • Multiple data formats(dump, manual)
  • Written in Java 8
  • Based on Spring (Data, Test)

Dependencies

spring-esdata-loader is based on dependencies that you already have in your Spring (Boot) project, if you are doing Elasticsearch with Spring :

Installation & Usage

The library is split into 2 independent sub-modules, both are available on JCenter and Maven Central:

  • spring-esdata-loader-junit4 for testing with JUnit 4
  • spring-esdata-loader-junit-jupiter for testing with JUnit Jupiter

To get started,

  1. add the appropriate dependency to your gradle or maven project
Gradle Maven
JUnit 4
dependencies {
    testImplementation 'com.github.spring-esdata-loader:spring-esdata-loader-junit4:1.1.0'
}
<dependency>
    <groupId>com.github.spring-esdata-loader</groupId>
    <artifactId>spring-esdata-loader-junit4</artifactId>
    <version>1.1.0</version>
    <scope>test</scope>
</dependency>
JUnit Jupiter
dependencies {
    testImplementation 'com.github.spring-esdata-loader:spring-esdata-loader-junit-jupiter:1.1.0'
}
<dependency>
    <groupId>com.github.spring-esdata-loader</groupId>
    <artifactId>spring-esdata-loader-junit-jupiter</artifactId>
    <version>1.1.0</version>
    <scope>test</scope>
</dependency>
  1. write your test class. You can have a look at:

Supported Data Formats

spring-esdata-loader currently supports 2 formats to load data into Elasticsearch: DUMP and MANUAL.

Dump data format

Here is an example:

{"_index":"author","_type":"Author","_id":"1","_score":1,"_source":{"id":"1","firstName":"firstName1","lastName":"lastName1"}}
{"_index":"author","_type":"Author","_id":"2","_score":1,"_source":{"id":"2","firstName":"firstName2","lastName":"lastName2"}}
{"_index":"author","_type":"Author","_id":"3","_score":1,"_source":{"id":"3","firstName":"firstName3","lastName":"lastName3"}}
{"_index":"author","_type":"Author","_id":"4","_score":1,"_source":{"id":"4","firstName":"firstName4","lastName":"lastName4"}}
{"_index":"author","_type":"Author","_id":"5","_score":1,"_source":{"id":"5","firstName":"firstName5","lastName":"lastName5"}}
{"_index":"author","_type":"Author","_id":"6","_score":1,"_source":{"id":"6","firstName":"firstName6","lastName":"lastName6"}}
{"_index":"author","_type":"Author","_id":"7","_score":1,"_source":{"id":"7","firstName":"firstName7","lastName":"lastName7"}}
{"_index":"author","_type":"Author","_id":"8","_score":1,"_source":{"id":"8","firstName":"firstName8","lastName":"lastName8"}}
{"_index":"author","_type":"Author","_id":"9","_score":1,"_source":{"id":"9","firstName":"firstName9","lastName":"lastName9"}}
{"_index":"author","_type":"Author","_id":"10","_score":1,"_source":{"id":"10","firstName":"firstName10","lastName":"lastName10"}}

You can use a tool like elasticdump (requires NodeJS) to extract existing data from your Elasticsearch server, and them dump them into a JSON file.

$ npx elasticdump --input=http://localhost:9200/my_index --output=my_index_data.json

The above command will run elasticdump to extract data from an index named my_index on a ES server located at http://localhost:9200 and then save the result into a file named my_index_data.json

If you change the --output part above into --output=$ | gzip my_data.json.gz the data will be automatically gzipped

Manual data format

In this format, you specify your target data directly (no metadata like _index, _source, ...), as an Array of JSON objects.

This is more suitable when you create test data from scratch (as opposed to dumping existing ones from a ES server) because it is easier to tweak later on to accommodate future modifications in tests.

Here is an example:

[
    {"id":"1","firstName":"firstName1","lastName":"lastName1"},
    {"id":"2","firstName":"firstName2","lastName":"lastName2"},
    {"id":"3","firstName":"firstName3","lastName":"lastName3"},
    {"id":"4","firstName":"firstName4","lastName":"lastName4"},
    {"id":"5","firstName":"firstName5","lastName":"lastName5"},
    {"id":"6","firstName":"firstName6","lastName":"lastName6"},
    {"id":"7","firstName":"firstName7","lastName":"lastName7"},
    {"id":"8","firstName":"firstName8","lastName":"lastName8"},
    {"id":"9","firstName":"firstName9","lastName":"lastName9"},
    {"id":"10","firstName":"firstName10","lastName":"lastName10"}
]

Contributing

Contributions are always welcome! Just fork the project, work on your feature/bug fix, and submit it. You can also contribute by creating issues. Please read the contribution guidelines for more information.

License

Copyright (c) 2019 Tine Kondo. Licensed under the MIT License (MIT)

spring-esdata-loader's People

Contributors

tinesoft avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

spring-esdata-loader's Issues

Feature Request: add the possibility to load data from a standard JSON file

Bug Report or Feature Request (mark with an x)

- [ ] bug report -> please search issues before submitting
- [x] feature request

Desired functionality

It would be a real improvement to give the possibility to load data from a standard JSON file in addition to a JSON stream file (as it's already the case).
It would be more human readable and easier to add data for test purposes.

What do you think abouth that @tinesoft ?

[Bug] Importing huge data is very CPU and memory intensive

Bug Report or Feature Request (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request

Loading huge data file (200K+ entries) is very memory and CPU intensive (almost 100% of cpu used). Make sure that the streaming of each line/row being read is effective.

Tested on a Docker env, capped to 300M of memory and 1 CPU

Spring Versions?

- `spring boot version`: all
- or `spring`, 'spring-test`, and `spring-data-elasticsearch` versions: all

Elasticsearch server Version?

6.7.0

OS Version?

Linux (CentOS)

Repro steps

Import a huge dataset (200K+ entries), on a docker env, capped to 300M and 1 CPU for example.

The log given by the failure

Desired functionality

Mention any other details that might be useful

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.