Git Product home page Git Product logo

elasticsearch-test-data's Introduction

Elasticsearch For Beginners: Generate and Upload Randomized Test Data

Because everybody loves test data.

Ok, so what is this thing doing? lets you generate and upload randomized test data to your ES cluster so you can start running queries, see what performance is like, and verify your cluster is able to handle the load.

It allows for easy configuring of what the test documents look like, what kind of data types they include and what the field names are called.

Cool, how do I use this?

Run Python script

Let's assume you have an Elasticsearch cluster running.

Python and Tornado are used. Run pip install tornado to install Tornado if you don't have it already.

It's as simple as this:

$ python --es_url=http://localhost:9200
[I 150604 15:43:19 es_test_data:42] Trying to create index http://localhost:9200/test_data
[I 150604 15:43:19 es_test_data:47] Guess the index exists already
[I 150604 15:43:19 es_test_data:184] Generating 10000 docs, upload batch size is 1000
[I 150604 15:43:19 es_test_data:62] Upload: OK - upload took:    25ms, total docs uploaded:    1000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    25ms, total docs uploaded:    2000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    19ms, total docs uploaded:    3000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    18ms, total docs uploaded:    4000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    27ms, total docs uploaded:    5000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    19ms, total docs uploaded:    6000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    15ms, total docs uploaded:    7000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    24ms, total docs uploaded:    8000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    32ms, total docs uploaded:    9000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    31ms, total docs uploaded:   10000
[I 150604 15:43:20 es_test_data:216] Done - total docs uploaded: 10000, took 1 seconds
[I 150604 15:43:20 es_test_data:217] Bulk upload average:           23 ms
[I 150604 15:43:20 es_test_data:218] Bulk upload median:            24 ms
[I 150604 15:43:20 es_test_data:219] Bulk upload 95th percentile:   31 ms

Without any command line options, it will generate and upload 1000 documents of the format


to an Elasticsearch cluster at http://localhost:9200 to an index called test_data.

Docker and Docker Compose

Requires Docker for running the app and Docker Compose for running a single ElasticSearch domain with two nodes (es1 and es2).

  1. Set the maximum virtual memory of your machine to 262144 otherwise the ElasticSearch instances will crash, see the docs
    $ sudo sysctl -w vm.max_map_count=262144
  2. Clone this repository
    $ git clone
    $ cd elasticsearch-test-data
  3. Run the ElasticSearch stack
    $ docker-compose up --detached
  4. Run the app and inject random data to the ES stack
    $ docker run --rm -it --network host oliver006/es-test-data  \
        --es_url=http://localhost:9200  \
        --batch_size=10000  \
        --username=elastic \
  5. Cleanup
    $ docker-compose down --volumes

Not bad but what can I configure?

python --help gives you the full set of command line ptions, here are the most important ones:

  • --es_url=http://localhost:9200 the base URL of your ES node, don't include the index name
  • --username=<username> the username when basic auth is required
  • --password=<password> the password when basic auth is required
  • --count=### number of documents to generate and upload
  • --index_name=test_data the name of the index to upload the data to. If it doesn't exist it'll be created with these options
    • --num_of_shards=2 the number of shards for the index
    • --num_of_replicas=0 the number of replicas for the index
  • --batch_size=### we use bulk upload to send the docs to ES, this option controls how many we send at a time
  • --force_init_index=False if True it will delete and re-create the index
  • --dict_file=filename.dic if provided the dict data type will use words from the dictionary file, format is one word per line. The entire file is loaded at start-up so be careful with (very) large files.
  • --data_file=filename.json|filename.csv if provided all data in the filename will be inserted into es. The file content has to be an array of json objects (the documents). If the file ends in .csv then the data is automatically converted into json and inserted as documents.

What about the document format?

Glad you're asking, let's get to the doc format.

The doc format is configured via --format=<<FORMAT>> with the default being name:str,age:int,last_updated:ts.

The general syntax looks like this:

<<field_name>>:<<field_type>>,<<field_name>>::<<field_type>>, ...

For every document, will generate random values for each of the fields configured.

Currently supported field types are:

  • bool returns a random true or false
  • ts a timestamp (in milliseconds), randomly picked between now +/- 30 days
  • ipv4 returns a random ipv4
  • tstxt a timestamp in the "%Y-%m-%dT%H:%M:%S.000-0000" format, randomly picked between now +/- 30 days
  • int:min:max a random integer between min and max. If min and max are not provided they default to 0 and 100000
  • str:min:max a word ( as in, a string), made up of min to max random upper/lowercase and digit characters. If min and max are optional, defaulting to 3 and 10
  • words:min:max a random number of strs, separated by space, min and max are optional, defaulting to '2' and 10
  • dict:min:max a random number of entries from the dictionary file, separated by space, min and max are optional, defaulting to '2' and 10
  • text:words:min:max a random number of words seperated by space from a given list of - seperated words, the words are optional defaulting to text1 text2 and text3, min and max are optional, defaulting to 1 and 1
  • arr:[array_length_expression]:[single_element_format] an array of entries with format specified by single_element_format. array_length_expression can be either a single number, or pair of numbers separated by - (i.e. 3-7), defining range of lengths from with random length will be picked for each array (Example int_array:arr:1-5:int:1:250)


  • document the remaining cmd line options
  • more different format types
  • ...

All suggestions, comments, ideas, pull requests are welcome!

elasticsearch-test-data's People


ariarijp avatar cornoualis avatar dependabot[bot] avatar konghui avatar kzzalews avatar luk-kaminski avatar mrsinisterx avatar oliver006 avatar sacredwx avatar unfor19 avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

elasticsearch-test-data's Issues

can't connect to es

root@DESKTOP-71H61H4:~/elasticsearch-test-data# python3 --es_url=http://localhost:9200
[I 210806 10:48:07 es_test_data:55] Trying to create index http://localhost:9200/test_data
Traceback (most recent call last):
File "", line 284, in
File "/usr/local/lib/python3.6/dist-packages/tornado/", line 530, in run_sync
return future_cell[0].result()
File "/usr/lib/python3.6/asyncio/", line 243, in result
raise self._exception
File "/usr/local/lib/python3.6/dist-packages/tornado/", line 234, in wrapper
yielded = ctx_run(next, result)
File "/usr/local/lib/python3.6/dist-packages/tornado/", line 162, in _fake_ctx_run
return f(*args, **kw)
File "", line 202, in generate_test_data
File "", line 57, in create_index
response = tornado.httpclient.HTTPClient().fetch(request)
File "/usr/local/lib/python3.6/dist-packages/tornado/", line 135, in fetch
functools.partial(self._async_client.fetch, request, **kwargs)
AttributeError: 'Task' object has no attribute 'fetch'
Exception ignored in: <bound method HTTPClient.del of <tornado.httpclient.HTTPClient object at 0x7f0902ab3198>>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tornado/", line 113, in del
File "/usr/local/lib/python3.6/dist-packages/tornado/", line 118, in close
AttributeError: 'Task' object has no attribute 'close'

Added a Dockerfile

@oliver006 Thank you for this great script, it works perfectly.

Just thought you might want to use this Dockerfile that I've created. Please note that I'm copying from tests/es_test_data so you'll need to remove the tests prefix if you build it from this repository.

DockerHub unfor19/es-test-data
Size unpacked: 47.9MB
Size @ DockerHub: 17.21MB


$ docker-compose up -d # docker-compose.yml at the bottom
$ docker run --rm -it --network host unfor19/es-test-data  \
   --es_url=http://localhost:9200  \
   --batch_size=10000  \
   --username=elastic \

[I 210317 22:22:19 es_test_data:54] Trying to create index http://localhost:9200/test_data
[I 210317 22:22:19 es_test_data:61] Looks like the index exists already
[I 210317 22:22:19 es_test_data:238] Generating 100000 docs, upload batch size is 10000
[I 210317 22:22:20 es_test_data:82] Upload: OK - upload took:   731ms, total docs uploaded:   10000
[I 210317 22:22:21 es_test_data:82] Upload: OK - upload took:   749ms, total docs uploaded:   20000
[I 210317 22:22:22 es_test_data:82] Upload: OK - upload took:   679ms, total docs uploaded:   30000
[I 210317 22:22:24 es_test_data:82] Upload: OK - upload took:   729ms, total docs uploaded:   40000
[I 210317 22:22:25 es_test_data:82] Upload: OK - upload took:   691ms, total docs uploaded:   50000
[I 210317 22:22:26 es_test_data:82] Upload: OK - upload took:   729ms, total docs uploaded:   60000
[I 210317 22:22:27 es_test_data:82] Upload: OK - upload took:   766ms, total docs uploaded:   70000
[I 210317 22:22:28 es_test_data:82] Upload: OK - upload took:   698ms, total docs uploaded:   80000
[I 210317 22:22:30 es_test_data:82] Upload: OK - upload took:   709ms, total docs uploaded:   90000
[I 210317 22:22:31 es_test_data:82] Upload: OK - upload took:   705ms, total docs uploaded:  100000
[I 210317 22:22:31 es_test_data:272] Done - total docs uploaded: 100000, took 12 seconds


### --------------------------------------------------------------------
### Docker Build Arguments
### Available only during Docker build - `docker build --build-arg ...`
### --------------------------------------------------------------------
ARG APP_NAME="tornado"
# Reminder- the ENTRYPOINT is hardcoded so make sure you change it (remove this comment afterwards)
### --------------------------------------------------------------------

### --------------------------------------------------------------------
### Build Stage
### --------------------------------------------------------------------
FROM python:"$PYTHON_VERSION"-slim-"${DEBIAN_VERSION}" as build


# Define env vars

# Upgrade pip and then install build tools
RUN pip install --upgrade pip && \
    pip install --upgrade wheel setuptools wheel

# Define workdir

# Install the app
RUN pip install --ignore-installed --no-warn-script-location --prefix="/dist" "$APP_NAME"=="$APP_VERSION"

WORKDIR /dist/

COPY tests/ .

# For debugging the Build Stage
CMD ["bash"]
### --------------------------------------------------------------------

### --------------------------------------------------------------------
### App Stage
### --------------------------------------------------------------------
FROM python:"$PYTHON_VERSION"-alpine"${ALPINE_VERSION}" as app

# Fetch values from ARGs that were declared at the top of this file

# Define env vars

# Define workdir

# Run as a non-root user
    addgroup -g "${APP_GROUP_ID}" "${APP_GROUP_NAME}" && \
    adduser -H -D -u "$APP_USER_ID" -G "$APP_GROUP_NAME" "$APP_USER_NAME" && \

# Copy artifacts from Build Stage
COPY --from=build --chown="$APP_USER_NAME":"$APP_GROUP_ID" /dist/ "$PYTHONUSERBASE"/

# The container runs the application, or any other supplied command, such as "bash" or "echo hello"
# CMD python -m ${APP_NAME}

# Use ENTRYPOINT instead CMD to force the container to start the application
ENTRYPOINT ["python", ""]


version: "3.7"

### ------------------------------------------------------------------
### Variables
### ------------------------------------------------------------------
  exposed-port: &exposed-port 9200
  es-base: &es-base
        soft: -1
        hard: -1
      - elastic
  data-path: &data-path /usr/share/elasticsearch/data      
  snapshots-repository-path: &snapshots-repository-path /usr/share/elasticsearch/backup
  volume-snapshots-repository: &volume-snapshots-repository 
    - type: volume
      source: snapshots-repository
      target: *snapshots-repository-path  
  services-es-env: &es-env-base
    "": "es-docker-cluster"
    "cluster.initial_master_nodes": "es01,es02"
    "bootstrap.memory_lock": "true"
    "ES_JAVA_OPTS": "-Xms512m -Xmx512m"
    "ELASTIC_PASSWORD": "esbackup-password"
    "": "true"
    "path.repo": *snapshots-repository-path
### ------------------------------------------------------------------

  es01: # master
    <<: *es-base
    container_name: es01
      <<: *es-env-base es01
      discovery.seed_hosts: es02
      - <<: *volume-snapshots-repository 
      - type: volume
        source: data01
        target: *data-path
      - published: *exposed-port
        target: 9200
        protocol: tcp
        mode: host

    <<: *es-base
    container_name: es02
      <<: *es-env-base es02
      discovery.seed_hosts: es01
      - <<: *volume-snapshots-repository 
      - type: volume
        source: data02
        target: *data-path

    driver: local
    driver: local
    driver: local

    driver: bridge

upload failed, error: stream closed

to be fair I am a cs rookie

[E 220930 18:28:48 es_test_data:76] upload failed, error: Stream closed
[E 220930 18:28:48 es_test_data:76] upload failed, error: Stream closed
[E 220930 18:28:48 es_test_data:76] upload failed, error: Stream closed
[E 220930 18:28:48 es_test_data:76] upload failed, error: Stream closed
[E 220930 18:28:48 es_test_data:76] upload failed, error: Stream closed
[E 220930 18:28:48 es_test_data:76] upload failed, error: Stream closed

help would be appreciated

RuntimeError: Cannot run the event loop while another loop is running

I got the following error when I run the command

python --es_url=http://localhost:9200 --index_name=test

The error:
[I 200622 08:48:49 es_test_data:52] Trying to create index http://localhost:9200/test
Traceback (most recent call last):
File "", line 281, in
File "C:\Program Files (x86)\Python\Python38-32\lib\site-packages\tornado\", line 532, in run_sync
return future_cell[0].result()
File "C:\Program Files (x86)\Python\Python38-32\lib\site-packages\tornado\", line 209, in wrapper
yielded = next(result)
File "", line 199, in generate_test_data
File "", line 54, in create_index
response = tornado.httpclient.HTTPClient().fetch(request)
File "C:\Program Files (x86)\Python\Python38-32\lib\site-packages\tornado\", line 107, in init
self._async_client = self._io_loop.run_sync(make_client)
File "C:\Program Files (x86)\Python\Python38-32\lib\site-packages\tornado\", line 526, in run_sync
File "C:\Program Files (x86)\Python\Python38-32\lib\site-packages\tornado\platform\", line 149, in start
File "C:\Program Files (x86)\Python\Python38-32\lib\asyncio\", line 316, in run_forever
File "C:\Program Files (x86)\Python\Python38-32\lib\asyncio\", line 560, in run_forever
File "C:\Program Files (x86)\Python\Python38-32\lib\asyncio\", line 554, in _check_running
raise RuntimeError(
RuntimeError: Cannot run the event loop while another loop is running

Upload: FAILED

Hi, I'm running this in localhost and the ES server is a vagrant box running locally, I'm forwarding port 9200 from guest to 9201 host. I'd like to get this working, see errors below. Any help would be appreciated. Thanks

python3 --es_url= git:master* [I 210612 04:39:13 es_test_data:55] Trying to create index [I 210612 04:39:13 es_test_data:60] Looks like the index exists already [I 210612 04:39:13 es_test_data:228] Generating 100000 docs, upload batch size is 1000 [I 210612 04:39:13 es_test_data:81] Upload: FAILED - upload took: 152ms, total docs uploaded: 1000 [I 210612 04:39:13 es_test_data:81] Upload: FAILED - upload took: 5ms, total docs uploaded: 2000 [I 210612 04:39:13 es_test_data:81] Upload: FAILED - upload took: 42ms, total docs uploaded: 3000

generate failed

[root@es01 elasticsearch-test-data-master]# python --es_url= --username=elastic --password=elastic --count=10000 --num_of_shards=5 --num_of_replicas=2 --batch_size=1000 --index_name=index03
[I 220905 21:34:08 es_test_data:56] Trying to create index
Traceback (most recent call last):
File "", line 319, in
File "/usr/local/lib64/python3.6/site-packages/tornado/", line 530, in run_sync
return future_cell[0].result()
File "/usr/lib64/python3.6/asyncio/", line 243, in result
raise self._exception
File "/usr/local/lib64/python3.6/site-packages/tornado/", line 234, in wrapper
yielded = ctx_run(next, result)
File "/usr/local/lib64/python3.6/site-packages/tornado/", line 162, in _fake_ctx_run
return f(*args, **kw)
File "", line 215, in generate_test_data
File "", line 58, in create_index
response = tornado.httpclient.HTTPClient().fetch(request)
File "/usr/local/lib64/python3.6/site-packages/tornado/", line 135, in fetch
functools.partial(self._async_client.fetch, request, **kwargs)
AttributeError: 'Task' object has no attribute 'fetch'
Exception ignored in: <bound method HTTPClient.del of <tornado.httpclient.HTTPClient object at 0x7f82db854160>>
Traceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/tornado/", line 113, in del
File "/usr/local/lib64/python3.6/site-packages/tornado/", line 118, in close
AttributeError: 'Task' object has no attribute 'close'

Delete index not working

How to reproduce:

  • Specify the --force_delete_index=True option
  • Expected result: the index is deleted
  • Actual result: a 400 Bad Request error is returned by Elasticsearch, the index is not deleted


  • Remove ?refresh=true from the url variable in function delete_index
-        url = "%s/%s?refresh=true" % (tornado.options.options.es_url, idx_name)
+        url = "%s/%s" % (tornado.options.options.es_url, idx_name)

RuntimeError at newer tornado versions

Running under Python 3.7.3 and tornado==6.0.3 (version is not specified in requirements.txt) throws the following error:

$ python --es_url=http://localhost:9200 --count=100 --index_name=fluentd-test-data --num_of_shards=10 --num_of_replicas=1
[I 190709 17:04:16 es_test_data:46] Trying to create index http://localhost:9200/fluentd-test-data
Traceback (most recent call last):
  File "", line 276, in <module>
  File "/Users/me/.pyenv/versions/es-test-data-py37/lib/python3.7/site-packages/tornado/", line 532, in run_sync
    return future_cell[0].result()
  File "/Users/me/.pyenv/versions/es-test-data-py37/lib/python3.7/site-packages/tornado/", line 209, in wrapper
    yielded = next(result)
  File "", line 193, in generate_test_data
  File "", line 48, in create_index
    response = tornado.httpclient.HTTPClient().fetch(request)
  File "/Users/me/.pyenv/versions/es-test-data-py37/lib/python3.7/site-packages/tornado/", line 107, in __init__
    self._async_client = self._io_loop.run_sync(make_client)
  File "/Users/me/.pyenv/versions/es-test-data-py37/lib/python3.7/site-packages/tornado/", line 526, in run_sync
  File "/Users/me/.pyenv/versions/es-test-data-py37/lib/python3.7/site-packages/tornado/platform/", line 148, in start
  File "/Users/me/.pyenv/versions/3.7.3/lib/python3.7/asyncio/", line 529, in run_forever
    'Cannot run the event loop while another loop is running')
RuntimeError: Cannot run the event loop while another loop is running

Installing tornado==4.5.3 solves the issue. I've found workaround here: jupyter/notebook#3397

Index options not picked up


I'm trying to override the index config defaults using the below:

docker run --rm -it --network host oliver006/es-test-data  --es_url=http://... --batch_size=1000 --num_of_shards=1 --num_of_replicas=2 --index_name=test_data4

However it seems to ignore the options for shards & replicas

curl http://.../test_data4/  |  jq
  "test_data4": {
    "aliases": {},
    "mappings": {
      "properties": {
        "age": {
          "type": "long"
        "last_updated": {
          "type": "long"
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
    "settings": {
      "index": {
        "creation_date": "1621856375498",
        "number_of_shards": "1",
        "number_of_replicas": "1",
        "uuid": "_LBis-bSTmOqmEIi2lnJNQ",
        "version": {
          "created": "7040199"
        "provided_name": "test_data4"

Can you confirm that this is working? Am I missing something?

Command line parameter not being used.

It is a good tool to generate bulk elasticsearch test data however parameters passed are not being handled correctly and parameter name given in --help option is not consistent with variable used in script ( - in help output vs _ in the script ). For example index-name vs index_name .
When specifying index_name it still creates index as test_data.

New feature requests

Nice work! At an initial glance, the first couple features I think would be great are:

  1. ability to provide a JSON file to describe the format for test messages
    ...I realise of course that this would require all data types (including nested objects) to be parseable

  2. Ability to specify different dicts and tie different field names to those dicts

  3. Generate (separate or combined) random street address data... although this might be covered by feature (2) above.

Error with default options against local cluster

I have a local Elasticsearch 2.3.2 instance and using Python 3.5...

I ran the following:

python --es_url=http://localhost:9200

Here's the output with exception:

[I 160515 23:59:36 es_test_data:47] Trying to create index http://localhost:9200/test_data
[I 160515 23:59:37 es_test_data:50] Creating index "test_data" done   b'{"acknowledged":true}'
[I 160515 23:59:37 es_test_data:217] Generating 10000 docs, upload batch size is 1000
Traceback (most recent call last):
  File "", line 274, in <module>
  File "C:\Python35\lib\site-packages\tornado\", line 453, in run_sync
    return future_cell[0].result()
  File "C:\Python35\lib\site-packages\tornado\", line 232, in result
  File "<string>", line 3, in raise_exc_info
  File "C:\Python35\lib\site-packages\tornado\", line 1014, in run
    yielded = self.gen.throw(*exc_info)
  File "", line 235, in generate_test_data
    yield upload_batch(upload_data_txt)
  File "C:\Python35\lib\site-packages\tornado\", line 1008, in run
    value = future.result()
  File "C:\Python35\lib\site-packages\tornado\", line 232, in result
  File "<string>", line 3, in raise_exc_info
  File "C:\Python35\lib\site-packages\tornado\", line 1017, in run
    yielded = self.gen.send(value)
  File "", line 68, in upload_batch
    result = json.loads(response.body)
  File "C:\Python35\lib\json\", line 312, in loads
TypeError: the JSON object must be str, not 'bytes'

seems like that the u cant spec field names with a '.'

seems like that the u cant spec field names with a '.'


$ ... '--format=sourceSystem:text,sourceSystemCustomerId:text,tenant:text,firstName:text,lastName:text,email.address'

[I 190521 14:58:02 es_test_data:46] Trying to create index ......
[I 190521 14:58:02 es_test_data:51] Looks like the index exists already
[I 190521 14:58:02 es_test_data:220] Generating 1 docs, upload batch size is 1000
Traceback (most recent call last):
  File "", line 276, in <module>
  File "/home/ubuntu/.local/lib/python2.7/site-packages/tornado/", line 576, in run_sync
    return future_cell[0].result()
  File "/home/ubuntu/.local/lib/python2.7/site-packages/tornado/", line 261, in result
  File "/home/ubuntu/.local/lib/python2.7/site-packages/tornado/", line 326, in wrapper
    yielded = next(result)
  File "", line 223, in generate_test_data
    item = generate_random_doc(format)
  File "", line 155, in generate_random_doc
    f_key, f_val = get_data_for_format(f)
  File "", line 81, in get_data_for_format
    field_type = split_f[1]
IndexError: list index out of range

while this works

$ ... '--format=sourceSystem:text,sourceSystemCustomerId:text,tenant:text,firstName:text,lastName:text'

Request Enhancements for better performance and stress testing.

Is it possible to reduce CPU usage by using predefined strings in memory as field value instead of generating random strings each time? Reason for this request is I observed 100% CPU installation when running this tool. Each random string generation seems to consume CPU cycle. Further , as this is single threaded script, it does not make use of available CPU in multicore nodes. Thus I am not able to fully stress the Elasticsearch nodes. When single thread CPU utilization reaches 100%, latency of indexing increases though CPU, Load, Memory or IOPs are not a bottleneck on ES node. Can the script use multi-threading option?

In addition to just insert, option for updating together with search queries could make it even better to simulate realistic cases.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.