benchflow / collectors Goto Github PK

Collectors microservices utilised to collect relevant data from containers

License: Other

Go 64.49% Makefile 35.51%

collectors's Introduction

BenchFlow

BenchFlow is an open-source expert system providing a complete platform for automating performance tests and performance analysis. We know that not all the developers are performance experts, but in nowadays agile environment, they need to deal with performance testing and performance analysis every day. In BenchFlow, the users define objective-driven performance testing using an expressive and SUT-aware DSL implemented in YAML. Then BenchFlow automates the end-to-end process of executing the performance tests and providing performance insights, dealing with system under test deployment relying on Docker technologies, distributing simulated users load on different server, error handling, performance data collection and performance metrics and insights computation.

Quick links: BenchFlow Documentation | TODO - also link to the documentation

TODO (try BenchFlow)

Purpose

TODO (BenchFlow has a strong focus on developer happiness & ease of use, and a batteries-included philosophy.)

Current project focus

The BenchFlow expert system is currently mainly focused on enabling the performance benchmark of Workflow Management Systems supporting the BPMN 2.0 modeling and execution language. Despite the main focus, most of its components are reusable and already general enough to support performance benchmarks of generic Web Services. We strongly encourage extending BenchFlow by adding missing functionalities specific to your particular benchmarking needs. TODO ([point to setup and getting started]). Website related to the current focus: http://benchflow.inf.usi.ch.

We have a temporary logo, we are going to have a proper logo at some point in the future.

Upcoming project focus

(TODO) automated objective-driven performance testing, and integration in continuous software improvement lifecycle.

Features (Why BenchFlow?)

TODO (also link to the documentation)

definition of a performance benchmark/test through a dedicated DSL; automatisation of the deployment of the System Under Test on distributed infrastructures using Docker; reliable execution of the performance benchmark using Faban; data collection and cleaning; data analysis in the form of computed metrics and KPIs.

Use Cases

TODO (to show the uses of the tool, linking to an actual article explaining how to do that, point also to controbuting for extending)

Installation or Upgrade

TODO (explain current project state in dev, and link to docs and uses to state that it is usable, but not 100% battle tested)

TODO (getbenchflow in container for client and docs for the rest [also links to docker hub if needed, ot least in the developer documentation], current release. Explain: Docker as prerequisite)

(TODO, maybe at the top) Project Status: The project is currently in active development and is tested on Mac OS X for the client command line side tools, and Ubuntu 14.04.2 LTS for the server side tools. The main project branch is devel [maybe for now say that there are no release yet, but we are in the process of having the first release]

Getting Started

Prerequisites

TODO (simplest example, then links to the docs for advanced stuff)

Installing

TODO (needs help or customisation, write contacts)

Built With

TODO

Contributing

TODO (also related to extending to custom software, and links to developer documentation and TODOs)

Versioning

TODO (SemVer + link to docs)

Authors

TODO

License

The license for all the not third-party code in the BenchFlow repositories is RPL-1.5, unless otherwise noted

collectors's People

Contributors

Stargazers

Watchers

Forkers

simonedavico

collectors's Issues

Improved REST API with correct methods

Currently we assume that all calls are done with GET on the API. To have a proper RESTful API we need to accept the correct methods, which should be:

PUT for /store
POST for /start
PUT for /stop

If the wrong method is used the collector should return a HTTP 405 METHOD NOT ALLOWED

Change base Image to base

We don't need to rely on the mc anymore, so we can switch to the base base image.

Then delete: https://github.com/benchflow/docker-images/tree/dev/base-images/base-minio-client

Get createDockerClient from Commons

createDockerClient should be obtained from commons.

Recently I changed the way to manage commons, you find all the information in benchflow/benchflow#33 on how it currently works and what are the planned improvements.

The same applies to https://github.com/benchflow/monitors

Make the implemented collectors clean and remove hard coded stuff

Remove:

Hard coded environment variables
Hard coded strings
Not needed folders and files

Add:

Dependency on envconsul to retrieve the environment variables
Consul service discovery

~~Manage:~~
~~- Dependency by only using Godeps~~

Docker:

Container go that builds a project from git and remove the dependency on golang;
Also have Dockerfile for local development, using base container to avoid code cloning among Dockerfile, if possible

Comments:

@Cerfoglg add some explicative comment to the code, after refactoring it.

Move common code for collectors in a common project

The collectors should notify Kafka when the file is stored on Minio

After storing the file on Minio, the collector should write a message on Kafka to synchronize with the data-transformers.

Depends on: #28

Send also the host_id when sending container_id

We should send also the host_id when sending container_id. Update also the kafka message structure: benchflow/data-analyses-scheduler#54

This impacts: https://github.com/benchflow/spark-tasks-sender, https://github.com/benchflow/data-transformers, https://github.com/benchflow/analysers

Investigate and Fix the Following Bugs

Zip Collector

If TO_ZIP is /, then what I get is Data read ‘22789403’ is not equal to the size ‘22757376’ of the input Reader. and no data are saved on Minio

Logs Collector

No data are on Minio after the call.

Add CSV saving to mysqldump

Add extra functionality to the mysqldump collector to:

Save a given set of tables as CSV files. Each table needs to be saved in its own CSV file.
Save the definition of the given set of tables, as in saving the type of the columns (int, varchar, ...). Each table should have its own CSV file with a single row defining for each column its data type.

In the case of Mysql, the dump should be achieved by quering the database for an entire table, returned in tab separated format, and altering the output to make it into Coma Separated Values. Column types can be obtained the same way.

By saving our databases in CSV format we streamline the process of transforming the data in the transform phase of ETL (Extract Transform Load) by having a single format for all possible databases, and let the different unique collectors dump the different databases into CSV files.

Load the data on Minio in the correct Bucket

The data must be loaded on the following bucket:

/benchmarks/benchmark_id/runs/INCREMENTAL_NUMBER

Each collector must store a zip of the collected information by using the following filename:

NameOfTheCollectorContainer_CollectorName (e.g., wfms-dbms_mysqldump)

NOTE: for now lets assume that NameOfTheCollectorContainer is provided through an ENV variable and CollectorName is known by the collector

Currently we only test by executing a single run of a driver, hence the INCREMENTAL_NUMBER can be assumed to be 1. In the future, the number generation must be handled somewhere, but maybe not in the collectors. The collectors will have to know the INCREMENTAL_NUMBER in a way.

Make all collectors use gzip

For all collectors, when zipping a file use gzip for compression.

By using gzip we remove the need to use external zipping methods, as gzip is present on all linux installations. In addition, gzip compressed files can be used directly by Spark when creating a context if opened directly, and even without it Python has a gzip package as part of the default installation (meaning we also don't have to import any extra python modules for handling compressed data).

Gzip also offers a good compress ratio at a very good speed, which makes it a good choice for the database dumps we are storing on Minio (CSV format).

Enable Execution Logs Collection

We need somehow to collect the execution of the collectors (for which we need to define the critical section to log), in case something goes wrong. One proposal would be to collect these logs on a file (directly or using a logs collector) and store them on Minio. This is useful for debugging purposes.

Requirements:

We should take care of the ephemeral execution of the collectors

Possible solution:

We might need a Gateway acting as a coordinator for the BenchFlow services, and handling centralised log collection from the same.

Think about apply the same to the monitors, so that we can enable again the logging.

Wrong collector name for CPU collector

The cpu collector should be named stats collector.

Properly Comment the Code

Comment all the choices in the code for which a comment might help to understand the why. Although it is tricky to decide what comment and what not to comment, a suggestion can be to comment all the code that after a couple of day you wrote need more than 5 seconds to grasp what the code is doing. For sure algorithms must be commented and documented in the most important steps.

Improve Design

Take a look at the following project, to get inspiration about defining a library to develop Go lang collectors and design them: https://github.com/elastic/beats

Make log collector an offline microservice

Make it so that the log collector will contact the docker API and obtain the logs with a single request, rather than attaching to the container and constantly obtaining the entries.

This way, we don't need to attach to a container and potentially impact performance of them, and makes log collecting into a single "collect" call, rather than start and stopping like the stats. The docker API returns the stdout and stderr of the container, which can be read from, and the logs written down into a file.

Also provide the option to query the microservice for the logs starting from a given time. This is easy to implement, given the docker API for returning logs already provides a "since" option when queried.

Define the Topics and Messages sent to Kafka

Define and Implement the Topic and the structure of the Messages sent from the Collectors to Kafka

Clean and define a service deployment template relying on Docker Compose

@simonedavico define a deployment descriptor we should define to add a BenchFlow service (e.g., collectors or monitors) to BenchFlow.

Here some discussed examples (where the # are generated):

#The service name should be "benchflowServiceName_BoundServiceName"
mysql:
  image: 'benchflow/collectors:mysql_dev'
  # container_name: mysql_db_TRIAL_ID
  environment:
    - KAFKA_HOST=${BENCHFLOW_ENV_KAFKA_IP}
    - MINIO_ALIAS=benchflow
    - MINIO_HOST=http://${BENCHFLOW_ENV_MINIO_IP}:${BENCHFLOW_ENV_MINIO_PORT}
    - MINIO_ACCESSKEYID=${BENCHFLOW_ENV_MINIO_ACCESSKEYID}
    - MINIO_SECRETACCESSKEY=${BENCHFLOW_ENV_MINIO_SECRETACCESSKEY}

    # - BENCHFLOW_EXPERIMENT_ID=camunda
    # - BENCHFLOW_TRIAL_ID=camunda_1O
    # - BENCHFLOW_TRIAL_TOTAL_NUM=1
    - MYSQL_DB_NAME=${BENCHFLOW_BENCHMARK_CONFIG_MYSQL_DB_NAME}
    - TABLE_NAMES=${BENCHFLOW_BENCHMARK_CONFIG_TABLE_NAMES}

    # the IP can be the local IP
    - MYSQL_HOST=${BENCHFLOW_BENCHMARK_BOUNDSERVICE_IP}
    - MYSQL_PORT=${BENCHFLOW_BENCHMARK_BOUNDSERVICE_PORT}
    - MYSQL_USER=${BENCHFLOW_BENCHMARK_CONFIG_MYSQL_USER}
    - MYSQL_USER_PASSWORD=${BENCHFLOW_BENCHMARK_CONFIG_MYSQL_USER_PASSWORD}

    # - BENCHFLOW_CONTAINER_NAME=mysql_db_TRIAL_ID
    - BENCHFLOW_COLLECTOR_NAME=mysql
    - BENCHFLOW_DATA_NAME=mysql

    # - "constraint:node==bull"
  expose:
    - 8080
  ports:
    - '8080' #192.168.41.128::8080

#The service name should be "benchflowServiceName_BoundServiceName"
stats:
  image: 'benchflow/collectors:stats_dev'
  # container_name: stats_camunda_TRIAL_ID
  environment:
    - KAFKA_HOST=${BENCHFLOW_ENV_KAFKA_IP}
    - MINIO_ALIAS=benchflow
    - MINIO_HOST=http://${BENCHFLOW_ENV_MINIO_IP}:${BENCHFLOW_ENV_MINIO_PORT}
    - MINIO_ACCESSKEYID=${BENCHFLOW_ENV_MINIO_ACCESSKEYID}
    - MINIO_SECRETACCESSKEY=${BENCHFLOW_ENV_MINIO_SECRETACCESSKEY}

    # - BENCHFLOW_EXPERIMENT_ID=camunda
    # - BENCHFLOW_TRIAL_ID=camunda_1O
    # - BENCHFLOW_TRIAL_TOTAL_NUM=1
    - CONTAINERS=${BENCHFLOW_BENCHMARK_BOUNDSERVICE_CONTAINER_NAME}

    # - BENCHFLOW_CONTAINER_NAME=stats_camunda_TRIAL_ID
    - BENCHFLOW_COLLECTOR_NAME=stats
    - BENCHFLOW_DATA_NAME=stats

    # - "constraint:node==lisa1"

  volumes:
    - /var/run/docker.sock:/var/run/docker.sock:ro
  expose:
    - 8080
  ports:
    - '8080' #192.168.41.105::8080

Improve Logs Collector by enabling the possibility to stop the collection

Add an API to stop the collection of the Logs

Useful reference: http://adampresley.com/2015/02/16/waiting-for-goroutines-to-finish-running-before-exiting.html

The .settings folder should not be in the mysqldump collector repository

The .settings folder is wrongly placed in the mysqldump repository.

Investigate Why the Blkio `Reads` Value is often Zero

@Cerfoglg TODO: describe the problem we encountered with the reads value of Blkio. We should follow the linked guidelines to describe the issue, so that thence can report it to Docker: https://github.com/docker/docker/blob/master/CONTRIBUTING.md#reporting-other-issues

Notes: it seems that not even the files from where the docker stats read the data are up to date during the execution

Discuss and Fix all the Issues reported by CodeClimate

Discuss and Fix the issues:

https://codeclimate.com/github/benchflow/collectors/issues

Enable all the Collectors as we did for mysql and logs

Refactor the zip and Faban collectors

Refactor code, in particular the backupHandler function can be parametrised so that it can be shared between the zip and the faban collectors.

Define and Uniform the APIs of the Collectors

We need to define a common REST API to interact with collectors. This should be differentiated between offline and online collectors.

Current state:

offline:

zip defines a /data API to collect and store the data
logs defines a store API to collect and store the data
dump defines a data API to collect and store the data

online:

stats defines two APIs to start and stop the collection, where the stop API also store the data

Abstract common collectors functionality in a library

Use Minio API for Golang

Use Golang APIs for saving files to Minio

https://github.com/minio/minio-go
https://docs.minio.io/docs/golang-client-api-reference

Enable Testing in CI

Refer to benchflow/benchflow#23

The MySQL collector should also save the entire sql dump of the database

As per in the title

Solve stats "Done" channel error

In the stats collector, we are using this golang API for connecting to Docker and retrieving the stats: https://github.com/fsouza/go-dockerclient

In the code, we use this function to retrieve the stats: https://godoc.org/github.com/fsouza/go-dockerclient#Client.Stats

The function is blocking, meaning that once started the routine can't be exited unless the function is stopped. It's possible to stop the function by signalling on the Done channel, which can be passed to the function inside the StatsOpts structure. Theoretically, sending a boolean to the channel should interrupt the function, however when attempting to do so an error is returned that states the channel is closed. It's possible this is an issue with the API itself.

What needs to be done:

Debug the program step by step, and try writing a smaller straight forward code to test the interruption of the function directly.
If the issue is in the API, open an issue on the GitHub of the API itself to report this problem.
Wait for the answer and fix the issue
Try if the same approach of the stopChannel works also for the doneChannel. It depends on how the library handle this channel. So: one channel for all the goroutine that gets closed when done.

Related issues: #1, #4, benchflow/monitors#1,

Add a collector that collects and stores the Container Stats from the Docker Stats API

The required functionalities are:

it must work from inside a container;
it must collect the Stats of a list of containers identified by names and provided through an Environment variables. The containers's name are separated by ":";
it must store the stats on a tmp file local to the container;
it must define APIs to decide when to start and stop the data collection;
as for the other collectors: it must define APIs to zip the data and store them on a remote S3 compatible datastore;
it must work with the less impact as possible on the monitored container performance.

Notes about CPU usage:

Compute the percentage usage given the number of core assigned to a container, not according to the host
cpushares: can be a relative weight to other containers
total_usage: CPU percentage is not feasible because Docker enables many options to share the CPU with other containers. We use the total_usage instead.

Some useful references:

Powerful go-dockerclient and Stats API: https://godoc.org/github.com/fsouza/go-dockerclient#Client.Stats
A test case that show how to use the API using the client at point 1: https://github.com/fsouza/go-dockerclient/blob/34eaaf52874d8ce5d57be011a4852eb83d950125/container_test.go#L1630
Docker Stats APIs: https://docs.docker.com/reference/api/docker_remote_api_v1.20/#get-container-stats-based-on-resource-usage

Define the deployment descriptors and update ENV variables names

Define the deployment descriptor and update ENV variables names once the template is defined in #38

Add a collector that collects files and folders from a linked container

It must take the --volumes-from a defined container, do a zip of files and folders specified in an environment variable and send the zip to an S3 like service.

Develop a solution for Minio key hashing

In the format we use for the Minio keys, we append a hash to the key to speed up the lookup when accessing Minio. We require a way to generate the hash for a given key, that can be accessed regardless of implemented language for our components.

The convenient solution is to develop an additional golang microservice we can query to generate the hash of a given key. This way we have a single microservice to handle that, meaning we won't need to implement hashing for other languages, and changing the hash function can be done once in a single location.

In addition, select a hash function to use. Some good ideas to consider here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

Move to minio-xl

Add new informations to kafka messages

See benchflow/data-analyses-scheduler#24

Refactor the Properties Collector

Refactor the code, in particular: af97e2b#diff-d6e9f78d4340c17360d3af9660f76ff6R69

Implement a Collector that Collects Faban data

We have to implement a collector, or use the zip if feasible, that collect faban data from the following folder inside the faban-harness container:

${FABAN_HOME}/output/${FABAN_RUN_ID}

Add a collector that collects a dump of a database (MYSQL for Now)

It must connect to a defined DBMS by linking to the container in which the DBMS is running, do a dump of a specified database, zip the dump and send it to an S3 like service.

Improve Stats Collector by better handling the stop of the collection

Remove the shared variable "collecting" that handles the stop of the collection, and try to see if it possible to directly use the API to understand when there are no more data.

Useful reference: http://adampresley.com/2015/02/16/waiting-for-goroutines-to-finish-running-before-exiting.html

Add a collector that collects the log of a specified list of Docker containers

The collector is similar to the one that collects Docker Stats (#1), in the sense that it offers similar functionalities but on a different API.

The required functionalities are:

it must work from inside a container;
it must collect the Logs of a list of containers identified by names and provided through an Environment variables. The containers's name are separated by ":";
it must store the stats on a tmp file local to the container;
it must enable the timestamps option of the Docker Logs API;
it must start the collection immediately after its start;
as for the other collectors: it must define APIs to zip the data and store them on a remote S3 compatible datastore;
it must work with the less impact as possible on the monitored container performance.

Some useful references:

Powerful go-dockerclient and Logs API: https://godoc.org/github.com/fsouza/go-dockerclient#Client.Logs
A test case that show how to use the API using the client at point 1: https://github.com/fsouza/go-dockerclient/blob/34eaaf52874d8ce5d57be011a4852eb83d950125/container_test.go#L1153
Docker Logs APIs: https://docs.docker.com/reference/api/docker_remote_api_v1.20/#get-container-logs

Send the ContainerID and the HostID only if needed

Currently we send the ContainerID and the HostID from each collector. Send it only when needed, it seems it is only for environment collectors.

Evaluate and discuss the differences between the Faban way and the BenchFlow way to collect data on the server side

Faban has support for services and tools deploy server side: http://faban.org/1.3/docs/guide/harnessdev/deployservice.html. Evaluate the differences with the BenchFlow way to do the same.

Collect Network Usage for Containers using --net="host"

The Docker Stats API, always returns zero for the network stats, when you set --net="host". We should investigate a way to collect network statistics for the containers using --net="host", probably by developing a dedicate container.

Some hints can be found on the following link: https://docs.docker.com/engine/articles/runmetrics/#network-metrics

The Docker API related issue are on the following page: https://github.com/docker/docker/labels/area%2Fapi

Implement a Collector That Collects the Container_Properties

Implement a Collector That Collect the Container_Properties as defined in the BenchFlow database.

Some references:

Inspect a Container. It might be possible that we can gather all the information about the container, after the benchmark execution is completed, by relying on this API.

Remove Hardcoded String

Remove hardcoded string you can see from the following search: https://github.com/benchflow/collectors/search?utf8=✓&q=Camunda

This must be read from an environment variable.

Collectors must Respond with JSON

Apply the same improvement applied for the monitors so that all the responses to client are in structured JSON objects, where the structure should be placed in the commons package to the collectors. The structure can simply be:

{
   status: "SUCCESS" or "FAILED"
   message: "..."
}

Of course the same information should be printed on the standard output of the collectors. This applies to all the collectors, because everything must be logged and the client should be aware about what happens in the collectors.

Start from the changes made in #89 and improve the error handling, the structure of the returned message, the use of http errors in case of internal error and so on.

The Properties Collector Does Not Build

The properties collector was disabled in #54, I enabled it in caffdea.

It has an issue with the build: https://travis-ci.org/benchflow/collectors/builds/131199025

Once solved, it should also be enabled on Docker Hub.

Improvements to the Stats Collector when using --net="host" to be Evaluated

Identify when using net="container" when container refer to a container using net="host" and use nethogs in this case too.

We now assume

The device to monitor are all the ones available to the container, that should be all the available interface to the host, since we don't limit them. Example of command: nethogs -d 1 docker0 eth0 lo.

Investigate whether it is possible to get only the ones used by the monitored containers, to reduce the collected data and the load. Now we collect all of them, so we get all the data.

Obtain container ID from stats collector and send it via Kafka

Our current implementation of stats collecting doesn't retrieve or make any use of the container's actual ID when collecting the stats, sending them to Minio and signalling their presence to Kafka. This is an important piece of information, as it is required when the Spark Tasks Sender will eventually execute the stats transformer and store the collected data for the specific container ID.

Modify the current implementation so it:

Retrieves the linked containers' IDs
Use container ID to name files on Minio instead of container name
Sends the container IDs over Kafka. Alter the minio key into a comma separated list of keys, one key per collected stats of a container

To indetify the container for which we collected environment statistics, we now rely of the file name of the statistics stored on Minio, that as part of the name has the container_id. We then "uniquely" identify a container's stats by grouping the statistics per experiment_id, trial_id and container_id. There can be the (very) unlikely chance that we end up having the same container_id for to different containers part of a trial. Then we must use our internally generated container_properties_id once we'll also collect the container properties with (probably) a dedicated collector for the scope that should store these data prior to any tranformer needing them can start.

We should investigate how to guarantee the uniqueness of this association. A possibility can be to store the container_id paired (_ separated) with an hash obtained by combining: the container name (that we ensure to generate unique), the host's mac address, the container_id itself, the experiment_id and the trial_id

Impacted functionalities:

Stats Collector
Spark-Tasks-Sender
Stats Transformer
Cassandra schema
Stats Analysers