netflix / inviso Goto Github PK

License: Other

Python 12.20% Java 21.15% CoffeeScript 5.75% CSS 3.81% JavaScript 49.91% HTML 7.18%

inviso's Introduction

Overview

Inviso is a lightweight tool that provides the ability to search for Hadoop jobs, visualize the performance, and view cluster utilization.

Design and Components

REST API for Job History: REST endpoint to load an entire job history file as a json object.

ElasticSearch: Search over jobs and correlate Hadoop jobs for Pig and Hive scripts.

Python Scripts: Scripts to index job configurations into ElasticSearch for querying. These scripts can accommodate a pub/sub model for use with SQS or some other queuing service to better distribute the load or allow other systems to know about job events.

Web UI: Provides an interface to serach and visualize jobs and cluster data.

Requirements

JDK 1.7+
Apache Tomcat (7+)
ElasticSearch (1.0+)
Hadoop 2 Cluster
- Log aggregation must be enabled for task log linking to work
- Specific version of Hadoop may need to set in the gradle build file
- Some functionality is available for Hadoop 1, but requires more configuration

QuickStart

Inviso is easy to setup given a Hadoop cluster. To get a quick preview, it is easiest to configure Inviso on the NameNode/ResourceManager host.

Pull down required resources and stage them

> wget http://<mirror>/.../apache-tomcat-7.0.55.tar.gz
> tar -xzf apache-tomcat-7.0.55.tar.gz
> rm -r apache-tomcat-7.0.55/webapps/*
> wget http://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.tar.gz
> tar -xzf elasticsearch-1.3.2.tar.gz

Clone the Inviso repository and build the java project

> git clone https://github.com/Netflix/inviso.git
> cd inviso
> ./gradlew assemble
> cd ..

Copy WAR files and link Static Web Pages

> cp inviso/trace-mr2/build/libs/inviso#mr2#v0.war apache-tomcat-7.0.55/webapps/
> ln -s `pwd`/inviso/web-ui/public apache-tomcat-7.0.55/webapps/ROOT

Start ElasticSearch and create Indexes

> ./elasticsearch-1.3.2/bin/elasticsearch -d
> curl -XPUT http://localhost:9200/inviso -d @inviso/elasticsearch/mappings/config-settings.json
  {"acknowledged":true}
> curl -XPUT http://localhost:9200/inviso-cluster -d @inviso/elasticsearch/mappings/cluster-settings.json
  {"acknowledged":true}

Start Tomcat

> ./apache-tomcat-7.0.55/bin/startup.sh

Build virtual environment and index some jobs

> virtualenv venv
> source venv/bin/activate
> pip install -r inviso/jes/requirements.txt
> cd inviso/jes/
> cp settings_default.py settings.py
> python jes.py
> python index_cluster_stats.py

#Run in a cron or loop
> while true; do sleep 60s; python jes.py; done&
> while true; do sleep 60s; python index_cluster_stats.py; done&

Navigate to http://hostname:8080/

QuickStart - Docker Version

An alternate way of starting the inviso project would be via docker. If you already have docker installed, you can run the following command:

docker run -d -p 8080:8080 savaki/inviso

This will launch inviso in your container running on port 8080.

Enjoy!

inviso's People

Contributors

Stargazers

Watchers

inviso's Issues

"Open Logs" doesnt work with dfs.nameservices name

When clicking on "Open Logs" link on a task's information on the profile page, if I click on open logs I end up with "HTTP Status 500 - java.lang.reflect.InvocationTargetException". The root cause is

java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1

If I change the url in the address bar at this point to NamenodeHostname:8020 it works ok. i.e. the following works. i have confirmed that the node inviso is running has the correct hadoop client conf and is able to handle hdfs://nameservice1 type of urls. Any ideas how i can get inviso to be able to as well?

replacing

?fs=hdfs://nameservice1&root

with

?fs=hdfs://NamenodeHostname:8020&root

Cluster charts doesn't work

Hi,

I've configured both python scripts and they work (I see new data in elastic search), tabs 'Search' and 'Profiler' work, but 'Cluster' doesn't.

Here is what I see:

When opening Cluster tab, there is only one request to ElasticSearch:

POST /inviso-cluster/metrics/_search?size=0

with form data:
{"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1416559166153,"lte":1417768766154}}}]}},"aggs":{"clusters":{"terms":{"field":"cluster","size":50}}}}:

and response:
{"took":5,"timed_out":false,"_shards":{"total":6,"successful":6,"failed":0},"hits":{"total":1077,"max_score":0.0,"hits":[]},"aggregations":{"clusters":{"buckets":[{"key":"hefajstos","doc_count":1077}]}}}

What could be wrong? I don't see any errors neither in tomcat log or elasticsearch log.

Cluster not listed at tab #cluster

Strangely nothing is shown on the web app. In particular there is nothing to select under the Cluster drop-down menu and it just shows Nothing selected. So probably the webapp isn't correctly configured yet. I followed the guide and also changed settings.py to the following:

clusters = [
    Cluster(id='clustername', name='clustername', host=socket.getfqdn())
]

and

clusters = [
    Cluster(id='clustername', name='clustername', host='hostname')
]

to no avail. Any ideas?

cannot compile with 2.6.0

This may be related to gradle/maven but I cannot seem to get this build for hadoop 2.6.0. Changing the versions in trace-mr2/build.gradle to 2.6.0 crashes the build process, stating

/srv/invisio/inviso/trace-mr2/src/main/java/com/netflix/bdp/inviso/fs/S3DelegateFS.java:29: error: package org.apache.hadoop.fs.s3native does not exist

Is there any other place I have to change the hadoop version? Or something else I am missing?

does inviso require genie and aws?

I am able to follow the instructions and get all the way to the "python jes.py" part in step 6 but that blows up with the error below. I am trying to get this working with a locally installed cluster and not aws. I also don't have genie. what do I need to do to get this to work?

(venv)[skhehra@testvm jes]$ python jes.py
ERROR:inviso.jes:[Errno 32] Broken pipe
Traceback (most recent call last):
File "jes.py", line 35, in main
monitor.run()
File "/home/skhehra/inviso/jes/inviso/monitor.py", line 295, in run
for f in listing:
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/client.py", line 139, in ls
recurse=recurse):
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/client.py", line 1072, in _find_items
fileinfo = self._get_file_info(path)
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/client.py", line 1202, in _get_file_info
return self.service.getFileInfo(request)
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/service.py", line 35, in
rpc = lambda request, service=self, method=method.name: service.call(service_stub_class.dict[method], request)
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/service.py", line 41, in call
return method(self.service, controller, request)
File "/home/skhehra/venv/lib/python2.6/site-packages/google/protobuf/service_reflection.py", line 267, in
self._StubMethod(inst, method, rpc_controller, request, callback))
File "/home/skhehra/venv/lib/python2.6/site-packages/google/protobuf/service_reflection.py", line 284, in _StubMethod
method_descriptor.output_type._concrete_class, callback)
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/channel.py", line 411, in CallMethod
self.send_rpc_message(method, request)
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/channel.py", line 309, in send_rpc_message
self.write_delimited(rpc_request_header)
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/channel.py", line 238, in write_delimited
self.write(encoder._VarintBytes(len(data)))
File "/home/skhehra/venv/lib/python2.6/site-packages/snakebite/channel.py", line 235, in write
self.sock.send(data)
error: [Errno 32] Broken pipe

Missing jobs' information

I search a user's job information, but the "Stop", "Duration", "Links", "Workflow ID" and "Genie Name" information is not showing up. Moreover, the "Job Status" shows "UNKNOWN".

Dose it have any connection with my hadoop configuration?
Which file finishes the work of gathering jobs' information so I can check up?

Error Walking through the README quick start

I'm following the instructions for the quick start and I get to the point where I run the jes.py

It fails with the following

python jes.py
ERROR:inviso.jes:[Errno 32] Broken pipe
Traceback (most recent call last):
  File "jes.py", line 35, in main
    monitor.run()
  File "/home/liamm/invisio/inviso/jes/inviso/monitor.py", line 295, in run
    for f in listing:
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/client.py", line 139, in ls
    recurse=recurse):
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/client.py", line 1072, in _find_items
    fileinfo = self._get_file_info(path)
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/client.py", line 1202, in _get_file_info
    return self.service.getFileInfo(request)
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/service.py", line 35, in <lambda>
    rpc = lambda request, service=self, method=method.name: service.call(service_stub_class.__dict__[method], request)
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/service.py", line 41, in call
    return method(self.service, controller, request)
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/google/protobuf/service_reflection.py", line 267, in <lambda>
    self._StubMethod(inst, method, rpc_controller, request, callback))
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/google/protobuf/service_reflection.py", line 284, in _StubMethod
    method_descriptor.output_type._concrete_class, callback)
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/channel.py", line 411, in CallMethod
    self.send_rpc_message(method, request)
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/channel.py", line 309, in send_rpc_message
    self.write_delimited(rpc_request_header)
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/channel.py", line 238, in write_delimited
    self.write(encoder._VarintBytes(len(data)))
  File "/home/liamm/invisio/venv/lib/python2.6/site-packages/snakebite/channel.py", line 235, in write
    self.sock.send(data)
error: [Errno 32] Broken pipe

Documentation for Search and Profiler

How to use search and profiler ? I just get blank page? Can someone provide brief example ?

Document project roadmap and contribution stance

I'm looking at using Inviso for our clusters, but since it's been about 2 years since there was any notable changes there are some obvious concerns about the status of the project.

It would be useful if you could document (in the README.md file?) what the roadmap and status of the project is (have there been further changes at Netflix that are yet to be released? Has there literally been no changes in this time?) and what the project's stance towards contributions is.

How does this work with the docker container?

Everything in the UI is just empty. I tried changing settings.py and then running jes,py, but it dies with 'connection refused'.

Can't run inviso with elastic search 2.3.2

Hello folks,

I'm trying to run inviso with elastic search 2.3.2.
While trying to run this command:
curl -XPUT http://localhost:9200/inviso -d @inviso/elasticsearch/mappings/config-settings.json

I get the following error:

{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Mapping definition for [_timestamp] has unsupported parameters: [store : true]"}],"type":"mapper_parsing_exception","reason":"Failed to parse mapping [default]: Mapping definition for [_timestamp] has unsupported parameters: [store : true]","caused_by":{"type":"mapper_parsing_exception","reason":"Mapping definition for [_timestamp] has unsupported parameters: [store : true]"}},"status":400}

How can we adjust inviso in order to have it compatible with elastic search 2.3.2 ?

too many data points

Getting the following error even for relatively small (sub day long) selections:

Error with index_cluster_stats.py

I'm following the instructions for the QuickStart.

When it comes to step 6 with the command " python index_cluster_stats.py ", it fails with the following. I am confused with the port 9026, it just shows up suddenly, I do not know when it get configured. Should I change the port?

ERROR:inviso.cluster:Error processing: cluster_1
ERROR:inviso.cluster:HTTPConnectionPool(host='taoran1', port=9026): Max retries exceeded with url: /ws/v1/cluster/apps?state=RUNNING (Caused by <class 'socket.error'>: [Errno 111] Connection refused)
Traceback (most recent call last):
File "index_cluster_stats.py", line 90, in index_stats
index_apps(es, cluster, info)
File "index_cluster_stats.py", line 17, in index_apps
apps = requests.get('http://%s:%s/ws/v1/cluster/apps?state=RUNNING' % (cluster.host, '9026'), headers = {'ACCEPT':'application/json'}).json().get('apps')
File "/home/venv/lib/python2.6/site-packages/requests/api.py", line 55, in get
return request('get', url, *_kwargs)
File "/home/venv/lib/python2.6/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, *_kwargs)
File "/home/venv/lib/python2.6/site-packages/requests/sessions.py", line 456, in request
resp = self.send(prep, *_send_kwargs)
File "/home/venv/lib/python2.6/site-packages/requests/sessions.py", line 559, in send
r = adapter.send(request, *_kwargs)
File "/home/venv/lib/python2.6/site-packages/requests/adapters.py", line 375, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='taoran1', port=9026): Max retries exceeded with url: /ws/v1/cluster/apps?state=RUNNING (Caused by <class 'socket.error'>: [Errno 111] Connection refused)

Also, when conducted the command "python jes.py", it came up with the followings, I do not know whether it matters for the Error above.

INFO:inviso-monitor:Publishing event: (cluster_1) job_1413958950234_0008 2014-10-23T02:37:35+00:00
INFO:inviso-handler:Processing 1 events
WARNING:inviso-handler:No trace info available for hdfs://taoran1:9000/tmp/hadoop-yarn/staging/history/done/2014/10/23/000000/job_1413958950234_0008-1414031808636-hadoop-QuasiMonteCarlo-1414031855566-4-1-SUCCEEDED-default-1414031818339.jhist
INFO:inviso-handler:Indexing 1 documents
INFO:inviso-handler:Events complete.