Git Product home page Git Product logo

compass's Introduction

Compass

中文文档

Compass is a big data task diagnosis platform, which aims to improve the efficiency of user troubleshooting and reduce the cost of abnormal tasks for users.

The key features:

  • Non-invasive, instant diagnosis, you can experience the diagnostic effect without modifying the existing scheduling platform.

  • Supports multiple scheduling platforms(DolphinScheduler, Airflow, or self-developed etc.)

  • Supports Spark 2.x or 3.x, Hadoop 2.x or 3.x troubleshooting.

  • Supports workflow layer exception diagnosis, identifies various failures and baseline time-consuming abnormal problems.

  • Supports Spark engine layer exception diagnosis, including 14 types of exceptions such as data skew, large table scanning, and memory waste.

  • Supports various log matching rule writing and abnormal threshold adjustment, and can be optimized according to actual scenarios.

Compass has supported the concept of diagnostic types:

Diagnostic Dimensions Diagnostic Type Type Description
Failure analysis Run failure Tasks that ultimately fail to run
First failure Tasks that have been retried more than once
Long term failure Tasks that have failed to run in the last ten days
Time analysis Baseline time abnormality Tasks that end earlier or later than the historical normal end time
Baseline time-consuming abnormality Tasks that run for too long or too short relative to the historical normal running time
Long running time Tasks that run for more than two hours
Error analysis SQL failure Tasks that fail due to SQL execution issues
Shuffle failure Tasks that fail due to shuffle execution issues
Memory overflow Tasks that fail due to memory overflow issues
Cost analysis Memory waste Tasks with a peak memory usage to total memory ratio that is too low
CPU waste Tasks with a driver/executor calculation time to total CPU calculation time ratio that is too low
Efficiency analysis Large table scanning Tasks with too many scanned rows due to no partition restrictions
OOM warning Tasks with a cumulative memory of broadcast tables and a high memory ratio of driver or executor
Data skew Tasks where the maximum amount of data processed by the task in the stage is much larger than the median
Job time-consuming abnormality Tasks with a high ratio of idle time to job running time
Stage time-consuming abnormality Tasks with a high ratio of idle time to stage running time
Task long tail Tasks where the maximum running time of the task in the stage is much larger than the median
HDFS stuck Tasks where the processing rate of tasks in the stage is too slow
Too many speculative execution tasks Tasks in which speculative execution of tasks frequently occurs in the stage
Global sorting abnormality Tasks with long running time due to global sorting

Get Started

1. Compile

git clone https://github.com/cubefs/compass.git
cd compass
mvn package -DskipTests

2. Configure

cd dist/compass

vi bin/compass_env.sh
# Scheduler MySQL
export SCHEDULER_MYSQL_ADDRESS="ip:port"
export SCHEDULER_MYSQL_DB="scheduler"
export SCHEDULER_DATASOURCE_USERNAME="user"
export SCHEDULER_DATASOURCE_PASSWORD="pwd"
# Compass MySQL
export COMPASS_MYSQL_ADDRESS="ip:port"
export COMPASS_MYSQL_DB="compass"
export SPRING_DATASOURCE_USERNAME="user"
export SPRING_DATASOURCE_PASSWORD="pwd"
# Kafka
export SPRING_KAFKA_BOOTSTRAPSERVERS="ip1:port,ip2:port"
# Redis
export SPRING_REDIS_CLUSTER_NODES="ip1:port,ip2:port"
# Zookeeper
export SPRING_ZOOKEEPER_NODES="ip1:port,ip2:port"
# Elasticsearch
export SPRING_ELASTICSEARCH_NODES="ip1:port,ip2:port"
vi conf/application-hadoop.yml
hadoop:
  namenodes:
    - nameservices: logs-hdfs # the value of dfs.nameservices
      namenodesAddr: [ "machine1.example.com", "machine2.example.com" ] # the value of dfs.namenode.rpc-address.[nameservice ID].[name node ID]
      namenodes: [ "nn1", "nn2" ] # the value of dfs.ha.namenodes.[nameservice ID]
      user: hdfs
      password:
      port: 8020
      # scheduler platform hdfs log path keyword identification, used by task-application
      matchPathKeys: [ "flume" ]

  yarn:
    - clusterName: "bigdata"
      resourceManager: [ "machine1:8088", "machine2:8088" ] # the value of yarn.resourcemanager.webapp.address
      jobHistoryServer: "machine3:19888" # the value of mapreduce.jobhistory.webapp.address

  spark:
    sparkHistoryServer: [ "machine4:18080" ] # the value of spark.history.ui

3. Initialize the database and tables

The Compass table structure consists of two parts, one is compass.sql, and the other is a table that depends on the scheduling platform (dolphinscheduler.sql or airflow.sql, etc.)

  1. Please execute document/sql/compass.sql first

  2. If you are using the DolphinScheduler scheduling platform, please execute document/sql/dolphinscheduler.sql; if you are using the Airflow scheduling platform, please execute document/sql/airflow.sql

  3. If you are using a self-developed scheduling platform, please refer to the task-syncer module to determine the tables that need to be synchronized

4. Deploy

./bin/start_all.sh

Documents

architecture document

deployment document

User Interface

overview overview-1 tasks onclick application cpu memory

Community

Welcome to join the community for the usage or development of Compass. Here is the way to get help:

  • Submit an issue.
  • Join the wechat group, search and add WeChat ID daiwei_cn or zebozhuang. Please indicate your intention in the verification information. After verification, we will invite you to the community group.

License

Compass is licensed under the Apache License, Version 2.0 For detail see LICENSE and NOTICE.

compass's People

Contributors

hoey94 avatar mio0330 avatar nilnon avatar wforget avatar yves-yuan avatar zebozhuang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.