Git Product home page Git Product logo

magpie's Introduction

Magpie

This is the code repository for the a big data system parameter automatic optimization paper titled 'Magpie: Efficient Big Data Query System Parameter Optimization based on Pre-selection and Search Pruning Approach'.

Magpie can recommend the best parameter configuration of the big data system (Flink,Spark,etc.)according to the performance target requirements and parameters set by the user and their range of values.

Prerequisites

    CentOS 7.5
    Java 1.8
    Python 3.6.3
    Hadoop 2.6.7
    Hive 2.3.4
    Flink 1.11.0
    Prometheus 2.19.2
    Pushgateway 1.2.0

When installing java, hadoop, hive and Flink, please make sure to set user environment variables for them, such as JAVA_HOME, HADOOP_HOME, FLINK_HOME and PATH

Before the system is running, use Python to load the LightGBM dependency package, install the command: pip install lightgbm

Before the system runs, please make sure that your job can run normally in the Flink cluster

Quick Start

  1. Compile and package

    cd Magpie
    mvn clean install -Dmaven.test.skip=true
    
  2. System configuration: configure flink parameters and values, inspected performance indicators, performance goals, flink execution jobs and job types and other parameters in conf/config.yaml

    #Flink dir
    flink.dir: /env/flink-1.11.0
    #Flink parameters values
    parameters:
        taskmanager.memory.process.size: [2g,3g,4g,5g,6g,7g,8g,9g,40g,12g,14g,16g,18g,20g,24g,30g]
        taskmanager.numberOfTaskSlots: [2,3,4,5,6,7,8,9,10,11,12,16,20]
        taskmanager.memory.network.fraction: [0.05,0.1,0.15,0.2, 0.25]     
        taskmanager.memory.managed.fraction: [0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.6,0.7]
         parallelism.default: [2,4,8,10,16,20,30,32,40,48,50,60,70,80]
    #performance target
    target: 1.0
    #Flink Job compute model
    flink.job.model: batch
    #job type
    flink.job.type: SQL
    #Flink job submit
    job.submit.cmd: ./bin/flink  run -m yarn-cluster  -c  org.apache.flink.benchmark.Benchmark\  
            ~/target/flink-tpcds-0.1-SNAPSHOT-jar-with-dependencies.jar\    
    		--database tpcds_bin_orc_100\ 
            --queries q7.sql
    
  3. Running

    ./bin/start.sh &
    

    After the system is running, you can check whether the Flink job is running normally on Flink Web port 8081 or Yarn port 8088, and you can check job performance data on Prometheus Web port 9091. If you want to stop the system running, execute the command ./bin/stop.sh

  4. Operation result: monitor the parameter search process and view the recommended configuration parameter result output

    tail –f logs/task.log (Running)
    tail –f logs/task.out (After running)
    

magpie's People

Contributors

haoeeeee avatar ronggu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.