Git Product home page Git Product logo

ivanaxu / ideeprec Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 1.0 128.11 MB

DeepRec For Me https://github.com/alibaba/DeepRec

Home Page: https://deeprec.readthedocs.io/zh/latest/index.html

License: Apache License 2.0

Starlark 2.45% Shell 0.49% Batchfile 0.02% Python 33.30% Dockerfile 0.05% C++ 55.45% C 0.60% Java 0.58% CMake 0.15% Makefile 0.07% HTML 3.11% Cuda 0.14% Jupyter Notebook 1.93% MLIR 1.35% SWIG 0.11% Cython 0.01% LLVM 0.01% Objective-C 0.06% Objective-C++ 0.14% Ruby 0.01%

ideeprec's Introduction

བཀྲ་ཤིས་བདེ་ལེགས་

  • Practice wakatime
  • Profile Website Website Website Website Website Website Website
  • Product Website
  • Progress Website' Website' Website
  • Poetry Daily

四 支

茶對酒,賦對詩,燕子對鶯兒。栽花對種竹,落絮對遊絲。四目頡,一足夔,鴝鵒對鷺鷥。半池紅菡萏,一架白荼蘼。幾陣秋風能應候,一犁春雨甚知時。智伯恩深,國士吞變形之炭;羊公德大,邑人豎墮淚之碑。

行對止,速對遲,舞劍對圍棋。花箋對草字,竹簡對毛錐。汾水鼎,峴山碑,虎豹對熊羆。花開紅錦繡,水漾碧琉璃。去婦因探鄰舍棗,出妻爲種後園葵。笛韻和諧,仙管恰從雲裏降;櫓聲咿軋,漁舟正向雪中移。

戈對甲,鼓對旗,紫燕對黃鸝。梅酸對李苦,青眼對白眉。三弄笛,一圍棋,雨打對風吹。海棠春睡早,楊柳晝眠遲。張駿曾爲槐樹賦,杜陵不作海棠詩。晉士特奇,可比一斑之豹;唐儒博識,堪爲五總之龜。

ideeprec's People

Contributors

ivanaxu avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

nann-auto

ideeprec's Issues

part2 🍎 x0.4 编译开启OneDNN + Eigen Threadpool工作线程池版本+ABI=0的版本 CPU

https://deeprec.readthedocs.io/zh/latest/oneDNN.html

oneDNN 是 Intel 开源的跨平台深度学习性能加速库,通过 文档 可以了解到被支持的原语,DeepRec 中已经加入了 oneDNN 的支持,只需要在 DeepRec 编译命令中加入关于 oneDNN 的编译选项:--config=mkl_threadpool 即可开启 oneDNN 加速算子计算。在支持 AVX512 指令集的机器(Sky Lake 及其之后的 CPU)上添加 --config=opt 选项,默认会打开 --copt=-march=native 的优化,可以进一步加速算子计算性能。

Tips: MKL-DNN 被重命名为 DNNL,之后又被重命名为 oneDNN;TensorFlow 初期采用的是 MKL 加速算子计算,在之后的版本迭代中,逐步使用 oneDNN 替换了 MKL,但宏定义还是仍然保留。

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:


Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
        --config=gdr            # Build with GDR support.
        --config=verbs          # Build with libverbs support.
        --config=ngraph         # Build with Intel nGraph support.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=noignite       # Disable Apache Ignite support.
        --config=nokafka        # Disable Apache Kafka support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished

https://deeprec.readthedocs.io/zh/latest/DeepRec-Compile-And-Install.html

GPU/CPU版本编译

bazel build -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package
GPU/CPU版本编译+ABI=0

bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --host_cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=opt //tensorflow/tools/pip_package:build_pip_package
编译开启OneDNN + Eigen Threadpool工作线程池版本(CPU)

bazel build  -c opt --config=opt  --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true //tensorflow/tools/pip_package:build_pip_package
编译开启OneDNN + Eigen Threadpool工作线程池版本+ABI=0的版本 (CPU)

bazel build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --host_cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" -c opt --config=opt --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true //tensorflow/tools/pip_package:build_pip_package

Demo Submit in The Semi-Finals

日期:2022-10-13 21:58:04
score:2082.8406
DIEN (s):380.1314
DIN (s):329.4982
DLRM (s):345.6923
DeepFM (s):341.6685
MMoE (s):331.0767
WDL (s):354.7736

part1 🌈 w0.1 基于CostModel的Executor

https://deeprec.readthedocs.io/zh/latest/Executor-Optimization.html

基于CostModel的Executor

通过动态Trace指定的Session Run情况,统计与计算多组指标,通过CostModel计算出一个较优的调度策略。该功能中包含了基于关键路径的调度策略和根据CostModel批量执行耗时短的算子的调度策略。

使用方式 首先用户可以指定Trace哪些Step的Sesison Run来收集执行指标,默认是收集100~200 Step的指标,通过设置下列环境变量,用户可以自定义此参数。

os.environ['START_NODE_STATS_STEP'] = "200"
os.environ['STOP_NODE_STATS_STEP'] = "500"
上述示例表示Trace 200~500区间的执行指标。 如果START_NODE_STATS_STEP小于STOP_NODE_STATS_STEP,会Disable此Trace功能,后续CostModel计算也不会被执行。 同时在用户脚本中,需要增加增加下列代码来开启基于CostModel的Executor功能,

sess_config = tf.ConfigProto()
sess_config.executor_policy = tf.ExecutorPolicy.USE_COST_MODEL_EXECUTOR

with tf.train.MonitoredTrainingSession(
master=server.target,
...
config=sess_config) as sess:
...

inter_threads/intra_threads = 7/1

  • serving/processor/serving/model_config.cc
    (*config)->inter_threads = schedule_threads / (8/7); // 2

    (*config)->intra_threads = schedule_threads / (8/1); // 2

session_num = 2 & select_session_policy = "RR"

  • serving/processor/serving/model_config.h
  // session num of session group,
  // default num is 1
  int session_num = 2; // 1
  // In multi-session mode, we have two policy for
  // select session for each thread.
  // "RR": Round-Robin policy, threads will use all sessions in Round-Robin way
  // "MOD": Thread select session according unique id, uid % session_num
  std::string select_session_policy = "RR"; // MOD

And More...

part1 🌈 w0.8 base w0.7

日期:2022-10-30 23:37:36
score:864000.0000
DIEN (s):0.0000
DIN (s):86.6291
DLRM (s):123.3763
DeepFM (s):118.7644
MMoE (s):134.3123
WDL (s):145.5450

Add configure & run / no configure & run

# run.sh

#
echo
echo "> Run"

#
echo
echo ">> STEP@1"
cd /pro/DeepRec
ls -l

#
echo
echo ">> STEP@2"
# ./configure

#
echo
echo ">> STEP@3"
# bazel build  -c opt --config=opt  --config=mkl_threadpool --define build_with_mkl_dnn_v1_only=true//tensorflow/tools/pip_package:build_pip_package

echo
echo ">> STEP@4"
# ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /pkg/tensorflow_pkg

echo
echo ">> STEP@5"
pip uninstall tensorflow -y
pip install /pkg/tensorflow_pkg/tensorflow-1.15.5+deeprec2206-cp36-cp36m-linux_x86_64.whl

echo
echo ">> STEP@6"
cd /pro/DeepRec/tianchi
python run_models.py

#
echo "> Run"

The Semi-Finals 复赛

初赛(2022年7月19日-10月8日,UTC+8)

提交时间及规则:7月19日10:00-10月8日18:00。每队每天有5次提交结果的机会,系统进行实时评测并每小时返回最新成绩。按照评测指标从高到低排序。排行榜将选择参赛队伍在本阶段的历史最优成绩进行排名展示;
进入复赛队伍:初赛结束,以10月11日18:00排行榜的榜单信息为准,组委会将审核并取消作弊等不合理行为的团队比赛资格,晋级空缺名额后补。初赛成绩符合要求且通过实名认证的排名前80名的参赛队伍将进入复赛。
复赛(2022年10月13日—11月14日,UTC+8)

提交时间及规则:10月13日10:00-11月14日18:00。每队每天有1次提交结果的机会,系统进行实时评测并每小时返回最新成绩。按照评测指标从高到低排序。排行榜将选择参赛队伍在本阶段的历史最优成绩进行排名展示;
进入决赛队伍:复赛结束,以11月16日18:00发布的榜单成绩为准。组委会将审核并取消作弊等不合理行为的团队比赛资格,晋级空缺名额后补。复赛成绩符合要求且通过实名认证的排名前6名的参赛队伍将进入复赛。TOP7-16的队伍获得优胜奖且受邀出席决赛现场。

part1 🌈 w0.5 sess_config.graph_options.optimizer_options.micro_batch_num = 4

https://deeprec.readthedocs.io/zh/latest/Auto-Micro-Batch.html

AutoMicroBatch功能依赖于用户开启图优化的选项,需要注意的是,如果用户配置batch_size=1024,配置micro_batch_num=2,那么实际等价于用户之前使用batch_size=2048训练的收敛性。如果用户使用前的batch_size=512,使用large_batch_size功能配置micro_batch_num=4,那么在不改变收敛性的情况下,建议用户同时修改batch_size=128,用户接口如下:

config = tf.ConfigProto()
config.graph_options.optimizer_options.micro_batch_num = 4

inter_threads/intra_threads = 1/1

  • serving/processor/serving/model_config.cc
    (*config)->inter_threads = schedule_threads / (8/1); // 2

    (*config)->intra_threads = schedule_threads / (8/1); // 2

part1 🌈 w0.6 sess_config.*

https://deeprec.readthedocs.io/zh/latest/Smart-Stage.html

  • sess_config.graph_options.optimizer_options.do_smart_stage = True

https://deeprec.readthedocs.io/zh/latest/Auto-Fusion.html

  • sess_config.graph_options.optimizer_options.do_op_fusion = True

https://deeprec.readthedocs.io/zh/latest/Async-Embedding-Stage.html

  • sess_config.graph_options.optimizer_options.do_async_embedding = True
  • sess_config.graph_options.optimizer_options.async_embedding_threads_num = 4
  • sess_config.graph_options.optimizer_options.async_embedding_capacity = 4

https://deeprec.readthedocs.io/zh/latest/Executor-Optimization.html

  • sess_config.executor_policy = tf.ExecutorPolicy.USE_INLINE_EXECUTOR

https://deeprec.readthedocs.io/zh/latest/XLA.html

  • sess_config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1

part1 🌈 w0.3 storage_size=[1024*1024*1024, 10*1024*1024*1024]

下面是EmbeddingVariable多级存储的接口定义:

@tf_export(v1=["StorageOption"])
class StorageOption(object):
def init(self,
storage_type=None,
storage_path=None,
storage_size=[102410241024]):
self.storage_type = storage_type
self.storage_path = storage_path
self.storage_size = storage_size
参数解释:

stroage_type:使用的存储类型, 例如DRAM_SSD为使用DRAM和SSD作为embedding的存储,具体支持的存储类型会在第4节中给出
storage_path: 如果使用SSD存储,则需要配置该参数指定保存embedding数据的文件夹路径
storage_size: 指定每个层级可以使用的存储容量,单位是字节,例如对于DRAM+PMem要使用1GB DRAM和 10GB PMem,则配置为[102410241024, 1010241024*1024],默认是每级1GB,目前的实现中无法限制SSD的使用量

inter_threads/intra_threads = 4/8

  • serving/processor/serving/model_config.cc
inter_threads = schedule_threads / (8/4); // 2
intra_threads = schedule_threads / (8/8); // 2

Note

sess_config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.