Git Product home page Git Product logo

eat_tensorflow2_in_30_days's Introduction

How to eat TensorFlow2 in 30 days ?🔥🔥

Click here for Chinese Version(中文版)

《10天吃掉那只pyspark》

《20天吃掉那只Pytorch》

《30天吃掉那只TensorFlow2》

极速通道

1. TensorFlow2 🍎 or Pytorch🔥

Conclusion first:

For the engineers, priority goes to TensorFlow2.

For the students and researchers,first choice should be Pytorch.

The best way is to master both of them if having sufficient time.

Reasons:

    1. Model implementation is the most important in the industry. Deployment supporting tensorflow models (not Pytorch) exclusively is the present situation in the majority of the Internet enterprises in China. What's more, the industry prefers the models with higher availability; in most cases, they use well-validated modeling architectures with the minimized requirements of adjustment.
    1. Fast iterative development and publication is the most important for the researchers since they need to test a lot of new models. Pytorch has advantages in accessing and debugging comparing with TensorFlow2. Pytorch is most frequently used in academy since 2019 with a large amount of the cutting-edge results.
    1. Overall, TensorFlow2 and Pytorch are quite similar in programming nowadays, so mastering one helps learning the other. Mastering both framework provides you a lot more open-sourced models and helps you switching between them.

2. Keras🍏 and tf.keras 🍎

Conclusion first:

Keras will be discontinued in development after version 2.3.0, so use tf.keras.

Keras is a high-level API for the deep learning frameworks. It help the users to define and training DL networks with a more intuitive way.

The Keras libraries installed by pip implement this high-level API for the backends in tensorflow, theano, CNTK, etc.

tf.keras is the high-level API just for Tensorflow, which is based on low-level APIs in Tensorflow.

Most but not all of the functions in tf.keras are the same for those in Keras (which is compatible to many kinds of backend). tf.keras has a tighter combination to TensorFlow comparing to Keras.

With the acquisition by Google, Keras will not update after version 2.3.0 , thus the users should use tf.keras from now on, instead of using Keras installed by pip.

3. What Should You Know Before Reading This Book 📖?

It is suggested that the readers have foundamental knowledges of machine/deep learning and experience of modeling using Keras or TensorFlow 1.0.

For those who have zero experience of machine/deep learning, it is strongly suggested to refer to "Deep Learning with Python" along with reading this book.

"Deep Learning with Python" is written by François Chollet, the inventor of Keras. This book is based on Keras and has no machine learning related prerequisites to the reader.

"Deep Learning with Python" is easy to understand as it uses various examples to demonstrate. No mathematical equation is in this book since it focuses on cultivating the intuitive to the deep learning.

4. Writing Style 🍉 of This Book

This is a introduction reference book which is extremely friendly to human being. The lowest goal of the authors is to avoid giving up due to the difficulties, while "Don't let the readers think" is the highest target.

This book is mainly based on the official documents of TensorFlow together with its functions.

However, the authors made a thorough restructuring and a lot optimizations on the demonstrations.

It is different from the official documents, which is disordered and contains both tutorial and guidance with lack of systematic logic, that our book redesigns the content according to the difficulties, readers' searching habits, and the architecture of TensorFlow. We now make it progressive for TensorFlow studying with a clear path, and an easy access to the corresponding examples.

In contrast to the verbose demonstrating code, the authors of this book try to minimize the length of the examples to make it easy for reading and implementation. What's more, most of the code cells can be used in your project instantaneously.

Given the level of difficulty as 9 for learning Tensorflow through official documents, it would be reduced to 3 if learning through this book.

This difference in difficulties could be demonstrated as the following figure:

5. How to Learn With This Book ⏰

(1) Study Plan

The authors wrote this book using the spare time, especially the two-month unexpected "holiday" of COVID-19. Most readers should be able to completely master all the content within 30 days.

Time required everyday would be between 30 minutes to 2 hours.

This book could also be used as library examples to consult when implementing machine learning projects with TensorFlow2.

Click the blue captions to enter the corresponding chapter.

Date Contents Difficulties Est. Time Update Status
  Chapter 1: Modeling Procedure of TensorFlow ⭐️ 0hour
Day 1 1-1 Example: Modeling Procedure for Structured Data ⭐️⭐️⭐️ 1hour
Day 2 1-2 Example: Modeling Procedure for Images ⭐️⭐️⭐️⭐️ 2hours
Day 3 1-3 Example: Modeling Procedure for Texts ⭐️⭐️⭐️⭐️⭐️ 2hours
Day 4 1-4 Example: Modeling Procedure for Temporal Sequences ⭐️⭐️⭐️⭐️⭐️ 2hours
  Chapter 2: Key Concepts of TensorFlow ⭐️ 0hour
Day 5 2-1 Data Structure of Tensor ⭐️⭐️⭐️⭐️ 1hour
Day 6 2-2 Three Types of Graph ⭐️⭐️⭐️⭐️⭐️ 2hours
Day 7 2-3 Automatic Differentiate ⭐️⭐️⭐️ 1hour
  Chapter 3: Hierarchy of TensorFlow ⭐️ 0hour
Day 8 3-1 Low-level API: Demonstration ⭐️⭐️⭐️⭐️ 1hour
Day 9 3-2 Mid-level API: Demonstration ⭐️⭐️⭐️ 1hour
Day 10 3-3 High-level API: Demonstration ⭐️⭐️⭐️ 1hour
  Chapter 4: Low-level API in TensorFlow ⭐️ 0hour
Day 11 4-1 Structural Operations of the Tensor ⭐️⭐️⭐️⭐️⭐️ 2hours
Day 12 4-2 Mathematical Operations of the Tensor ⭐️⭐️⭐️⭐️ 1hour
Day 13 4-3 Rules of Using the AutoGraph ⭐️⭐️⭐️ 0.5hour
Day 14 4-4 Mechanisms of the AutoGraph ⭐️⭐️⭐️⭐️⭐️ 2hours
Day 15 4-5 AutoGraph and tf.Module ⭐️⭐️⭐️⭐️ 1hour
  Chapter 5: Mid-level API in TensorFlow ⭐️ 0hour
Day 16 5-1 Dataset ⭐️⭐️⭐️⭐️⭐️ 2hours
Day 17 5-2 feature_column ⭐️⭐️⭐️⭐️ 1hour
Day 18 5-3 activation ⭐️⭐️⭐️ 0.5hour
Day 19 5-4 layers ⭐️⭐️⭐️ 1hour
Day 20 5-5 losses ⭐️⭐️⭐️ 1hour
Day 21 5-6 metrics ⭐️⭐️⭐️ 1hour
Day 22 5-7 optimizers ⭐️⭐️⭐️ 0.5hour
Day 23 5-8 callbacks ⭐️⭐️⭐️⭐️ 1hour
  Chapter 6: High-level API in TensorFlow ⭐️ 0hour
Day 24 6-1 Three Ways of Modeling ⭐️⭐️⭐️ 1hour
Day 25 6-2 Three Ways of Training ⭐️⭐️⭐️⭐️ 1hour
Day 26 6-3 Model Training Using Single GPU ⭐️⭐️ 0.5hour
Day 27 6-4 Model Training Using Multiple GPUs ⭐️⭐️ 0.5hour
Day 28 6-5 Model Training Using TPU ⭐️⭐️ 0.5hour
Day 29 6-6 Model Deploying Using tensorflow-serving ⭐️⭐️⭐️⭐️ 1hour
Day 30 6-7 Call Tensorflow Model Using spark-scala ⭐️⭐️⭐️⭐️⭐️ 2hours
  Epilogue: A Story Between a Foodie and Cuisine ⭐️ 0hour

(2) Software environment for studying

All the source codes are tested in jupyter. It is suggested to clone the repository to local machine and run them in jupyter for an interactive learning experience.

The authors would suggest to install jupytext that converts markdown files into ipynb, so the readers would be able to open markdown files in jupyter directly.

#For the readers in mainland China, using gitee will allow cloning with a faster speed
#!git clone https://gitee.com/Python_Ai_Road/eat_tensorflow2_in_30_days

#It is suggested to install jupytext that converts and run markdown files as ipynb.
#!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -U jupytext
    
#It is also suggested to install the latest version of TensorFlow to test the demonstrating code in this book
#!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple  -U tensorflow
import tensorflow as tf

#Note: all the codes are tested under TensorFlow 2.1
tf.print("tensorflow version:",tf.__version__)

a = tf.constant("hello")
b = tf.constant("tensorflow2")
c = tf.strings.join([a,b]," ")
tf.print(c)
tensorflow version: 2.1.0
hello tensorflow2

6. Contact and support the author 🎈🎈

If you find this book helpful and want to support the author, please give a star ⭐️ to this repository and don't forget to share it to your friends 😊

Please leave comments in the WeChat official account "算法美食屋" (Machine Learning cook house) if you want to communicate with the author about the content. The author will try best to reply given the limited time available.

image.png

30天吃掉那只 TensorFlow2

📚 gitbook电子书地址: https://lyhue1991.github.io/eat_tensorflow2_in_30_days

🚀 github项目地址:https://github.com/lyhue1991/eat_tensorflow2_in_30_days

🐳 kesci专栏地址:https://www.kesci.com/home/column/5d8ef3c3037db3002d3aa3a0

极速通道

一,TensorFlow2 🍎 or Pytorch🔥

先说结论:

如果是工程师,应该优先选TensorFlow2.

如果是学生或者研究人员,应该优先选择Pytorch.

如果时间足够,最好TensorFlow2和Pytorch都要学习掌握。

理由如下:

  • 1,在工业界最重要的是模型落地,目前国内的大部分互联网企业只支持TensorFlow模型的在线部署,不支持Pytorch。 并且工业界更加注重的是模型的高可用性,许多时候使用的都是成熟的模型架构,调试需求并不大。

  • 2,研究人员最重要的是快速迭代发表文章,需要尝试一些较新的模型架构。而Pytorch在易用性上相比TensorFlow2有一些优势,更加方便调试。 并且在2019年以来在学术界占领了大半壁江山,能够找到的相应最新研究成果更多。

  • 3,TensorFlow2和Pytorch实际上整体风格已经非常相似了,学会了其中一个,学习另外一个将比较容易。两种框架都掌握的话,能够参考的开源模型案例更多,并且可以方便地在两种框架之间切换。

二,Keras🍏 and tf.keras 🍎

先说结论:

Keras库在2.3.0版本后将不再更新,用户应该使用tf.keras。

Keras可以看成是一种深度学习框架的高阶接口规范,它帮助用户以更简洁的形式定义和训练深度学习网络。

使用pip安装的Keras库同时在tensorflow,theano,CNTK等后端基础上进行了这种高阶接口规范的实现。

而tf.keras是在TensorFlow中以TensorFlow低阶API为基础实现的这种高阶接口,它是Tensorflow的一个子模块。

tf.keras绝大部分功能和兼容多种后端的Keras库用法完全一样,但并非全部,它和TensorFlow之间的结合更为紧密。

随着谷歌对Keras的收购,Keras库2.3.0版本后也将不再进行更新,用户应当使用tf.keras而不是使用pip安装的Keras.

三,本书📖面向读者 👼

本书假定读者有一定的机器学习和深度学习基础,使用过Keras或者Tensorflow1.0或者Pytorch搭建训练过模型。

对于没有任何机器学习和深度学习基础的同学,建议在学习本书时同步参考学习《Python深度学习》一书。

《Python深度学习》这本书是Keras之父Francois Chollet所著,该书假定读者无任何机器学习知识,以Keras为工具,

使用丰富的范例示范深度学习的最佳实践,该书通俗易懂,全书没有一个数学公式,注重培养读者的深度学习直觉。

四,本书写作风格 🍉

本书是一本对人类用户极其友善的TensorFlow2.0入门工具书,不刻意恶心读者是本书的底限要求,Don't let me think是本书的最高追求。

本书主要是在参考TensorFlow官方文档和函数doc文档基础上整理写成的。

但本书在篇章结构和范例选取上做了大量的优化。

不同于官方文档混乱的篇章结构,既有教程又有指南,缺少整体的编排逻辑。

本书按照内容难易程度、读者检索习惯和TensorFlow自身的层次结构设计内容,循序渐进,层次清晰,方便按照功能查找相应范例。

不同于官方文档冗长的范例代码,本书在范例设计上尽可能简约化和结构化,增强范例易读性和通用性,大部分代码片段在实践中可即取即用。

如果说通过学习TensorFlow官方文档掌握TensorFlow2.0的难度大概是9的话,那么通过学习本书掌握TensorFlow2.0的难度应该大概是3.

谨以下图对比一下TensorFlow官方教程与本教程的差异。

五,本书学习方案 ⏰

1,学习计划

本书是作者利用工作之余和疫情放假期间大概2个月写成的,大部分读者应该在30天可以完全学会。

预计每天花费的学习时间在30分钟到2个小时之间。

当然,本书也非常适合作为TensorFlow的工具手册在工程落地时作为范例库参考。

点击学习内容蓝色标题即可进入该章节。

日期 学习内容 内容难度 预计学习时间 更新状态
  一、TensorFlow的建模流程 ⭐️ 0hour
day1 1-1,结构化数据建模流程范例 ⭐️⭐️⭐️ 1hour
day2 1-2,图片数据建模流程范例 ⭐️⭐️⭐️⭐️ 2hour
day3 1-3,文本数据建模流程范例 ⭐️⭐️⭐️⭐️⭐️ 2hour
day4 1-4,时间序列数据建模流程范例 ⭐️⭐️⭐️⭐️⭐️ 2hour
  二、TensorFlow的核心概念 ⭐️ 0hour
day5 2-1,张量数据结构 ⭐️⭐️⭐️⭐️ 1hour
day6 2-2,三种计算图 ⭐️⭐️⭐️⭐️⭐️ 2hour
day7 2-3,自动微分机制 ⭐️⭐️⭐️ 1hour
  三、TensorFlow的层次结构 ⭐️ 0hour
day8 3-1,低阶API示范 ⭐️⭐️⭐️⭐️ 1hour
day9 3-2,中阶API示范 ⭐️⭐️⭐️ 1hour
day10 3-3,高阶API示范 ⭐️⭐️⭐️ 1hour
  四、TensorFlow的低阶API ⭐️ 0hour
day11 4-1,张量的结构操作 ⭐️⭐️⭐️⭐️⭐️ 2hour
day12 4-2,张量的数学运算 ⭐️⭐️⭐️⭐️ 1hour
day13 4-3,AutoGraph的使用规范 ⭐️⭐️⭐️ 0.5hour
day14 4-4,AutoGraph的机制原理 ⭐️⭐️⭐️⭐️⭐️ 2hour
day15 4-5,AutoGraph和tf.Module ⭐️⭐️⭐️⭐️ 1hour
  五、TensorFlow的中阶API ⭐️ 0hour
day16 5-1,数据管道Dataset ⭐️⭐️⭐️⭐️⭐️ 2hour
day17 5-2,特征列feature_column ⭐️⭐️⭐️⭐️ 1hour
day18 5-3,激活函数activation ⭐️⭐️⭐️ 0.5hour
day19 5-4,模型层layers ⭐️⭐️⭐️ 1hour
day20 5-5,损失函数losses ⭐️⭐️⭐️ 1hour
day21 5-6,评估指标metrics ⭐️⭐️⭐️ 1hour
day22 5-7,优化器optimizers ⭐️⭐️⭐️ 0.5hour
day23 5-8,回调函数callbacks ⭐️⭐️⭐️⭐️ 1hour
  六、TensorFlow的高阶API ⭐️ 0hour
day24 6-1,构建模型的3种方法 ⭐️⭐️⭐️ 1hour
day25 6-2,训练模型的3种方法 ⭐️⭐️⭐️⭐️ 1hour
day26 6-3,使用单GPU训练模型 ⭐️⭐️ 0.5hour
day27 6-4,使用多GPU训练模型 ⭐️⭐️ 0.5hour
day28 6-5,使用TPU训练模型 ⭐️⭐️ 0.5hour
day29 6-6,使用tensorflow-serving部署模型 ⭐️⭐️⭐️⭐️ 1hour
day30 6-7,使用spark-scala调用tensorflow模型 ⭐️⭐️⭐️⭐️⭐️ 2hour
  后记:一个吃货和一道菜的故事 ⭐️ 0hour

2,学习环境

本书全部源码在jupyter中编写测试通过,建议通过git克隆到本地,并在jupyter中交互式运行学习。

为了直接能够在jupyter中打开markdown文件,建议安装jupytext,将markdown转换成ipynb文件。

此外,本项目也与和鲸社区达成了合作,可以在和鲸专栏fork本项目,并直接在云笔记本上运行代码,避免环境配置痛苦。

🐳和鲸专栏地址:https://www.kesci.com/home/column/5d8ef3c3037db3002d3aa3a0

#克隆本书源码到本地,使用码云镜像仓库国内下载速度更快
#!git clone https://gitee.com/Python_Ai_Road/eat_tensorflow2_in_30_days

#建议在jupyter notebook 上安装jupytext,以便能够将本书各章节markdown文件视作ipynb文件运行
#!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -U jupytext
    
#建议在jupyter notebook 上安装最新版本tensorflow 测试本书中的代码
#!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple  -U tensorflow
import tensorflow as tf

#注:本书全部代码在tensorflow 2.1版本测试通过
tf.print("tensorflow version:",tf.__version__)

a = tf.constant("hello")
b = tf.constant("tensorflow2")
c = tf.strings.join([a,b]," ")
tf.print(c)
tensorflow version: 2.1.0
hello tensorflow2

六,鼓励和联系作者 🎈🎈

如果本书对你有所帮助,想鼓励一下作者,记得给本项目加一颗星星star⭐️,并分享给你的朋友们喔😊!

如果对本书内容理解上有需要进一步和作者交流的地方,欢迎在公众号"算法美食屋"下留言。作者时间和精力有限,会酌情予以回复。

也可以在公众号后台回复关键字:加群,加入读者交流群和大家讨论。

image.png

eat_tensorflow2_in_30_days's People

Contributors

lyhue1991 avatar nbwuzhe avatar neilteng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eat_tensorflow2_in_30_days's Issues

关于input_shape的问题

在1-1结构化数据建模流程范例,为什么input_shape=(15,),而不是x_train.shape 即input_shape=(891,15)

3-1 低阶API示范 构建数据管道迭代器

3-1 低阶API示范 构建数据管道迭代器data_iter(features, labels, batch_size=8)函数中,
yield tf.gather(X,indexs), tf.gather(Y,indexs)
是不是该写成
tf.gather(features,indexs), tf.gather(labels,indexs)

数据集

时间序列数据集好像没有啊

spark-scala调用tensorflow2.0 模型会报错

有个疑问,原生SavedModelBundle 、Session 类并没有实现serializable 接口,直接

val broads = sc.broadcast(bundle)
会报
Serialization stack: - object not serializable (class: org.tensorflow.SavedModelBundle, value: org.tensorflow.SavedModelBundle@6a1ebcff)

的异常,自己要修改原码增加 serializable 接口,要改不少代码,文中是如何做到这点的呢?

5-4 无法运行

在创建了 Linear 类以后,第一次实例化这个类的时候 linear = Linear(units=8),系统报错。反复与原始代码比较,没发现不同的地方。

class Linear(layers.Layer):
    def __init__(self, units=32, **kwargs):
        super(Linear, self).__init__(**kwargs)
        self.units = units
        
    def build(self, input_shape):
        self.w = self.add_weight('w', shape=(input_shape[-1], self.units),
                                initializer='random_normal',
                                trainable=True)
        self.b = self.add_weight('b', shape=(self.units,),
                                initializer='random_normal',
                                trainable=True)
        super(Linear, self).build(input_shape)
        
    @tf.function
    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
    
    def get_config(self):
        config = super(Linear, self).get_config()
        config.update({'units':self.units})
        return config
linear = Linear(units=8)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-efa0d9cd5402> in <module>
----> 1 linear = Linear(units=8)
      2 print(linear.built)
      3 linear.build(input_shape=(None, 16))
      4 print(linear.built)

TypeError: __call__() missing 1 required positional argument: 'inputs'

谢谢!

3-3,高阶API示范 DNN模型的示例跟中阶API一样

按照高阶API的线性回归模型中的示范,建模过程应该是:
(1)通过 models.Sequential()的方式构建模型
(2)add()添加网络层
(3)定义loss, metric, optimizer
(4)通过compile()的方式配置模型训练中的各种参数
(5)model.fit()的方式训练模型

但是在高阶API示例的DNN模型的建模中,使用的是build()的方式,整体框架基本跟中阶API示例中DNN模型的建模过程一模一样,感觉这个是不是给错了示例了。

子类化构建模型

你好,请问我在用子类化方法构建模型的时候,想将该模型嵌入另一个子类化模型中,
第一种方式是将嵌入的模型写成继承Layer类的方法,然后重写get_config 方法,
第二种方式是将嵌入的模型写成继承Model类的方法重写compute_output_shape方法。

请问这两种方法效果是否是一样的?或者有什么区别?

3-2 中阶API train_model提示 Internal: No unary variant device copy function found for direction...

这章的实例code好像在最新的tensorflow下不能用会遇到

Traceback (most recent call last):
  File "demo.py", line 40, in <module>
    train_model(model,epochs = 200)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 608, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 678, in _call
    return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal:  No unary variant device copy function found for direction: 1 and Variant type_index: tensorflow::data::(anonymous namespace)::DatasetVariantWrapper
         [[{{node while_input_5/_12}}]]
         [[Func/while/body/_1/while/cond/then/_78/input/_91/_52]]
  (1) Internal:  No unary variant device copy function found for direction: 1 and Variant type_index: tensorflow::data::(anonymous namespace)::DatasetVariantWrapper
         [[{{node while_input_5/_12}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_model_342]

Function call stack:
train_model -> train_model

我把示例code中的visualization的部分都去掉以便于重现这个问题:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers,losses,metrics,optimizers

n = 400

X = tf.random.uniform([n,2],minval=-10,maxval=10)
w0 = tf.constant([[2.0],[-3.0]])
b0 = tf.constant([[3.0]])
Y = X@w0 + b0 + tf.random.normal([n,1],mean = 0.0,stddev= 2.0)

ds = tf.data.Dataset.from_tensor_slices((X,Y)) \
     .shuffle(buffer_size = 100).batch(10) \
     .prefetch(tf.data.experimental.AUTOTUNE)

model = layers.Dense(units = 1)
model.build(input_shape = (2,))
model.loss_func = losses.mean_squared_error
model.optimizer = optimizers.SGD(learning_rate=0.001)

@tf.function
def train_step(model, features, labels):
    with tf.GradientTape() as tape:
        predictions = model(features)
        loss = model.loss_func(tf.reshape(labels,[-1]), tf.reshape(predictions,[-1]))
    grads = tape.gradient(loss,model.variables)
    model.optimizer.apply_gradients(zip(grads,model.variables))
    return loss

@tf.function
def train_model(model,epochs):
    for epoch in tf.range(1,epochs+1):
        loss = tf.constant(0.0)
        for features, labels in ds:
            loss = train_step(model,features,labels)
        if epoch%50==0:
            tf.print("epoch =",epoch,"loss = ",loss)
            tf.print("w =",model.variables[0])
            tf.print("b =",model.variables[1])
train_model(model,epochs = 200)

问题应该是出现再train_model这个function里。如果把train_model上的@tf.function去掉,则没有问题。难道原因是不能在tf function里操作tf.dataset?

我使用的是tensorflow的nightly build。谢谢

1-2,图片数据建模流程范例

`#使用并行化预处理num_parallel_calls 和预存数据prefetch来提升性能
ds_train = tf.data.Dataset.list_files("./data/cifar2/train//.jpg")
.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
.shuffle(buffer_size = 1000).batch(BATCH_SIZE)
.prefetch(tf.data.experimental.AUTOTUNE)

ds_test = tf.data.Dataset.list_files("./data/cifar2/test//.jpg")
.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
.batch(BATCH_SIZE)
.prefetch(tf.data.experimental.AUTOTUNE) `
我经过处理后打印标签都相同,不知何处问题?

3-2在GPU上运行会报错,查了一下发现有人在CPU下可以运行

会报错如下
(0) Internal: No unary variant device copy function found for direction: 1 and Variant type_index: class tensorflow::data::`anonymous namespace'::DatasetVariantWrapper [[{{node while_input_4/_12}}]] (1) Internal: No unary variant device copy function found for direction: 1 and Variant type_index: class tensorflow::data::`anonymous namespace'::DatasetVariantWrapper [[{{node while_input_4/_12}}]] [[Func/while/body/_1/input/_60/_20]]

1-1结构化数据建模流程规范的问题

在1-1章中, 作者使用到的y_test = dftest_raw['Survived'].values,其中dftest_raw是没有Survived这一列的, 这个时候会报错。

不知道作者使用的test data是官方的test data,还是从train data中分割一部分出来成为test data呢? 谢谢!

在6-3章节我遇到有一个小错误

6-3里面有一句gpus = tf.config.list_physical_devices("GPU")
我在运行之后会报错,module 'tensorflow_core._api.v2.config' has no attribute 'list_physical_devices'
我改为tf.config.experimental.list_physical_devices("GPU")解决了,不知道其他人遇没遇见,建议可以修改一下。

自定义评估函数

`@tf.function
def update_state(self,y_true,y_pred):
y_true = tf.cast(tf.reshape(y_true,(-1,)),tf.bool)
y_pred = tf.cast(100*tf.reshape(y_pred,(-1,)),tf.int32)

    for i in tf.range(0,tf.shape(y_true)[0]):
        if y_true[i]:
            self.true_positives[y_pred[i]].assign(
                self.true_positives[y_pred[i]]+1.0)
        else:
            self.false_positives[y_pred[i]].assign(
                self.false_positives[y_pred[i]]+1.0)
    return (self.true_positives,self.false_positives)`

在2.1中 输入应该添加 sample_weight=None, 而且返回值只能选择一个

windows log路径问题

如果在windows上使用绝对路径时,需要写成类似
logdir = 'C:\xx\autograph\%s' %stamp

5.5的损失函数有误

def focal_loss(gamma=2., alpha=.25):
    
    def focal_loss_fixed(y_true, y_pred):
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
        pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
        loss = -tf.sum(alpha * tf.pow(1. - pt_1, gamma) * tf.log(1e-07+pt_1)) \
           -tf.sum((1-alpha) * tf.pow( pt_0, gamma) * tf.log(1. - pt_0 + 1e-07))
        return loss
    return focal_loss_fixed

提示 AttributeError: module 'tensorflow' has no attribute 'sum', 猜测应该更正为:

def focal_loss(gamma=2., alpha=.25):
    
    def focal_loss_fixed(y_true, y_pred):
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
        pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
        loss = -tf.reduce_sum(alpha * tf.pow(1. - pt_1, gamma) * tf.math.log(1e-07+pt_1)) \
           -tf.reduce_sum((1-alpha) * tf.pow( pt_0, gamma) * tf.math.log(1. - pt_0 + 1e-07))
        return loss
    return focal_loss_fixed

estimator是否考虑加入书籍?

estimator是tf从1到2一直延续的重要api,层级上来看应该属于高阶api,可以直接定义model。
是否考虑把这一部分加入书籍呢?
为什么把这部分丢弃,是出于什么考虑呢

Some Suggestions for '1-3' Maybe

Excuse me. QAQ But I hope to get suggestions!


Where the issue happens

Chapter 1-3,文本数据建模流程范例

# 构建词典
def clean_text(text):
    ...
    tf.strings.regex_replace(stripped_html,
         '[%s]' % re.escape(string.punctuation),'')

Issue Detail

In re.escape(string.punctuation),'', should '' be this->' ' ?
Otherwise, we'll get "himbut" from "him,but".
Additionally, I'm considering we should remove "'" from string.punctuation.
Otherwise, we'll get "It's a good" from "it s a good".

My Edition for These Codes

def clean_text(text):
    # A string include all punctuations which has been escaped by re.
    # Use '\\' for escape of metacharacters.
    escaped_punctuation = re.escape(string.punctuation.replace("'", ""))
    lowercase = tf.strings.lower(text)
    stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
    cleaned_punctuation = tf.strings.regex_replace(stripped_html,
                                                   '[%s]' % escaped_punctuation, ' ')

    return cleaned_punctuation

使用继承Model基类构建自定义模型的模型加载问题

模型保存

model.save('./data/tf_model_savedmodel', save_format="tf")

经测试,只能以这种方式保存,不能保存成keras的h5形式

模型加载

model_loaded = tf.keras.models.load_model('./data/tf_model_savedmodel')

error

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * Tensor("x:0", shape=(None, 200), dtype=int32)
    * Tensor("training:0", shape=(), dtype=bool)
  Keyword arguments: {}

Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='input_1')
    * True
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='x')
    * False
  Keyword arguments: {}

Option 3:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='x')
    * True
  Keyword arguments: {}

Option 4:
  Positional arguments (2 total):
    * TensorSpec(shape=(None, 200), dtype=tf.int32, name='input_1')
    * False
  Keyword arguments: {}

成功加载

load_model = tf.saved_model.load('./data/saved_model')

但是这样加载的模型没有编译,无法直接使用model.xxx方法

目前解决方法

以tensorflow serving的docker形式部署saved_model 格式的模型

3-3,高阶API示范 划分数据集有一点错误

如果从二,DNN二分类模型这部分开始运行代码
运行到这里

ds_train = tf.data.Dataset.from_tensor_slices((X[0:n*3//4,:],Y[0:n*3//4,:])) \
     .shuffle(buffer_size = 1000).batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

ds_valid = tf.data.Dataset.from_tensor_slices((X[n*3//4:,:],Y[n*3//4:,:])) \
     .batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

会出现NameError: name 'n' is not defined这个错误,我感觉您的意思训练集是总数据的75%,测试集是总数据的25%。
所以我建议改成

n = n_positive+n_negative
ds_train = tf.data.Dataset.from_tensor_slices((X[0:n*3//4,:],Y[0:n*3//4,:])) \
     .shuffle(buffer_size = 1000).batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

ds_valid = tf.data.Dataset.from_tensor_slices((X[n*3//4:,:],Y[n*3//4:,:])) \
     .batch(20) \
     .prefetch(tf.data.experimental.AUTOTUNE) \
     .cache()

`

@符号增加正态扰动的含义?

在3-1低阶API示范中准备数据的时候有一条注释是:

@表示矩阵乘法,增加正态扰动

具体位置在3-1低阶API示范的“一、线性回归模型”的“1、准备数据”的第一段程序片的最后一行,已附上图片不知道能不能显示
20200517115005

而在tensorflow的API(matmul )中却这样写道:
Since python >= 3.5 the @ operator is supported (see PEP 465). In TensorFlow, it simply calls the tf.matmul() function, so the following lines are equivalent:

d = a @ b @ [[10], [11]]
d = tf.matmul(tf.matmul(a, b), [[10], [11]])

在网上找了一圈也没找到关于“矩阵相乘增加正态扰动”等之类的资料,请问增加正态扰动的含义是什么呢或者说是在什么地方用到呢?是与变量X = tf.random.uniform([n,2],minval=-10,maxval=10)此处的random有关吗还是其他?谢谢!!

Suggest a virtual environment.

I suggest a virtual environment for this tutorial

For one thing, it decouples the change in new relase of tf and the development environment we use. it saves authors' effort to answer tf version related problem and delegate them back to tf developers.
And it also saves readers effort to figure out missing package. e.g. When I run the 5-1, it tells me I miss the package pillow which is not explicitly imported.

Best
Neil

请问作者考虑写一下tft data pipeline内容吗?

tf2 加入了tfx的扩展支持,其中tfr我觉得是最可能在工程中用到。请问作者考虑加入这部分的教程?

典型的一个场景是:
里面对apache beam的整合,可以让我们将线下训练和线上serving的data pipeline统一起来。这样子我们的model只要消费pipeline给的数据就好了。

day 1 训练模型报错

history = model.fit(x_train,y_train,batch_size= 64,epochs= 30, validation_split=0.2)报错:

validation_split is only supported for Tensors or NumPy arrays, found following types in the input: [<class 'pandas.core.frame.DataFrame'>]

1-3 TensorFlow 运行报错

1-1可以正常运行,但是1-3就会报错

软件版本:
Ubuntu18.04
CUDA: 10.0
CuDNN: 7.6.5
TensorFlow-gpu: 2.1.0

报错信息:

2020-04-29 10:23:08.233741: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-04-29 10:23:08.233797: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-04-29 10:23:08.233803: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-04-29 10:23:08.758262: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-29 10:23:08.764938: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.765330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-29 10:23:08.765483: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-29 10:23:08.766542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-29 10:23:08.767329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-29 10:23:08.767509: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-29 10:23:08.768612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-29 10:23:08.769447: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-29 10:23:08.771963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-29 10:23:08.772061: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.772396: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.772661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-29 10:23:08.772903: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-29 10:23:08.797181: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-04-29 10:23:08.797494: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c37cbd2a0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-29 10:23:08.797521: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-29 10:23:08.870114: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.870464: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x555c3850a540 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-29 10:23:08.870477: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2020-04-29 10:23:08.870591: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.870863: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-04-29 10:23:08.870889: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-29 10:23:08.870899: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-29 10:23:08.870908: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-29 10:23:08.870916: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-29 10:23:08.870925: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-29 10:23:08.870933: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-29 10:23:08.870941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-29 10:23:08.870976: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.871252: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.871502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-29 10:23:08.871523: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-29 10:23:08.872180: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-29 10:23:08.872188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-04-29 10:23:08.872192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N 
2020-04-29 10:23:08.872253: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.872536: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-29 10:23:08.872803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6900 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
[b'the', b'and', b'a', b'of', b'to', b'is', b'in', b'it', b'i', b'this', b'that', b'was', b'as', b'for', b'with', b'movie', b'but', b'film', b'on', b'not', b'you', b'his', b'are', b'have', b'be', b'he', b'one', b'its', b'at', b'all', b'by', b'an', b'they', b'from', b'who', b'so', b'like', b'her', b'just', b'or', b'about', b'has', b'if', b'out', b'some', b'there', b'what', b'good', b'more', b'when', b'very', b'she', b'even', b'my', b'no', b'would', b'up', b'time', b'only', b'which', b'story', b'really', b'their', b'were', b'had', b'see', b'can', b'me', b'than', b'we', b'much', b'well', b'get', b'been', b'will', b'into', b'people', b'also', b'other', b'do', b'bad', b'because', b'great', b'first', b'how', b'him', b'most', b'dont', b'made', b'then', b'them', b'films', b'movies', b'way', b'make', b'could', b'too', b'any', b'after', b'characters']
Model: "cnn_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        multiple                  70000     
_________________________________________________________________
conv_1 (Conv1D)              multiple                  576       
_________________________________________________________________
maxpool_1 (MaxPooling1D)     multiple                  0         
_________________________________________________________________
conv_2 (Conv1D)              multiple                  4224      
_________________________________________________________________
maxpool_2 (MaxPooling1D)     multiple                  0         
_________________________________________________________________
flatten (Flatten)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  6145      
=================================================================
Total params: 80,945
Trainable params: 80,945
Non-trainable params: 0
_________________________________________________________________
2020-04-29 10:23:12.802249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-29 10:23:12.968219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-29 10:23:13.365572: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-29 10:23:13.378753: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-04-29 10:23:13.378835: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node cnn_model/conv_1/conv1d}}]]
	 [[Nadam/ReadVariableOp_3/_20]]
2020-04-29 10:23:13.378885: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[{{node cnn_model/conv_1/conv1d}}]]
Traceback (most recent call last):
  File "/home/huxiaoyang/PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py", line 170, in <module>
    main()
  File "/home/huxiaoyang/PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py", line 166, in main
    train_model(model, ds_train, ds_test, epochs=6)
  File "/home/huxiaoyang/PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py", line 148, in train_model
    train_step(model, features, labels)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 2363, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
    self.captured_inputs)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 545, in call
    ctx=ctx)
  File "/home/huxiaoyang/miniconda3/envs/tf210/lib/python3.7/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node cnn_model/conv_1/conv1d (defined at /PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py:71) ]]
	 [[Nadam/ReadVariableOp_3/_20]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node cnn_model/conv_1/conv1d (defined at /PycharmProjects/eat_tensorflow2_in_30_days/1-3_text_data_modeling_process_example/example.py:71) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_step_4773]

Function call stack:
train_step -> train_step


Process finished with exit code 1

1-2 load_image函数打标签错误

![image](https://user-images.githubusercontent.com/55381998/79407314-c5b0ac00-7fcb-11ea-8546-54e90495fbf1.png
标签全为0,导致后续训练正确率均为1.
Train for 100 steps, validate for 20 steps
Epoch 1/10
100/100 [==============================] - 16s 162ms/step - loss: 0.0116 - accuracy: 0.9904 - val_loss: 1.2626e-09 - val_accuracy: 1.0000
Epoch 2/10
100/100 [==============================] - 11s 106ms/step - loss: 5.7853e-09 - accuracy: 1.0000 - val_loss: 1.2602e-09 - val_accuracy: 1.0000
Epoch 3/10
100/100 [==============================] - 11s 105ms/step - loss: 5.7422e-09 - accuracy: 1.0000 - val_loss: 1.2595e-09 - val_accuracy: 1.0000
...

1-3的Valid Loss为什么在上升?

源文档中:
Epoch=1,Loss:0.442317516,Accuracy:0.7695,Valid Loss:0.323672801,Valid Accuracy:0.8614
Epoch=2,Loss:0.245737702,Accuracy:0.90215,Valid Loss:0.356488883,Valid Accuracy:0.8554
Epoch=3,Loss:0.17360799,Accuracy:0.93455,Valid Loss:0.361132562,Valid Accuracy:0.8674
Epoch=4,Loss:0.113476314,Accuracy:0.95975,Valid Loss:0.483677238,Valid Accuracy:0.856
Epoch=5,Loss:0.0698405355,Accuracy:0.9768,Valid Loss:0.607856631,Valid Accuracy:0.857
Epoch=6,Loss:0.0366807655,Accuracy:0.98825,Valid Loss:0.745884955,Valid Accuracy:0.854

我复现后:
Epoch=1,Loss:0.679053724,Accuracy:0.55235,Valid Loss:0.572207093,Valid Accuracy:0.717
Epoch=2,Loss:0.467248648,Accuracy:0.7762,Valid Loss:0.491477,Valid Accuracy:0.7588
Epoch=3,Loss:0.349681437,Accuracy:0.8475,Valid Loss:0.514342368,Valid Accuracy:0.7628
Epoch=4,Loss:0.278649092,Accuracy:0.8863,Valid Loss:0.564446032,Valid Accuracy:0.763
Epoch=5,Loss:0.2197005,Accuracy:0.9159,Valid Loss:0.643948495,Valid Accuracy:0.7548
Epoch=6,Loss:0.163983703,Accuracy:0.94135,Valid Loss:0.770707726,Valid Accuracy:0.7524

可以看到Valid Loss在逐渐上升

1-2预处理num_parallel_calls出现问题

Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

文件路径表示问题

由于Windows和类Unix系统对于路径表示有差异,所以示例代码需要考虑兼容性才能在不同系统成功运行。以“1-2,图片数据建模流程范例”为例子,其中tf.strings.regex_full_match(img_path, "./automobile/.")就需要改为tf.strings.regex_full_match(img_path, ".automobile."),以及logdir = "./data/keras_model/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")也需要改为用os.path.join()函数连接的形式而不是硬编码。

tf serving预测会有错误

tf serving预测会有错误帮忙看下

{ "error": "Malformed request: POST /v1/models/linear_model" }{ "error": "In[0] is not a matrix. Instead it has shape [3]\n\t [[{{node model/outputs/BiasAdd}}]]" }%

Eat 第一天想使用继承Model基类构建自定义模型

class Model(models.Model):
def init(self):
super(Model, self).init()

def build(self, input_shape):

    self.dense = tf.keras.layers.Dense(15, 20)
    self.dense = tf.keras.layers.Dense(20, 10)
    self.dense = tf.keras.layers.Dense(10, 1)
    super(Model, self).build(input_shape)

def call(self, x):
    x = self.dense(x)
    x = tf.nn.relu(x)
    x = self.dense(x)
    x = tf.nn.relu(x)
    x = self.dense(x)
    x = tf.nn.sigmoid(x)

    return (x)

model = Model()
print(model)
model.build(input_shape=(15,))
model.summary()

报错 :TypeError: Could not interpret activation function identifier: 20

why can we use parameter "input_shape = (2,)" which undefined in __init__?

In 4-3, it is not a bug but just a question that I dont understand. Why we can do this model.add(Linear(units = 1,input_shape = (2,))) without this parameter in init method "input_shape = (2,)"

class Linear(layers.Layer):
    def __init__(self, units=32, **kwargs):
#         super(Linear, self).__init__(**kwargs)
        super().__init__(**kwargs)
        self.units = units
    
    # The trainable parameters are defined in build method
    # Since we do not need the input_shape except the build function,
    # we do not need to store then in the __init__ function
    def build(self, input_shape): 
        self.w = self.add_weight("w",shape=(input_shape[-1], self.units),
                                 initializer='random_normal',
                                 trainable=True) # Parameter named "w" is compulsory or an error will be thrown out
        self.b = self.add_weight("b",shape=(self.units,),
                                 initializer='random_normal',
                                 trainable=True)
        super().build(input_shape) # Identical to self.built = True

    # The logic of forward propagation is defined in call method, and is called by __call__ method
    @tf.function
    def call(self, inputs): 
        return tf.matmul(inputs, self.w) + self.b
    
    # Use customized get-config method to save the model as h5 format, specifically for the model composed through Functional API with customized Layer
    def get_config(self):  
        config = super().get_config()
        config.update({'units': self.units})
        return config

tf.keras.backend.clear_session()

model = models.Sequential()
# Note: the input_shape here will be modified by the model, so we don't have to fill None in the dimension representing the number of samples.
model.add(Linear(units = 1,input_shape = (2,)))  
print("model.input_shape: ",model.input_shape)
print("model.output_shape: ",model.output_shape)
model.summary()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.