Light

harryzhangog / deep-rl-notes Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 188.0 13.92 MB

A collection of comprehensive notes on Deep Reinforcement Learning, customized for UC Berkeley's CS 285 (prev. CS 294-112)

TeX 100.00%

deep-rl-notes's People

Contributors

Stargazers

Watchers

Forkers

nareshram256 yfzheng11 peterzhucs fx196 colinqiyangli botkevin wanghuimu satpreetsingh zhugeyicixin obsidian6s xupercoin n0wwa awekling spicyguml tufo830 mistyr0se iam20cm maigone wensiyuansix monsterdove minisoco lycokie billionerd staccats jbluv s8xy coder-drinker zaku-zaku herpacker d3p10y 0x8235 masemxiao closegoingaway tutuna hisstar moguijoe vamoko cerviny e-kiss-me hay-man molierflower lifelongyuan fskeo nicbair luluchou paramedick zumablue windb3ll hs991023 farmingtong jtt1998 remasterd cha001 rasputin02 urunicorn poorlet ntt720 piapplepi kudawa sreyao nanpusher excelisa ymzhang96 w90o0u xiao2duan twacoco paoyes luozhe023 nicolesherwood qugou1350636 jinyi-sama aimogmog commachan nap1ch kamifr raymusk ai2047 zshpro bartslab leonz87 tqcheung err-nil skillcampalan yetaye stlkoch alexyiy gluct wongli233 lt6253090 sparkcus xuyu67 quantumira hui13579246 reikolo xigua369 zeozez fuanfree coolume halfloat poyexe

deep-rl-notes's Issues

Typo in 2.3.1

Hi Harry,

Thanks for taking these notes! They are clear and very helpful. I found a typo in 2.3.1, which should be $\pi(a_t | o_1, ..., o_t)$ rather than $\pi(a_t | o+1, ..., o_t)$

It should be a easy fix in L20 in https://github.com/harryzhangOG/Deep-RL-Notes/blob/master/imitation.tex

Thanks a lot!

Typo in Chapter 10

Hi Harry, great note! Just found some small typos in chapter 10:

1. The sigma (page 60)

$p(x|z) = \mathcal{N}(\mu_{nn}(z),\mu_{nn}(z))$

And the correct should be this:

$p(x|z) = \mathcal{N}(\mu_{nn}(z),\sigma_{nn}(z))$

2. The theta display (page 60)

[
theta\leftarrow \argmaxA_\theta\frac{1}{N}\sum_i\mathbb{E}{z\sim p(z|x_i)}\log p\theta(x_i)
]

typo

Hi, thanks for your great notes.
I just found a typo in Chapter 10.1.1. :
The KL-divergence $D_{KL}(q_i(x_i))||(p(z|x_i)$.
And the correct should be this: $D_{KL}(q_i(z)||p(z|x_i))$?

Some typos.

Hi,

First of all, thanks so much for the notes! They are extremely useful.

I just wanted to point out a few typos:

Last equation of Ch.7 (step size): missing J(\theta) after last grad on the denominator. Link
Algorithm 14 (DDPG), page 34: line 4 should not have a \max_a in front of Q (since we are finding the best state-action function by training a NN to learn the best action \mu_{\theta}). Also, although this is not a typo per se, I think after the summation in line 6 there should be a \frac{d \mu_{\theta}}{d \theta} rather that \frac{d a}}{d \theta}, as that "a" in that derivative is the output from the NN above. Link
Algorithm 5 (Online AC), page 22: line 5 does not have a \sum_i as we are updating one value at a time. Link

I hope this helps! Thanks!
Javier.

3.7.1 typo

The first equation seems to have its first parenthesis in the wrong place: https://youtu.be/VgdSubQN35g?list=PL_iWQOsE6TfU9DwANRsUZf0YUUJS_ySr0&t=152

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.