xiangyunhuang / data-analysis-in-action Goto Github PK
View Code? Open in Web Editor NEW:book: R 语言数据分析实战(写作中) Data Analysis in Action Using R
Home Page: https://bookdown.org/xiangyun/data-analysis-in-action/
:book: R 语言数据分析实战(写作中) Data Analysis in Action Using R
Home Page: https://bookdown.org/xiangyun/data-analysis-in-action/
Quarto 1.4 即将发布了,因此,尝试一下预览版,发现编译出现问题,如下:
Error running filter /opt/quarto/quarto-1.4.369/share/filters/main.lua:
/opt/quarto/quarto-1.4.369/share/filters/main.lua:4305: attempt to index a nil value (field 'content')
stack traceback:
/opt/quarto/quarto-1.4.369/share/filters/main.lua:226: in function </opt/quarto/quarto-1.4.369/share/filters/main.lua:216>
(...tail calls...)
[C]: in ?
[C]: in method 'walk'
/opt/quarto/quarto-1.4.369/share/filters/main.lua:148: in function </opt/quarto/quarto-1.4.369/share/filters/main.lua:138>
(...tail calls...)
/opt/quarto/quarto-1.4.369/share/filters/main.lua:725: in local 'callback'
/opt/quarto/quarto-1.4.369/share/filters/main.lua:738: in upvalue 'run_emulated_filter_chain'
/opt/quarto/quarto-1.4.369/share/filters/main.lua:773: in function </opt/quarto/quarto-1.4.369/share/filters/main.lua:770>
stack traceback:
/opt/quarto/quarto-1.4.369/share/filters/main.lua:148: in function </opt/quarto/quarto-1.4.369/share/filters/main.lua:138>
(...tail calls...)
/opt/quarto/quarto-1.4.369/share/filters/main.lua:725: in local 'callback'
/opt/quarto/quarto-1.4.369/share/filters/main.lua:738: in upvalue 'run_emulated_filter_chain'
/opt/quarto/quarto-1.4.369/share/filters/main.lua:773: in function </opt/quarto/quarto-1.4.369/share/filters/main.lua:770>
Error: Process completed with exit code 1.
看来升级不简单。
在可重复、代码优先的要求下,希望 MacOS 、基于 Rocker 的 Ubuntu 容器和 Github Action (Ubuntu 环境)都可以顺利编译出四种书籍格式。可见,书籍的测试工作是非常复杂的。下面以交互图形的截图操作为例
在 MacOS 系统中,如下代码可以非常轻松的截图
---
title: "Untitled"
output:
bookdown::epub_book:
toc: yes
epub_version: "epub3"
date: "2022-11-26"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## 交互图形
```{r}
#| label: echarts4r
#| fig-cap: "交互图形"
#| eval: !expr knitr::is_html_output()
library(echarts4r)
mtcars |>
e_charts(qsec) |>
e_line(mpg)
```
但是,在 Docker 容器(来自 Rocker 项目的 Ubuntu 系统)报出如下错误,且无法按照指示安装软件,(系统已安装 chromium-browser,不知为何还要 snap install chromium)
Quitting from lines 18-26 (Untitled.Rmd)
Error in launch_chrome(path, args) : Failed to start chrome. Error:
Command '/usr/bin/chromium-browser' requires the chromium snap to be installed.
Please install it with:
snap install chromium
Calls: <Anonymous> ... initialize -> <Anonymous> -> initialize -> launch_chrome
Execution halted
但是,在 Github Action 中输出的 EPUB 格式书籍中的截图是黑屏的。
目前已知
R Markdown 食谱介绍 webshot 的使用
https://bookdown.org/yihui/rmarkdown-cookbook/html-widgets.html
knitr 1.32 开始默认使用 webshot2 包来截图,用户可以指定 webshot 来截图
https://github.com/yihui/knitr/blob/master/NEWS.md#changes-in-knitr-version-132
Github Action 环境 https://github.com/actions/runner-images 默认安装 PhantomJS 2.1.1 各大浏览器都已安装
总结一下,在 Ubuntu 系统上:
webshot 需要系统安装 phantomjs
apt-get install phantomjs
webshot2 需要系统安装 chromium
snap install chromium
若想在所有系统环境中都运行,尝试 webshot 而不是 webshot2,发现 R Markdown Cookbook 也是使用 webshot
在 Rocker 中使用 webshot 截图,报出新的问题
Error in (function (url = NULL, file = "webshot.png", vwidth = 992, vheight = 744, :
webshot.js returned failure value: -6
Calls: <Anonymous> ... html_screenshot -> in_dir -> do.call -> <Anonymous>
Execution halted
> quarto::quarto_publish_site(
name = "data-analysis-in-action", render = "none",
server = "bookdown.org",
account = "xiangyun",
title = "Data Analysis in Action"
)
Preparing to deploy site...Error: HTTP 404
GET https://bookdown.org/applications?filter=account_id:103&filter=name:data-analysis-in-action&count=100&offset=0
404 page not found
> extSoftVersion()
zlib bzlib
"1.2.13" "1.0.8, 13-Jul-2019"
xz PCRE
"5.4.1" "10.42 2022-12-11"
ICU TRE
"72.1" "TRE 0.8.0 (BSD)"
iconv readline
"glibc 2.37" "8.2"
BLAS
"FlexiBLAS OPENBLAS-OPENMP"
> La_library()
[1] "FlexiBLAS OPENBLAS-OPENMP"
> La_version()
[1] "3.11.0"
> capabilities()
jpeg png tiff tcltk X11 aqua http/ftp
TRUE TRUE TRUE TRUE FALSE FALSE TRUE
sockets libxml fifo cledit iconv NLS Rprof
TRUE FALSE TRUE TRUE TRUE TRUE TRUE
profmem cairo ICU long.double libcurl
TRUE TRUE TRUE TRUE TRUE
> l10n_info()
$MBCS
[1] TRUE
$`UTF-8`
[1] TRUE
$`Latin-1`
[1] FALSE
$codeset
[1] "UTF-8"
> libcurlVersion()
[1] "8.0.1"
attr(,"ssl_version")
[1] "OpenSSL/3.0.8"
attr(,"libssh_version")
[1] "libssh/0.10.4/openssl/zlib"
attr(,"protocols")
[1] "dict" "file" "ftp" "ftps" "gopher" "gophers" "http"
[8] "https" "imap" "imaps" "ldap" "ldaps" "mqtt" "pop3"
[15] "pop3s" "rtsp" "scp" "sftp" "smb" "smbs" "smtp"
[22] "smtps" "telnet" "tftp"
> grSoftVersion()
cairo cairoFT pango
"1.17.8" "" "1.50.14"
libpng jpeg libtiff
"1.6.37" "6.2" "LIBTIFF, Version 4.4.0"
> pcre_config()
UTF-8 Unicode properties JIT stack
TRUE TRUE TRUE FALSE
>
尽量选用一些社会、经济、文化、历史方面的数据,有具体背景,可以考虑一些国家的统计局、政府组织发布的数据,真实的数据,具体的场景,给读者在学习技术的同时,也能了解社会、经济等真实现状,有多少数据讲多少故事,尽量去吸引和带动读者做自己的探索、分析和研究。相信大家对了解这个社会是很感兴趣的。
目前本地目录 data-raw/
存放原数据以及处理数据的代码(不上传到本代码仓库),处理后的数据放在目录 data/
下,上传到代码仓库)。目前的组织方式是数据内容+年份,以 R 软件内置的 RDS 格式保存,如下是一些示例。优点是占用空间小,易于交流。等统稿的时候,可以考虑做一个单独的 R 包存放数据。
china-age-sex-2020.rds china-sex-ratio-2020.rds svn-trunk-log-2022.rds
china-household-sex-2020.rds gapminder-2020.rds usa-mortality-2020.rds
china-raise-illiteracy-2020.rds rversion-2022.rds
针对 array 类型,介绍 apply / stack / sweep 等函数及常用操作。数组操作在 MCMC 和模型编码中常用,array 比 data.frame 高效。附录已有矩阵运算一章。
仓库的书稿和代码是用 Github Action 提供的测试环境来测试的
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true
r-version: '4.2.1'
- uses: r-lib/actions/setup-r-dependencies@v2
上游已经更新,继续使用 v2
修复 Node 12 警告信息,防止发布失败
- name: Install Quarto
uses: quarto-dev/quarto-actions/setup@v2
with:
# To install LaTeX to build PDF book
tinytex: true
version: 1.1.251
- run: |
quarto --version
quarto pandoc --version
使用最新的稳定版 Quarto
书籍用到了很多 R 包,一些 R 包处于活跃开发和变动中,现将问题记录于此,以便跟进。如下都是一些对书籍影响比较大的 R 包,而且这些 R 包本身影响力也比较大,比如 ggplot2、 spatstat、 gt、 echarts4r 等
问题解决后就勾掉
由于 ggplot2 3.4.0 的改动导致本书使用的一些下游 R 包报出警告,预计 2022 年10月底之前发布 3.4.0 版本。
书中用到以下 R 包的开发版,待新的稳定版本发布后,可以更新 DESCRIPTION 文件
ggsurvfit(aml_ggsurvfit, linewidth = 1) +
add_confidence_interval() +
add_risktable()
Warning messages:
1: In ggplot2::geom_blank() :
All aesthetics have length 1, but the data has 22 rows.
ℹ Did you mean to use `annotate()`?
2: In ggplot2::geom_blank() :
All aesthetics have length 1, but the data has 22 rows.
ℹ Did you mean to use `annotate()`?
3: In ggplot2::geom_blank() :
All aesthetics have length 1, but the data has 22 rows.
ℹ Did you mean to use `annotate()`?
4: In ggplot2::geom_blank() :
All aesthetics have length 1, but the data has 22 rows.
ℹ Did you mean to use `annotate()`?
5: In ggplot2::geom_blank() :
All aesthetics have length 1, but the data has 22 rows.
ℹ Did you mean to use `annotate()`?
简单的表格可以一律用 knitr::kable
来做,复杂的表格可以一律用 gt 包做,主要考虑到 gt 的活跃开发现状, gt 和 Quarto 的紧密结合,以及 gt 的扩展生态(比如 gtExtra / gtreg),以及 gt 和 Quarto 都属于 RStudio 大厂持续维护和高投入。
gt gtsummary gtreg gtExtras 等包的文档见 https://github.com/jthomasmock
模型输出考虑 gtsummary 或者 modelsummary
Word 输出中,交叉引用功能失效,需要 Quarto 和相关表格包互相支持,预计 Quarto 1.4 会支持。
cols_width()
目前不支持 LaTeX 输出,对于复杂表格有点麻烦 rstudio/gt#634WARNING: Unable to resolve crossref @tbl-anscombe-datasets
WARNING: Unable to resolve crossref @tbl-ucb-admissions
已报告,详见 rstudio/gt#1140
时间序列预测:经典方法、机器学习方法
机器学习部分添加一章排序问题
\def\bm#1{{\boldsymbol #1}}
Warning: Could not convert TeX math
\def\bm#1{{\boldsymbol #1}}
, rendering as TeX:
^
unexpected control sequence \def
expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
第一部分 R语言入门
第二部分 R 语言基本方法
第三部分 R 语言机器学习
SQL 大数据处理,常规倾斜处理
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.