googdown's People
googdown's Issues
Add GH issues template
Add revisions to document commit history
[figures] add / test support
[code-blocks] Prevent figure captions getting scrambled when uploading
At the moment, all spaces are removed. Not quite sure what's causing it.
Make find and replace pass optional
For easier debugging
[docx_template] Add links to git repo
Ideally, you'd alter the word template to have something like:
Repo | Reproducible source @2f793f92
With links to the repo, the .Rmd, and the .Rmd at that commit respectively. You'd want to make it work for GHE (so use customizable base URLs).
[standardize_rmd] Leaves extra space at the end of yaml block
[bookdown] Deleting figures does not work
Currently seems to break the pandoc AST during the diffing/merging stage.
[gd_pull] Add local copy of remotely added images
If a user pushes an Rmarkdown file, and then adds an image to the google doc remotely, this should be added to the document's local representation. Currently, references are added to the Rmarkdown, to files which do not exist.
put brocks on cran
Currently all CI tests are failing because brocks
isn't on CRAN. It would be more useful if they reflected the functionality of the package!
[standardize_rmd] `\n` gets turned into 'real' newline in code blocks
[footnotes] add / test support
YAML Headers
There's a few of these which would be useful, that you're yet to write:
link_type
(you have to choose between inline and reference)doc_url
(probably easier for people than the doc_id)shared
bool
[standardize_rmd] Don't remove newlines at the end of codeblocks
It would be fine (for you) to just enforce them, if that's easier
Read / Write Doc via JSON / Apps Script
Currently, you're using the google drive API to download / upload documents in odt/docx format. An alternative is to use google apps script to upload/download a JSON representation of the document. Downloading is already possible via https://github.com/krilor/gdoc2json (tested). However, uploading would probably require writing software which:
- Recursivley parse the JSON tree of the uploaded and original documents
diffs
them, extracting the nature and location of changes- Converts the JSON diffs to function/method calls, which can be applied to the document structure to add/remove the differing elements
Which sounds like a lot of work for marginal benefit.
Advantages of Drive API relative to Apps Script:
- No extra code to write in a weird subset of JavaScript to make upload/download possible
- Most features supported
- Pandoc reader / writer already written and mature
- User does not have to trust a remote script (there is no way to verify that a remote script is what it says it is), or upload the scripts themselves (fiddly, error prone, tricky to automate, additional permissions / scopes)
Advantages of Apps Script relative to Drive API:
- Could modify existing document, rather than adding and deleting everything, as currently happens (thus removing any existing comments)
- The above would allow for much more frequent pushing / pulling
- Could detect code blocks, and inline
- Avoid styling via odt/docx format (awkward, indirect, lossy)
- Easier to version control JSON
Dumb Punctuation
Currently, you're running all this without the --smart-punctuation
flag, which is the default in rmarkdown (and makes a lot of sense).
This means that this will cause you problems when diffing remote and local documents.
Happily, this is a deterministic find and replace, which can be applied to remote ASTs once downloaded.
Google Docs removes (blue/underlined) hyperlink formatting
Show diff on fail
At the moment, pulling changes back from the Google doc fails pretty often, and debugging is a bit of a pain. It would be useful if gd_pull
showed the problematic diffs (either written to text files, printed to console, or both).
Diff visualization capabilities
With several changes happening at once, in a long document it can be difficult to know exactly what changed. Additionally, as certain 'impossible' changes (e.g. those which edit certain types of dynamic content) are dropped, it's important to be able to verify that all the changes made remotely have made it back into the source. It's also important to be able to select changes that are perhaps unwanted (e.g. editing a value in a dynamically changed table will replace the codeblock with a markdown table of the new values, which a user probably doesn't want).
Trying to pass all of this information with messages or warnings would be pretty overwhelming. Managing and visualizing diffs is something that's already well handled by git, but this is a large technical burden to place on casual users.
Additionally, it's quite possible that work could continue to happen on the .Rmd source file after pushing, producing a merge problem. Two solutions, which could be complementary.
Support git branching
- When new changes are pulled in, a branch is created from the commit of the .Rmd source, at the point the document was pushed
- Changes are made to this branch
- Users can merge however they please
- It's easy to compare the changes in the current and previous remote files, to verify that remote changes have made it to source
Support a basic HTML view
- diffobj already has some great tools for visualizing diffs in HTML. You could adapt these to show difference between remote markdown files, and source files.
- Adding interactivity via shiny would be possible, but a relatively large amount of work
[docx_template] Center equations
[equations] Backlashes get added to formula after round trip
Each time you push / pull, more slashes get added
Add support for document templates
And provide something that looks a bit like the standard Google docs file.
[odt_template] Not enough space before / after headings
[odt_template] Bullet points are weird / invisible at some levels
[tests] Round trips
Image hashing does not work when knitr parameter cache = TRUE
pandoc: use rmarkdown::pandoc_convert
Currently you're just shelling out to pandoc, which is alright, but doesn't have great error handling etc. rmarkdown::pandoc_convert
is probably a better solution.
Uploading with docx: Bookmark links are ugly
The blue ribbons to the left of all the headings.
Might be able to remove them by either:
- Disabling the
implicit_header_references
extension - Making them 'invisible' in the document template
[gd_pull] Allow specification of a separate outputfile
So that pulling in new changes doesn't necessarily overwrite the existing file.
[docx_template] Make table captions look nicer
More like figure captions, if that's possible
Alert users to remotely edited dynamic content
If I have an rmarkdown document:
the meaning of the life is `r 40 + 2`.
Which renders to
the meaning of the life is 42.
Which is then remotely changed to
the meaning of the life is 43.
It's not clear what the best thing to do for the user is when the change to the document is being pulled in locally. Given this, it's probably best to have an interactive interface where the user is asked, or at the very least, some alerting.
[gd_pull] Add final find & replace pass
A fairly common operation is to move a plot from the body of a document, to the appendix. This is not detected by diffing; this appears as independent additions and deletions. The result is that moving a plot to the appendix does not result in the relevant R code block moving, but rather being removed and then the plain markdown for the plot appearing in the appendix.
One way to 'fix' this would be to take a final find and replace pass over the document, once the diffing is complete. Because image files are currently identified sequentially, you'd need some way for them to be uniquely identifiable. One solution would be to write something which would pass over the document and rename image files (and references to them) as the hashes of the content.
TODO
bookdown: Investigate
A Google doc is hardly a book, but the way it handles figure captioning looks good (hopefully it's numbered). This isn't something that you currently use.
Not a bug, but a thing to investigate.
Can't add text immediately before / after a plot
Fails with a pandoc error
[gd_pull] Borks if bullet points immediately precede heading
This appears to be specific to the upload format --- this doesn't happen if uploading via pandoc exporting to MS Word (though that has other problems).
Current understanding:
- File is uploaded in odt form, with bullet points immediately preceding a heading
- Works and renders fine in Google docs
- Upon export via docx, the resulting file looks fine
- Upon converting said docx to markdown / json via pandoc, subsequent headings are for some reason numbered and indented
- This causes downstream problems for diffing/merging, and breaks everything
It's possible to avoid this by simply adding a sentence to the end of a list of bullet-points, although this isn't much fun.
As an example, this
This is a H1
============
- This
- That
- Other stuff
This is a H2
------------
This is a sentence
Becomes this
This is a H1
============
- This
- That
- Other stuff
1. This is a H2
------------
This is a sentence
It may be that this is some quirk in the template which you can 'remove' somehow (no idea how to go about this; using the standard pandoc template causes the same problems), or just that this is a Google bug which will never be fixed. Could try moving to docx as an upload format, but would rather not (I'm sure this would introduce lots of its own problems).
Note: This doesn't appear to a be a problem with markdown alone, e.g.
echo -e "# H1\n\n- this\n- that\n\n##H2\n\nsentence" | pandoc -f markdown -t odt -o test.odt
pandoc -f odt -t markdown test.odt
Behaves as expected.
[citations] add / test support
Cannot delete output from code chunks
This was initially a design choice to get things going, but it would be very useful to be able to delete the output of code (inline or blocks) in the Google doc, and have the changes propagated back to the source .Rmd file.
Currently errors with
ul <- gd_pull("test.Rmd")
test.Rmd converted to standard pandoc markdown (with --wrap= none )
Downloading remote changes
Error in difflist[[l]] : subscript out of bounds
>
> traceback()
7: patch_strings(file1, file2, difflist)
6: eval(lhs, parent, parent)
5: eval(lhs, parent, parent)
4: patch_strings(file1, file2, difflist) %>% paste(collapse = "\n") %>%
fix_json() %>% writeLines(patched_file) at patching.R#4
3: patch(local1, remote2, offset_diff, output_file) at remote_to_local_diffs.R#53
2: remote_diff_to_local(remote1 = remote1_ast_path, local1 = local1_ast_path,
remote2 = remote2_ast_path, output_file = md_merged_ast) at gdoc_push_pull.R#65
1: gd_pull("test.Rmd")
>
[equations] add / test support
Workaround for figure captions
Both knitr chunks and pandoc's AST have a representation of a figure caption. However Google docs does not.
If you'd like to be able to edit figure captions remotely and have the changes pulled in locally, then you'll need to write an additional processor to detect what's a figure caption, and when it's been changed, and which strings go where.
Googdown thinks documents are synced even after unsuccessful pull
[standardize_rmd] Fix spacing
Currently, standardize_rmd
produces rather 'squashed' looking files. You should probably write something at the end that forces some reasonable spacing conventions (e.g. before / after code-blocks and headings).
Error on unsupported input
For example, formatting on code (blocks, and inline) is lost during the round trip.
When a user attempts to use such features, googdown should fail politely and early.
[gd_pull] Chunks with multiple plots get over-written with markdown source
Not immediately clear what's causing this.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.