Git Product home page Git Product logo

googdown's People

Contributors

brendan-r avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

googdown's Issues

[docx_template] Add links to git repo

Ideally, you'd alter the word template to have something like:

Repo | Reproducible source @2f793f92

With links to the repo, the .Rmd, and the .Rmd at that commit respectively. You'd want to make it work for GHE (so use customizable base URLs).

[gd_pull] Add local copy of remotely added images

If a user pushes an Rmarkdown file, and then adds an image to the google doc remotely, this should be added to the document's local representation. Currently, references are added to the Rmarkdown, to files which do not exist.

put brocks on cran

Currently all CI tests are failing because brocks isn't on CRAN. It would be more useful if they reflected the functionality of the package!

YAML Headers

There's a few of these which would be useful, that you're yet to write:

  • link_type (you have to choose between inline and reference)
  • doc_url (probably easier for people than the doc_id)
  • shared bool

Read / Write Doc via JSON / Apps Script

Currently, you're using the google drive API to download / upload documents in odt/docx format. An alternative is to use google apps script to upload/download a JSON representation of the document. Downloading is already possible via https://github.com/krilor/gdoc2json (tested). However, uploading would probably require writing software which:

  • Recursivley parse the JSON tree of the uploaded and original documents
  • diffs them, extracting the nature and location of changes
  • Converts the JSON diffs to function/method calls, which can be applied to the document structure to add/remove the differing elements

Which sounds like a lot of work for marginal benefit.

Advantages of Drive API relative to Apps Script:

  • No extra code to write in a weird subset of JavaScript to make upload/download possible
  • Most features supported
  • Pandoc reader / writer already written and mature
  • User does not have to trust a remote script (there is no way to verify that a remote script is what it says it is), or upload the scripts themselves (fiddly, error prone, tricky to automate, additional permissions / scopes)

Advantages of Apps Script relative to Drive API:

  • Could modify existing document, rather than adding and deleting everything, as currently happens (thus removing any existing comments)
  • The above would allow for much more frequent pushing / pulling
  • Could detect code blocks, and inline
  • Avoid styling via odt/docx format (awkward, indirect, lossy)
  • Easier to version control JSON

Dumb Punctuation

Currently, you're running all this without the --smart-punctuation flag, which is the default in rmarkdown (and makes a lot of sense).

This means that this will cause you problems when diffing remote and local documents.

Happily, this is a deterministic find and replace, which can be applied to remote ASTs once downloaded.

Show diff on fail

At the moment, pulling changes back from the Google doc fails pretty often, and debugging is a bit of a pain. It would be useful if gd_pull showed the problematic diffs (either written to text files, printed to console, or both).

Diff visualization capabilities

With several changes happening at once, in a long document it can be difficult to know exactly what changed. Additionally, as certain 'impossible' changes (e.g. those which edit certain types of dynamic content) are dropped, it's important to be able to verify that all the changes made remotely have made it back into the source. It's also important to be able to select changes that are perhaps unwanted (e.g. editing a value in a dynamically changed table will replace the codeblock with a markdown table of the new values, which a user probably doesn't want).

Trying to pass all of this information with messages or warnings would be pretty overwhelming. Managing and visualizing diffs is something that's already well handled by git, but this is a large technical burden to place on casual users.

Additionally, it's quite possible that work could continue to happen on the .Rmd source file after pushing, producing a merge problem. Two solutions, which could be complementary.

Support git branching

  • When new changes are pulled in, a branch is created from the commit of the .Rmd source, at the point the document was pushed
  • Changes are made to this branch
  • Users can merge however they please
  • It's easy to compare the changes in the current and previous remote files, to verify that remote changes have made it to source

Support a basic HTML view

  • diffobj already has some great tools for visualizing diffs in HTML. You could adapt these to show difference between remote markdown files, and source files.
  • Adding interactivity via shiny would be possible, but a relatively large amount of work

pandoc: use rmarkdown::pandoc_convert

Currently you're just shelling out to pandoc, which is alright, but doesn't have great error handling etc. rmarkdown::pandoc_convert is probably a better solution.

Alert users to remotely edited dynamic content

If I have an rmarkdown document:

the meaning of the life is `r 40 + 2`.

Which renders to

the meaning of the life is 42.

Which is then remotely changed to

the meaning of the life is 43.

It's not clear what the best thing to do for the user is when the change to the document is being pulled in locally. Given this, it's probably best to have an interactive interface where the user is asked, or at the very least, some alerting.

[gd_pull] Add final find & replace pass

A fairly common operation is to move a plot from the body of a document, to the appendix. This is not detected by diffing; this appears as independent additions and deletions. The result is that moving a plot to the appendix does not result in the relevant R code block moving, but rather being removed and then the plain markdown for the plot appearing in the appendix.

One way to 'fix' this would be to take a final find and replace pass over the document, once the diffing is complete. Because image files are currently identified sequentially, you'd need some way for them to be uniquely identifiable. One solution would be to write something which would pass over the document and rename image files (and references to them) as the hashes of the content.

TODO

  • Write function to replace image filenames and references with the hashes of the file content (bfb9f49)
  • Integrate with existing pull and push functions, test
  • Fix #34
  • Write function to perform the final find and replace pass. This work better on diffed-and-merged Rmd file, rather than the AST.

bookdown: Investigate

A Google doc is hardly a book, but the way it handles figure captioning looks good (hopefully it's numbered). This isn't something that you currently use.

Not a bug, but a thing to investigate.

[gd_pull] Borks if bullet points immediately precede heading

This appears to be specific to the upload format --- this doesn't happen if uploading via pandoc exporting to MS Word (though that has other problems).

Current understanding:

  • File is uploaded in odt form, with bullet points immediately preceding a heading
  • Works and renders fine in Google docs
  • Upon export via docx, the resulting file looks fine
  • Upon converting said docx to markdown / json via pandoc, subsequent headings are for some reason numbered and indented
  • This causes downstream problems for diffing/merging, and breaks everything

It's possible to avoid this by simply adding a sentence to the end of a list of bullet-points, although this isn't much fun.

As an example, this

This is a H1
============

-   This

-   That

-   Other stuff

This is a H2
------------

This is a sentence

Becomes this

This is a H1
============

-   This

-   That

-   Other stuff

    1.  This is a H2
        ------------

This is a sentence

It may be that this is some quirk in the template which you can 'remove' somehow (no idea how to go about this; using the standard pandoc template causes the same problems), or just that this is a Google bug which will never be fixed. Could try moving to docx as an upload format, but would rather not (I'm sure this would introduce lots of its own problems).

Note: This doesn't appear to a be a problem with markdown alone, e.g.

echo -e "# H1\n\n- this\n- that\n\n##H2\n\nsentence" | pandoc -f markdown -t odt -o test.odt
pandoc -f odt -t markdown test.odt

Behaves as expected.

Cannot delete output from code chunks

This was initially a design choice to get things going, but it would be very useful to be able to delete the output of code (inline or blocks) in the Google doc, and have the changes propagated back to the source .Rmd file.

Currently errors with

ul <- gd_pull("test.Rmd")
test.Rmd  converted to standard pandoc markdown (with --wrap= none )
Downloading remote changes
Error in difflist[[l]] : subscript out of bounds
>
> traceback()
7: patch_strings(file1, file2, difflist)
6: eval(lhs, parent, parent)
5: eval(lhs, parent, parent)
4: patch_strings(file1, file2, difflist) %>% paste(collapse = "\n") %>%
       fix_json() %>% writeLines(patched_file) at patching.R#4
3: patch(local1, remote2, offset_diff, output_file) at remote_to_local_diffs.R#53
2: remote_diff_to_local(remote1 = remote1_ast_path, local1 = local1_ast_path,
       remote2 = remote2_ast_path, output_file = md_merged_ast) at gdoc_push_pull.R#65
1: gd_pull("test.Rmd")
>

Workaround for figure captions

Both knitr chunks and pandoc's AST have a representation of a figure caption. However Google docs does not.

If you'd like to be able to edit figure captions remotely and have the changes pulled in locally, then you'll need to write an additional processor to detect what's a figure caption, and when it's been changed, and which strings go where.

[standardize_rmd] Fix spacing

Currently, standardize_rmd produces rather 'squashed' looking files. You should probably write something at the end that forces some reasonable spacing conventions (e.g. before / after code-blocks and headings).

Error on unsupported input

For example, formatting on code (blocks, and inline) is lost during the round trip.

When a user attempts to use such features, googdown should fail politely and early.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.