Git Product home page Git Product logo

verso's Introduction

verso|recto - A Different Approach to Literate Programming

CI Status Crates.io

Literate programming is the art of preparing programs for human readers.

Norman Ramsey, on the noweb homepage.

Literate Programming (LP) tries to address a common problem with software systems: by reading the code one can discover how a thing is done, but not why it is done that way. Every program has a "theory of operation" embedded within its logic, but this is often hidden. Comments within the code and documentation of APIs are helpful but insufficient for this task. Often the documentation generation systems provided with a programming language do not provide a way to contextualize the code they describe within the larger system. For programs which rely on a more advanced mathematical background, it is also often difficult to embed equations or other markup in the documentation.

Existing LP tools such as WEB, noweb, and Org Babel attempt to address this by taking the following approach:

  1. Embed the program source code within a prose document which explains the author's thinking.
  2. Provide mechanisms for organizing code into abstract chunks, which can be recombined in different orders than they are defined and referenced throughout the document.
  3. Process the combined document in one of two ways: either tangle the source code out of the document into a computer-friendly version, or weave the document into a form ready for humans to read.

Overall this is an improvement on inline documentation and can provide much more context to the reader than mainstream approaches. Unfortunately it does suffer some serious drawbacks as well:

  • The source code is embedded within a markup file, making it inaccessible to language-specific tooling such as editors, compilers, and static analysis tools. In order to use these the code must first be tangled.
  • Similarly, to build your literate program the end user must have the appropriate tools installed in addition to the language tooling needed to compile the source.
  • Most programmers have spent years working directly in the machine-friendly source representation rather than a literate style, introducing a barrier to entry to writing literate programs. Porting an existing project to a literate representation is nontrivial.

verso|recto takes a different approach. It considers the source file as a first-class citizen to be referenced by the documentation. Rather than embedding source code in documents, or documents in source code, lightweight annotations are used to mark sections of interest within the code which can be easily referenced by prose documents. There is no tangle step, and source files remain fully valid inputs for compilers, editors, and other tools. There is no need to translate line numbers or formatting between literate sources and what the compiler sees.

What traditional LP systems get right

A lot of things! Here are a few ideas that the traditional systems pioneered and verso|recto borrows:

  • Support a many-to-many relationship between LP documents and source files.
    • In noweb, a document may contain several root chunks which can each be sent to different source files, and multiple LP documents can be fed into the tangle tool at once (they are concatenated).
  • The more modern tools support multiple programming and markup languages.
  • noweb offers a pipelined architecture which makes it easy to insert processing stages to meet a user's needs.
  • They generate indices and cross references within the human-readable output.

Using verso|recto

verso|recto is driven by annotations within your source files, defining regions which can be referenced by other documents. These regions are called "fragments".

Annotating a file

Annotations are quite simple. To mark a region of code and make a fragment, simply add a pair of comments around the region with the symbols @< and >@ followed by a unique ID. The ID can be any string of alphanumeric characters and the characters /, _, or -, though it should be both unique within your project and valid in the source file you're annotating. (The period character (.) is reserved, as it is used for inserting metadata about fragments.) Other characters may be added to the "safe" list in the future. If a character you want to use is not listed here, please file an issue on GitHub (or better yet, send a PR)!

Referencing annotations

In order to insert a fragment in another file, add a line containing the symbol @@ followed by the ID of the annotation (e.g. @@12345). When the file is woven using the recto command (see the next section), the line will be replaced with the contents of th fragment. You can add any markup you like around the line to provide formatting.

To insert a group of fragments, a regular expression can be used after the @* symbol. All of the fragments whose ID matches the expression will be inserted in place of the symbol, in lexicographic order by their IDs.

Sometimes it is also desirable to refer to metadata about a fragment. Currently, verso|recto supports the following metadata insertion operators:

  1. Filename. @?id.file inserts the name of the file the fragment was drawn from.
  2. Line number. @?id.line inserts the line number on which the fragment began.
  3. Column number. @?id.col inserts the column number at which the fragment began. (Currently this value is always 0, as fragments always begin at the start of a line.)
  4. Quick location. @?id.loc inserts the file name, starting line number, and column number for the fragment in the format file (line:col). This is useful if you just want to quickly refer to the metadata without futzing with the formatting.

Weaving a document for human consumption

The verso command will read all of the files specified on the command line, extract their fragments, and output the result to stdout. In turn the recto command will read fragments from stdin. This makes the two programs easy to use together via pipes:

verso main.rs lib.rs | recto build chap1.tex chap2.tex blog/home.md
      ^       ^              ^     ^         ^         ^
      +-------+              |     +---------+---------+
      |                      |                         |
      |                      |                         |
      +--- Source files      +--- Output directory     +--- Prose files

Each of the woven files is written to the output directory, provided as the first argument, in the same relative location as given on the command line. So, for example, the file blog/home.md above will be written to build/blog/home.md when it is woven.

Note that, although the two programs appear to run in parallel, verso won't send input to recto until it has successfully extracted fragments from all of the source files it was given and that recto will not start weaving files together until it receives those fragments. Because of this if verso fails, recto will also fail.

Full symbology

For reference, here is a table with the full symbology. Note that in the (hopefully rare) case that your language has symbols which collide with the defaults used by verso|recto, you can override them by using the listed environment variables.

Name Symbol Description Override Variable
Fragment Open @< Starts a named fragment. VERSO_FRAGMENT_OPEN_SYMBOL
Fragment Close >@ Ends a named fragment. VERSO_FRAGMENT_CLOSE_SYMBOL
Halt @!halt Halts fragment extraction. VERSO_HALT_SYMBOL
Insert Fragment @@ Insert a fragment by ID. RECTO_INSERTION_SYMBOL
Insert Pattern @* Insert a fragment by ID pattern. RECTO_PATTERN_SYMBOL
Insert Metadata @? Insert metadata about a fragment. RECTO_METADATA_SYMBOL

The Name

Recto and verso are respectively, the text written or printed on the "right" or "front" side and on the "back" side of a leaf of paper in a bound item such as a codex, book, broadsheet, or pamphlet.

Wikipedia - Recto and Verso

(Note that, conveniently, the verso program goes on the left of the pipe while recto goes on the right.)

Contributors

Future Work

  • Add support for allow overlapping fragments.
  • Add support for custom formatting of annotation properties within the woven output.
  • Paralellize file processing in Verso, and both reading from stdin and file reading in Recto.
  • Add --fragments-from option to specify a source other than stdin for fragments.

verso's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

verso's Issues

Feature requests

Thank you for this tool! It is a great idea and very elegantly executed: I love how minimal and clean the design is; it's really a joy to use, and my code is really improving as a result. Moreover, with this design, I imagine that it will be easy to extend to cover different use-cases.

Just jotting down here a couple of things I felt would be nice, so that I don't forget them (none are really necessary; the tool is fine without them):

  1. It may be nice for the error messages, like

    Error: (src/lib.rs:1212:7) DoubleOpen found a fragment open symbol while a fragment is already opened: 1212
    

    and

    Error: (src/lib.rs:1767:3) CloseBeforeOpen found a fragment close symbol when no fragment is active: 1767
    

    to mention the names (found foo open while bar was already opened, or found close for foo…). But the line numbers are enough for now.

  2. This won't be wanted in most cases, but to make sure I've covered everything, it may be nice for verso to show all the lines that are not covered by any fragment (or put them in automatically generated everything_else fragment(s) — this will mostly be blank lines and a few comments), and similarly for recto to report all the fragments that were defined but weren't referenced.

    Edit: And just to echo my point about how good the design is: I am already able to achieve the latter with

    for i in $(verso src/lib.rs src/bin.rs | jq -r '.[].id'); do grep -q @@$i docs-src/about.md || echo "$i"; done
    

Pipes

verso main.rs lib.rs | recto build chap1.tex chap2.tex blog/home.md

Pipes are pretty neat for that, I like it!

I guess the downside is that pipefail isn't on by default in most shells, so it might confuse the users if verso fails but recto succeeds nevertheless?

This won't be an issue at the moment I guess since if verso terminates early, you wouldn't have valid JSON emitted which would make recto fail anyway. So just keeping it here, perhaps it's good to emphasise that pipes are safe to use in this case so people don't have to worry about it.

Nice project.

Not really an issue, but simply a comment. I've been thinking of turning the documentation for my open-source project (located at https://trane-project.github.io/) into some sort of literary programming book, and I was brainstorming something that could work with mdbook, which is what I am using right now. I came up with something similar to this. Something that preprocessed the documentation files and attached the sections from the code.

Glad to see that it's already been made and fits my purposes. Still have to try it out, but it looks very promising.

cargo clippy reports several fixable warnings

cargo 1.65.0 with clippy 0.1.65 reports several fixable warnings when run against verso 0.1.2

lib.rs generates 7 warnings
recto.rs generates 2 warnings
verso.rs generates 1 warnings.

All warnings can be fixed with the option --fix.
cargo test run after fixing shows no errors.

Annotating: only specify annotation id once?

Would be nice if you don't have to repeat annotation id for the closing 'tag'?
E.g.:

def main():
    #@!annotation_1
    print("Hello")
    print("World")
    #!@
    print("!")
    #@!annotation_2
    sys.exit(1)
    #!@

If annotations are overlapping then you can still specify the id so it's parsed correctly

def main():
    #@!annotation_1
    print("Hello!")
    #@!annotation_2
    print("World")
    #!@annotation_1
    print("!")
    sys.exit(1)
    #!@annotation_2

Btw, #! is a shebang, so it might make people nervous seeing it all around their code! Maybe a different symbol instead of bang? :)

Add Command Line help to recto and verso

Whilst the readme for the crate verso is clear on the arguments that recto and verso take, it would be nice to support the argument --help or -h to output the usage of recto and verso.

This might be achieved by using the clap crate or a similar CLI argument parsing tool.

Configurable Block Open / Close character sequences?

I've been using verso with the ABAP programming language and have stumbled across a few cases where the block open character sequence @< is used in ABAP code. This right royally confuses recto / verso. I've not experienced collisions with other symbols.

I wonder about the interest in making the block open and close sequences able to be read from a file with a fallback to the standard sequences if optional configuration content is not available?

Identifiers cannot use all of the characters that docs say they can

Hello.

Documentation says: "The ID can be any string of non-whitespace characters".

But when I try to use IDs with "special" characters, like '/', the Annotations are not parsed completly.

For example,

I have this IDs: api/definition and api/implementation.

But then when I run verso, Verso stops reading at the special character ('/' in this case):

Read annotation api
Read annotation api

but should have read:

Read annotation api/definition
Read annotation api/implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.