vijithassar / lit Goto Github PK
View Code? Open in Web Editor NEWa little preprocessor for literate programming
License: The Unlicense
a little preprocessor for literate programming
License: The Unlicense
lit-node is intended for use only with JavaScript and thus requires you to add a language specifier after the fence backticks. This allows you to include code written in other non-JavaScript languages in your Markdown file (notably, Bash commands for installation) without compiling that content down to the JavaScript output.
That idea doesn't quite work for lit, which is intended to be language agnostic, but it does point toward new functionality that might be worth implementing.
lit could allow you to specify from the command line interface the code block identifier to compile. So for example, to make lit behave like lit-node as described above, you might tell it only to extract code blocks that are identified as js:
$ ./lit.sh --identifier "js"
Simple enough.
To this, I'll also suggest a few extensions:
This identifier functionality should only be available when using lit over stdio. That's because the default multi-file "compile everything" behavior bases several aspects of its behavior (which files to process, what to name output files) on the file extensions of the input files. If with this feature the same Markdown input file could include both js and sh code, then it's not always clear from the filename handling alone whether the output file should be called .sh or .js. This also doesn't provide a way to parse the file extension in cases where you use the complete language name as an identifier (which is preferable practice since it is clearer, and more literate!).
```javascript
// code goes here
```
This could be solved by including a registry of languages which map to file extensions (and comment characters); docco does this, but I'd prefer to avoid a registry of "officially supported" languages because this tool is flexible enough that it doesn't actually need that.
In most cases where I have tested, syntax highlighting for GitHub Flavored Markdown only cares about the characters that immediately follow the backticks.
This code block is identified as JavaScript, obviously:
```js
// code goes here
```
If there are additional characters after the language identifier, they'll be ignored and syntax highlighting will still be triggered accurately. For example:
Still identified as JavaScript:
```js a
// code goes here
```
Also still identified as JavaScript:
```js a
// code goes here
```
This means that we can allow lit to extract longer (more literate?) identifiers that extend the usual simple language identifiers:
Some JavaScript for the Node.js server:
```js server
// code goes here
```
Some JavaScript for the client:
```js client
// code goes here
```
There's no limit to how deep you can go with this:
Old version of the program:
```js client stable
// code goes here
```
Experimental new version:
```js client prototype
// code goes here
```
This allows the same literate Markdown file to be compiled in different ways, routed to different output files. For example, to build the same Markdown file into executable files for client, server, and unit tests (remember, I propose requiring stdio for this new identifier flag, hence the pipe syntax):
$ cat app.md | ./lit.sh --identifier "js client" > client.js
$ cat app.md | ./lit.sh --identifier "js server" > server.js
$ cat app.md | ./lit.sh --identifier "js test" > tests.js
For maximum ridiculous flexibility, I'd suggest that --identifier
might as well be plural --identifiers
and contain a comma-separated string of multiple code block identifiers, all of which would be included in the output.
When the --before
/-b
flags are used to comment out Markdown content, awk
prints a space after the character that begins the comment. This is fine when there is a line to comment out:
# let's **create a variable** now
# ```python
x = 'hello world'
# ```
But awk
also inserts the space for line breaks when there is nothing to comment out, which is to say, for line breaks in the Markdown content:
# MY PROGRAM
#
# let's **create a variable** now
# ```python
x = 'hello world'
# ```
This is happening because awk
has a built in variable called OFS
, for "output file separator," which is set to the space character by default.
Linters do not like the trailing space on empty comment lines, and often complain about empty space at the end of a line. Whether this problem occurs will depend on your language, your linter, and your specific linter configuration, but prohibiting this in developer tooling is very common.
Instead of relying on the awk
OFS
variable to supply the space, OFS
should be set to an empty string so as not to pad empty comment lines, and then the space can be explicitly added.
This will require measuring the length of the Markdown content being commented out. If the length greater than 0 characters, then the space should be added before the Markdown content is appended to the. If the length of the Markdown content is zero characters, then it's just an empty line, and the space should not be appended.
This is related to the need to generally clean up the awk
logic.
I know the interface and the overall structure of the program is not complicated, however it would be really beneficial if lit.sh
shows available options and some information about how to use them, when it is called without arguments something like:
$ ./lit.sh
--------------------------------------------------------------------------------
| lit.sh: a simple preprocessor for literate programming
| Example:
| ./lit.sh dev/myfile.cpp.md -o src/
|
| List of Available options
| -H shows this screen
| -o --output specifies the output directory
| ...
|
| For bugs and suggestions: github.com/vijithassar/lit
| ...
-------------------------------------------------------------------------------
So far lit
always assumes your intention is to build the Markdown source into executable code, but there are also cases where isolating the documentation is useful – publishing a user manual on the web site for a project, for example. Such efforts are very much in line with the goals of lit and literate program, though the "last mile" problem of actually exposing the documentation is obviously best left to other tools.
For these scenarios, it'd be useful for lit
to accept a command line flag which inverts its usual behavior, sending documentation to the output and ignoring the code, or perhaps even retaining it but commenting it out.
This should be relatively easy to implement – just offset the awk comparison by 1 to invert the blocks that are captured from the input. (That link points to the compiled output script rather than the literate source, because unfortunately GitHub has no way to link to a specific line of Markdown!)
I'd love to use this for work, but the current license is a bit of a barrier. Would you be willing to at least dual license this under MIT (or the Unlicense, or CC0)?
The awk portion of this tool began as an adaptation of @trauber's gist that was simply reformatted across multiple lines for the sake of clarity, but it has changed a bit now that it has the ability to comment out the Markdown instead of strip it. This should be reworked and cleaned up, most notably to use native awk variables when possible instead of treating the awk program code as a string and using Bash variable expansion to change it.
This should be implemented before the identifiers proposal.
Currently lit
always assumes it is processing GitHub Flavored Markdown documents in which code and prose are distinguished with triple backticks. Allowing the user to specify an alternative token with a command line argument would enable literate programming using other document formats. Parsing LaTeX and HTML would require a sort of "soft match" mode in which the presence of part of the token would trigger a match, so that e.g. specifying the token pre>
could be used to match both opening <pre>
and closing </pre>
. (Supporting the tab-indented code formatting of non-GitHub original Markdown would require dramatically different logic throughout and as a result should not be considered part of this feature proposal.)
Adding this feature would make the input document format configurable just like the programming language, which furthers the goal of making prose and code equal partners.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.