vijithassar / lit Goto Github PK

View Code? Open in Web Editor NEW

116.0 4.0 6.0 74 KB

a little preprocessor for literate programming

License: The Unlicense

Shell 99.77% Ruby 0.23%

bash shell documentation markdown preprocessor literate-programming

lit's People

Contributors

Stargazers

Watchers

Forkers

jeremyjbowers 0atman bclinkinbeard nasser houptlab ludoplex

lit's Issues

dynamic selection of code blocks

lit-node is intended for use only with JavaScript and thus requires you to add a language specifier after the fence backticks. This allows you to include code written in other non-JavaScript languages in your Markdown file (notably, Bash commands for installation) without compiling that content down to the JavaScript output.

That idea doesn't quite work for lit, which is intended to be language agnostic, but it does point toward new functionality that might be worth implementing.

lit could allow you to specify from the command line interface the code block identifier to compile. So for example, to make lit behave like lit-node as described above, you might tell it only to extract code blocks that are identified as js:

$ ./lit.sh --identifier "js"

Simple enough.

To this, I'll also suggest a few extensions:

stdio

This identifier functionality should only be available when using lit over stdio. That's because the default multi-file "compile everything" behavior bases several aspects of its behavior (which files to process, what to name output files) on the file extensions of the input files. If with this feature the same Markdown input file could include both js and sh code, then it's not always clear from the filename handling alone whether the output file should be called .sh or .js. This also doesn't provide a way to parse the file extension in cases where you use the complete language name as an identifier (which is preferable practice since it is clearer, and more literate!).

```javascript
// code goes here
```

This could be solved by including a registry of languages which map to file extensions (and comment characters); docco does this, but I'd prefer to avoid a registry of "officially supported" languages because this tool is flexible enough that it doesn't actually need that.

Extended Identifiers

In most cases where I have tested, syntax highlighting for GitHub Flavored Markdown only cares about the characters that immediately follow the backticks.

This code block is identified as JavaScript, obviously:
```js
// code goes here
```

If there are additional characters after the language identifier, they'll be ignored and syntax highlighting will still be triggered accurately. For example:

Still identified as JavaScript:
```js a
// code goes here
```

Also still identified as JavaScript:
```js a
// code goes here
```

This means that we can allow lit to extract longer (more literate?) identifiers that extend the usual simple language identifiers:

Some JavaScript for the Node.js server:
```js server
// code goes here
```

Some JavaScript for the client:
```js client
// code goes here
```

There's no limit to how deep you can go with this:

Old version of the program:
```js client stable
// code goes here
```

Experimental new version:
```js client prototype
// code goes here
```

This allows the same literate Markdown file to be compiled in different ways, routed to different output files. For example, to build the same Markdown file into executable files for client, server, and unit tests (remember, I propose requiring stdio for this new identifier flag, hence the pipe syntax):

$ cat app.md | ./lit.sh --identifier "js client" > client.js
$ cat app.md | ./lit.sh --identifier "js server" > server.js
$ cat app.md | ./lit.sh --identifier "js test" > tests.js

For maximum ridiculous flexibility, I'd suggest that --identifier might as well be plural --identifiers and contain a comma-separated string of multiple code block identifiers, all of which would be included in the output.

insert spaces after inline comments explicitly without awk OFS

When the --before/-b flags are used to comment out Markdown content, awk prints a space after the character that begins the comment. This is fine when there is a line to comment out:

# let's **create a variable** now
# ```python 
x = 'hello world'
# ```

But awk also inserts the space for line breaks when there is nothing to comment out, which is to say, for line breaks in the Markdown content:

# MY PROGRAM
# 
# let's **create a variable** now
# ```python
x = 'hello world'
# ```

This is happening because awk has a built in variable called OFS, for "output file separator," which is set to the space character by default.

Linters do not like the trailing space on empty comment lines, and often complain about empty space at the end of a line. Whether this problem occurs will depend on your language, your linter, and your specific linter configuration, but prohibiting this in developer tooling is very common.

Instead of relying on the awk OFS variable to supply the space, OFS should be set to an empty string so as not to pad empty comment lines, and then the space can be explicitly added.

This will require measuring the length of the Markdown content being commented out. If the length greater than 0 characters, then the space should be added before the Markdown content is appended to the. If the length of the Markdown content is zero characters, then it's just an empty line, and the space should not be appended.

This is related to the need to generally clean up the awk logic.

Echo available options for help

I know the interface and the overall structure of the program is not complicated, however it would be really beneficial if lit.sh shows available options and some information about how to use them, when it is called without arguments something like:

$ ./lit.sh
--------------------------------------------------------------------------------
| lit.sh: a simple preprocessor for literate programming
| Example:  
| ./lit.sh dev/myfile.cpp.md -o src/
|
| List of Available options
| -H                     shows this screen
| -o --output            specifies the output directory
| ...
|
| For bugs and suggestions: github.com/vijithassar/lit
|  ...
-------------------------------------------------------------------------------

output documentation instead of code

So far lit always assumes your intention is to build the Markdown source into executable code, but there are also cases where isolating the documentation is useful – publishing a user manual on the web site for a project, for example. Such efforts are very much in line with the goals of lit and literate program, though the "last mile" problem of actually exposing the documentation is obviously best left to other tools.

For these scenarios, it'd be useful for lit to accept a command line flag which inverts its usual behavior, sending documentation to the output and ignoring the code, or perhaps even retaining it but commenting it out.

This should be relatively easy to implement – just offset the awk comparison by 1 to invert the blocks that are captured from the input. (That link points to the compiled output script rather than the literate source, because unfortunately GitHub has no way to link to a specific line of Markdown!)

Add a proper license

I'd love to use this for work, but the current license is a bit of a barrier. Would you be willing to at least dual license this under MIT (or the Unlicense, or CC0)?

clean up awk program

The awk portion of this tool began as an adaptation of @trauber's gist that was simply reformatted across multiple lines for the sake of clarity, but it has changed a bit now that it has the ability to comment out the Markdown instead of strip it. This should be reworked and cleaned up, most notably to use native awk variables when possible instead of treating the awk program code as a string and using Bash variable expansion to change it.

This should be implemented before the identifiers proposal.

configurable token to demarcate code blocks

Currently lit always assumes it is processing GitHub Flavored Markdown documents in which code and prose are distinguished with triple backticks. Allowing the user to specify an alternative token with a command line argument would enable literate programming using other document formats. Parsing LaTeX and HTML would require a sort of "soft match" mode in which the presence of part of the token would trigger a match, so that e.g. specifying the token pre> could be used to match both opening <pre> and closing </pre>. (Supporting the tab-indented code formatting of non-GitHub original Markdown would require dramatically different logic throughout and as a result should not be considered part of this feature proposal.)

Adding this feature would make the input document format configurable just like the programming language, which furthers the goal of making prose and code equal partners.

vijithassar / lit Goto Github PK

lit's People

Contributors

Stargazers

Watchers

Forkers

lit's Issues

dynamic selection of code blocks

stdio

Extended Identifiers

insert spaces after inline comments explicitly without awk OFS

Echo available options for help

output documentation instead of code

Add a proper license

clean up awk program

configurable token to demarcate code blocks

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent