NailIt
A quite minimal literate programming tool, capable of formatting code with explanation, linking to sections of code and backreferencing other sections of code.
The tool converts from a Markdown file into HTML, fit for reading online or printing, if you want.
The tool supports converting only one Markdown file at the moment, if you want to use multiple files you'll have to combine them somehow… cat *.md > onesource.md?
Also, the tool does not at the moment support appending code blocks "dynamically", so you have to uh… nail it I guess. It may be better that way, anyway—at least, in a "documentation" instead of "tutorial" setting.
Building
The repository is located at https://github.com/ZoomTen/nailit.
Requires Nim ≥1.6.x. The standard distribution should include the nimble tool, use nimble build to make a binary called nailit.
Usage
NailIt - a simple literate programming tool.
Usage:
nailit weave [--template=<template.html>] <source.md> [<out.html>]
nailit tangle <source.md> <destdir/>
nailit blocks <source.md>
nailit (-h | --help)
nailit --version
weave = generate a human-readable HTML document
from literate programs.
tangle = generate compileable source code from
literate programs.
blocks = see what blocks NailIt sees.
Literate program structure
Literate programs consist of code blocks and prose blocks. This tool accepts literate programs in the form of Markdown-formatted documents.
Code blocks are, well, the actual program source code. To make a code block, surround code with a single line of ``` before and after the code portion. The ``` line before the code can take one of four forms:
- ```: starts an unnamed block interpreted as plain text;
- ```lang: starts an unnamed block interpreted as code in lang;
- ```lang name of block: starts a block named name of block interpreted as code in lang;
- ``` name of block: starts a block named name of block interpreted as plain text. In this form, at least one space is needed before the block's name!
Code block names that start with a / will be interpreted as file output relative to the destdir specified when invoking nailit tangle.
Inside a code block, you can refer to other code blocks like so: @{Name of other code block}. They must live in its own line, with optional indentation. Indenting these references will add indentation to the inserted code block when tangling it, so you must keep that in mind when using whitespace-sensitive languages.
Prose blocks are paragraphs and other stuff around the code blocks that explain what the code does and why it does. They are formatted as as Markdown… Nim-flavored Markdown, at least. Because it uses Nim's own Markdown compiler, NailIt is relatively self-contained—you don't need an external tool for helping with the Markdown-to-HTML conversion, which is either a feature or a limitation depending on your point of view.
Limitations
(or, what similar programs might do that NailIt doesn't do)
- Code block references must live in its own line.
- No support for multiple source files.
- No support for creating multi-page output. Though, the single huge page could be made into one… if you print it to PDF through your browser :p
- No support for appending to code blocks, only replacing them (will output a warning).
- No support for syntax highlighting natively, although you may use a JavaScript-based solution like Prism.js.
- Does not automatically insert prose blocks inside of tangled code blocks as comments.
- Weaving and tangling must be done separately, e.g. nailit weave source.md source.html && nailit tangle source.md src/. There's not really a technical reason for this, I just want to know what I'm doing.
- No support for line numbers in the weaved output yet.
Design Considerations
- Allow code blocks to be written anywhere in the literate program and let NailIt compile them into a program that makes sense, per the Knuthian definition of literate programming.
- Allow flexibility in laying out a literate program.
- The files it reads should remain compatible with existing Markdown engines, like GitHub's renderer.
- Keep it simple and probably naive.
- Reference code block names directly in the output, rather than using section or index numbers, which should help readability by making it obvious what a code block is used in.
Styling Weaved Output
The body of the weaved output consists of HTML prose (not wrapped in anything… yet?) and code blocks, formatted like this:
<div id="codeblocktitle" class="code-block">
<header class="block-title">
<a href="#codeblocktitle">Code block title</a>
</header>
<pre>
<code class="cb-content">
Here's a bit of code...
</code>
<code class="cb-reference">
<a href="#someothercode">@ {Some other code}</a>
</code>
<code class="cb-content">
Here's another bit of code...
</code>
</pre>
<footer class="used-by">
Used by
<a href="#yetanotherpieceofcode">Yet another piece of code</a>,
<a href="#andanother">and another</a>
</footer>
</div>
The containing <div> will have language-* classes if a language is specified in the corresponding code block in the literate program. Additionally, the "used by" footer will not be present if a code block stands alone, not referenced by any other code block. And the header would disappear when using anonymous code blocks.
(yeah, I need to escape the @{code referencing} stuff… :\)
Source Code
This README contains NailIt's entire source code! However for convenience and bootstrapping, this repo also provides the sources generated off this README. It also serves as a practical explanation on what literate programs NailIt can process.
To make the compileable source code from this README, do:
nimble run -- tangle README.md .
To generate a literate program as HTML from this README, do:
nimble run -- weave README.md index.html
(The nimble run -- command is used here to make it more straight-forward, but you can instead build and just use ./nailit directly)
Entry point
The entry point to the program is about what you'd expect: Parse command line arguments, do stuff accordingly. The really nice docopt library is used to transform the command line help string into actual arguments the program can parse. The commands, at least, stay in-sync and self-documenting.
let args = """
@{command line arguments}
""".docopt(
version = "NailIt 0.2.0"
)
let blocks =
open($args["<source.md>"]).getBlocks()
if args["weave"].to_bool():
@{call weave command}
if args["tangle"].to_bool():
@{call tangle command}
if args["blocks"].to_bool():
@{call blocks command}
Blocks
Blocks are just text with attributes that make it either "part of the explanation" or "part of the code". Prose blocks are straight-forward, containing only content. Code blocks however, have additional metadata.
type
BlockType = enum
Prose
Code
Block = object
content: string
case kind: BlockType
of Code:
name: string
language: string
else:
discard
Parsing blocks from the document
Basically, parsing is done on a line-by-line basis. This function takes in a file input and spits out the list of blocks resulting from that file.
While parsing, the program looks for these specific patterns:
- codeBlockPtn scans for the start and end of code block definitions, in the 4 forms described earlier in this document.
- codeBlockRefPtn scans for code block references, like @{named block}
- codeBlockRefSpacesPtn is like codeBlockRefPtn, except it grabs whatever leading spaces are in it as well.
const
codeBlockPtn = re2"^```$|^```(\w+)$|^```(\w+)\s+(.+)$|^```\s+(.+)$"
codeBlockRefPtn = re2"(@\{(.+)\})"
codeBlockRefSpacesPtn = re2"(?m)^(\s*?)@\{(.+?)\}"
The two types of blocks in the markdown document live separately and cannot be nested, i.e. no code blocks in prose blocks and vice versa, no code blocks within code blocks, etc. On every line, when one of the code block patterns are found, a switch that asks "is the current block a code block?", is toggled.
The nature of this loop means that if a code block begins the document, it will come after an empty prose block. Not that it matters, anyway. Since the code block to be added is not actually inserted until it hits an ending ```, setting metadata for that code block is deferred.
The regex library I'm using expresses empty matches by having its begin index greater than the end index, but I wanna be lazy, so here's a helper function.
proc isEmptyMatch(s: Slice[int]): bool {.inline.} =
return (s.a > s.b)
Groups 2 (inside a named code block with language) and 3 (inside a named plain text code block) contain the name of the new block, so I'll check for both.
nextNameBuffer = (
if not (m.group(2).isEmptyMatch()): line[m.group(2)].strip()
elif not (m.group(3).isEmptyMatch()): line[m.group(3)].strip()
else: ""
)
As are the language identifier in groups 0 (inside an anonymous code block) and 1 (inside a named code block). Note here that group 0 really means the first group, and not "the entire match" as Python would have it.
nextLangBuffer = (
if not (m.group(0).isEmptyMatch()): line[m.group(0)]
elif not (m.group(1).isEmptyMatch()): line[m.group(1)]
else: ""
)
This helper function exists to handle things like spaces before and after the content, as well as potentially other issues should they come in the future.
proc addBlock(
blocks: var seq[Block],
parseAs: BlockType,
contentBuf: string,
nameBuf: string = "",
langBuf: string = ""
): void =
case parseAs
of Prose:
blocks.add Block(
kind: Prose,
content: contentBuf
)
of Code:
blocks.add Block(
kind: Code,
name: nameBuf,
content: (
@{trim spaces on either end of the content}
),
language: langBuf
)
Cleaning up block content
Here's where I trim the spaces. The final line is what will ultimately be the value for content. I do like how Nim lets me do this kinda thing.
var contentStripped = contentBuf
if contentStripped.len == 1:
contentStripped = ""
else:
if contentStripped[0] == '\n':
contentStripped = contentStripped[1 ..^ 1]
if contentStripped[^1] == '\n':
contentStripped = contentStripped[0 ..^ 2]
contentStripped
Weave
The weave command compiles an HTML page from a literate program.
First, the blocks are transformed into an HTML string of the entire contents using the weave function. Then, it is inserted into an HTML template using intoHtmlTemplate. This template is set via the option --template—although optional, as the command has a "default" template that it uses. If an output file (2nd argument) is not provided, the output will simply be in stdout.
let weaved = blocks.weave().intoHtmlTemplate(
inputTemplate = (
if args["--template"].kind == vkNone:
""
else:
open($args["--template"]).readAll()
),
title = $args["<source.md>"], # TODO
)
if args["<out.html>"].kind == vkNone:
echo weaved
else:
open($args["<out.html>"], fmWrite).write(weaved)
quit(0)
The weave function
Here's the function that turns the list of blocks processed earlier into an HTML string.
Counting code block references
A code block is usually referenced by other code blocks, so for every named code block I need to track how many times they're referenced or invoked in other code blocks. Just in case I need to show it.
for txblock in blocks:
case txblock.kind
of Code:
if not reflist.hasKey txblock.name:
reflist[txblock.name] = initCountTable[string](0)
of Prose:
discard
For each block I then add 1 to the reference count of each other code block referenced within this code block. Here I can also do some checking, warning you that you might have referenced a block that doesn't even exist at all.
for txblock in blocks:
case txblock.kind
of Code:
for m in txblock.content.findAll(codeBlockRefPtn):
let keyName = txblock.content[m.captures[1]]
# skip empty names
if keyName.len < 1: continue
if reflist.hasKey keyName:
reflist[keyName].inc txblock.name
else:
stderr.writeLine "WARNING: key " & keyName & " not found!"
of Prose:
discard
Generating the prose block HTML
Converting prose blocks to HTML is trivial: just use the rstToHtml function on the entire input and append it to the HTML. Although there is a bit of a quirk when the contents are not preceded with a blank line: the first paragraph will be text whereas the others would be surrounded in <p>. This can add pain to layout and styling, and so I've put a .. raw:: html hack to force the first paragraph to be surrounded in <p>.
let toParaHack = ".. raw:: html\n\n" & txblock.content
generatedHtml &=
toParaHack.rstToHtml(
{
roSupportMarkdown, roPreferMarkdown, roSandboxDisabled,
roSupportRawDirective,
},
modeStyleInsensitive.newStringTable(),
)
Generating the code block HTML
On the other hand, converting code blocks aren't so trivial. At minimum the code block needs to have escapes in order for them not to be interpreted as HTML code when I don't want it, which can lead to incorrect code displays. Then there's also the extra metadata that needs to be laid out so as to easily identify and navigate between them.
First I escape the common HTML characters, and then turn all code block references into links.
txblock.content
.replace("&", "&")
.replace("<", "<")
.replace(">", ">")
.replace(codeBlockRefPtn, nameAsLink)
The link-replacement is done by this helper function:
Every one of these links needs to refer to valid HTML identifiers, which, to make it consistent, I'll have to make a helper function to convert from the code block's name to a weird HTML identifier.
proc normalize(s: string): string =
return s
.replace("_","")
.replace(" ","")
.tolowerascii()
After I've converted the main content into something presentable, I can wrap it in an HTML container, having a title tab if the code block has a name, but a plain pre otherwise. In them I also add language information via a class, so that external tools or JavaScript would know what to do with them.
"<div class=\"code-block" & (
if txblock.language.strip() == "": ""
else: " language-" & txblock.language
) & "\" id=\"" & normName & "\">" &
"<header class=\"block-title\">" &
"<a href=\"#" & normName & "\">" & txblock.name & "</a>" &
"</header>" &
"<pre><code class=\"cb-content\">" &
escapedCode &
"</code></pre>"
"<div class=\"code-block" & (
if txblock.language.strip() == "": ""
else: " language-" & txblock.language
) & "\">" &
"<pre><code class=\"cb-content\">" &
escapedCode &
"</code></pre>"
The backlinks list take advantage of the whole block reference-counting thing from earlier. It can help navigate back and forth between sections of code, answering the question of "Hmm, where is this used?"
generatedHtml &= "<footer class=\"used-by\">Used by "
for i in reflist[txblock.name].keys:
let normI = i.normalize()
generatedHtml &=
"<a href=\"#" & normI & "\">" & i &
# " × " & $(reflist[txblock.name][i]) &
"</a> "
generatedHtml &= "</footer>"
Preparing the HTML output
What I have so far is the raw HTML of every block, now I just have to wrap it into a useable HTML document. And for this I'll want a template approach. The template must have both <!-- TITLE --> and <!-- BODY --> for it to be useable. If a template is not provided, it will just fall back onto a minimal, default one.
proc intoHtmlTemplate(weaved: string, inputTemplate: string = "", title: string = ""): string =
const defaultTemp = staticRead("default.html")
let temp = (
if inputTemplate.strip() == "": defaultTemp
else: inputTemplate
)
# <!-- TITLE --> is replaced with the source file name.
# <!-- BODY --> is replaced with the body of the document.
# The spellings need to exact.
return temp.replace("<!-- TITLE -->", title).replace("<!-- BODY -->", weaved)
This is the default HTML template, you can find it in the source under src/default.html. For styling, it assumes a css/screen.css and css/print.css to be available from the point of view of the rendered HTML file.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title><!-- TITLE --></title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="css/screen.css" media="screen,projection,tv">
<link rel="stylesheet" href="css/print.css" media="print">
</head>
<body>
<!-- BODY -->
</body>
</html>
Tangle
Meanwhile, tangle here exports files from the literate program to make source code that can be compiled.
blocks.tangle(($args["<destdir/>"]))
quit(0)
The tangle function
This tangle function needs to do two things:
- Replace code block references with the actual code blocks.
- Save code blocks to files when it's warranted to do so.
Replacing code block references
First I'll want to go through every code block in document order and populate the code block mappings with the contents of their respective code blocks verbatim. There's no "append" feature, but there is a "replace" "feature" (no special syntax required), which will warn you when you're replacing a block.
for txblock in blocks:
case txblock.kind
of Code:
if txblock.name.len < 1: continue
if codeBlkMap.hasKey txblock.name:
stderr.writeLine "WARNING: replacing code block " & txblock.name
codeBlkMap[txblock.name] = txblock.content
of Prose:
discard
Then I'll go through the code block mappings again to replace the references with the actual content. Er, uh… this should probably be done recursively, but for small code stuff I think it works alright for now.
for codeBlk in codeBlkMap.mvalues: # :(
for _ in 0 .. codeBlk.findAll(codeBlockRefSpacesPtn).len: # :(
codeBlk = codeBlk.replace(codeBlockRefSpacesPtn, replaceReferencesWithContent)
The references are replaced in such a way that it retains the leading spaces used for the reference in every line of the replacement. For example, if a reference @{something} starts with 4 spaces, the entire thing to replace it will start every line with an additional 4 spaces. I think this can help in whitespace-sensitive languages by ensuring you don't accidentally change the indentation inside of a loop or something.
proc replaceReferencesWithContent(m: RegexMatch2, s: string): string =
let keyName = s[m.group(1)]
if codeBlkMap.hasKey keyName:
# indent each line with the same amount of spaces as
# the indentation of the references
let initialNLAndSpaces = s[m.group(0)]
if (
let initialSpaces = initialNLAndSpaces.replace("\n", "")
initialSpaces.len > 0
):
var
paddedCodeLines = initialSpaces
isInitialLine = true
for line in codeBlkMap[keyName].strip().splitLines():
if isInitialLine:
paddedCodeLines &= line & "\n"
isInitialLine = false
else:
paddedCodeLines &= initialSpaces & line & '\n'
return paddedCodeLines
return initialNLAndSpaces & codeBlkMap[keyName]
stderr.writeLine "WARNING: key " & keyName & " not found!"
return ""
Saving to files
NailIt will only save to files code blocks which start with a /. The / here means "your current working directory or your specified user directory."
for key in codeBlkMap.keys:
if key.len > 0 and key[0] == '/':
let outFileName = [dest, key[1 ..^ 1]].join($os.DirSep)
outFileName.parentDir.createDir()
outFileName.open(fmWrite).write(codeBlkMap[key])
stderr.writeLine "INFO: wrote to file " & outFileName.string
View Blocks
This blocks command is really just a debugging tool. It answers the question of "What does NailIt actually see when I give it my literate program?"
proc displayBlocks(blocks: seq[Block]) =
var num = 1
for b in blocks:
let blockTitle =
"Block " & (
case b.kind
of Prose: "P."
of Code: "C."
) & $num & (
case b.kind
of Prose: ""
of Code: " \"" & b.name & "\" (" & b.language & ")"
)
echo '-'.repeat(blockTitle.len)
echo blockTitle
echo '-'.repeat(blockTitle.len)
num += 1
echo b.content
echo '-'.repeat(blockTitle.len) & '\n'
Overall program structure
Finally, let's put this all together into the full code for the thing.
import regex
import std/[strutils, tables, strtabs, os]
import packages/docutils/[rst, rstgen]
import docopt