Update: 2014.04.08. Updated to match stable release of MarkdownTools 1.0. Install commands changed and mdmerge help output updated.

Markdown is, in certain circles, the way to create text content for electronic publication. It works very well if you are targeting a blog post, which makes sense because that is what it was created for back in 2004 by John Gruber.¹ Indeed, this very blog post was written in Markdown². But it can also be used for longer form works.

some background

The name Markdown is a play on the term markup as used in HyperText Markup Language (HTML), or more impressively in Extensible Markup Language (XML), or more ominously in Standard Generalized Markup Language (SGML). Those markup standards are complex, verbose, fiddly, and decorate the text with all manner of markup tags, elements, attributes, processing instructions, and so on. Markdown, in constrast, reads very nearly like just the straight text. It does contain some formatting widgets, which honesly must be called markup, but it uses the very minimal amount necessary to achieve its formatting goals. HTML, XML, and SGML are optimized to be read by machines; Markdown is optimized to be read by humans.

Markdown achieves its goal of readability and simplicity in large part by constraining the domain of what it can produce. If you are trying to produce a text document that contains headings, paragraphs, bolded or italicized text, bulleted lists, numbered lists, with some embedded hyperlinks, and perhaps some quoted text or source code examples, then Markdown will work very well. If you need to do a lot more than that – particularly if you need to generate HTML with particular characteristics like semantic element tags (e.g., article, section, cite) or identify parts of the content with particular CSS classes – then Markdown won’t be sufficient³.

There are extensions to the Markdown syntax that fill in some gaps. For example, one mistake that Gruber made in his specification was that text lines that end with a space will force a line break there. I think he was trying to figure out how to format text where line breaks are important (such as poetry); but having an invisible space at the end of a line be meaningful to the formating tool breaks the rule about Markdown being human readable. GitHub Flavored Markdown (GFM) instead says that end of line spaces are not meaningful but line breaks are always meaningful. If you want a long paragraph to be formatted with line wrapping, then you need to make it one long line with no hard line feeds in it. If you want a line break in your poem, just hit the return key at the end of each line.

The most powerful extension, in my opinion, is MultiMarkdown, written by Fletcher Penney. This adds support for a number of things missing from the classic Markdown specification. It supports definition lists, tables, footnotes, math formulas, and permits the insertion of <head> content in HTML output (so you can include links to CSS and JavaScript files). Pretty much everything you need to write an academic paper using Markdown syntax.

There are other extensions and syntax variants out there. At the moment there is no agreed upon minimal syntax and no agreement on the syntax for extended features. For example, GFM breaks lines at line feed characters and MultiMarkdown joins adjacent text lines (just like the original Markdown). MultiMarkdown has support for definition lists; GFM does not. There is still a lot of innovation going on in the Markdown community, so perhaps the time is not yet here for standardization; but I think the time is near.

If you are not familiar with Markdown, then it would be worth your while to spend just a few minutes, certainly no more than an hour, reading about Markdown and MultiMarkdown, and experimenting with it a bit. Here are some places to start:

classic (John Gruber’s Markdown syntax)
MultiMarkdown (Fletcher Penny’s syntax, MultiMarkdown version 4)
GFM (GitHub Flavored Markdown)

All other things being equal, I recommend using MultiMarkdown from the command line, and just being familiar with the differences between it and other popular variants.

Multi-file Documents

As I said in the opening paragraph, Markdown text excels when the goal is to produce a single blog post or even a short website article or small essay. But for any larger work, using a single file for the text input becomes unwieldy very quickly. Scrolling up and down a long text file can be frustrating and is certainly inefficient. Producing something like a book or multi-section academic paper with a single text file would be a nightmare.

People have handled this by simply concatentating files together before feeding the combined text into a markdown processor to produce the HTML (or PDF, or whatever).

At some point Fletcher Penney wrote a short Perl script called mmd_merge.pl that takes an plain text index file containing a list the Markdown files and concatenates them; the twist is that if the filename line is indented by 4 spaces then the heading level in the Markdown is incremented by 1, and if indented by 8 spaces, then headings are incremented by 2, and so on – so the file list can be an outline of the document structure. For quite a while, this index approach was the best way we had for managing a multi-file Markdown document.

Taking another approach, a company called LeanPub introduced syntax for an embedded include statement within Markdown text. They had one kind of statement for include a file as normal Markdown, and another kind of statement for including a file as “code”, as unformatted plain text. And, importantly, the includes could nest; a file included normally (not as code) could itself contain include other files and those files could include others and so on. Leanpub also supports the index file mechanism (they call it a “book file”, and the file is named book.txt); it is similar to mmd_merge but without the indentation structure.

Very recently MultiMarkdown introduced its own style of embedded include statements, called file transclusions. I think the use of “transclusion” is probably a nod to Ted Nelson who coined the term back in 1982, as well as the terms hypertext and hypermedia, and who thought a lot about structuring information and interactions with references and dereferences (a.k.a. hyperlinks), long before the first web server was created.

Markdown viewers

Although creating content with Markdown is much easier than doing it directly in HTML⁴ (or, for the very unlucky, in XML or SGML), there is still the issue that you don’t really see what you are writing in published form. It is not WYSIWYG. And although Markdown does look awfully similar to plain text, it is really a markup language in that it contains formatting indicators embedded with the content. And all that means that you have to run the Markdown text through a processor to produce HTML and then view the HTML in a browser to see if what you thought you wrote is what the readers will actually see.

For a long time I had a setup where I had three windows open: a text editor in which I made changes to the Markdown text, a terminal window in which I periodically ran a simple script to process the Markdown text and produce an HTML document, and a browser window displaying that HTML document. Periodically I’d save the Markdown, run the script, and refresh the browser window to see what the final document looked like. Sometimes the formatting would be not what I expected and I’d go back and fiddle a bit with the Markdown and then repeat the save, run, refresh sequence until the document in the browser window looked correct.

Last year, 2013, that changed. I’m fortunate to do a lot of my writing on a Mac (on OS X). There is an application available for the Mac called Marked that eliminates most of those steps. It displays the formatted output in a window and also watches the Markdown file. If the file changes (i.e., if I save the Markdown file), then Marked reprocesses the file and updates the formatted document. It even scrolls to the first changed line. And it has some nice CSS stylesheets appropriate for Markdown-produced content, or you can supply your own CSS.

Marked is available in Mac App Store. But there is a new version out called Marked 2; it is not available in Mac App Store, but instead is sold directly by the developer, Brett Terpstra⁵, for all of $12⁶. This version supports index files (like Leanpub’s book.txt) and include files (using Leanpub syntax and similar⁷). That is huge, because I can now work with multi-file documents and see the complete document, formatted, in a viewer window. And Marked identifies which source document contributed to a particular bit of formatted text (use shift-I for that info).

Introducing mdmerge: A Multi-Document CLI tool

The Markdown environment and tool chain has matured to the point where longer form documents can be produced using Markdown as the content source.

Well, that’s not quite true.

What has been missing is a way of taking Markdown with index files and/or embedded include statements and processing it in a command script such as would be used in most automated publishing workflows or automated software build procedures. So, although I can view a combined document in Marked 2, I can’t generate the equivalent HTML from the command line (within, for example, the script I use to generate the HTML for this blog post).

I looked for a comand line interface (CLI) tool that would take the same set of include file syntax that Marked 2 supports but found none. So I wrote one.

The tool is called mdmerge and is available to any system running Python 2.6 or later, or Python 3.3 or later⁸. If you have a Python 3 environment, use pip install MarkdownTools to install mdmerge; if you have a Python 2 environment then use pip install MarkdownTool2 to install mdmerge. The source code can be found on GitHub at: https://github.com/JeNeSuisPasDave/MarkdownTools.

mdmerge supports all the include statement syntax variants that Marked 2 supports and it also supports MultiMarkdown file transclusions. It supports both mmd_merge style index files and LeanPub style index/book files. Follow these links for details on the syntax of each form:

Here is the output from the command line help:

usage: mdmerge [-h] [--version]
               [--export-target {html,latex,lyx,opml,rtf,odf}]
               [--ignore-transclusions] [--just-raw] [--leanpub] [--book]
               [-o OUTFILE]
               [inFile [inFile ...]]

Concatenate and include multiple markdownfiles into a single file

positional arguments:
  inFile                One or more files to merge, or just '-' for STDIN. If
                        multiple files are provide then they will be treated
                        as if they were listed in an index file.

optional arguments:
  -h, --help            show this help message and exit
  --version             show the software version
  --export-target {html,latex,lyx,opml,rtf,odf}
                        Guide include file wildcard substitution
  --ignore-transclusions
                        MultiMarkdown transclusion specifications are
                        untouched
  --just-raw            Process only raw include specifications
  --leanpub             Any file called 'book.txt' will be treated as an index
                        file
  --book                Treat STDIN as an index file
  -o OUTFILE, --outfile OUTFILE
                        Specify the path to the output file

Given mdmerge, Marked 2, and a recent OS X system, there now is a complete Markdown toolchain for publication of long form documents from muliple Markdown source files.

I assert that Markdown was heavily influenced by the syntax used to for wiki page source. The wiki was invented in 1994-5 by Ward Cunningham and it included very simple markup much like Markdown. I’ve been running wikis at my workplace since the late 90s. Though I’ve never seen or heard Gruber acknowledge it, the influence of wiki page syntax and Markdown syntax is clear. The novelty is that Markdown is simpler than most wiki’s syntax, and Markdown was designed to be used in an offline publication toolchain, rather than in an interactive wiki page editor. ↩
Strictly speaking, the post was written in MultiMarkdown syntax, an extension to the original Markdown syntax. ↩
Using Markdown along with some special processing applications might achieve that goal without abandoning the simplicity of creating content with Markdown text. ↩
As a counter-example, John Siracusa famously writes directly in HTML and creates very long form content doing so. http://5by5.tv/hypercritical/33 ↩
Author of many useful utilities for writers and developers working on an OS X or iOS system. ↩
Pricing at the time of this writing, 16 March 2014. ↩
At the time of this writing the MultiMarkdown file transclusion syntax is not supported in Marked 2. I expect that it will be added, but do not know when. ↩
Python 3 is not fully backward compatible with Python 2, so we are stuck with saying awkward things like that until such time as the transition to Python 3 (away from Python 2) is complete. Which may be another 10 years in coming; the latest version of OS X comes with Python 2.7 installed. ↩