Workflow: Markdown to WordPress

My inaugural post explained how I generated WordPress-friendly HTML from MultiMarkdown input. I’ve since improved the workflow.

I found some additional formatting issues with the generated HTML. One was that any code block in the generated output included a new-line between the last line of code and the terminating </code> tag. Another was an extra new line or two at the end of the file.

I also found that the footnote references were a problem when multiple posts were displayed on a single page (like on the home page of this blog) because there would be multiple #fn:1 references on a single web page.

And, finally, I got tired of opening the editor, selecting lines and copying them into the clipboard.

So I’ve now created a couple scripts that automate everything from taking the Markdown (or MultiMarkdown) input to putting the WordPress friendly HTML fragment¹ into the clipboard². You can find the scripts on GitHub in my collection of useful Bash scripts.

Interesting bits

Most of the Bash and Perl script code is mundane. There are sufficient comments to explain the intent of each bit of code. However, there are a few parts of the Bash script that might be of interest.

One is the way the actual directory of the shell script is determined. That is needed so that the actual directory of the Perl script can determined. (The Perl script is expected to be in the same directory as the shell script, but the shell script could be invoked from any working directory – e.g., ../../produce-wp.sh -pb mypost.)

Here is the code to determine the path of the shell script from within the shell script:

# Get the actual directory of the produce script so that we
# can find the munge-wp.pl script
#
DIR_=$( cd "$( dirname "$0" )" && pwd )

I found that bit of trickery in a stackoverflow question. The construct $(...) will create a new shell, execute the commands inside the parens, and substitute the output of the command for the whole $(...) command. The $0 should be the name of the script which might include prefixed relative or absolute directory tokens. dirname removes the scriptname from that path. cd changes the working directory to that directory. And pwd gets the full path of that directory. Because the cd happens in a child shell, which exits when the scope of the $(...) terminates, the cd doesn’t affect the parent script.

A less interesting part is this code:

# Determine the footnote distinguishing hex byte
#
FNHEX_=`echo -n $FN_ | md5 | sed  's/^..*\(..\)$/\1/'`

which is used to get a couple hex characters from the input filename. Those characters are then used (by the Perl script) to help try to make the footnote references unique.³ So a footnote reference id of "fn:1" might get converted to "fn:f7:1" for one file and "fn:a4:1" in another file.

The mechanism for ensuring unique footnote references is reasonably good at doing so, but is not guaranteed to do so 100% of the time. Other than that, the output completely resolves all the formatting issues I’ve uncovered to date. ↩
On OS X the clipboard is called the pasteboard … at least in the Terminal environment. ↩
Because the filename for each post is expected to be unqiue, and because md5 introduces a reasonable amount of entropy in the hex string it generates. If I find collisions, then I could change this to get the last four characters, instead of just two, and significantly reduce the chances of footnote reference collisions. ↩