I was fascinated in reading about the concept of 'Literate programming', or the concept of writing code so very well commented that it stands for it's own manual. Since I've been doing a lot of blogging about bits and pieces of Clojure code these last few days, I decided to write a utility that can convert a well-documented Clojure file into it's own blog entry.
This is the result of running that utility on itself. Call it self-blogging code.
Since I'm blogging in HTML, the code will receive an input clojure file and transform it into an output html file with syntax-coloring of the code bits. I can then use the output file to retrieve the html code for entering into my blog.
Step 1. I want to be able to use this utility from the command-line (btw, check out JLine for adding bash-like command-line utility on unixy systems). So in true command-line fashion, I define a method for printing out usage about the utility:
(defn usage[]
(println "usage: html-transform <input clojure file> <output-html-file>"))
Step 2. I check at runtime that I get both input parameters (input Clojure file and ouput HTML file, otherwise print out usage and quit the utility:
(if (< (count *command-line-args*) 1)
(do
(usage)
(System/exit -1)))
Step 3. I define input and output file according to the command-line arguments:
(def input-file (nth *command-line-args* 0))
(def output-file (nth *command-line-args* 1))
(def input (java.io.InputStreamReader. (java.io.FileInputStream. input-file)))
Step 4. Now, I'm going to define certain 'special' words, that will be rendered differently than other text. 'Reserved' words are Clojure forms that are common enough as to warrant their own special color. 'Definition' words are the words following which are user-defined values, functions and macro names, all of which will be rendered in bold letters to highlight them. Finally, 'punctuation' words are opening and closing parentheses, brackets, and curly-braces.
(def reserved #{"def" "defn" "defmacro" "let" "letrec" "if" "cond" "when" "do" "recur" "loop"})
(def definitions #{"def" "defn" "defmacro"})
(def punc #{"(" ")" "[" "]" "`" "'" "&" "{" "}"})
Incidentally, a string-transformation function to transform special html characters found in the code so as to render them html-ready:
(defn htmlize[st]
"replace certain characters in a string with their html equivalents to render it html-ready"
(.replaceAll (.replaceAll (.replaceAll st "&" "&") ">" ">") "<" "<"))
Another string-transformation function to extract text from user comments and render them as html paragraphs:
(defn text[st]
"convert a string into another string representing a list of html paragraphs"
(let [paraphed
(reverse
(loop [lines (.split st "n") buffer "" result nil]
(let [line (first lines)]
(if line
(let [line (.trim line)]
(if (= line "")
(if (= buffer "")
(recur (rest lines) "" result)
(recur (rest lines) "" (cons buffer result)))
(recur (rest lines) (.trim (str buffer " " line)) result)))
(if (= buffer "")
result
(cons buffer result))))))]
(reduce (fn[a b](str a b))
(map (fn[a](str "<p>" a "</p>")) paraphed))))
Step 5. Define a few nifty-looking colors for rendering output Clojure code:
(def string-color "#889")
(def keyword-color "#458")
(def splice-color "#485")
(def reserved-word-color "#c50")
(def punctuation-color "#666")
(def definition-color "#d80")
(def symbol-color "#059")
Step 6. A function to color strings, keywords, symbols, punctuation, splices, definitions. This function also takes the previous colored word so as to render definition names in bold (need to detect that the previously colored word was 'def', 'defn' or 'defmacro'):
(defn color[st previous]
(cond
(= "" (.trim st))
""
(.startsWith st """)
(str "<font style="color: " string-color "">" (htmlize st) "</font>")
(.startsWith st ":")
(str "<font style="color: " keyword-color "">" (htmlize st) "</font>")
(.startsWith st "~")
(str "<font style="color: " splice-color "">" (htmlize st) "</font>")
(contains? reserved (. st trim))
(str "<font style="color: " reserved-word-color "">" (htmlize st) "</font>")
(contains? punc (. st trim))
(str "<font style="color: " punctuation-color "">" (htmlize st) "</font>")
:default
(if
(contains? definitions previous)
(str "<font style="font-weight: bold;color: " definition-color "">" (htmlize st) "</font>")
(str "<font style="color: " symbol-color "">" (htmlize st) "</font>"))))
Finally, I define input and output streams and parse the input, transform to HTML and send it to the output.This piece of code is a state-aware recursive loop into the Clojure structure. I define three states: symbol, string and escape. While I'm in :symbol mode, I use the (,),[,],{ and } as word separators. As soon as a double-quote " is encountered, I switch to string mode and collect all characters into a string. If while in string mode I encounter a backslash \, I switch to escape mode for a single following character that will be added to the string wether it's a double quote or not. From escape mode I can only go to string mode, which encountering another double-quote " will switch back down to symbol mode and send the string to the rendering engine.
The loop keeps an account of opening and closing punctuation so as to maintain information of the nesting level of any word: this is to process zero-level strings differently: instead of rendering them surrounded by double-quotes I will render them as HTML paragraphs. This is the literate programming that I was mentionning earlier: the code comments will become the text of the blog entry.
(let [out (java.io.PrintStream. (java.io.FileOutputStream. output-file))]
(. out println (str
"<html> <head> <style> body { font-family: sans-serif; background-color: #fff; font-size: 10pt; text-align: center; } #main { width: 400pt; margin-left: auto; margin-right: auto; text-align: left; padding: 10pt; } </style> </head> <body> <div id="main">"
(loop [buffer "" state :symbol word "" previous-word "" level 0]
(let [next-char-read (char (.read input)) next-char (str "" next-char-read)]
(if (= next-char-read (char -1))
buffer
(cond (= :escaping state)
(recur buffer :string (str word next-char) "" level)
(= :string state)
(cond
(= next-char "\")
(recur buffer :escaping word "" level)
(= next-char """)
(if (= 0 level)
(recur (text (str buffer word)) :symbol "" "" level)
(recur (str buffer (color (str """ word """) "")) :symbol "" "" level))
:default
(recur buffer state (str word next-char) "" level))
(= :symbol state)
(cond
(= next-char """)
(recur (str buffer (color word previous-word)) :string "" word level)
(= next-char "n")
(recur (str buffer (color word previous-word) "<br/>") state "" word level)
(= next-char " ")
(recur (str buffer (color word previous-word) " ") state "" word level)
(or (= next-char "[") (= next-char "{") (= next-char "("))
(recur (str buffer (color word previous-word)(color next-char "")) state "" word (+ level 1))
(or (= next-char "]") (= next-char ")") (= next-char "}") )
(recur (str buffer (color word previous-word)(color next-char "")) state "" word (- level 1))
(= next-char "t")
(recur (str buffer " " (color word previous-word)) state "" word level)
:default
(recur buffer state (str word next-char) previous-word level))))))
" </div> </body> </html>")))
And that's it. using this utility can accelerate blogging with code, and will enable automatic syntax-coloring of the code. The utility is hosted Here. Feel free to use it or abuse it as fits your own blogging or no blogging, needs. Cheerio!
Thursday, February 26, 2009
Literate programming in Clojure
Subscribe to:
Post Comments (Atom)

4 comments:
I'm sure you've already been made aware, but your title is misspelled. One too many t's in 'literate'.
oops - lemme correct that...
This is nice, but it isn't really "literate programming". Literate programming works the opposite direction. You have tools which take a document describing a program in narrative fashion (often with bits of program code presented in the order that makes sense to explain it, rather than the order in which the compiler needs to see it), and the tools extract the code and reassemble the bits in an order that will compile.
I like it. And having your code make code bloggable itself being bloggable--nice, very meta.
Post a Comment