I have previously told of my adventures trying to enrich
Jekyll’s Markdown parsing abilities to allow for the wrapping of code samples in <figure>
element. My first attempt
built upon Rouge and Kramdown and was going well,
until I hit a roadblock I couldn’t figure out. Fortunately, a fresh eye was all it took to realise my mistakes and
finish the work — as useless as it may be now.
The situation where I left it
When Jekyll publishes a post, it calls upon Kramdown to convert the Markdown to HTML, and Kramdown in turn calls upon Rouge to highlight (i.e. replace with complex HTML) the code samples it encounters. Rouge offers different formatters to be used, depending on the kind of syntax highlighting needed.
The original formatter used by Kramdown is Rouge::Formatters::HTMLLegacy
, but actually it is more of a
facade in front of four different formatters: HTML
, HTMLInline
,
HTMLTable
and HTMLPygments
.
To wrap the Rouge-generated HTML in <figure>
elements, I had decided to write a custom formatter for Rouge. My
formatter inherited from HTML
, ignoring the other three:
Unfortunately, this formatter didn’t render the HTML code I was expecting: the <figure>
and <figcaption>
elements
were there, as was the highlighted code, but the later was not wrapped in <pre>
and <code>
elements, as it should
have.
This issue didn’t happen with the HTMLLegacy
formatter, so I took a quick look at its code:
My first mistake was to skip over the comments (rookie mistake) and focus on the first line of the initializer,
leading me to believe that, indeed, HTML
would be the formatter used in normal cases. Looking at their names,
HTMLInline
was obviously for inline code samples, HTMLTable
for the complex rendering with line numbers (as
hinted at by the conditional if opts[:line_numbers]
), while HTMLPygments
probably had something to do with a
legacy fallback for users of Pygments, the precursor to Rouge.
I then tried to add the missing elements to my custom formatter, even though I couldn’t quite understand why they were missing in the first place. In retrospect, was my second mistake — I was trying to stumble my way to a solution without taking the time to figure out the problem first.
Unsurprisingly, this didn’t work. Yes, the code was preformatted thanks to the extra HTML elements, but so were simple
code spans – and those should not be wrapped in a <pre>
element, only a <code>
one.
Faced with this problem, I made yet a third mistake: I concluded that, since the HTML
formatter was not adding the
<pre>
and <code>
elements, they were under the responsibility of the Markdown converter (i.e. Kramdown), and not
the syntax highlighter. So I went looking for their handling in Kramdown’s code, a code spelunking session that led
me nowhere; in part because Kramdown’s source was only part of the actual code involved, especially when it comes to
code blocks (Jekyll also loads up kramwdown-parser-gfm), but mostly because
there is no such code in the first place!
Solving the mystery
Lost in a dead end, I gave up and tried a different approach, with a different Markdown converter. But what had I missed back then?
Contrary to my initial, half-backed conclusion, Kramdown does rely on Rouge to wrap the syntax-highlighted code in a
<code>
and, if needed, a <pre>
elements. Outputting the options passed from Kramdown to the formatter gave me a clue:
Along the expected options — including the caption — is one named :wrap
. I remembered having seen it in the HTMLLegacy
initializer:
Could it be that this HTMLPygments
was not just a legacy formatter for obscure backward-compatiblity edge cases? I had
a look:
So there it was. In spite of its name, HTMLPygments
is the real deal. (Interestingly, this piece of code shows a
different pattern than subclassing Rouge::Formatters::HTML
, as the README suggests;
instead, HTMLPygments
is a decorator of the selected base formatter.)
Searching for a proper solution
Let’s recap. Kramdown’s converter calls up Rouge to turn a code block into a collection of specifically-crafted <span>
elements. Because the expected result can vary, Rouge offers several formatters to craft these elements, and optionnally
wrap them in containing HTML elements such as <pre>
and <code>
. However, Kramdown’s converter doesn’t really care
about chosing the right formatter; instead, it defers to a special one, HTMLLegacy
, which does the selection for it,
based on a few options, such as :wrap
.
We want to use a custom formatter, but only when expecting certain results (namely: the rendering of a code block).
Ideally, we would like to keep Kramdown’s normal behavior untouched, except for this addition of a <figure>
element when
rendering a code block. So what is Kramdown’s normal behavior?
It is hidden behind quite a bit of indirection, but basically, all options defined in Kramdown’s configuration for
Rouge are passed down to the HTMLLegacy
initializer. Furthermore, these options can be specified twice: once for the
rendering of a code block
and once for the rendering of a code span
. This is a lot of behavior to preserve.
- We could move the facade logic of
HTMLLegacy
to the converter, and have it chose the right formater (including our custom one) based on the options passed, while respecting the configuration syntax (i.e. the differents options forspan
andblock
). - We could copy-paste this facade logic from
HTMLLegacy
to our custom formatter. That would leave it behind shouldHTMLLegacy
evolve in a future Rouge upgrade, but this eventuality seems unlikely. - We could re-open or extend
HTMLLegacy
so that an extra decorator was added to the formatter used when a caption is present (or, alternatively, every time a block is renderer).
The last option would be the least intrusive, and also the most acrobatic, since it would involve monkey-patching Rouge. It could look like this:
I admit, I like this approach — but this is mostly my ego speaking. I don’t get to use Module#prepend
and anynomous module
that often, and monkey-patching is a bit exhilarating. Plus, it is indeed the least intrusive approach – it leaves the
inner workings of Rouge as they are, and the custom Kramdown syntax highlighter required is mostly a carbon copy of the
original (including the use of HTMLLegacy
). However, monkey-patching is always risky, and more importantly, it
doesn’t fix the underlying issue: HTMLLegacy
, as its name implies, is a legacy formatter, introduced for
backward-compatibility with Rouge 1.x. It would be better if Kramdown wasn’t using it in the first place.
(Note that Jekyll, for its highlight
Liquid tag, does the right thing and instantiates the right formatter directly,
instead of relying on this transitional prop.)
The subtleties of software design
Instead, let’s consider the other two options. The first one makes the Markdown converter responsible for adding the
<pre>
and <code>
tags, while the second keeps this responsibility at the syntax highlighter level. As it happens,
the Markdown specification is quite explicit as to how code blocks should be converted:
Rather than forming normal paragraphs, the lines of a code block are interpreted literally. Markdown wraps a code block in both <pre> and
tags.
So, relying on the syntax highlighter do the wrapping seems like a mistake in the first place. Put differently, when
converting a Markdown code block to HTML, the code should always end up wrapped in a <pre>
and <code>
elements,
even if there is no code highlighting being done.
In fact, this is exactly was Kramdown does when there is no highlighting:
If the code has been highlighted, it is wrapped in a <div>
; if not, it is wrapped in the mandatory <pre>
and <code>
elements.
I can only speculate as to why Kramdown behaves so — my guess is that Rouge initially took upon itself to do the wrapping
in <pre>
and <code>
elements, and Kramdown then had to take this over-zealous behaviour into account, and stay like
this even after Rouge fixed its rendering, probably because other systems now depend on it.
In any case, we could either use a custom converter for Kramdown (one that would not rely on Rouge for the wrapping),
or change the way its Converter::HMTL
converter works. Both options seem daunting.
Kramdown is very modular and configurable, but has no mechanism to allow the swapping of converters – Kramdown relies
on metaprogramming to require the relevant converter based on the name of the method called for the conversion, so that
#to_html
instantiates a Converter::Html
converter, and so on. To use a different HTML converter, we would have to
either pretend that it converts to a different format (and somehome have Jekyll call #to_custom_html
instead…) or
hijack Kramdown’s converter-instantiating logic. Both options are way more intrusive than monkey-patching Rouge’s
HTMLLegacy
formatter.
The intricacy of open source
But if relying on the syntax highlight to add the <pre>
and <code>
elements is a mistake in the first place, why not
contribute to Kramdown and submit a fix? In short: because I’m not too fond of Kramdown as a project.
I love contributing to open source – in fact, I consider that is it a privilege to be able to do so, and a duty to actually contribute if you can. However, I also consider that any contribution, even the smallest, is a form of commitment to the project.
Open source maintainers deserve respect; they (usually) welcome contributions, but in my opinion, the least one can do when contributing is to have regard for the the maintainers’ leadership, opinions, choices, and the overall direction they want to give their project. In other words: when contributing to Rome, do as the Roman senators do.
I may be overly cautious, but I’m not too fond of opening a PR without being confident that it would be useful to the project, and not only to me, and that it would be in line with whatever the project maintainers have in mind. In other words, projects have a vibe, and I want to be in sync with it.
This probably sounds like a lot of overthinking, or possibly an excuse not to contribute, but it’s not. It’s basically a complicated way to say that I don’t want to contribute to projects whose philosophy or leadership I don’t feel good about, and that is exactly the case here.
I’ve complained about the complexity of Kramdown’s code base (and yes, I know how easy it is to criticise), but in itself this would not be enough to keep me from opening a small PR. However, to get a feel of the project, I took a look at the other PRs and the conversations around them, and didn’t really like what I saw. No major red flag, just a tone not to my liking.
And so, since neither the technical nor human aspects of this project vibe with me, I’d rather not get involved. It’s as simple as that.
Done beats perfect
I enjoy pursuing the best solution to a given problem – within reason. From my perspective – and I may well be wrong! –
the best solution would be to move the responsibility of wrapping code blocks in <pre>
and <code>
elements
from the syntax highligher (Rouge) to the converter (Kramdown), and while we’re at it to also make the converter
be responsible for adding the <figure>
elements around the converted code block. However, this would require working
on Kramdown, which is something I don’t want to do.
And so, the second-best approach is the one I’ll go with – keep the wrapping of the highlighted code in <figure>
,
<pre>
and <code>
elements under the responsibility of Rouge, implemented through a small monkey-patch. It may not
be ideal or perfect, but it will work, for a reasonable cost.