[ad_1]
Over on my Function Flags guide web page, I am the use of my guide’s Markdown content material to generate the HTML for the web page. I then use jSoup to inject a desk of contents (TOC); which calls for that I insert an identifier into each and every header component. And, now that I am making an attempt to make use of Pandoc to generate an EPUB (virtual guide) model, I wish to ensure that my ColdFusion-based header identifiers fit those that Pandoc will generate within the ultimate EPUB.
The Pandoc documentation on “Headings and Sections” describes the set of rules that it makes use of to generate the heading identifiers:
- Take away all formatting, hyperlinks, and so forth.
- Take away all footnotes.
- Take away all non-alphanumeric characters, except for underscores, hyphens, and classes.
- Substitute all areas and newlines with hyphens.
- Convert all alphabetic characters to lowercase.
- Take away the whole thing as much as the primary letter (identifiers won’t start with a host or punctuation mark).
- If not anything is left after this, use the identifier “segment”.
The Pandoc documentation additionally supplies a collection of pattern headings and the identifiers that it’ll generate. We will use those samples to take a look at our ColdFusion set of rules. And, after all, we will make abundant use of Common Expressions to unravel this drawback.
Within the following ColdFusion code, we are looping over the samples supplied by means of Pandoc and saying that our ColdFusion-generated identifier fits the predicted identifier:
<cfscript>
// Those values are supplied within the Pandoc documentation on Headings and Sections.
assertions = [
{
heading: "Heading identifiers in HTML",
identifier: "heading-identifiers-in-html"
},
{
heading: "Maître d'hôtel",
identifier: "maître-dhôtel"
},
{
heading: "*Dogs*?--in *my* house?",
identifier: "dogs--in-my-house"
},
{
heading: "[HTML], [S5], or [RTF]?",
identifier: "html-s5-or-rtf"
},
{
heading: "3. Programs",
identifier: "packages"
},
{
heading: "33",
identifier: "segment"
}
];
// Let's take a look at the Pandoc header assertions in opposition to our ColdFusion set of rules, yay!
for ( statement in assertions ) {
identifier = generateIdentifier( statement.heading );
writeOutput("
<p>
Heading: #encodeForHtml( statement.heading )# <br />
Anticipated: #encodeForHtml( statement.identifier )# <br />
Gained: #encodeForHtml( identifier )# <br />
Move: <b>#yesNoFormat( statement.identifier == identifier )#</b>
</p>
");
}
// ------------------------------------------------------------------------------- //
// ------------------------------------------------------------------------------- //
/**
* I generate a Pandoc segment identifier (ie, URL anchor) from the given heading textual content.
*
* ASSUMPTION: For this demo, I'm assuming that every one formatting, hyperlinks, and footnotes
* have already been got rid of and that we're coping with plain-text header values.
*/
public string serve as generateIdentifier( required string heading ) {
var identifier = heading
.trim()
// Convert all alphabetic characters to lowercase.
.lcase()
// Substitute all areas and newlines with hyphens.
.reReplace( "s+", "-", "all" )
// Take away all non-alphanumeric characters, except for underscores, hyphens,
// and classes.
.reReplace( "[^w.-]+", "", "all" )
// Take away the whole thing as much as the primary letter (identifiers won't start with
// a host or punctuation mark).
.reReplace( "^[^a-z]+", "" )
;
// If not anything is left after this, use the identifier segment.
if ( ! identifier.len() ) {
go back( "segment" );
}
go back( identifier );
}
</cfscript>
As a normal rule, when the use of Common Expressions to unravel an issue, all the time transfer the “convert to lowercase” step as high-up within the set of rules as you’ll be able to. That method, you’ll be able to simplify your patterns by means of the use of [a-z]
as an alternative of [a-zA-Z]
; and, you’ll be able to use .reReplace()
as an alternative of .reReplaceNoCase()
, which will probably be extra environment friendly.
On this ColdFusion code, I have used Pandoc’s description of each and every step as a remark within the code in an effort to see how each and every RegEx trend maps to Pandoc’s meant consequence. If Common Expressions appear to be a international language to you, take a look at my video presentation on fundamental trend utilization. If you get started the use of patterns, you can in finding that they give a boost to the standard of your developer existence.
With that stated, if we run this ColdFusion code, we get the next output:

As you’ll be able to see, the heading identifiers generated by means of our ColdFusion Common Expression replacements fit the identifier assertions supplied by means of Pandoc. At this level, I will be able to replace my Function Flags website online good judgment and no longer concern in regards to the inter-chapter hyperlinks breaking once I generate my EPUB.
Notice: My Function Flags website online makes use of Flexmark to transform from Markdown to HTML in ColdFusion (all over website online bootstrapping and initialization); which is why the 2 algorithms wish to be aligned. This manner, I neither wish to set up Pandoc on my server nor do I wish to dedicate the generated HTML to my supply keep an eye on.
Wish to use code from this submit?
Take a look at the license.
[ad_2]