Markdown Lazy Links in Python

One of the things I am most excited about in Engineer 0.5.0 is the new support for Markdown Lazy Links. My implementation is actually a bit richer than Brett Terpstra’s original sample, though it’s not quite as elegant as the original either. In particular, Engineer’s implementation allows you to add lazy links to posts that already have numeric reference links. Also, you can optionally have Engineer transform the lazy links into numeric links during a build. This can come in handy if you anticipate doing a lot of reorganizing of the post content at some point, and want to make sure links don’t break.

It took some time to unpack Brett’s elegant regular expression into the Python form, mostly because Ruby is very foreign to me, and its regex engine has some default behaviors that differ from Python’s. In particular, it took some time to figure out exactly what flags to pass in so that things behaved appropriately. I’m still not sure I got it completely right, though my unit tests seem to pass and I’ve been using the plugin for some time so I think it’s stable.

I chose to use the VERBOSE regular expression form so it’s clearer how the expression works. Hopefully that will save someone some time if they’re looking to port the thing to some other regular expression language. You can find the source in the GitHub repository, but I’m pasting the relevant class below as well. Note that this is an Engineer PostProcessor plugin, so some of the code is simply scaffolding for the plugin system. If you find a bug, please let me know, or even better, file an issue on GitHub.

1
class LazyMarkdownLinksPlugin(PostProcessor):
2
  # Inspired by Brett Terpstra:
3
  # https://brettterpstra.com/2013/10/19/lazy-markdown-reference-links/
4
  _link_regex = re.compile(r'''
5
    (       # Start group 1, which is the actual link text
6
      \[      # Match a literal [
7
      [^\]]+    # Match anything except a literal ] - this will be the link text itself
8
      \]      # Match a literal ]
9
      \s*     # Any whitespace (including newlines)
10
      \[      # Match the opening bracket of the lazy link marker
11
    )       # End group 1
12
    \*      # Literal * - this is the lazy link marker
13
    (       # Start group 2, which is everything after the lazy link marker
14
      \]      # Literal ]
15
      .*?^    # Non-greedy match of anything up to a new line
16
      \[      # Literal [
17
    )       # End Group 2
18
    \*\]:     # Match a literal *]: - the lazy link URL definition follows this
19
    ''', re.MULTILINE | re.DOTALL | re.UNICODE | re.VERBOSE)
20

21
  _counter_regex = re.compile(r'\[(\d+)\]:', re.UNICODE)
22
  _counter = 0
23

24
  @classmethod
25
  def _replace(cls, match):
26
    cls._counter += 1
27
    sub_str = '%s%s%s%s]:' % (match.group(1), cls._counter, match.group(2), cls._counter)
28
    return sub_str
29

30
  @staticmethod
31
  def get_max_link_number(post):
32
    all_values = set([int(i) for i in LazyMarkdownLinksPlugin._counter_regex.findall(post)])
33
    return max(all_values) if all_values else 0
34

35
  @classmethod
36
  def preprocess(cls, post, metadata):
37
    from engineer.conf import settings
38

39
    logger = cls.get_logger()
40
    content = post.content_preprocessed
41
    cls._counter = cls.get_max_link_number(content)
42

43
    # This while loop ensures we handle overlapping matches
44
    while cls._link_regex.search(content):
45
      content = cls._link_regex.sub(cls._replace, content)
46
    post.content_preprocessed = content
47
    if getattr(settings, 'LAZY_LINKS_PERSIST', False):
48
      if not post.set_finalized_content(content, cls):
49
        logger.warning("Failed to persist lazy links.")
50
return post, metadata