tyler butler

Markdown Lazy Links in Python

One of the things I am most excited about in Engineer 0.5.0 is the new support for Markdown Lazy Links. My implementation is actually a bit richer than Brett Terpstra’s original sample, though it’s not quite as elegant as the original either. In particular, Engineer’s implementation allows you to add lazy links to posts that already have numeric reference links. Also, you can optionally have Engineer transform the lazy links into numeric links during a build. This can come in handy if you anticipate doing a lot of reorganizing of the post content at some point, and want to make sure links don’t break.

It took some time to unpack Brett’s elegant regular expression into the Python form, mostly because Ruby is very foreign to me, and its regex engine has some default behaviors that differ from Python’s. In particular, it took some time to figure out exactly what flags to pass in so that things behaved appropriately. I’m still not sure I got it completely right, though my unit tests seem to pass and I’ve been using the plugin for some time so I think it’s stable.

I chose to use the VERBOSE regular expression form so it’s clearer how the expression works. Hopefully that will save someone some time if they’re looking to port the thing to some other regular expression language. You can find the source in the GitHub repository, but I’m pasting the relevant class below as well. Note that this is an Engineer PostProcessor plugin, so some of the code is simply scaffolding for the plugin system. If you find a bug, please let me know, or even better, file an issue on GitHub.

class LazyMarkdownLinksPlugin(PostProcessor):
    # Inspired by Brett Terpstra:
    # http://brettterpstra.com/2013/10/19/lazy-markdown-reference-links/
    _link_regex = re.compile(r'''
        (           # Start group 1, which is the actual link text
            \[          # Match a literal [
            [^\]]+      # Match anything except a literal ] - this will be the link text itself
            \]          # Match a literal ]
            \s*         # Any whitespace (including newlines)
            \[          # Match the opening bracket of the lazy link marker
        )           # End group 1
        \*          # Literal * - this is the lazy link marker
        (           # Start group 2, which is everything after the lazy link marker
            \]          # Literal ]
            .*?^        # Non-greedy match of anything up to a new line
            \[          # Literal [
        )           # End Group 2
        \*\]:       # Match a literal *]: - the lazy link URL definition follows this
        ''', re.MULTILINE | re.DOTALL | re.UNICODE | re.VERBOSE)

    _counter_regex = re.compile(r'\[(\d+)\]:', re.UNICODE)
    _counter = 0

    @classmethod
    def _replace(cls, match):
        cls._counter += 1
        sub_str = '%s%s%s%s]:' % (match.group(1), cls._counter, match.group(2), cls._counter)
        return sub_str

    @staticmethod
    def get_max_link_number(post):
        all_values = set([int(i) for i in LazyMarkdownLinksPlugin._counter_regex.findall(post)])
        return max(all_values) if all_values else 0

    @classmethod
    def preprocess(cls, post, metadata):
        from engineer.conf import settings

        logger = cls.get_logger()
        content = post.content_preprocessed
        cls._counter = cls.get_max_link_number(content)

        # This while loop ensures we handle overlapping matches
        while cls._link_regex.search(content):
            content = cls._link_regex.sub(cls._replace, content)
        post.content_preprocessed = content
        if getattr(settings, 'LAZY_LINKS_PERSIST', False):
            if not post.set_finalized_content(content, cls):
                logger.warning("Failed to persist lazy links.")
return post, metadata

Deploying Engineer Sites to Azure

A longstanding issue in the Engineer issue tracker concerns documenting some of the free/low-cost hosting options one has to host an Engineer site. GitHub Pages is a common request, and documenting that process is definitely on my to-do list, but I think there’s a better option: Azure.

I’ve been hosting tylerbutler.com on Azure for the past few months, and I have to say, I’m very pleased with it so far. It’s probably not the most cost-effective thing for my site, but if you have an MSDN subscription, which many .Net developers have (including me), then you have a monthly Azure credit that is almost certainly enough to cover the cost of deploying your Engineer site to Azure.

Deploying on Azure has the benefits of auto-scaling to handle traffic demands, though that’s not particularly compelling for Engineer-based sites since static sites by nature tend to be very scalable anyway. The truly compelling feature, in my opinion, is that it lets you maintain your built Engineer site in a Git or Mercurial repository, which is something that developers in particular really like. It is, after all, one of the nice things about GitHub pages as well.

With some of the new features in Engineer version 0.5.0, I’ve got tylerbutler.com in a GitHub repository of its own, and every time I git push, the site is updated automatically thanks to Azure. Even better, thanks to Engineer’s support for multiple post directories, I can put my published posts inside the Git repository itself for safekeeping but still write posts from any device/app that integrates with Dropbox. This flexibility of post authoring was one of the key reasons I wrote Engineer originally; it’s important that I maintain that with whatever deployment architecture I choose.

Once you have everything set up on Azure, the basic flow looks like this:

  • Edit and maintain your site in a Git or Mercurial repository
  • Build your site, commit to Git/Mercurial
  • Push the repository to GitHub/Bitbucket
  • Azure site automatically updates itself in a few minutes

In order to get this up and running, you can follow the steps below. Note that Engineer 0.5.0+ is needed. I’ll eventually get these instructions incorporated into the official Engineer docs, but I wanted to get the info out there for folks without delay.

Getting started

If you don’t yet have an Engineer site, you can initialize a new one with a content structure and configuration files especially for Azure using the following command (new in Engineer 0.5.0):

engineer init -m azure

If you have an existing site, you can simply use it, of course, but you may need to add your own .deployment file or configure the Azure deployment settings for your site yourself. There are more details below.

You can lay out your files however you wish, but the typical layout will look something like this:

/my-engineer-site
    - .deployment
    - config.yaml
    /content
    /templates
    /output
        /azure

Run the following command from the root of the folder to build your site for Azure:

engineer build -s ./config.yaml

The output will be written to ./output/azure/ by default. You can obviously change this in the Engineer settings, though note that you’ll have to make some other changes as well. Keep reading for further details.

Go to your Azure portal and create a new web site. Configure the site to automatically deploy from a GitHub/Bitbucket repository and connect it to your repository. You can also choose to use manual Git deployment if you wish, or even just FTP the site content, but I recommend the GitHub auto-publish route; it’s much simpler and automatic.

Now every time you push a new commit to GitHub/Bitbucket, your Azure site will update automatically with the contents of the ./output/azure/ folder. This magic works because of the .deployment file in your repository, which is created automatically by Engineer when you initialize a new site using engineer init -m azure. You can read more about this file in the GitHub wiki

If you want to change the output folder to a different path, you can do that in your Engineer config file. However, in addition, you’ll need to tell Azure that the location of the site content within your repository is different. This is contained in – you guessed it – the .deployment file. Just change the value of the ‘project’ setting within that file.

If you prefer, you can remove the .deployment file altogether and configure the site root in the Azure portal directly. This is basically the same as the .deployment file approach, but if you have a single repository that contains multiple Engineer sites – or other types of sites, even – then using the settings in the Azure portal is preferable. From Scott Hanselman:

What’s nice about setting the “Project” setting via site configuration rather than via a .deployment file is that you can now push the same git repository containing two different web sites to two remote Azure web sites. Each Azure website should have a different project setting and will end up deploying the two different sites.

That is very cool. Scott has more details in his post on the topic.

Hopefully this helps you get your Engineer site up and running on Azure. As I mentioned before, I am going to be incorporating this information into the official Engineer docs as well. I’ve also gotten Engineer sites running on GitHub Pages, so I’ll cover that in a future post.

Engineer v0.5.0 Released

I released the next major version of Engineer, version 0.5.0, last month. I didn’t quite meet all of my goals with the release (not the least of which was the release date, which was five months later than I had optimistically planned), but it’s a major one nonetheless. As usual, full release notes are available at Read the Docs.

Unfortunately, upgrade may or may not work for you the normal way due to setuptools changes. There are more details in the upgrade docs.

Lots of cool stuff in this release, and more planned, so if you’ve not tried Engineer yet, now might be a good time.

Count Files In A Directory

Need to count files in a directory? Try this one line of PowerShell:

(dir | where {$_.GetType() -match "fileInfo"} | measure-object).count

You can also add a -r parameter (-r means recursive) to the first dir1 command and get a complete count of all files in all subdirectories under the current path.

This comes in handy, though I do wish it was a little more succinct.


  1. And of course you can substitute ls for dir if you prefer, or even Get-ChildItem if you’re feeling particularly masochistic. ↩︎