Migrating from WordPress to Jekyll – Part 2: **Everything** you need to know about Jekyll

how-to-migrate-from-wordpress-to-jekyll

So, as promised, here’s the second part of a two parts series on why and how I migrated this blog to Jekyll from the publishing platform I was using previously, WordPress. Here’s the (technical) steps I had to take in order to complete the migration while preserving the site’s layout, usability and SEO characteristics.

NoteI know that there are several other articles on the subject, however while I found many of them, I couldn’t easily find answers to questions such as where to find something, what data is available with Jekyll when building the site, and so on. So in this post, while I describe what I have done to migrate my blog, I have also tried to make those answers readily available for others who may have the same issues as I had when migrating.

It’s a pretty long post, with a lot of information of the WordPress => Jekyll migration, as well as on Jekyll in general, so I have organised the contents in various sections; if you are already familiar with Jekyll or are looking for a quick answer concerning something specific, feel free to jump to one section or another.

WordPress is a feature rich blogging platform that boosts a massive community and tons of extensions, so if you also have a blog powered by WordPress but are unsure as to why you may want to migrate, have a look at my previous post, in which I mentioned the reasons why I decided to migrate anyway, and that WordPress was not for me.

Your mileage may vary, of course, but if in the end you do want to migrate to Jekyll, herein you will find useful tips that will hopefully save you some time. On a side note, while the focus here is clearly on the WordPress => Jekyll migration, much of the information in this post can also be useful to users wanting to migrate from different blogging platforms, such as Blogger, Tumblr, Live Journal, and others.

Update 22 Apr 2011: I’ve published a follow up on how to integrate a dynamic contact form, powered by Sinatra.

Contents:

Introduction

Assuming you already are thinking of abandoning WordPress for some reason, and are also thinking that Jekyll may be a good alternative for you, let’s first have a brief look at how Jekyll differs from WordPress, before looking at how to actually migrate. This may help you make an informed choice after all.

To recap from the previous post, Jekyll:

  • is a simple, blog aware, static site generator. This in short means that it isn’t a CMS like WordPress, it doesn’t use a database, and all it does is basically generate a totally static site that can then be served directly by a web server like Nginx or Apache; if you have (hopefully) used a caching plugin with WordPress such as WP-SuperCache, then you already know that this means a faster site; deployment is also easier, in that all you’d have to do to deploy a Jekyll site is copy it to your production web server (although you may prefer using something like RsyncCapistrano or Rake, as we’ll see later);
  • since the site is fully static and is preprocessed in advance (as opposed to WordPress with WP-SuperCache or similar where you still have dynamically built pages cached to static files), Jekyll does not need any server-side technologies (while for example WordPress needs PHP), therefore no application server is required either; however, we’ll see how “mixing” a fast static site generated by Jekyll with some server side technologies can still make sense in some cases;
  • publishing or updating an article or page basically means you need to rebuild the entire static site - at least using Jekyll the typical way – and this may sound bad. Rebuilding the site is almost immediate for this blog, for example, since it’s pretty new and doesn’t have many posts yet; however it may take a considerable amount of time to rebuild a site with many more posts and pages.

The three points above should carefully be taken into consideration when switching to Jekyll. For larger sites, it may be a case of better performance for the reader (Jekyll) vs faster publishing (WordPress). Depending on whether you are or not a Ruby developer and are used or not to tools such as Git or other SCM, you may either prefer WordPress’ administration interface or using a developer’s tools instead to publish your articles. This, again, is very important and you need to be sure you’ll be happy with Jekyll’s more “manual” approach. Personally, I do prefer editing my posts with TextMate, versioning with Git, and both publishing and maintaining the site with Rake and Capistrano tasks. It may sound like a more complex workflow, but in realty it can be simpler and a lot quicker once you are used to shortcuts of any type between your shell and your text editor of choice.

However, since I bet many among the people who will end up reading this page are either non developers or are PHP developers who are used to different tools, it’s your call whether all this suits your or not.

Installing and using Jekyll

For starters, in order to use Jekyll as your new publishing platform, you’ll need to install it. Assuming you have a Ruby environment already configured, all you need for now is the jekyll gem:

Once you’ve got the gem installed, you’ll need a folder which will contain the few files and folders Jekyll requires to work; at a minimum, Jekyll expects the following folder structure:

So just three folders and a configuration file. As we’ll see later, my blog’s folder has got a lot more stuff because of various extensions, plugins and more. But the items listed above are the basic files and folders Jekyll needs to work. You can also find a lot of Jekyll sites hosted on Github, so you could even fork one of them and start from there (at the moment I am using my own Git server for this blog, but I will also push the code to Github when I have time).

_includes contains text files, typically HTML, that can be included like “widgets” in pages or layouts; if you are familiar with Ruby on Rails development, they are like partials in the context of Jekyll. _layouts, similarly, contains the layouts which can be used to give a consistent look to your pages – as in WordPress, you can different layouts, for example with or without sidebars, or with different sets of includes (widgets in WordPress). Jekyll uses Liquid as templating language. If you have never used it, it’s basically a templating language with a designer friendly syntax, and can be extended if you know your way around Ruby – I’ll show some examples on how to extend Liquid later on.

_posts will contain text files written in Markdown (so with the .markdown extension) which represent -you guessed it- your actual posts. When you run the jekyll binary (installed together with the gem) to build the static site, Jekyll will process these text files through a Markdown filter (you can choose from a few filters available, as we’ll see later) to convert them to posts in HTML. In a similar way, it will process “raw” pages through Liquid to generate the static, HTML pages that will then be served to clients (we’ll see in greater detail later that not all the pages are processed with Liquid, depending on one particular condition). It is important to remember that Jekyll expects post files to have names in the format YYYY-MM-DD-title.markdown, since it uses the first part of a post’s filename to determine the publishing date for the post. Unfortunately, Jekyll as is doesn’t handle the time when a post was published, and this – as we’ll see later – can cause a few issues although it can easily be fixed.

_config.yml, the configuration file, is not really required, in that the settings it contains can also be specified at runtime as command line arguments for the jekyll command. I prefer keeping in _config.yml those options that I always want to have enabled, while I leave out options that I may want to use or not at runtime, depending on what I am doing; more details in the next section.

Provided you already have some posts and have some layouts ready – we’ll see later how to create both – it’s extremely easy to generate the static site with Jekyll. All you have to do is run the jekyll command from within your blog’s folder. Jekyll will then generate a static copy of your site, ready to be served to users with any web server, in a subfolder named _site. There are various deployment options which we’ll see later, but in theory to deploy a Jekyll-generated site it’s enough to copy the _site folder to the production web server, and that’s it!

Configuration

As you may have guessed by the file’s extension, the configuration is stored in YAML format, which you will certainly be familiar with if you are a Ruby developer. All the settings you can put in there are optional, since -as said- you could also specify the same options as command line arguments for the jekyll binary; the end result is the same (for example, if you wanted to run Jekyll and also start a server at the same time, you’d have to run jekyll –server).

Here’s my current configuration file:

  • auto: if set to true, when you run the command jekyll it won’t just build the static site and then exit; instead, it will run in foreground and monitor for changes in the posts or the other source files, and will automatically update the static site whenever changes occur; it can be useful for quick testing and debugging while your are either writing posts or making changes to the layouts, for example;
  • server: if set to true, Jekyll will generate the static site and then start a server to serve the static site it has generated (the default is Webrick); it’s not what you should use in production because of scalability issues, however it can be pretty useful in development;
  • lsi: its value (true or false) determines how Jekyll will figure out related posts to associate to each of your posts. The default setting is false, meaning that Jekyll will do a sort of very quick calculation to determine related posts; this way building the static site is really fast, but the results are pretty poor. You can get more accurate related posts by setting this option to true; this way Jekyll will use Latent semantic indexing to find related posts, but it has the downside that the generation of the static site can take a ridiculously long time compared to the other option, so especially with large sites it may not be a good idea (well, depending on how important related posts are to you). I must say that this is one of the very few things I don’t like about Jekyll so far; I used to use a great plugin for this in WordPress, which yields much more accurate results -IMHO- and didn’t really slow down things in a noticeable way;
  • markdown: as anticipated, Jekyll allows you to use Markdown syntax when your write posts (although HTML is also allowed and correctly recognised in the files); this option lets you specify which Markdown filter Jekyll should use. I initially used Maruku since I like its extensions to the original Markdown syntax (it’s a Markdown “superset”); however, unsurprisingly, I had to give up on it because of performance issues. Maruku is a pure Ruby Markdown interpreter, and as such it is very slow if compared to other alternatives. The fastest interpreter these days seems to be still Rdiscount, which is for the most part written in C. Unfortunately using Rdiscount means that you’ll have to cope with a less featured Markdown syntax (and write some HTML by hand here and there), however the difference in terms of speed is massive, so big that even with a small site you’d be happy with the compromise (here‘s a benchmark done by somebody which will give you an idea of the difference); of course if you still prefer a richer Markdown superset, the choice is yours, however remember that the static site needs rebuilding each time its contents change, so speed is really an extremely important factor here. From my tests with Maruku, also, I’ve noticed that Maruku is a lot more sensible to syntax mistakes than Rdiscount is, in that it will very easily complain if the syntax is not 100% perfect, making editing a little annoying at times;
  • pygments: Jekyll supports syntax highlighting out of the box (while WordPress for example requires a plugin for this), though Pygments; the languages supported are pretty many and there’s a lot of themes (basically, CSS styles) to choose from;
  • permalink: in WordPress, you have likely used a custom permalink structure to improve the SEO performance of your blog; of course, you can do this with Jekyll as well with this setting. Many people -me included- agree that the /:title permalink structure is best for SEO, however you need to choose wisely: ideally you should use exactly the same permalink structure you used to use with WordPress, otherwise your blog’s search engine rankings will be severely affected; you can of course change it whenever you want, but you must make sure your site instructs 301-permanent redirects from the old URLs to the new ones, in order to let search engines know in the correct way about the changes. Since Jekyll normally does not use a database nor a server side technology, this would only be possible if you have access to the web server’s configuration for your site (which you don’t, if you are using a third party’s hosting service) and are familiar with that kind of configuration; another option is to “mix” the static site generated by Jekyll with some server side technology so to be able to instruct these redirects with server side code (we’ll see later that I am using Sinatra together with Jekyll, although for other reasons);
  • paginate: it lets you choose how many posts to show per page (Jekyll supports pagination);
  • category_dir: similar to the same option in WordPress to choose the root (in the URL) for your category pages; I have this setting because of an extension which I’ll explain later;
  • category_title_prefix: it allows you to prepend some text to page titles in category pages, such as “Posts filed in ..”; this setting is also there because of an extension;
  • exclude: as you may have noticed earlier, the folders and files used by Jekyll to generate the static site have names that start with the underscore (“_”); this is because Jekyll recognises files and folders that contain files required for the generation of the static files, and will not copy them to the _site subfolder. So if you have files and folders that are required by Jekyll to generate the site or that you just need to keep in the site’s root, but you don’t want them to be available for download by clients, you have two options: the first is to give them names that start with the underscore; the second, when the first one is not possible or if you don’t like that limit, is to set the exclude option to an array of names which will be then also ignored by Jekyll. We’ll see later why I have added those names in my configuration file;
  • destination: as mentioned earlier, by default Jekyll will generate the static site in the _site subfolder, but you can also customise its name. In my case, for example, I am using Capistrano for my deployment tasks, and since Capistrano expects the folders imagesjavascripts and stylesheets to be under the folder public, otherwise it will show warnings / throw errors, I have configured the destination option to let Jekyll know that I want the static site to be generated in the public subfolder instead.

There may be other options available, but so far this is all I have needed so I haven’t investigated further.

Importing posts from WordPress

Perhaps the most important step among the first ones during the migration, is to import the posts you already have in your WordPress blog, into Jekyll. Importing essentially means that you will need to read the relevant data from WordPress’ database and use it to dynamically create in the _posts folder as many text files as the number of posts you have. This, of course, unless you are willing to rewrite your posts in Jekyll…

It doesn’t really matter what you use to automate this process, as long as you can create the text files Jekyll expects from the data in your WordPress database. If you, like me, work with Ruby, then Ruby will of course be the natural choice. Before migrating my posts, I saw on Jekyll’s site that there was a reference script in Ruby, but it didn’t cover some things I wanted to do so I wrote my own script; you could use it as it is or as a base for your own customised import process, but you need to make sure you have the required gems installed before proceeding.

Here’s why I used a different import script:

  • I wanted to import more data, such as featured images for posts (introduced with WordPress 3.0 if I remember rightly), categories, and tags;
  • I wanted to import existing drafts and distinguish them from the published posts
  • I wanted to convert the posts to Markdown while migrating; as said earlier, you can leave HTML in the Markdown files, but I didn’t like the idea of having mixed HTML and Markdown posts in my _posts folder, plus I had to cleanup anyway the HTML generated by WordPress and change things like the syntax highlighting code.

You can find the full script in this gist on Github; here I’ll highlight a few things in the code if you are familiar with Ruby.

As you can see, I have used two queries rather than a single query as seen in similar scripts for the importing. This is because as said I also wanted to import more data about posts, and I also wanted to import categories and tags as they were in WordPress; I could have perhaps written a single query anyway, but since I didn’t have many posts to import, it was easier to proceed this way rather than spending more time to figure out a proper join.

You can see here that I am also reading the featured image (if any), plus the status. This is because I then want to separate drafts from published posts (the sample script on Jekyll’s site does not take drafts into account).

This may look ugly… I could have certainly copied the featured images directly from the other site, but since I had the URLs already in the database, it was quicker to just download the files with the proper name to the expected target destination.

Here’s where I use the second query to import categories and tags (both referred to as “taxonomies” in WordPress) for each post. The autoslug is perhaps something temporary as I haven’t decided yet how to manage slugs and titles for categories and tags. In my WordPress blog, these taxonomies had human friendly titles which often differ from their related slugs. So for the time being I am importing everything as is so not to affect SEO, although this means that I am basically storing -in posts- three values for each category or tag (titleslug, and autoslug). The autoslug is basically something that I am using for now for posts published after the migration, and it basically converts the human friendly title into a valid slug for use in URLs; the slug instead, if present, will have priority since it means that it was the slug used in the WordPress blog.

Here you can see that I am also keeping the excerpts, and assign categories and tags to each post. In some other import scripts I have seen, others choose not to import the excerpts and to dynamically generate a “read more” intro by cutting the first N words from the post’s content, but I do prefer having excerpts better thought for both usability and SEO.

Finally, once collected and processed the relevant data, a text file for each post is created in either _posts or _drafts depending on whether it was a published post or still a draft. As mentioned previously, Jekyll expects these files to have names in the YYYY-MM-DD-post-title.markdown format; so it doesn’t understand the time when a post was published, but only the date.

Here’s a warning: if you leave this as it is (instead of implementing a fix or workaround to preserve the original date and time each post was published), beware that this will affect the Atom/RSS feed that Jekyll will generate for your site, in that posts won’t have a time associated to their published date, and therefore RSS readers and alike will treat all the old posts as new. So your feed’s subscribers will likely see all the old entries once again in their RSS clients. I realised this after I had completed the migration. If you have many posts, you may want to think about this as it may annoy your readers.

In the last code snippet, you can see that the YAML data generated from the relevant data for a post, is printed at the beginning of the file. This is called YAML Front Matter. Here’s the front matter for my previous post, as an example:

I have cut the other tags and all the categories to save space, since the concept should be clear. There’s one thing I’d suggest you keep in mind when setting the properties: there are some characters you would need to escape when writing string values, or they will break the processing of the page. So it’s just easier to enclose all text values in the front matter within double quotes, so then you know you only need to escape double quotes.

So to import the existing posts from WordPress, you can run the script as follows (in my case I had the script in the lib subfolder):

Provided you have all the dependencies installed, and can access the WordPress database with the credentials you have specified, after running the script you should have as many text files in the _posts and _drafts folders as the published posts and drafts you had in WordPress.

Converting the imported posts to Markdown

This is something that you may want to do or not depending on your needs. In my case, I didn’t like the idea of having new posts in Markdown, mixed to old posts – imported from WordPress – in HTML. Plus, I also had to cleanup anyway the HTML imported from WordPress and change things such as the syntax highlighting. So I decided I wanted to convert all the existing posts and drafts to Markdown as well, so to have all posts and drafts stored and managed in the same way.

Luckily, I found a Ruby library, DownmarkIt, ready to use. Unfortunately I still had to apply minor fixes to the Markdown it generated… but all in all it worked quite well; so if you also want to convert WordPress’ posts to Markdown files, grab the library first and put it in the same folder as the import script. Then you’ll have to make two small changes: first, you need to require the library; second, you’ll need to make sure that each imported’s post is first filtered through this library before being written to the destination text file:

That should be it. In my case the Markdown resulting from the conversion wasn’t perfect (perhaps there are better alternatives, but I didn’t want to invest more time on this), so I had to manually fix some little mistakes which would result in errors when parsing these Markdown files. But nothing really time consuming for me since I only had around twenty posts to check.

Layouts and includes

Once I had imported drafts and posts from my WordPress blog, the next step naturally was to recreate the same layout and look in Jekyll. The very first attempt I made was basically to try to reuse the same stylesheets and scripts as in the old blog, and just copy and paste HTML from the blogs’ pages (looking at their source) into new Liquid layouts, extracting here and there portions of HTML into includes.

However there was a lot of clutter in the HTML produced by WordPress or, more likely, the plugins I had; plus, the plugins were also the source of some mess with JavaScript and CSS Stylesheets, although I had already done some optimisations for the WordPress blog. So in a word the operation didn’t feel “clean”, and I decided to restart from scratch with regards to layouts and includes, rewriting all the HTML.

As already mentioned earlier, Jekyll uses Liquid to process layouts and includes, and luckily Liquid makes it easy and quick to recreate layouts, nested layouts, and includes in a way that looks similar to how partials work in Rails, so I really like this kind of organisation of the markup in several files.

Jekyll will always parse with Liquid any file in the folders _posts_includes and _layouts, by default. However Jekyll can also parse through Liquid any other file – regardless of its location – as long as the file contains, at the top, a YAML front matter; so if you want Jekyll to process a file with Liquid before the file gets copied to the static site (and that file is not a layout or include), you need to add a YAML front matter to that file. You don’t have to add actual data to that YAML section; even an “empty” front matter will cause Jekyll to parse that file with Liquid anyway. Example:

The interesting thing here is that any file with a front matter will be parsed with Liquid, and this is useful because it means that even CSS stylesheets and JavaScript files can be so processed, for example if you want to dynamically inject content or change something depending on the value of some variable, or things like these.

Another cool thing is that with Liquid it is extremely easy to include some widget or anyway a portion of HTML into a layout, with the include directive:

head.html will be expected to exist in _includes. Of course, you can include the same file multiple times in the same or different layout or page, or even in another include for nesting.

You can even have nested layouts with the same ease, and this is something I really like because it allows you to create a default layout, and then create variations with only the changes (for example a main layout with sidebars, and a layout without sidebars). Take the following layout, for example:

For the nested layout to appear in the parent layout, you need to use the content directive in the parent layout, as shown below:

The same applies to any page using a layout: you just need to specify a layout in the page’s front matter, and make sure the layout has the content directive. I like Liquid because makes complex layouts extremely simple, since you can have just as many includes or nested layouts as you want.

Using static and dynamic data

We’ve already seen that we can add basically any kind of “static” data to a page or layout, by adding it to the YAML front matter. So for example say that we have a post file with this content:

We could have a layout named post (which has been specified in the post) like this:

So you can use the {{ }} syntax for any of the static data you specify to the YAML front matter, too. This means that each post will have its title (as well as other data), that will be then rendered in the relevant layout.

There’s a lot more you can do with Liquid (and as we’ll see you can even extend it); for more details, the official wiki is the best place to start with. I’ll show more examples along the way.

The “static” data you specify in the YAML front matter is not the only data you can use in your files. When you run the jekyll command to build the static site, Jekyll will automatically generate some hierarchical data on the posts, categories and so on that you can then use in any of the files that will be parsed with Liquid. Remember: you still need to add at least an empty YAML front matter to a file your want to be processed with Liquid, unless that file is in any of these folders: _posts_includes_layouts.

For example, in your site’s index page you will want to show the latest posts, right? So you can create an index.html file in the Jekyll folder with containing something like this:

As you can see, in Liquid you can use loops as well; in the snippet above, you can also spot an example of Liquid functions to manipulate (in this case) dates. This is just a tiny example of course, but we’ll see more of these (again, see Liquid’s wiki for more info).

The most important “dynamic” data Jekyll makes available when generating a site, that you can use in your files is as follows:

  • site.categoriessite.tags: returns all the categories / tags detected when processing the posts
  • site.categories[page.category]: returns all the posts for the category of the current page; this can be useful when creating category pages, since it can be used to list all the posts for a particular category
  • site.tags[page.tag], same as above, for tags
  • post.categoriespost.tags
  • page.urlpage.titlepage.previouspage.previous.urlpage.previous.titlepage.nextpage.next.urlpage.next.title
  • site.related_posts

Once you know how to use both static and dynamic data, know which data is available to you, and know that you can also use loops and more in Liquid, you should now be able to build your own layouts, includes, etc. You may also want to search Github for same public Jekyll repositories and start your own blog from the source code of one of these, although I am afraid all the public examples I have seen are very basic.

When I have a little time, I will publish the source code for this blog as well since it’s a bit more complex than others so it covers more of what could or should be done with Jekyll.

Liquid: some more useful stuff

You already know about place holders to render data in layouts and pages, and have already seen loops. One thing that you may also happen to need is how to limit the items you want to iterate in a loop, or how to specify how many items you want to skip before iterating the remaining items.

Have a look at my index page: as you can see I have 4 featured posts with a particular layout, followed by 6 “teasers”, styled differently. To achieve this, my index.html page only contains this:

You’ve already seen include, so you know this means that the featured-posts.html file will be included in that place. But then there’s something new: Liquid also lets you set variables that you can then use in your layouts and pages. This is particularly useful, because it allows you to reuse the same portion of Liquid layout but make it render differently depending on the values of some variables.

So in my home page I first render the featured posts (we’ll see in a minute what’s in that include); then I assign all the site’s posts to the variable named teasers, and in the following two steps I specify in two other variables that I want to skip the first 4 posts (since they will already be rendered as “featured posts”), and that I want to only take the next 6. Finally, the teasers.html include is injected.

I use the same include in other layouts as well, but without limiting the number of posts to take or the number of posts to skip, so it’s nice that I can just set variables and then use the same include in multiple places.

Here’s what I have in the featured-posts.html include (I have omitted the HTML for each posts since you already know how to render a post’s data from the previous section):

You can see that I can limit the loop to the first 4 posts. And here’s the content of the teasers.html include:

Where I say both how many posts to skip, and how many to take instead.

In the snippet above you can also see something that may look weird. My home page renders teasers in two columns, so to achieve this I basically inject alternately, after each teaser, and empty string (so, nothing basically) or a div with the following CSS styles:

By adding that div with these styles, I force every odd teaser on the left, and every even teaser on the right. Luckily Liquid supports both the typical if..then..else construct, as well as the forloop.last and cycle methods, used the way shown above.

Another thing useful to know is how to render links to the previous and next posts (if any). This is really good for SEO since it helps with internal linking. Below is the markup I am using at the moment, which should be pretty self explanatory.

Next, here’s is how I render related posts on my post pages (again, it should be pretty clear how it works, so I’ll just paste it for reference):

Important: if you use the –lsi option to get more accurate related posts (at the cost of a slower generation of the static site), you’ll see that Jekyll suggests

Notice: for 10x faster LSI support, please install http://rb-gsl.rubyforge.org/

So you could speed things up by just installing the gsl gem. However if you just try to install the gem, you may see errors since the gem requires a gsl package as well.

On Mac OS X

This is what I got at first on my Mac, which I use for the development:

It turned out that I needed to install the gsl package first. It’s easy if you use for example Homebrew:

Then I could install the gem without any errors:

Linux

I got a similar error when I did the first setup for my new Jekyll blog on my production server, and the fix was also similar; here’s what I had to install on Ubuntu 10.04 server:

Once you got these dependencies sorted out, Jekyll should rebuild the site noticeably faster with the –lsi than before (it does for me).

Lastly, I wanted to mention a few other Liquid directives I find useful to chain or transform data for rendering:

As you can see, it’s pretty easy to chain data in Liquid with capture: all what is between capture‘s opening and closing tags is concatenated and assigned to the variable specified in the opening tag. You can also see here some more examples of Liquid functions to prepend, append, and replace text. Then there’s the custom function urlencode; I’ll show later how I extended Liquid with this function.

All Liquid functions, as you can see, are used by appending a pipe to the data to transform, followed by the function’s name and any arguments it may require.

Comments

One thing that should be pretty clear by now, is that because Jekyll only generates a static site, and it doesn’t use a database nor any dynamic content built with server side technologies, it doesn’t support comments out of the box. There are two possibilities to fix that: either you can use some server side technology and integrate the management of comments into Jekyll’s static pages in a way or another, or – the recommended one – you can just outsource comments.

Services like DisqusIntense Debate and similar have become pretty popular these days and are great for comments since they offer a much better experience even compared to that you’d normally have with WordPress; even with WordPress you could improve the built in comments feature with plugins, but that often means a heavier WordPress install, and generally speaking it’s hard to achieve this way the same user experience that those third party services offer with minimum effort. Even Facebook have recently revamped their comments plugin and it’s now pretty good, so much so that even some large sites have started to use it. However I prefer Disqus over it (I have never liked Intense Debate too much), because of both its nice integration of what they call social media reactions, and the fact that they allow users to login with multiple authentication providers, as opposed to Facebook that at the moment – I think – only supports authentication with Facebook credentials.

Regardless of which one you choose, there’s a few things you should know if you aren’t yet familiar with outsourcing a blog’s comments to a separate service:

  • since comments are -obviously- stored elsewhere, they will be loaded into your pages through a JavaScript remote call; it is very easy to integrate a third party commenting service since it only requires you to add some JavaScript code snippet to your layouts, but this also means that comments won’t be already present in the page while the page loads and is rendered, so comments may appear more slowly or not at all in the event the service you use is down;
  • the fact that comments are not present in the HTML documents of your pages, means that those comments basically “don’t exist” for search engines. Search engines, in fact, will only index the HTML of your pages, not JavaScript code nor JSON data that is loaded from the remote service to render your comments in the client’s browser; so this may be a negative point with regards to SEO depending on your point of view; some people think that comments may distract from the focus on the topic of a page (in the context of SEO), so perhaps it is better that they are excluded or ignored when search engines’ spiders index the site; other people instead think that the content of comments may also help drive search traffic to the site; if you think that comments add to the SEO of your site, it’s good to know that some services -like Disqus- create a comments page on their site for each page of your site that implements comments, and their pages basically instruct search engines of the canonical location of the comments, that is your own pages. So in theory this should fix some of the SEO related issues, and your site should be able to claim back some precious link juice;
  • while all these services usually require you to include just one script into your layouts, when comments are actually loaded these service make a number pretty high of additional HTTP requests to load whatever is needed to render the comments in your pages; this can indeed affect the general client side performance of your site so you should also take it into consideration because a) users may perceive slowness, b) search engines now also take into account client side speed in the algorithms that calculate rankings;
  • most of these external commenting services, when used with WordPress through plugins, also take care of the synchronisation of all the comments between your WordPress install and their own data store; this means that a) you will always have a local “backup” of all your comments, although they are managed by an external service; b) that if the commenting service is down at some time, you can temporarily switch it off and restore WordPress’ built in commenting until the problem is fixed so your readers would have a limited user experience but the commenting feature wouldn’t be completely lost. This is something that of course is not possible with Jekyll out of the box since Jekyll doesn’t have even a database. However nothing forbids you to do a local backup of all the comments every so often; most of these services (if not all), in fact, also offer APIs that you can use to download and synchronise comments with a local database, although this of course requires that you are familiar with some programming language. I am planning to do something like this in Ruby ASAP (unless there’s something ready for the purpose – I haven’t investigated this yet), and will then post about it when done;
  • if you have always used WordPress’ built in commenting feature, you will have all your comments in your WordPress database, therefore you will need to import all those comments into Disqus or other service before the same comments can be available for your Jekyll site. In my case I didn’t need to do this as I have always used Disqus since the beginning (so Disqus already had all my comments when I migrated), but if you need to, the easiest way is to install Disqus’ plugin in your WordPress blog and let it synchronise all the comments to Disqus’ servers; this may take anywhere from a few minutes to hours or days depending on how many comments you blog has.

So, how do you integrate one of these services in your Jekyll blog? I’ll show here how easy it is with Disqus, so far my favourite. You can get the necessary info from your Disqus account, however here’s what I had to do as an example and for quicker reference.

First, I created the file _includes/comments.html, making sure it contains a div element with id set to disqus_thread, since Disqus will render the comments in this element; then I specified the id of my blog in Disqus (which is vitosjournal) and the URL to use as the association between some comments and the page where those comments should appear. You can see I used a simple Liquid placeholder with the page.url that Jekyll makes available for each of your pages. Finally, I have included Disqus’ script:

You can obtain a snippet similar to the one above from within your Disqus account (following Install => Universal Code). That’s it: you should now see the comments rendered correctly in your pages.

A second step was to get the View comments link next to each featured post or teaser (in the index or in the archive pages), to display the number of comments (and, for Disqus, of social media reactions) stored for each of them. You can achieve this by making sure that

a) you add another script which will take care of this:

b) each link to comments has a URL ending with the #disqus_thread fragment identifier; Jekyll will recognise these links and, by default, use the URL in the link’s href attribute as unique identifier for the comments count to display:

You can optionally add an attribute named data-disqus-identifier if you want to specify a different URL or use some different piece of information as unique identifier for your comments. A word of warning: make sure that if you specify the data-disqus-identifier, the identifier is the correct one and Disqus correctly recognises it. In my case this didn’t work despite I had just taken the same identifiers I could see from my WordPress blog (I was using Disqus’ plugin already); however I haven’t had any issues whatsoever using the normal URLs instead.

That’s all for comments if you go for Disqus, but the process should be pretty similar with the other services too.

Search engine optimisation – Creating an XML site map

Worpress is not bad from a SEO point of view already out of the box, and you -like me- may have further improved its SEO performance with a SEO-oriented theme (mine was Thesis) and various plugins. With Jekyll, you need to care yourself of anything SEO, as well.

At least for the basics, here’s what I did: I first created an include, meta-tags.html, containing (among other things) these meta tags:

Then, in each page’s or post’s YAML front matter, I added the relevant information. Here’s, for example, the full content of my index.hml:

As you can see, besides the title (recommended max length: 65-70 characters) and the meta description (recommended max length: ~150 characters), I also specify for each page how search engine’s robots should treat it. The possible values of the meta-robots setting:

  • noodp: instructs search engines to not generate descriptive snippets from the Open Directory Project as a source – as this may yield unexpected results;
  • noydir: same as above, but with regards to Yahoo! Directory;
  • noindex: tells spiders to basically ignore the current page;
  • noarchive: tells spiders not to cache/archive the contents of the current page;

I suggest you search these settings if you aren’t familiar with them; I basically have noodp and noydir for all my pages, while on pages that I want to be ignored by search engines, I also add noindex and noarchive. This way, I can make sure search engines only have relevant information on my posts rather than duplicate content or information that would only look like clutter in the context of a search.

Besides meta-tags.html, I also have another include, link-rel.html, which contains links to stylesheets but also a link tag for canonical URLs:

As you can see, I am using the custom Liquid function canonical (we’ll see later how you can define custom functions), to adapt the page URL in some way. Please ignore that function for now since it’s something a bit more specific to my blog, the important thing is that you put in there the correct canonical URL for each page (in the YAML front matter, of course, so that include can be added to all the layouts with just a Liquid placeholder).

Canonical URLs are required to avoid issues with duplicate content in a number of cases, since duplicate content may affect the SEO performance of a site. In my WordPress blog, canonical URLs were very important since I often had very long posts (such as the one you are reading) that -because of the length- I had split in several sections, that is on different pages basically. In that case, by having a canonical URL pointing to the first page of a long article, for all its pages, would ensure that search engines would push ranking for the first page rather than diluting the SEO value of the article across its pages.

In reality, even with this kind of configuration there may still be problems with search engines. In fact, in Google’s Matt Cutts’ words:

A canonical page is the preferred version of a set of pages with highly similar content.

Also:

Must the content on a set of pages be similar to the content on the canonical version?

Yes. The rel=”canonical” attribute should be used only to specify the preferred version of many pages with identical content (although minor differences, such as sort order, are okay).

So Google makes it pretty clear that canonical URLs help avoid issues with duplicate content that may penalise the ranking of your pages, but the content on each of the pages pointing to the same canonical URL must have identical (or almost identical) content. This isn’t the case of course with the content of a post split into different pages, so I had to change my plans with regards to in-post pagination and I would suggest you do the same if you also come from a similar WordPress setup.

Besides SEO, I have received enough feedback from readers to understand they don’t usually like to have to load different pages just to read the entire content of an article. It’s a slower reading experience for them and they may think you do this just to increase the number of page views, although you may instead do this simply to organise longer posts.

So, it looks like I had enough reasons to drop in-post pagination altogether. Now, long posts on this blog (such as the one you are reading) are still organised in sections, but it’s actually just a single, long page with in-page hyperlinks. User experience should now be improved, and the site should be in safer water also for what concerns SEO.

Back to the canonical URLs: I am still using them, despite I no longer split long posts into multiple pages, ad again I would suggest you do the same. Why? It is just safer in case you happen to have the same identical (or very similar) content on more than one page, but not only. Do you -like me- use a CDN? If yes, you may be surprised to hear that depending on the CDN you use, and on how you use it, the CDN may affect the SEO performance of your site! Try a site:your-cdn-hostname.com search in Google, and check for yourself if you have CDN-cached copies of your pages and posts that directly compete (as for SEO) with your site! If yes, you clearly have a duplicate content issue there.

Make sure that your CDN allows you to specify a custom robots.txt for your CDN distribution that will remain as you define it regardless of the content of your site’s own robots.txt. So for example in the root of my CDN distribution I have a robots.txt file with the following content:

This tells spider to completely ignore the content of the whole CDN distribution (or “pull zone” if you are using a service similar to the one I use, MaxCDN). Instead, the robots.txt in the actual site that I do want spiders to index, contains:

This allows spiders to index the whole site without restrictions instead. You can also see that I am specifying the location of my site’s XML sitemap. In WordPress, you have likely -hopefully- used a plugin to create this site map that you could then let Google know about (optionally also through the Google Webmaster Tools).

It is a recommended task, since the XML site map helps Google understand the content you care about, and you can even tell it (as well as the other search engines I suppose) the priority with which it should index your pages and posts. You can easily have a fully functional XML site map with Jekyll too, by simply creating a sitemap.xml file in the root of your Jekyll folder. Here’s the content of mine:

You can see that I have specified only my Ruby CV page (you may want search engines to index other pages as well), and all the posts, assigning a higher priority to posts and asking search engines to come back to check for changes every day (this is only an indication of the frequency, you can’t really decide). You can also see that I have added an empty YAML front matter to ensure the file is processed with Liquid when Jekyll rebuilds the static site.

Sometimes I check that the site map is as expected after making changes to the site, by loading it directly in my browser. Since the file will be plain XML in the end, I have specified an XSL stylesheet so to make it appear it more user friendly when displayed in a browser. You can find my sitemap.xsl file in this gist if you want to do the same.

Syntax Highlighting

One of the nice things that Jekyll can handle out of the box (while in WordPress you’ll need a plugin), is syntax highlighting. As mentioned in the Configuration section, Jekyll uses Pygments for this; all you need to do is to turn the feature on, by either setting pygments to true in your _config.yml or as command line argument when you run jekyll to rebuild the static site.

It’s very easy to let Jekyll know you want to render a portion of text in your posts, as code:

Besides HTML, Pygments supports a long list of languages.

One optional thing that you should do is generate the CSS stylesheet for the theme you choose to use with Pygments. In my case, I didn’t like too much the (many) themes available by default, and I went instead for a Railscasts inspired theme which I like much more.

First, I had to install the new theme so that Pygments would recognise it:

Then, to check that all went well, I started Python’s console and entered the commands you see below:

If you see that class returned, all is well. You can now export the correct styles to a new CSS stylesheet that you’ll then have to include in your layouts:

Atom/RSS feed

Similarly to what we’ve seen for the XML sitemap, when you switch from WordPress to Jekyll you’ll also have to remember that the Atom / RSS feed for your blog won’t be automatically generated and updated for you. However, it’s as easy to have a feed with Jekyll as it is to have an XML site map.

Just create a file with the same name and location of the feed in your WordPress blog (usually /feed/index.html), and paste the code below into it:

That’s it. Jekyll will keep this file updated whenever you publish new posts and clients will know about them.

Remember, however, that Jekyll doesn’t handle the time when posts are published, and this may cause your readers to see old posts coming back as soon as they hit your new feed for the first time (as mentioned earlier). In my case I realised this after I had already migrated, so it was late.

If you want, it’s not difficult to fix: just add the time to the YAML front matter of each post, and then add a Liquid place holder for it in the feed template you see above. This should work, although I haven’t tested it.

Extending Jekyll: plugins and extensions

One of the most interesting characteristics of Jekyll is that – provided you know your way around Ruby – you have full power on what it can do in three ways. First, the most obvious one: you can just fork the code and make whichever changes you like, and then use your version of the gem rather than the original one.

However, while this is easy, I prefer the other two options, plugins and extensions, since they let me achieve the same without having to change Jekyll’s code. There are two main advantages with this:

  • you can update Jekyll’s gem whenever there’s a new version, without having to mess with the code because of the changes you had made on the previous version;
  • both plugins and extensions live in code files that you can easily share across projects.

Plugins

Jekyll expects your plugins to live in the _plugins folder. So far I have needed plugins for two particular needs:

  • to make Jekyll generate custom page types that it wouldn’t generate otherwise; for example, think of the archive pages (category pages, tag pages, monthly archives and so on), since Jekyll -surprisingly IMHO- doesn’t generate them by default despite it understands categories, tags, etc;
  • to register new, custom Liquid tags that can be used to pull in whatever content you want, in any place of your layouts, with just a single placeholder as we have seen for content, for example.

Archive pages

I was lucky enough to find some useful examples of plugins here; I basically took the plugin to generate category pages and customised it slightly since -as you may remember- each category in my site has multiple properties (slug, title and autoslug) vs a single text value. Then I created other plugins from that one to generate tag pages and monthly archives.

Besides these, I have also created a couple of other plugins:

Tag cloud

The first custom plugin I wrote lets me register the custom Liquid tag tag_cloud. If used as we’ve seen for other placeholders,

it will render … you guessed it.. a tag cloud. Here’s the code of the plugin:

As you can see it’s different from the other plugins in that it only registers the Liquid tag but it doesn’t create any files. If you’re curious how I calculate the “weight” for each tag depending on the number of posts associated to it, I am now using a formula I found here after some googling.

Speeding up syntax highlighting

Another very useful plugin I was glad to have found, helps speed up the syntax highlighting. I found it here. It basically caches code that has already been processed with Pygments to files in the _cache folder, so next time Jekyll has to rebuild the static site it won’t have to reprocess the syntax highlighting for code snippets that are already cached and haven’t changed (the plugin checks the MD5 hash of each code snippet).

Since I don’t have many posts yet, perhaps it doesn’t make much difference for me, but since Pygments can be quite slow -I think- with large code snippets, I can see how it may help speed things up on larger blogs with lots of code snippets.

Extensions

Plugins aren’t the only way to extend Jekyll without touching its code. I found here about a Ruby gem named jekyll_ext that lets you achieve similar results but in a slightly different way. Extensions are expected to live in _extensions, and work by changing or adding features through filters, thanks to meta programming – see this for more details.

At the moment I am using a single extension, and over the past few days I have realised that plugins are kind of the “official” way of extending Jekyll, so perhaps it may be pointless to use both plugins and extensions; however at the moment I am happy with my setup so I haven’t experimented for example with converting the single extension I currently have into a plugin.

There is one reason though I may anyway look into using just plugins in the future, in that to make extensions work you need to generate the static site by running the wrapper ejekyll (installed together with the jekyll_ext gem), rather than jekyll. It’s just one more thing to remember perhaps, but this makes plugins a little cleaner as a solution for extending Jekyll, besides that they are baked into it already.

Here’s the code for that extension:

Basically it helps me with two things:

  • it registers the custom Liquid functions canonical and urlencode which we’ve seen earlier
  • it creates an hierarchical data set with the posts organised by year, month, day, that I can use in my pages. It’s just the way I am using the data at the moment, so there may be better alternatives, but for me it works.

Contact form

Besides comments, another typical feature of a blog that is basically lost when switching to Jekyll because of the lack of both a database and a server side technology, is that of a contact form.

You could outsource a contact form too, however for added privacy I have preferred adding the support for dynamic content to my static Jekyll blog, through Sinatra. Jekyll generates a completely static site, so it is a no brainer to integrate the two things; also, besides the contact form I may need some dynamic action for something else too in the future, so it’s useful to have Sinatra available while the normal blog is still a super fast static site.

Integrating Sinatra is pretty easy: Sinatra expects all static content that should be ready to be served to clients in the public folder. As mentioned in the configuration section, Jekyll by default generates the static site in _site, but it is easy to change this by setting the option destination in the _config.yml configuration file.

You could change the location of Sinatra’s public folder instead, but since -as we’ll see in the next section- having Sinatra makes Capistrano a natural choice for deployment, and because Capistrano expects the folders imagesstylesheets and javascripts to be in public, it’s just easier to change the destination option in Jekyll instead.

Back to the contact form… there are surely various ways of doing this, but at the moment what I do is have an iframe in the contact page that renders the actual contact form, dynamically rendered by Sinatra. Nobody likes iframes, but for the time being it’ll do.

Conclusions

As you can easily guess, this post took me ages to write, but I wanted to write a useful and up to date reference on how to migrate to Jekyll and how to use it, since I am very happy so far with it and I bet an increasing number of people will prefer its approach over WordPress and other heavier solutions.

I think I have covered pretty much all that is needed for anyone from getting started to Jekyll to basically do most things with it, but I am sure there is more to say on the subject. I’ll leave it here for now, but I am already thinking of a couple more posts on Jekyll that I am sure many will find useful; in particular, a more in depth look at various deployment options (besides just copying the static site, as mentioned), how to integrate a simple Sinatra-powered contact form, and some maintenance tips. So keep on eye on this blog if you are interested in knowing more about these topics.

In the meantime, I hope you’ll find in this post the information you were looking for, and if you have questions, suggestions or any thoughts, don’t be shy and let me know in the comments.




Have your say!

Please see my comment policy if this is your first time here or if you have any questions regarding comments.