Using Google Translate from the terminal

(Update Jan 14, 2011: If you have already used this tip and are back to this post because it’s no longer working, read on… Google have updated the Translate API v2, so I have made some changes to the end of the post accordingly.)

In a previous post, I showed a nice shortcut I use quite often to search for definitions, with Google‘s define search feature, from the command line rather than from within a browser.

As I always have a few terminal windows open at any time, I often look for ways of using some popular web services from the command line as this can be fun and save time too; nowadays many web services having a UI that can be consumed with a browser, also expose APIs that allow developers to integrate these web services in their applications or web mashups.

Google, in particular, offer APIs for just about every one of their web services, including the very popular Google Translate, which I also use a lot to translate mostly from and to English, Finnish, Italian. So, how can we use Google Translate from the command line?

In the previous example, we have seen how easy it is to fetch any kind of web page from the command line by using utilities such as wget or similar; we’ve also seen how we can manipulate and format the content returned, for example to adapt it for display in a terminal. We could do something similar with Google Translate, however there is a quicker and better way to achieve the same result, by using the Google Translate API. This API can be consumed with usual HTTP requests, but returns a JSON response rather than a normal HTML web page, as it is designed to be integrated by developers in other applications and services.

We could manipulate this JSON response once again with filters based on regular expressions as seen in the previous example, but there is an easy way of parsing this JSON directly from the command line, with a utility called jsawk, that works in a very similar way to awk but specifically with JSON-formatted text.

The first step naturally is to make a request to the Google Translate API; in the documentation, we can see that it is possible to consume the API in two ways: either with some JavaScript code, or by making a REST request directly to the service and get in return the result in JSON format almost ready to use – this is what we are going to do.

First of all, you’ll need a Google Account -if you don’t own an account already, create one- and you’ll also need to request an API key to be able to consume the service. You can request a key for free here.
The documentation tells us that we should issue requests with URLs in the format

https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=TEXT TO TRANSLATE&source=SOURCE LANGUAGE CODE&target=TARGET LANGUAGE CODE

So for example, to translate the word “hello” from English to French, with wget we can issue a request like

wget -qO- --user-agent firefox "https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=hello&source=en&target=fr"

As for the previous example, we need to specify a user agent otherwise Google will return an empty response. You should see this result:

{
  "data": {
    "translations": [
      {
        "translatedText": "bonjour"
      }
    ]
  }
}

which is a JSON response, as expected. Next step is to parse this response and get the value of the translatedText property in the nested object data->translations.

However, jsawk seems to expect a JSON array of objects, therefore we’ll need to first manipulate this response and wrap it into [ ] brackets to obtain an array with a single item.

echo "[`wget -qO- --user-agent firefox \"https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=hello&source=en&target=fr\"`]"

the JSON response becomes:

[ {
  "data": {
    "translations": [
      {
        "translatedText": "bonjour"
      }
    ]
  }
} ]

and is now ready to be parsed by jsawk. We need to get the value of the property data.translations[0].translatedText of the first item of the array (as you can see, data.translations is also an array):

echo "[`wget -qO- --user-agent firefox \"https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=hello&source=en&target=fr\"`]"  \
| jsawk -a "return this[0].data.translations[0].translatedText"

You should see just the word “bonjour” rather than the JSON response. One last step we have already seen in the previous example with the define search feature, is to make sure any HTML entities in the translated text can be correctly displayed in the terminal:

echo "[`wget -qO- --user-agent firefox \"https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=hello&source=en&target=fr\"`]" \
| jsawk -a "return this[0].data.translations[0].translatedText" \
| perl -MHTML::Entities -pe 'decode_entities($_)'

At this point, you can automate this command by wrapping it within a shell function that accepts as arguments the source language ($1), the target language ($2) and the text you want to translate ($3):

translate() {
    echo "[`wget -qO- --user-agent firefox \"https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=$3&source=$1&target=$2\"`]" \
    | jsawk -a "return this[0].data.translations[0].translatedText" \
    | perl -MHTML::Entities -pe 'decode_entities($_)'
}

so you can use this function like this:

translate en fr "hello"
=> bonjour

You may also set up some aliases for the couples of languages you translate from/to the most:

alias en2fr='translate en fr "$@"'

So the example for English->French translation simply becomes:

en2fr "hello"
=> bonjour

As seen in this last example, remember to wrap the text to translate within double quotes if you are translating a phrase rather than a single word.

Now, to test that all works, guess what “Hyvää Joulua kaikille” means, in Finnish. 🙂

Update Jan 14, 2011: I noticed today that the trick I described in this post was no longer working as it was; it looks like Google have updated the version 2 of the Translate API, which is the version I have used in the commands above. I hadn’t noticed, actually, that this version was still a “lab” version and not yet a release, as highlighted in the documentation:

Important: This version of the Google Translate API is in Labs, and its features might change unexpectedly until it graduates.

Funnily enough, the documentation itself doesn’t yet reflect some changes they’ve already made to the API. In particular, a request made to the same URL,

wget -qO- --user-agent firefox "https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=hello&source=en&target=fr"

now yields a JSON response in a slightly different format:

{
 "data": {
  "translations": [
   {
    "translated_text": "bonjour"
   }
  ]
 }
}

As you can see, the array’s gone and they’ve renamed the translatedText property to translated_text.So the new version of the translate function, still using the Google Translate API v2, would be:

translate() {
    wget -qO- --user-agent firefox "https://www.googleapis.com/language/translate/v2?key=YOUR KEY&q=$3&source=$1&target=$2" \
    | jsawk  "return this.data.translations[0].translated_text" \
    | perl -MHTML::Entities -pe 'decode_entities($_)'
}

which is also a little bit easier. However, since they’ve made it clear that the API may still change while in the labs, it’s perhaps more convenient to stick to the Google Translate API v1 in the meantime – the result, in the end, should be the same. So the translate function, using v1 instead according to its documentation, becomes:

translate() {
    wget -qO- --user-agent firefox "https://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=$3&langpair=$1|$2" \
    | jsawk "return this.responseData.translatedText" \
    | perl -MHTML::Entities -pe 'decode_entities($_)'
}

It doesn’t look to me like this version requires a key, as it seems to working just fine without any. Quick test:
Ruby

en2fr 'Thanks, Google!'
Merci, Google!

Using Google’s ‘define’ search feature from your terminal

(Update 05/04/2012: Google have slightly changed the format of URLs for search, so I have updated the snippets to take this and other small changes into account)

Since I started writing content for this blog, and as English is not my mother tongue, I find myself using quite often various tools that help me either chose the right word for something I am trying to communicate, or check that the syntax of a sentence is correct, so that the content is readable enough.

One of such tools is -unsurprisingly- Google: by searching for two different terms or phrases within the double quotes (what Google calls “phrase search“), I can see which one yields most results and therefore is more likely to be correct English. But even more useful is the define search feature: by prepending the text “define:” to your search term or query, you can instruct Google to search for and return directly definitions from various sources for that term or phrase, rather than a bunch of links.

I have been using define a lot lately, but at some point I got a bit tired of opening a new browser tab or window each time I had to double check the definition for a word (too much energy, you know…), so I have been toying with a little hack that now lets me use the same feature from within the terminal much more quickly, given that I always have at least one or two terminals open at any time.

There are a few command line utilities you can use to fetch web pages, with wget being one of the most popular. To fetch for example the definitions for the word “blog” from Google define using wget, all what you need to do is type a command like the following:

wget -qO- http://www.google.co.uk/search\?q\=blog\&tbs\=dfn:1

where the option “-qO-” simply tells wget to output the content of the page downloaded directly to screen (or STDOUT) rather than to file. You’ll notice that wget seems to be performing the request as expected, however it shows no output. This is because -it seems- a user agent is required. So let’s try again specifying a user agent such as “Firefox”:

wget -qO- -U "Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4" http://www.google.co.uk/search\?q\=blog\&tbs\=dfn:1

You should now see the HTML of the page as a browser would see it. Problem is, this is not really readable, is it? Next step is to strip all the html tags so that we can only preserve the actual content we are looking for: the definitions for our search term or phrase. We can do this easily by processing the HTML with grep and instructing it to only return li HTML elements since -you can check in the HTML- the li elements in the page correspond to the various definitions returned for your search query.

wget -qO- -U "Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4" http://www.google.co.uk/search\?q\=blog\&tbs\=dfn:1 \
| grep --perl-regexp --only-matching '(?<=
<li style="list-style:none">)[^<]+'

In the pipe above, we tell grep to process wget’s output and use the regular expression provided as argument to return only the parts of each matching line that match the pattern, that is in -in this case- all the li elements present in the page returned by Google. If you try the command above you will now see an output similar to the following for the word “blog”:

read, write, or edit a shared on-line journal
web log: a shared on-line journal where people can post diary entries about their personal experiences and hobbies; "postings on a blog are usually in chronological order"
A blog (a contraction of the term "web log") is a type of website, usually maintained by an individual with regular entries of commentary, descriptions of events, or other material such as graphics or video. Entries are commonly displayed in reverse-chronological order. ...
website that allows users to reflect, share opinions, and discuss various topics in the form of an online journal while readers may comment on posts. ...
blogger - a person who keeps and updates a blog
(cut)

This is a lot better, but we can still improve it further by adding line numbers (with the command nl) and making sure that HTML entities, if any, are displayed correctly in the terminal (we are not using a browser, after all). This can be done by using once again perl and in particular it’s decode_entities() method:

wget -qO- -U "Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4" http://www.google.co.uk/search\?q\=blog\&tbs\=dfn:1 \
| grep --perl-regexp --only-matching '(?<=
<li style="list-style:none">)[^<]+' \
| nl | perl -MHTML::Entities -pe 'decode_entities($_)'

You should now see a more readable output similar to the following:

1   read, write, or edit a shared on-line journal
2   web log: a shared on-line journal where people can post diary entries about their personal experiences and hobbies; "postings on a blog are usually in chronological order"
3   A blog (a contraction of the term "web log") is a type of website, usually maintained by an individual with regular entries of commentary, descriptions of events, or other material such as graphics or video. Entries are commonly displayed in reverse-chronological order. ...
4    website that allows users to reflect, share opinions, and discuss various topics in the form of an online journal while readers may comment on posts. ...
5   blogger - a person who keeps and updates a blog
(cut)

Now edit your .bash_profile file (or equivalent for the shell you use – if different than bash, you may have to adapt slightly the code) and add this function:

define() {
    wget -qO- -U "Mozilla/6.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4" http://www.google.co.uk/search\?q\=$@\&tbs\=dfn:1 \
    | grep -Po '(?<=
<li style="list-style:none">)[^<]+' \     | nl \     | perl -MHTML::Entities -pe 'decode_entities($_)' 2>/dev/null;
}

Then, to finally use the trick from your terminal, all you have to do is enter a command like:

define blog

I love this kind of tricks as they make the use of our dear terminal even more productive. I am sure other Google search features -as well as other web services- can be as useful when consumed from the terminal; we’ll have a look at some more examples later on.

Faster Internet browsing with alternative DNS servers and a local cache

It is no secret to power Internet users that DNS resolution is one of the factors that mostly affect the performance of our Internet browsing, and sometimes even a very fast broadband can become a pain if a poor DNS service is used. DNS -which stands for “Domain Name System“- is a protocol which makes use of a networked database, plus a particular set of services for querying that database, with the main function of translating human friendly, symbolic hostnames such as “www.google.com”, into the numerical addresses or IP‘s of the machines hosting a website or service accessible from the Internet or, generally speaking, a typical network. Fast DNS servers, usually, make for a better user experience thanks to a faster Internet browsing, even with today’s fast broadband services. With today’s media-rich websites, social networks and content mashups, in fact, each time a web page is downloaded chances are that the page contains references to images or other content hosted on several different hosts, and therefore accessible from different hostnames / domain names. While sometimes developers may make this happen in purpose so that browsers can benefit from parallel downloading (see Steve Souders‘ famous 14 rules for faster loading web sites), since each hostname requires a trip to the DNS server, already just a few different hostnames can negatively affect the overall page loading time if a low performing DNS server is used.

ISP’s usually offer their own free DNS together with their Internet connectivity service; this is what most people normally use, and in many cases this may be just OK. However, the DNS service offered by ISP’s is often poor when it comes to performance. Luckily, nowadays there are quite a few alternative, more specialised DNS services which are also freely available and that usually offer either improved performance (thanks to smarter caching and a generally better setup/design) or additional features that go beyond the simple DNS resolution but that make use of DNS -most importantly, improved security with filters at DNS level that prevent users from reaching known malicious sites, protecting them against phishing and other threats . Some of these free services only promise to deliver better performance, such as Google Public DNS and DNS Advantage by UltraDNS, while others – such as Norton DNS or Comodo Secure DNS focus mainly on the security benefits of having an active filtering at DNS level.

Then, among the more popular ones, there is also OpenDNS, that does it all. This was likely the first specialised DNS service and remained basically the only one of its kind for a while, until several others spotted the significant potential of services based on DNS as well as new revenue opportunities. OpenDNS and others offer most of their services for free, while making money with “premium” services with additional features, as well as through NXDOMAIN hijacking: when a request is made for an unresolvable domain name, the browser won’t show the usual, expected error message; instead, as OpenDNS intercepts and processes the requests, it detects that the domain name cannot be resolved (or that the actual website isn’t loading at the time) and by default redirects the user to their OpenDNS Guide page with search results based on the mistyped or wrong domain name… plus some lovely ads, of course. This is a somewhat clever trick that makes them some decent money and therefore it shouldn’t surprise that others have copied it, competitors as well as many ISP’s who can’t miss the opportunity to make some more money the easy way. However this approach by ISP’s has often been criticised, since while OpenDNS makes it pretty clear how they make money out of their free DNS service, most others do not.

Out of the several DNS services around these days, OpenDNS still remains the most feature-rich of them all. Not only it still offers the best DNS performance in many locations (as is the case for me in London, UK), it also offers quite many other features that can be particularly useful when managing a network of computers or if you have small kids, thanks to security filtering and parental control. I recommend you to create an account (it’s free!) and configure the appropriate settings for your network(s) through their excellent dashboard, if you are interested in these features.

opendns-png-1c6b9a

Regardless of which DNS server you use, it is still possible to improve your DNS experience, thus your Internet browsing, a further bit by setting up a local DNS server to use as a cache. Some may argue that most operating systems and browsers already cache DNS query results locally, but while this is true, I have found that a local DNS server used as a cache still helps improve things, especially, of course, if this cache can also be shared with other clients in the same network (at home I use a local DNS cache as well as a caching proxy, Squid, to improve the overall browsing performance of my home clients).

Setting up a local DNS server is pretty easy and quick on Unix systems, provided you are familiar with the terminal. Here we’ll see how to do this on Snow Leopard, but it shouldn’t be too different on other Unix flavours once you have installed the DNS server we are going to use, BIND, with your package manager of choice (for Windows desktops, there was once upon a time a simple DNS server called TreeWalk DNS, but the project seems to have been abandoned years ago and the website is currently listed as malicious by Norton for some reason).

BIND is already installed on Mac OS X, although it is switched off by default. For starters, to prevent users on remote systems form being able to access and control our local BIND server, we need to make sure that a secret key is used to explicitly grant privileges to a host. BIND requires a daemon called named to be running on the system, while the utility rndc will take care of the administration of this daemon with commands that will only work if the keys specified in the two configuration files /etc/named.conf and /etc/rndc.conf match.

It is possible to automate this little configuration and create a key file by executing the command


sudo rndc-confgen -a

Since BIND expects the key to be in /etc while the command above creates the key in /private/etc (at least on Snow Leopard 10.6.5), you can either


sudo mv /private/etc/rndc.key /etc/rndc.key

or


sudo vim /etc/named.conf  

and change the line include “/etc/rndc.key”; to include “/private/etc/rndc.key”;

Update: as reader David Glover reminds me in the comments, there is no need to move the file /private/etc/rndc.key to /etc/rndc.key since /etc is already a symlink to /private/etc; I can’t remember why I had done that while getting BIND to work on my system, but you should be able to safely skip that step. Thanks David.

Next, we need to tell BIND which DNS servers it has to forward to any queries that it cannot answer directly either because they resolve yet locally unknown domain names or because the cached results have expired.

Open the file /etc/named.conf as sudo (unless you have it opened already from the previous step) with vim or your favourite editor, and add the following lines to the options section:

forwarders {
    208.67.222.222;
    208.67.220.220;
};

In this example, I am using OpenDNS’ servers, but you can use Norton’s public DNS (198.153.192.1, 198.153.194.1), Google Public DNS (8.8.8.8, 8.8.4.4), UltraDNS (156.154.70.1, 156.154.71.1) or whichever other DNS servers you prefer or that work best for you.

Now, depending on the version of OS X you are using, you may need or not to create the following – just skip this if you already have the folder /System/Library/StartupItems/BIND.

sudo mkdir -p /System/Library/StartupItems/BIND
sudo nano /System/Library/StartupItems/BIND/BIND

Copy the following lines in the file you’ve just created (unless it was already there), and save.

#!/bin/sh
. /etc/rc.common

if [ "${DNSSERVER}" = "-YES-" ]; then
    /usr/sbin/named
fi

Then make it executable

sudo chmod +x /System/Library/StartupItems/BIND/BIND

In the same folder, create the file

sudo vim /System/Library/StartupItems/BIND/StartupParameters.plist

and copy the following lines in it:

{
    Description = "DNS Server";
    Provides = ("DNS Server");
    OrderPreference = "None";
    Messages =
    {
        start = "Starting BIND…";
        stop = "Stopping BIND…";
    };
}

By default, the DNS server is set not to start at boot. Let’s change that by opening the file

sudo vim /etc/hostconfig

and changing the content so that it contains the line

DNSSERVER=-YES-

Save, then either reboot or load BIND manually for the current session with

sudo /System/Library/StartupItems/BIND/BIND

At this stage BIND should be up and running, but it is not used yet. You will need to go to System Preferences > Network > Advanced > DNS, and replace all the current DNS servers with the only 127.0.0.1 so that your local DNS server is used instead. To make sure this is working as expected, type in your terminal

scutil --dns

You should see an output similar to this:

DNS configuration
resolver #1
domain : config
nameserver[0] : 127.0.0.1
order   : 200000
....

Another thing that may be useful to know is how to flush the DNS cache should you need to do so for any reason:

sudo rndc -p 54 flush && dscacheutil -flushcache

You should now have and be using a local DNS cache and your Internet browsing should feel faster. Please let me know in the comments if this is the case for you as well or whether you see different results.