has_many :codes

share_counts Ruby gem and social networks

Published  

This is a post about the share_counts Ruby gem I wrote. I was looking for a way to quickly check at once how many times a URL has been shared on the most popular social networks and aggregators, but I couldn’t find any. So I wrote some code to query these social networks’ APIs and I thought it may be useful to others, so why not gem it up? In fact, I already got the confirmation that may be useful to others since I just published the gem hours ago and despite I am talking about it only now, I saw that almost 20 people had already downloaded the gem!At the moment, the gem – named share_counts – supports the following social networks and aggregators:

  • Reddit
  • Digg
  • Twitter
  • Facebook (Shares and Likes)
  • LinkedIn
  • Google Buzz
  • StumbleUpon

I may add support for other networks if needed, and I will more likely extend the gem with other methods to leverage more of what the APIs offer, so stay tuned.

share_counts Ruby gem

Github repo: https://github.com/vitobotta/share_counts. On RubyGems: https://rubygems.org/gems/share_counts.

Once you have installed the gem with the usual gem install share_counts, it’s very easy to use. For example, if you want to check the Reddit score for a story, you can call the method reddit with the URL as argument:

ruby-1.9.2-p0 :001 > require "share_counts"
=> true

ruby-1.9.2-p0 :016 > ShareCounts.supported_networks
=> ["reddit", "digg", "twitter", "facebook", "fblike", "linkedin", "googlebuzz", "stumbleupon"]

ruby-1.9.2-p0 :002 > ShareCounts.reddit "http://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to reddit...
=> 5

It works the same way with the other networks supported. Only for Facebook there are two methods available rather than one, since Facebook has both “shares” and “likes”:

ruby-1.9.2-p0 :003 > ShareCounts.facebook "http://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to facebook...
=> 1

ruby-1.9.2-p0 :004 > ShareCounts.fblike "http://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to fblike...
=> 0

You can also get both shares and likes together:

ruby-1.9.2-p0 :007 > ShareCounts.fball "http://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to fball...
=> {"share_count"=>1, "like_count"=>0}

Also you can get the share counts for all the supported services in one call or otherwise specify which ones you are interested in:

ruby-1.9.2-p0 :005 > ShareCounts.all "http://vitobotta.com/awesomeprint-similar-production/"
Redis caching is disabled - Making request to reddit...
Redis caching is disabled - Making request to digg...
Redis caching is disabled - Making request to twitter...
Redis caching is disabled - Making request to facebook...
Redis caching is disabled - Making request to fblike...
Redis caching is disabled - Making request to linkedin...
Redis caching is disabled - Making request to googlebuzz...
Redis caching is disabled - Making request to stumbleupon...
=> {:reddit=>4, :digg=>1, :twitter=>2, :facebook=>1, :fblike=>0, :linkedin=>2, :googlebuzz=>0, :stumbleupon=>0}

ruby-1.9.2-p0 :006 > ShareCounts.selected "http://vitobotta.com/awesomeprint-similar-production/", [ :reddit, :linkedin ]
Redis caching is disabled - Making request to reddit...
Redis caching is disabled - Making request to linkedin...
=> {:reddit=>4, :linkedin=>2}

In these cases you’ll get back a hash instead.

At this point, you may have noticed the message “Redis caching is disabled” being printed with each call. That was because I had the caching disabled. Since I’ve noticed that a) some of these social networks’ APIs aren’t available/working 100% of the time, and b) some of them may do rate limiting if you are making too many requests in a short period of time, the gem also supports caching with Redis.

By default caching is disabled since you may not be running Redis or you may want to use some other caching in your application, or you may not want to use caching at all. So the first step if you do want to use the caching, is to enable it. By default, share_counts assumes that Redis is listening on 127.0.0.1:6379, but you can override this by either setting in advance the global variable $share_counts_cache if you already have a reference to a connection with Redis (and you’re using the same redis gem used by share_counts), or by passing that reference as argument. You can alternatively specify host and port when you enable the caching:

# Using caching with the default settings
ruby-1.9.2-p0 :009 > ShareCounts.use_cache
=> #<Redis client v2.1.1 connected to redis://127.0.0.1:6379/0 (Redis v2.0.3)>

# Using an existing reference to a connection to Redis
$share_counts_cache = a_Redis_connection

# Same thing as above, but by passing the reference to the connection as argument
ruby-1.9.2-p0 :010 > ShareCounts.use_cache :redis_store => a_Redis_connection

# Specifying host and port for the connection to Redis
ruby-1.9.2-p0 :010 > ShareCounts.use_cache :host => "localhost", :port => 6379
=> #<Redis client v2.1.1 connected to redis://127.0.0.1:6379/0 (Redis v2.0.3)>

Cached share counts expire by default in 2 minutes, but you can again override this by setting the global variable $share_counts_cache_expire to a value in seconds.

So, let’s compare now using the gem with and without caching:

ruby-1.9.2-p0 :002 > require 'benchmark'
=> true

# Enabling caching
ruby-1.9.2-p0 :003 > ShareCounts.use_cache
=> #<Redis client v2.1.1 connected to redis://127.0.0.1:6379/0 (Redis v2.0.3)>

# First run, values are not cached
ruby-1.9.2-p0 :004 > Benchmark.realtime { ShareCounts.all "http://vitobotta.com/awesomeprint-similar-production/" }
Making request to reddit...
Making request to digg...
Making request to twitter...
Making request to facebook...
Making request to fblike...
Making request to linkedin...
Making request to googlebuzz...
Making request to stumbleupon...
=> 3.7037899494171143

# Now values are cached
ruby-1.9.2-p0 :005 > Benchmark.realtime { ShareCounts.all "http://vitobotta.com/awesomeprint-similar-production/" }
Loaded reddit count from cache
Loaded digg count from cache
Loaded twitter count from cache
Loaded facebook count from cache
Loaded fblike count from cache
Loaded linkedin count from cache
Loaded googlebuzz count from cache
Loaded stumbleupon count from cache
=> 0.003225088119506836

You can see which URLs have been cached and with which available share counts, with the cached method:

ruby-1.9.2-p0 :013 > ShareCounts.cached
=> {"http://vitobotta.com/awesomeprint-similar-production/"=>{:fblike=>0, :stumbleupon=>0, :linkedin=>2, :googlebuzz=>0, :facebook=>1, :twitter=>2, :digg=>1, :reddit=>5}}

Also, if you need you can clear the cached values:

ruby-1.9.2-p0 :013 > ShareCounts.cached
=> {"http://vitobotta.com/awesomeprint-similar-production/"=>{:fblike=>0, :stumbleupon=>0, :linkedin=>2, :googlebuzz=>0, :facebook=>1, :twitter=>2, :digg=>1, :reddit=>5}}
ruby-1.9.2-p0 :014 > ShareCounts.clear_cache
=> ["ShareCounts||fblike||http://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||stumbleupon||http://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||linkedin||http://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||googlebuzz||http://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||facebook||http://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||twitter||http://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||digg||http://vitobotta.com/awesomeprint-similar-production/", "ShareCounts||reddit||http://vitobotta.com/awesomeprint-similar-production/"]
ruby-1.9.2-p0 :015 > ShareCounts.cached
=> {}

Notes:

  • If a request fails for one network, its share count won’t be cached and will remain set to nil. This way you can easily know whether a service’s API failed by just checking whether its share count for the given URL is nil or not.
  • Since you may be already using Redis in your app for something else, the gem namespaces the keys so that if you clear the cache, only its keys will be deleted.

A look at the code

The code is on Github if you want to have a look. Here I’ll highlight a few things.

All the methods to retrieve share counts for each supported service are wrapped in the module ShareCounts, as you may already have guessed:

module ShareCounts

  extend Common
  extend Caching

  def self.supported_networks
    %w(reddit digg twitter facebook fblike linkedin googlebuzz stumbleupon)
  end

  def self.reddit url
    try("reddit", url) {
      extract_count from_json( "http://www.reddit.com/api/info.json", :url => url ),
      :selector => "data/children/data/score"
    }
  end

...

In particular, the try method will try to fetch the requested share count(s) by either making an HTTP request (or multiple requests depending on which share counts are being requested) or, if caching is enabled, from the cache.

def try service, url, &block
  cache_key = "ShareCounts||#{service}||#{url}"
  if cache_enabled?
    if result = from_redis(cache_key)
      puts "Loaded #{service} count from cache"
      result
    else
      puts "Making request to #{service}..."
      to_redis(cache_key, yield)
    end
  else
    puts "Redis caching is disabled - Making request to #{service}..."
    yield
  end
rescue Exception => e
  puts "Something went wrong with #{service}: #{e}"
end

Since most of these APIs follow a common pattern, HTTP requests are made with the assumption that APIs will return a JSON response with or without a callback method; if a callback method is provided, then the response is first manipulate to just extract the JSON data we need. The make_request method will attempt a request to a network’s API for a maximum of three times, with a maximum timeout of 2 seconds for each attempt. There’s a reason for this: while I was testing these APIs, I noticed that in most cases if a request didn’t return within a couple seconds, it then either timed out after a long time or returned with a 503 Service Unavailable status code. From this point of view, I must say I was surprised to see that Digg‘s API was likely the least reliable of the bunch, returning a 503 code too often, although I wasn’t making too many requests in a short period of time, so I doubt this was because of rate limiting. Anyway, the combination of a 2 seconds timeout and the three attempts, means we expect a response from each service within a few seconds and that’s a good compromise if you use caching. To make requests, I am using one of my favourite gems, rest-client (from the Github user archiloque‘s fork since it seems to be more up to date than the original one by Heroku‘s Adam Wiggins):

def make_request *args
  result = nil
  attempts = 1

  begin
    timeout(2) do
      url = args.shift
      params = args.inject({}) { |r, c| r.merge! c }
      response = RestClient.get url, { :params => params }

      # if a callback is specified, the expected response is in the format "callback_name(JSON data)";
      # with the response ending with ";" and, in some cases, "\n"
      result = params.keys.include?(:callback) \
      ? response.gsub(/^(.*);+\n*$/, "\\1").gsub(/^#{params[:callback]}\((.*)\)$/, "\\1") \
      : response
    end

  rescue Exception => e
    puts "Failed #{attempts} attempt(s)"
    attempts += 1
    retry if attempts <= 3
  end

  result
end

As for the extraction of the actual share counts from each API’s response, I was pleased to see a common pattern in the usage of JSON, so it was as easy as writing a simple method that “queries” the JSON data in a way that somehow recalls XPATH for XML. Arguments are the JSON data and a :selector => “where/the/share/count/is”, single key hash:

def extract_count *args
  json = args.shift
  result = args.first.flatten.last.split("/").inject( json.is_a?(Array) ? json.first : json ) {
    |r, c| r[c].is_a?(Array) ? r[c].first : r[c]
  }
end

The stuff needed for the caching with Redis is in a separate mix-in. If you haven’t used Redis yet, you can see its most basic usage from looking at this code.

To initialise a connection and optionally specify host and port:

def use_cache *args
  arguments = args.inject({}) { |r, c| r.merge(c) }
  $share_counts_cache ||= arguments[:redis_store] ||
  Redis.new(:host => arguments[:host] || "127.0.0.1", :port => arguments[:port] || "6379")
end

To read from and write to Redis:

def from_redis(cache_key)
  value = $share_counts_cache.get(cache_key)
  return if value.nil?
  Marshal.load value
end

def to_redis(cache_key, value)
  $share_counts_cache.set cache_key, Marshal.dump(value)
  $share_counts_cache.expire cache_key, $share_counts_cache_expire || 120
  value
end

Then we have methods to return all the cached values used by the gem, and to clear those cached values:

def cached
  urls = ($share_counts_cache || {}).keys.select{|k| k =~ /^ShareCounts/ }.inject({}) do |result, key|
    data = key.split("||"); network = data[1]; url = data[2];
    count = from_redis("ShareCounts||#{network}||#{url}")
    (result[url] ||= {})[network.to_sym] = count unless ["all", "fball"].include? network
    result
  end
  urls
end

def clear_cache
  ($share_counts_cache || {}).keys.select{|cache_key| cache_key =~ /^ShareCounts/ }.each{|cache_key|
    $share_counts_cache.del cache_key}
end

As you can see keys are sort of “namespaced” and I am using inject to build a hash with the cached URLs and share counts.

APIs: a few exceptions to the “rule”

As said, most of the APIs for supported social networks follow a common pattern in their usage of JSON data. However there were two exceptions; a small one with Google Buzz‘s, in that it returns a JavaScript object -instead of an array- having as unique property the URL specified as argument; the value of that property then is the actual share count on Google Buzz. So in this case rather than using the extract_count method as for the other JSON-based APIs, all I had to do is getting the value of that property once parsed the JSON response:

def self.googlebuzz url
  try("googlebuzz", url) {
    from_json("http://www.google.com/buzz/api/buzzThis/buzzCounter",
    :url => url, :callback => "google_buzz_set_count" )[url]
  }
end

The second exception, instead, is StumbleUpon. I was so surprised and disappointed to see that they don’t have an API yet! (unless I missed it). It looks like StumbleUpon is a little behind the competition on this front. Luckily, despite the lack of an API, it wasn’t much more difficult to fetch share counts for SU too; in this case, once identified the HTML returned when their button is displayed, I could use Nokogiri to extract the share count, using XPATH:

def self.stumbleupon url
  try("stumbleupon", url) {
    Nokogiri::HTML.parse(
    make_request("http://www.stumbleupon.com/badge/embed/5/", :url => url )
    ).xpath( "//body/div/ul/li[2]/a/span").text.to_i
  }
end

this was a quick look at the code as it is now, but I expect to add more methods to fetch more information from the APIs, so keep an eye on the Github repo if you plan on using this gem.

Also, if you have any suggestions on what to add or how to improve it, please let me know in the comments.

© Vito Botta