Bitwise operations in Ruby, and an example application to testing with Rspec

applying-bitwise-operations-to-rspec-testing

I often need to test that something should happen or should be possible depending on a variable number of conditions that are linked together in some arbitrary way. In one application, for example, I have a model called License, and one of the requirements is that it should be possible to ‘activate’ an existing license only if its status is either :new or :suspended. Without going into too much detail about the domain specific to this application, it is clear in this example that a license should be activable only if the following two conditions are met at the same time:

  • the license should exist, that is – in typical Rails terms – the license should be persistedand
  • the status of the license should be either :new or :suspended

So I have an instance method on the License model that looks like this:

persisted? is a method available on all ActiveRecord models while status_new? and status_suspended? are methods dynamically generated by a module that, included in a model, allows to manage any attribute as an ‘enum’ field that can only contain one from a list of possible values (so, in the example, License can have any of the statuses [:new, :active, :suspended, :revoked, :expired, :transferred]).

How would you go about testing that this method behaves as expected? One obvious way could be by taking into account all the possible combinations of true/false values that the three variables above (persisted?status_new?status_suspended?) could have. persisted? is a framework thing (and as such we won’t test it) and the methods dynamically generated concerning the license status are unit-tested separately, plus there are integration tests to ensure everything is working together properly. So it is safe, in this case, to just stub all the methods with the purpose of testing when a license can be activated, in isolation.

So we could have something like this:

One other way some would achieve the same thing is:

Either way, the bottom line is that we’ll likely have to somehow loop through all the possible combinations of boolean values that the three variables persisted?status_new?status_suspended? can have, and then we determine for each of these combinations what the expected behaviour is (in the example: whether the license can be activated or not). While both examples (as in “Rspec examples”) would work, both suffer from quite a bit of duplication and reduced readability, in that it is not very clear right away by just looking at the code what is the relationship between the conditions we’re taking into account. In our case, we have a “A and (B or C)” kind of relation that is instantly clear only if you read the title of the Rspec examples.

Another way of testing the same thing, which I prefer, is as follows:

As you can guess, thanks to the boolean_combinations method (more on that later) we’re still looping through all the possible combinations but with no duplication, and the example is more readable. The advantage is that from looking at just the code, you can understand right away what is the relationship between what I’m calling “the variables” and how the various conditions are linked together. In particular, the code

clearly says when a license should be activable. It’s good from a “driving code by tests/specs” standpoint in that it suggests the “code we wish we had”. Before going ahead, one note on having multiple expectations in the same example: I usually prefer to keep one expectation per example, as it is often recommended for clarity and simplicity; however in cases like the above all the expectations in the single example define only together the behaviour being tested. None of them, taken individually, would define any behaviour of the subject of the test, and the alternative would be several almost identical specs with perhaps some nested contexts (depending on the combinations of conditions being tested) that would IMO be overkill in such cases. This is particularly true if you have even more than 3 different variables involved in the conditions being tested.

So, back to the boolean_combinations method: how do we generate all the possible combinations of boolean values, given any number of variables? If you’ve studied -in particular- some electronics, the answer should be pretty obvious. What we need is the same kind of truth table often used to figure out how to simplify, or reduce, some logical operations (for example with Karnaugh maps or similar methods). Such a table looks like the following (for 3 variables as in our example) and should be pretty familiar:

One possible ‘Rubyish’ way is to generate this table is with the combination method available for arrays in Ruby. It expects the number of items you want for each combination, and will produce all the possible combinations with the given array of items, e.g.

In our example however we have a slightly different case, since we want combinations of 3 items each, but each of the items can have either true or false value. If we tried with

we would be out of luck because the array doesn’t have enough items to match the number of items required for each combination, therefore the result is an empty array. We can work around this by just ‘extending’ the original array, for example by multiplying it for the number of items we want in each combination:

The good thing is that this way we do get all combinations we are looking for; the bad thing is that we get lots of duplicates – the more items per combination, the more duplicates. So we’d have to use uniq on the result to remove those duplicates:

This does indeed produce all the combinations we are after, but it’s not terribly efficient, albeit it is short and simple.

Using bitwise logic

Another way to generate all the boolean combinations, in a perhaps less ‘Rubyish’ but more efficient way, is to use a bitwise operation.

First, we need to know in advance how many possible combinations we have with the given number of variables, since this will be required in the algorithm we’ll see shortly; one first way to figure out the number of combinations (or range) is with the shift-left operator (which shouldn’t be confused with same-looking operators on types other than integers):

As the name of this operator might suggest to some, what it does is shift each bit of the binary representation of the number to the left by n positions. The number of positions is simply the number of elements we want in each combination (that is also the number of variables we want to produce the boolean table for). So, in our example, we have:

The binary representation of 1 is 0001 (using some leading zeros for clarity), so if we shift each bit by 3 positions to the left, we get 1000 which is the binary representation of the number 8. Similarly, 1 << 2 means shifting each bit of 0001 by 2 positions to the left, so we get 0100, which is the binary representation of the number 4. You’d quickly guess that the operation 1 << n is equivalent to the operation 2**n (number two raised to the nth power), so the formula usually used to calculate the number of combination is instead:

as it’s a bit easier to remember and understand. In our case we have 2 ** 3 items = 8 combinations. The table we’ve seen earlier is equivalent to the following table:

Or also, using the variables in our example:

Interestingly, each combination of 1s and 0s on each line of the table is basically equivalent to the binary representation of the index I of the combination. So 4 = 101, 6 = 110, and so on.

So, if we look at this table we can see that for each combination with index I, each variable will have value true or false depending on whether their corresponding bit in the binary representation of I is set or not. For example, in the combination with index 7, all the three variables will have value true since their corresponding bits in the binary representation of 7, which is 111, are all set. Similarly, in the combination with index 5, persisted and status_suspended will have value true since their bits in the binary representation of 5 (101) are set, while status_new will be false because its bit in the binary representation of 5 isn’t set.

We can say the same thing this way too: given a combination with index I, and an element of the original array with index J, the element J will have value true in the combination I only if the Jnth bit (from the right) of the binary representation of I is set.

Given a number m, how do we check if the bit of position n of the binary representation of m is set? The “canonical” way do to this is to perform a binary and operation between m and the number that has only the bit of n position set. A binary and is simply an and operation performed bit by bit. So for example, if we wanted to check whether the 3rd bit of the binary representation of 16 (which is 10000) is set, we would do:

since 00100 (or 4) is the binary number having only the 3rd bit set. So the operation is equivalent to 16 & 4. In this example, the result of the bit-by-bit and operation is 0, meaning that the 3rb bit of 10000 (16) isn’t set – and in fact it is not. As another example, let’s check now if the 4th bit of 27 is set. The binary representation of 27 is 11011, while the binary number having only the 4th bit set is 10000, or 16. So we have:

The result is 10000 or 16, and the “rule” is that given:

  • m is a number
  • n is the position of one bit in the binary representation of m
  • o is the number that has only the nth bit set

the nth bit of the binary representation of m is set if the operation m & o yields a result that differs from zero.

Back to the table above, let’s check for example why persisted has a value of true in the combination by index 5 (“m“). Persisted, in the table, corresponds to the third bit (“n“) of the binary representation of 5, so we need to find out if the 3rd bit from the right in 101 (= 5) is set; the number with only the 3rd bit set (“o“) is 100, or 4, so the operation we need is (using directly numbers in decimal notation):

The result is != 0, meaning that the 3rd bit of 5 is indeed set and, in turn, that for that combination, persisted has value true.

Applying all of the above to write a first version of an algorithm to generate all the boolean combinations we wanted in first place, we get:

This indeed produces all the combinations we’re after:

We can simplify this code a bit. Firstly, with the bit reference operator fix[n] -> 0,1 available with Fixnum objects, we can get right away the nth bit of the binary representation of a given number, and then we return either true or false depending on whether that bit is set or not. So the code

is equivalent to

Secondly, we can slightly simplify the two nested loops with map:

I’m not sure of the performance difference (we are talking about tiny arrays here) but perhaps I would prefer the other, more readable version. So I keep a file in spec/support containing the boolean_combinations method, and use it in some Rspec examples as shown earlier:

There are various other operations that can be done with bitwise logic, including some tricks involving databases that I’ll perhaps show in some other post.

Update 16/08/2012: reader Luca Belmondo sent me a comment pointing out that the built in Array#repeated_permutation does exactly the same thing.

I hadn’t notice this method before, as well as Array#repeated_combination - it looks like both were introduced in Ruby 1.9 – so this is a fresh reminder that it’s always better to check out what’s already available before reinventing the wheel. Thanks Luca!




Have your say!

Please see my comment policy if this is your first time here or if you have any questions regarding comments.