FileVault: User’s home directory on an encrypted second drive

FileVault 2

Using encryption on a laptop gives you peace of mind that if the laptop gets lost or stolen, others won’t be able to snoop inside your precious data. To this end, I’ve been using FileVault for years to encrypt my home directory; so I was glad that the new version introduced with Lion – also known as FileVault 2 – can now also encrypt entire disks, not just the home. So if you are a Mac user you really have no more excuses not to use encryption on your Mac these days.

Unfortunately, while FileVault makes it easy to enable full disk encryption for the main drive, it’s not as straightforward to encrypt other drives. Besides, it is not possible to move a user’s home directory to an encrypted drive other than the main drive. The reason is that FileVault normally “unlocks” only the main disk before a user logs in, while any other disks that are also encrypted will only be unlocked after the user has logged in. This means that the user’s home directory won’t be available during the login process, if stored on a secondary encrypted drive, causing nasty errors.

On my main MBP I’m lucky enough to have two SSD drives installed, so I wanted to leave the first one (OCZ-VERTEX3 MI) to the OS, and dedicate the second one (OWC Mercury Extreme Pro SSD) to user data, while also having both drives fully encrypted with FileVault.

Here I’ll describe the procedure I followed to achieve this.

Enabling the root user

For starters, I recommend you enable the root user: not only does this make it easier to change the location of your home directory, but it also ensures that if something goes wrong (we’ll see later the most common scenario) you will more likely be able to recover your data or fix your user profile.

You can find easy instructions for this on Apple’s support website.

Encrypting the second drive

I’ll assume here that you’ve already enabled FileVault on the main drive (if not, read this).

Once the root user is enabled, ensure you are logged out ad log in again but as root (from the login window, select ‘other’ and enter ‘root’ as username and whatever password you have set for the root user) and open a terminal. Find the disk you want to encrypt and that will store the home directory with diskutil list:

> diskutil list development [2670cdc] untracked
/dev/disk0
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *240.1 GB disk0
1: EFI 209.7 MB disk0s1
2: Apple_CoreStorage 239.2 GB disk0s2
3: Apple_Boot Recovery HD 650.0 MB disk0s3
/dev/disk1
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *240.1 GB disk1
1: EFI 209.7 MB disk1s1
2: Apple_CoreStorage 239.7 GB disk1s2
3: Apple_Boot Boot OS X 134.2 MB disk1s3
/dev/disk2
#: TYPE NAME SIZE IDENTIFIER
0: Apple_HFS OS *238.9 GB disk2
/dev/disk3
#: TYPE NAME SIZE IDENTIFIER
0: Apple_HFS Data *239.4 GB disk3

In my case I have both drives already encrypted (see Apple_CoreStorage for both drives), but if I hadn’t yet encrypted my second drive, I’d have to run the command

diskutil cs convert /dev/disk1s2 -passphrase

in order to encrypt the partition on my second drive. cs stands for CoreStorage, which is the technology behind FileVault that handles encrypted volumes. The command above will ask for the password you want to use to encrypt the partition – make sure you remember it or keep a note about it somewhere safe, otherwise you won’t be able to access the contents of the encrypted partition later on if you forget it. diskutil will now start encrypting, or “converting” the selected drive, and this will take some time depending on how large the drive is and on how much data is already stored on it.

While diskutil is doing its thing (you can check the status of the conversion at any time with diskutil cs list), open another terminal session and install unlock (big thanks to the author Justin Ridgewell!) – this is required to have a secondary encrypted drive unlocked before logging in:

curl https://raw.github.com/jridgewell/Unlock/master/install.sh | bash

unlock will detect any encrypted drives other than the main one, and for each of them it will ask you if you want to unlock the drive before logging in. If you answer ‘yes’, you will be asked to enter the password required to unlock the drive and that you have set earlier when running the diskutil cs convert command.

Once unlock is installed, you can restart your Mac and then login again as root to proceed with the next step. Don’t worry if the conversion of the disk isn’t complete yet, as it will automatically be resumed once you have restarted.

Moving a user’s home directory

Once you have restarted and are again logged in as root, make a copy (for now) of your home directory to the newly encrypted (or encrypting) drive. For example, in my case the second drive is mounted as “Data”, therefore I copied the contents of my old home directory /Users/vito into /Volumes/Data/Users/vito. I suggest you make a copy rather than just moving your home directory to the new location, so to be able to recover your previous settings if something goes wrong.

When the copy is complete, open System Preferences -> Users & Groups and click on the lock to authenticate yourself and be able to make changes. Then right-click on the user whose home directory you have migrated, and click on Advanced options:

1-jpg-1c6b9a

You’ll see the current location of the home directory:

2-jpg-1c6b9a

In my case, since I have already migrated it, the current location is already /Volumes/Data/Users/vito. In your case it will likely be /Users/your-username. Click on choose, and select the copy of the home directory in the new location. Done that confirm the selection and log out; then login again with your usual user account, and if all went well you’ll see your usual desktop, dock icons, and all the rest. Just to be sure, open the terminal and type:

> cd ~ ; pwd development [2670cdc] modified untracked
/Volumes/Data/Users/vito

If the change was successful, pwd will return the new location of your home directory. At this point, I’d recommend you restart the system once or twice to confirm that the second drive gets always unlocked before logging in, and that once logged in your user account works fine with the home directory in the new location. I find unlock pretty reliable, but you can never know, so it’s safer to check a few times; once you’re happy that everything works as expected, you should be able to safely delete the original home directory to free that disk space.

If something goes wrong….

From my experience over the past weeks, the procedure I described usually just works. However if for some reason you Mac happens to freeze completely and you can’t shut it down cleanly (it has already happened twice to me since upgrading to Mountain Lion), you could be in trouble. After restarting and logging back in, you might see something like this:

3-jpg-1c6b9a

Surprise! It might appear like your stuff is gone. Don’t panic yet – it’s very likely your data is still where it was and in most cases this is quite simple to fix, provided you haven’t disabled the root user! (or have some other admin account available).

If you did disable the root user once encrypted the second drive and moved your home directory across, you will likely end up fiddling with your terminal in a recovery session desperately trying to figure out how to fix your user account, or you’ll otherwise end up restoring from a backup (you do backups, don’t you?).

If you have left the root user enabled as I recommend, fixing should be easy. Log out and login again but as root, and open your terminal. Run the following ls command first to see what’s currently mounted:

Vitos-MacBook-Pro:~ root# ls /Volumes/
Data MobileBackups OS

In my case, I would see a directory named Data since that is the name given to my second drive. If your Mac wasn’t shut down cleanly though, once restarted it could happen that the second drive is not be mounted in that directory. So what happens when you login as your normal user following a forced restart, is that Lion/ML looks for the user directory in /Volumes/Data/Users/vito (or whatever it is in your case) and because it can’t find it, it creates a new home folder in the that location.

Just to confirm, type the following to check the size of your home directory as well as of the mount point for the second drive:

Vitos-MacBook-Pro:~ root# du -hs /Volumes/Data/Users/vito/
7.6M /Volumes/Data/Users/vito/

Vitos-MacBook-Pro:~ root# du -hs /Volumes/Data/
7.6M /Volumes/Data/

You’ll see that both the home directory and the mount point for the encrypted second drive are very small – you might want to check the contests too just to be 100% sure that location doesn’t contain your actual home directory.

So, to fix, you’ll simply need to delete the mount point:

rm -rf /Volumes/Data/

Then log out and log in again with your normal user account. The second drive will be mounted correctly in its usual location, and everything will look normal again.

I like this setup since I like SSDs for obvious performance reasons, but these drive tend to be expensive, so both of my SSDs are kinda small having a capacity of 240GB each. So it’s nice to have OS and apps on one drive, and all the user data on the other one, rather than a full primary drive.

This trick worked really well for me; if you give it a try, please let me know if it does for you too.

Bitwise operations in Ruby, and an example application to testing with Rspec

I often need to test that something should happen or should be possible depending on a variable number of conditions that are linked together in some arbitrary way. In one application, for example, I have a model called License, and one of the requirements is that it should be possible to ‘activate’ an existing license only if its status is either :new or :suspended. Without going into too much detail about the domain specific to this application, it is clear in this example that a license should be activable only if the following two conditions are met at the same time:

  • the license should exist, that is – in typical Rails terms – the license should be persisted, and
  • the status of the license should be either :new or :suspended

So I have an instance method on the License model that looks like this:

class License < ActiveRecord::Base
...
def activable?
persisted? and (status_new? or status_suspended?)
end
...
end

persisted? is a method available on all ActiveRecord models while status_new? and status_suspended? are methods dynamically generated by a module that, included in a model, allows to manage any attribute as an ‘enum’ field that can only contain one from a list of possible values (so, in the example, License can have any of the statuses [:new, :active, :suspended, :revoked, :expired, :transferred]).

How would you go about testing that this method behaves as expected? One obvious way could be by taking into account all the possible combinations of true/false values that the three variables above (persisted?, status_new?, status_suspended?) could have. persisted? is a framework thing (and as such we won’t test it) and the methods dynamically generated concerning the license status are unit-tested separately, plus there are integration tests to ensure everything is working together properly. So it is safe, in this case, to just stub all the methods with the purpose of testing when a license can be activated, in isolation.

So we could have something like this:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
license.stub(:persisted?).and_return(false)
license.stub(:status_new?).and_return(false)
license.stub(:status_suspended?).and_return(false)

license.should_not be_activable

# ... some other combinations

license.stub(:persisted?).and_return(true)
license.stub(:status_new?).and_return(true)
license.stub(:status_suspended?).and_return(false)

license.should be_activable

# ... some other combinations
end
...
end

One other way some would achieve the same thing is:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
# license isn't persisted yet
[:new, :suspended].each { |status| license.status = status; license.should_not be_activable }
(License.statuses - [:new, :suspended]).each { |status| license.status = status; license.should_not be_activable }

license.save!
# license is persisted
[:new, :suspended].each { |status| license.status = status; license.should be_activable }
(License.statuses - [:new, :suspended]).each { |status| license.status = status; license.should_not be_activable }
end
...
end

Either way, the bottom line is that we’ll likely have to somehow loop through all the possible combinations of boolean values that the three variables persisted?, status_new?, status_suspended? can have, and then we determine for each of these combinations what the expected behaviour is (in the example: whether the license can be activated or not). While both examples (as in “Rspec examples”) would work, both suffer from quite a bit of duplication and reduced readability, in that it is not very clear right away by just looking at the code what is the relationship between the conditions we’re taking into account. In our case, we have a “A and (B or C)” kind of relation that is instantly clear only if you read the title of the Rspec examples.

Another way of testing the same thing, which I prefer, is as follows:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
boolean_combinations(3).each do |persisted, _new, suspended|
license.stub(:persisted?).and_return(persisted)
license.stub(:status_new?).and_return(_new)
license.stub(:status_suspended?).and_return(suspended)

if persisted and (_new or suspended)
license.should be_activable
else
license.should_not be_activable
end
end
end
...
end

As you can guess, thanks to the boolean_combinations method (more on that later) we’re still looping through all the possible combinations but with no duplication, and the example is more readable. The advantage is that from looking at just the code, you can understand right away what is the relationship between what I’m calling “the variables” and how the various conditions are linked together. In particular, the code

...
if persisted and (_new or suspended)
license.should be_activable
else
license.should_not be_activable
end
...

clearly says when a license should be activable. It’s good from a “driving code by tests/specs” standpoint in that it suggests the “code we wish we had”. Before going ahead, one note on having multiple expectations in the same example: I usually prefer to keep one expectation per example, as it is often recommended for clarity and simplicity; however in cases like the above all the expectations in the single example define only together the behaviour being tested. None of them, taken individually, would define any behaviour of the subject of the test, and the alternative would be several almost identical specs with perhaps some nested contexts (depending on the combinations of conditions being tested) that would IMO be overkill in such cases. This is particularly true if you have even more than 3 different variables involved in the conditions being tested.

So, back to the boolean_combinations method: how do we generate all the possible combinations of boolean values, given any number of variables? If you’ve studied -in particular- some electronics, the answer should be pretty obvious. What we need is the same kind of truth table often used to figure out how to simplify, or reduce, some logical operations (for example with Karnaugh maps or similar methods). Such a table looks like the following (for 3 variables as in our example) and should be pretty familiar:

a | b | c
---+---+---
F | F | F
F | F | T
F | T | F
F | T | T
T | F | F
T | F | T
T | T | F
T | T | T

One possible ‘Rubyish’ way is to generate this table is with the combination method available for arrays in Ruby. It expects the number of items you want for each combination, and will produce all the possible combinations with the given array of items, e.g.

1.9.3p194 :038 > ['a', 'b', 'c'].combination(2).to_a
=> [["a", "b"], ["a", "c"], ["b", "c"]]
1.9.3p194 :039 >

In our example however we have a slightly different case, since we want combinations of 3 items each, but each of the items can have either true or false value. If we tried with

1.9.3p194 :040 > [true, false].combination(3).to_a
=> []

we would be out of luck because the array doesn’t have enough items to match the number of items required for each combination, therefore the result is an empty array. We can work around this by just ‘extending’ the original array, for example by multiplying it for the number of items we want in each combination:

1.9.3p194 :041 > ([true, false]*3).combination(3).to_a
=> [[true, false, true], [true, false, false], [true, false, true], [true, false, false], [true, true, false], [true, true, true], [true, true, false], [true, false, true], [true, false, false], [true, true, false], [false, true, false], [false, true, true], [false, true, false], [false, false, true], [false, false, false], [false, true, false], [true, false, true], [true, false, false], [true, true, false], [false, true, false]]

The good thing is that this way we do get all combinations we are looking for; the bad thing is that we get lots of duplicates – the more items per combination, the more duplicates. So we’d have to use uniq on the result to remove those duplicates:

1.9.3p194 :044 > ([true, false]*3).combination(3).to_a.uniq
=> [[true, false, true], [true, false, false], [true, true, false], [true, true, true], [false, true, false], [false, true, true], [false, false, true], [false, false, false]]

This does indeed produce all the combinations we are after, but it’s not terribly efficient, albeit it is short and simple.

Using bitwise logic

Another way to generate all the boolean combinations, in a perhaps less ‘Rubyish’ but more efficient way, is to use a bitwise operation.

First, we need to know in advance how many possible combinations we have with the given number of variables, since this will be required in the algorithm we’ll see shortly; one first way to figure out the number of combinations (or range) is with the shift-left operator (which shouldn’t be confused with same-looking operators on types other than integers):

number_of_combinations = 1 << n

As the name of this operator might suggest to some, what it does is shift each bit of the binary representation of the number to the left by n positions. The number of positions is simply the number of elements we want in each combination (that is also the number of variables we want to produce the boolean table for). So, in our example, we have:

number_of_combinations = 1 << 3

The binary representation of 1 is 0001 (using some leading zeros for clarity), so if we shift each bit by 3 positions to the left, we get 1000 which is the binary representation of the number 8. Similarly, 1 << 2 means shifting each bit of 0001 by 2 positions to the left, so we get 0100, which is the binary representation of the number 4. You’d quickly guess that the operation 1 << n is equivalent to the operation 2**n (number two raised to the nth power), so the formula usually used to calculate the number of combination is instead:

number_of_combinations = 2**number_of_items

as it’s a bit easier to remember and understand. In our case we have 2 ** 3 items = 8 combinations. The table we’ve seen earlier is equivalent to the following table:

a | b | c
---+---+---
0 | 0 | 0
0 | 0 | 1
0 | 1 | 0
0 | 1 | 1
1 | 0 | 0
1 | 0 | 1
1 | 1 | 0
1 | 1 | 1

Or also, using the variables in our example:

persisted | status_new | status_suspended combination index (I)
-----------+------------+----------------------------------------------
0 | 0 | 0 0
0 | 0 | 1 1
0 | 1 | 0 2
0 | 1 | 1 3
1 | 0 | 0 4
1 | 0 | 1 5
1 | 1 | 0 6
1 | 1 | 1 7

Interestingly, each combination of 1s and 0s on each line of the table is basically equivalent to the binary representation of the index I of the combination. So 4 = 101, 6 = 110, and so on.

So, if we look at this table we can see that for each combination with index I, each variable will have value true or false depending on whether their corresponding bit in the binary representation of I is set or not. For example, in the combination with index 7, all the three variables will have value true since their corresponding bits in the binary representation of 7, which is 111, are all set. Similarly, in the combination with index 5, persisted and status_suspended will have value true since their bits in the binary representation of 5 (101) are set, while status_new will be false because its bit in the binary representation of 5 isn’t set.

We can say the same thing this way too: given a combination with index I, and an element of the original array with index J, the element J will have value true in the combination I only if the Jnth bit (from the right) of the binary representation of I is set.

Given a number m, how do we check if the bit of position n of the binary representation of m is set? The “canonical” way do to this is to perform a binary and operation between m and the number that has only the bit of n position set. A binary and is simply an and operation performed bit by bit. So for example, if we wanted to check whether the 3rd bit of the binary representation of 16 (which is 10000) is set, we would do:

16 = 10000 &
4 = 00100
-------
00000 = 0 = 3rd bit not set

since 00100 (or 4) is the binary number having only the 3rd bit set. So the operation is equivalent to 16 & 4. In this example, the result of the bit-by-bit and operation is 0, meaning that the 3rb bit of 10000 (16) isn’t set – and in fact it is not. As another example, let’s check now if the 4th bit of 27 is set. The binary representation of 27 is 11011, while the binary number having only the 4th bit set is 10000, or 16. So we have:

27 = 11011 &
16 = 10000
-------
10000

The result is 10000 or 16, and the “rule” is that given:

  • m is a number
  • n is the position of one bit in the binary representation of m
  • o is the number that has only the nth bit set
    the nth bit of the binary representation of m is set if the operation m & o yields a result that differs from zero.

Back to the table above, let’s check for example why persisted has a value of true in the combination by index 5 (“m“). Persisted, in the table, corresponds to the third bit (“n“) of the binary representation of 5, so we need to find out if the 3rd bit from the right in 101 (= 5) is set; the number with only the 3rd bit set (“o“) is 100, or 4, so the operation we need is (using directly numbers in decimal notation):

1.9.3p194 :048 > 5 & 4
=> 4

The result is != 0, meaning that the 3rd bit of 5 is indeed set and, in turn, that for that combination, persisted has value true.

Applying all of the above to write a first version of an algorithm to generate all the boolean combinations we wanted in first place, we get:

def boolean_combinations(number_of_elements)
number_of_combinations = 2 ** number_of_elements
combinations = []

(0...number_of_combinations).each do |combination_index|
combination = Array.new(number_of_elements)

(0...number_of_elements).each do |element_index|
combination[element_index] = (combination_index & 2**element_index) != 0
end

combinations << combination.reverse
end

combinations
end

This indeed produces all the combinations we’re after:

1.9.3p194 :106 > boolean_combinations(3).each {|c| p c}; nil
[false, false, false]
[false, false, true]
[false, true, false]
[false, true, true]
[true, false, false]
[true, false, true]
[true, true, false]
[true, true, true]
=> nil

We can simplify this code a bit. Firstly, with the bit reference operator fix[n] -> 0,1 available with Fixnum objects, we can get right away the nth bit of the binary representation of a given number, and then we return either true or false depending on whether that bit is set or not. So the code

...
combination[element_index] = (combination_index & 2**element_index) != 0
...

is equivalent to

...
combination[element_index] = combination_index[element_index] == 1
...

Secondly, we can slightly simplify the two nested loops with map:

def boolean_combinations(number_of_elements)
(0...2**number_of_elements).map do |i|
(0...number_of_elements).map{ |j| i[j] == 1 }
end
end

I’m not sure of the performance difference (we are talking about tiny arrays here) but perhaps I would prefer the other, more readable version. So I keep a file in spec/support containing the boolean_combinations method, and use it in some Rspec examples as shown earlier:

describe "License" do
subject(:license) { build(:license) }
...
it "can only be activated if persisted, and new or suspended" do
boolean_combinations(3).each do |persisted, _new, suspended|
license.stub(:persisted?).and_return(persisted)
license.stub(:status_new?).and_return(_new)
license.stub(:status_suspended?).and_return(suspended)

if persisted and (_new or suspended)
license.should be_activable
else
license.should_not be_activable
end
end
end
...
end

There are various other operations that can be done with bitwise logic, including some tricks involving databases that I’ll perhaps show in some other post.

Update 16/08/2012: reader Luca Belmondo sent me a comment pointing out that the built in Array#repeated_permutation does exactly the same thing.

1.9.3p194 :004 > [true, false].repeated_permutation(3).to_a
=> [[true, true, true], [true, true, false], [true, false, true], [true, false, false], [false, true, true], [false, true, false], [false, false, true], [false, false, false]]

I hadn’t notice this method before, as well as Array#repeated_combination – it looks like both were introduced in Ruby 1.9 – so this is a fresh reminder that it’s always better to check out what’s already available before reinventing the wheel. Thanks Luca!