has_many :codes

A workaround for ActiveRecord’s partial updates not working properly with serialised columns (bug?)

Published  

Today, as I am currently working on a side project that uses a lot the tableless_model gem, I came across what looks like a bug in ActiveRecord’s support for partial updates, when used with models that have serialised columns.

Actually, this isn’t the first time I see this kind of issue, but I had completely forgotten about it and therefore the previous releases of the gem were also, although indirectly, affected. I have now pushed a new release (0.0.7) with a fix/workaround, so if you have downloaded and are using the gem, please update it. The workaround I have applied to the gem’s code, which I will describe together with the actual problem in this post, is a pretty simple hack that may also work in other cases, when using ActiveRecord models with serialised columns.

Some context

Since Rails 2.1, ActiveRecord supports without additional plugins a feature called partial updates. If enabled (which is the default setting in Rails), and you change some attributes -but not all- of a model’s instance, upon saving that instance ActiveRecord will generate a SQL update statement that will only include the fields that have actually changed, and that therefore need to be updated, rather than a full SQL update statement that would otherwise include all the fields if partial updates were switched off.

This feature, at least in theory, should help improve performance when updating models, by reducing the size of the SQL statements sent over to the database and then parsed. Partial updates may actually help in some cases, for example with text or -worse- blob columns (oh, my…) that may contain large amounts of data; in these cases, there is a benefit in not sending over the network large chunks of SQL text when those text/blob column haven’t changed and therefore do not need to be updated. But in most other cases, partial updates have little to no effect.

Partial updates work thanks to some other feature usually referred to as dirty attribute tracking. This feature basically keeps track of the state of each attribute of a model from when the model is initialised, to when the model is updated. It works, behind the scenes, by using an instance variable named @changed_attributes that is initially an empty hash when an instance of a model is initialised. Whenever (or almost, as we’ll see) the value of an attribute in the model is changed, its writer/setter method calls the write_attribute method with the new value for the attribute as argument. The write_attribute method then also updates the hash stored in @changed_attributes, adding to it a key/value pair having the name of the changed field as key, and an array in the form [ attribute’s previous value, attribute’s new value ], as value for that key. Borrowing once again the example I showed in the previous post for the tableless_model gem, we’d have something like this:

page = Page.first
=> #<Page id: 1, status: "Published", title: "Original title", content: "...", seo: <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">>

page.changes
=> {}

page.title = "New title"
=> "New title"

page.changes
=> { "title" => [ "Original title", "New title" ] }

As you can see in this example, the hash with the changes is initially empty. Once we change the value of one attribute, then, that hash will only contain that single change with both the previous value and the new value for the attribute title.

When the model’s save method is called, eventually, ActiveRecord converts the changes in @changed_attributes into a SQL update statement similar to this, for the example above:

UPDATE `pages` SET `title` = 'New title', `updated_at` = '2011-01-12 21:05:01'
WHERE (`pages`.`id` = 1)

As you can see, only the field that changed is included in this update statement, apart from the updated_at field which is always updated automatically. As said, this could be useful in some cases: in the example, if the model Page had a large text column and partial updates were switched off, the SQL statement above would also include this text column, however huge its value could be, and all the others, regardless of the fact that only the title had changed. Once the SQL update statement is executed in the database, ActiveRecord will revert @changed_attributes once again to an empty hash.

A few notes:

  • you can switch partial updates off, if needed with:
ActiveRecord::Base.partial_updates = false
  • Apart from reduced network traffic, smaller SQL statements also mean less “noise” in your logs, depending on your logging settings.
  • Simultaneous updates to the same record with partial updates switched on could easily fail, unless you enable optimistic locking; for this to work, your model must also have an integer field named lock_version which is used for some sort of versioning.
  • Apart from the changes method, thanks to the dirty attribute tracking and partial updates, ActiveRecord models also expose other helper methods that can be used to determine the state of each attribute, whether or not a record has changed, and which attributes have changed (only names), if any:
page = Page.first
=> #<Page id: 1, status: "Published", title: "Original title", content: "...", seo: <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">>

page.changed?
=> false

page.title = "test"
=> "test"

page.changed?
=> true

page.title_changed?
=> true

page.content_changed?
=> false

# attributes that have changed - only names
page.changed
=> ["title"]

# getting the change for a specific attribute
page.title_change
=> ["Original title", "title"]

Another thing worth noting about these helper methods, is that if an attribute is changed in some way that bypasses its normal setter, that attribute -somewhat surprisingly- won’t be listed among the changes that need to be saved. An example are string conversion methods:

page = Page.first
=> #<Page id: 1, status: "Published", title: "Original title", content: "...", seo: <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">>

page.title
=> "Original title"

page.title.upcase!
=> "ORIGINAL TITLE"

page.changes
=> {}

As you can see, the tracking of changes didn’t work as you would have expected, in this case. I haven’t investigated really in depth this behaviour, which is one of several things concerning partial updates that I find quite weird; so if there’s a good reason for it, I don’t know yet. Anyway, it is possible to force the detection of this kind of changes by manually calling the method attribute_name_will_change!… before the change happens:

page = Page.first
=> #<Page id: 1, status: "Published", title: "Original title", content: "...", seo: <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">>

page.title
=> "Original title"

page.title_will_change!
=> "Original title"

page.title.upcase!
=> "ORIGINAL TITLE"

page.changes
=> { "title" => [ "Original title", "ORIGINAL TITLE" ] }

As said, this is one of the weird things I have found so far concerning dirty attribute tracking and partial updates, which leads us to the original problem with gem.

The problem

Now that the context on partial updates is -hopefully- clear, it’s time to go back to the original problem I had to fix today with the tableless_model gem.

If you haven’t yet read the previous post in which I described the functionality of this gem, please do as this may clear up a few more things. You may also want to look at the [source language=”code”]/source for some more details.

In that post, we’ve seen an example on how to use this tiny gem to store a serialised, tableless model – SeoOptions – in a text column of the parent model, Page. The result was that thanks to serialisation, we were able to remove the one-to-one association between these two models and therefore we were also able to remove the seo_options table altogether, reducing database complexity and improving performance by eliminating the need for an additional query (if using :include or lazy loading) to fetch seo options for a given page.

We’ve also seen that to make this work, the gem is using the serialize class method available in ActiveRecord::Base. The problem I noticed once again today is that if serialize is used with an instance of another class, and one attribute of this instance is changed, page.changes won’t list this change in a similar way to what we’ve seen earlier for methods such as upcase!:

page = Page.first
=> #<Page id: 1, status: "Published", title: "Original title", content: "...", seo: <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">>

page.seo
=> <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">

page.seo.title_tag = "111"
=> "111"

page.seo.title_tag
=> ""

page.changes
=> {}

As you can see here, the page doesn’t know about the change and also the change that first seemed to have been applied, seems gone once we access page.seo.title_tag again.

Unfortunately, because in this case we are modifying an instance of another class, and not an attribute of the same one, will_change! won’t work as it does for upcase!:

page = Page.first
=> #<Page id: 1, status: "Published", title: "Original title", content: "...", seo: <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">>

page.seo
=> <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">

page.seo_will_change!

page.seo.title_tag = "111"
=> "111"

page.seo.title_tag
=> ""

page.changes
=> {"seo"=>[{"title_tag"=>"", "meta_description"=>"", "meta_keywords"=>"", "noindex"=>false, "nofollow"=>false, "noarchive"=>false}, {"title_tag"=>"", "meta_description"=>"", "meta_keywords"=>"", "noindex"=>false, "nofollow"=>false, "noarchive"=>false}]}

Here you can see that will_change! forces seo into the list of changes, however the actual change of the title_tag to “111” was basically lost.

The fix/workaround

To fix this issue, which basically means any changes made to a serialised class directly, are simply lost when you save the parent model, I have made a few changes to the tableless_model (please refer to the source code for more details).

Firstly, in the TablelessModel class I defined two writeable accessors:

class TablelessModel < Hash
  ...
  attr_accessor :__owner_object, :__serialized_attribute
  ...
end

Then, in the has_tableless class method injected in ActiveRecord::Base, I set these two accessors to respectively the instance of the parent model, and the name of the column that will store the instance of the tableless model in YAML-serialised format:

...
define_method column_name.to_s do
  instance = class_type.new(read_attribute(column_name.to_sym) || {})

  instance.__owner_object = self
  instance.__serialized_attribute = column_name

  instance
end
...

Lastly, in the writer method for the tableless model’s attributes (remember that TablelessModel is a specialised hash behind the scenes), there’s a call to the parent/owner object. Each time an attribute of the serialised tableless model is changed, the setter for the serialised column in the parent object is automatically called to update its content with the updated tableless model:

...
def []=(attribute_name, value)
  raise NoMethodError, "The attribute #{attribute_name} is undefined" unless self.class.attributes.has_key? attribute_name.to_s

  return_value = super(attribute_name.to_s, self.class.cast(attribute_name, value))

  if self.__owner_object
    # This makes the tableless model compatible with partial_updates:
    # whenever a property in the tableless model is changed, we force the parent/owner object to a changed state
    # by updating it with a new, updated instance of the tableless model
    self.__owner_object.send "#{self.__serialized_attribute.to_s}=".to_sym, self
  end

  return_value
end
...

The effect is similar to what would happen if we manually updated the serialised attribute in the parent model, using the options= setter (as per the previous example) and setting it to the new, updated, version of the tableless model:

page = Page.first
=> #<Page id: 1, status: "Published", title: "Original title", content: "...", seo: <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">>

page.seo
=> <#SeoOptions meta_description="" meta_keywords="" noarchive=false nofollow=false noindex=false title_tag="">

page.seo.title_tag = "CHANGED!"
=> "CHANGED!"

page.seo.title_tag
=> "CHANGED!"

page.changes
=> {"seo"=>[{"title_tag"=>"", "meta_description"=>"", "meta_keywords"=>"", "noindex"=>false, "nofollow"=>false, "noarchive"=>false}, {"title_tag"=>"CHANGED!", "meta_description"=>"", "meta_keywords"=>"", "noindex"=>false, "nofollow"=>false, "noarchive"=>false}]}

Bingo! As you can see, the change in the serialised model is no longer lost now, and will be persisted correctly to database when saving.

As I’ve said, I can see the benefits of partial updates in some particular cases, but this feature often causes more issues than the good it brings. I know of at least another issue caused by it, with models that have composite keys; however I will need to check if the problem still exists, and if it does, it will be subject for another post.

In the meantime, this workaround seems to be working as expected. So, beware of these issues with partial updates anyway, should you use serialised columns in your applications!

© Vito Botta