TIL: Queuing Sidekiq workers safely with the help of Isolator

November 15, 2021

Listening to a recent episode of the thoughtbot podcast, The Bike Shed, one of the hosts talked about a couple of gems that they'd come across that seemed pretty handy to me.

The first is isolator, a gem by the folks over at Evil Martians that detects non-atomic interactions within a database transaction. What do we mean by that? A simple example taken from the documentation that uses background jobs is something like the following:

User.transaction do
user.update!(confirmed_at: Time.now)
UserMailer.successful_confirmation(user).deliver_later
end

Some variation of the above is relatively common in many Rails codebases that I’ve worked with, more so when you start following the command pattern where you might have some service objects doing some work in a transaction.

Another reasonably common sight is to defer work in a Rails model's hooks, usually after_create, after_update, after_save, or even some combination of those. This particular set of hooks occur while the database transaction is still active. An example:

class Comment < ApplicationRecord
after_create :notify_author
private
def notify_author
CommentMailer.comment_created(self).deliver_later
end
end

Isolator will throw errors at runtime whenever this kind of work happens inside a transaction.

Why should we avoid this?

Isolator appealed to me because I was in the middle of doing some of the plumbing behind migrating from DelayedJob to Sidekiq. The main difference here is that Sidekiq uses Redis as its backing data store. In contrast, DelayedJob uses a table in your database (I won't go into the pros and cons of either here).

When enqueuing with Sidekiq (and Redis) mid-transaction, there is a non-trivial chance that Sidekiq will pick up the job and start processing it before the transaction that queued the job has been committed to the database. In the case of the Comment model, it might mean that when the delayed mailer runs to notify the author about a new comment, that comment doesn't exist yet when the job starts processing.

How does isolator help here? The immediate benefit is knowing that this is happening in your dev or test environments. If we were to install the gem and create a new comment, we'd get an error something like the following:

Isolator::BackgroundJobError:
You are trying to enqueue background job inside db transaction.
In case of transaction failure, this may lead to data inconsistency and unexpected bugs.

To solve this, we want to change our model to queue up the mailer in a transactional callback like after_commit instead (we'll get to ad hoc transactions in a bit). We could rewrite the Comment model above like so:

class Comment < ApplicationRecord
after_create_commit :notify_author
...
end

The other potential issue here is that there might be data inconsistencies that arise in cases that the transaction gets rolled back. Because we're now queuing jobs using a separate data source, rolling back a failed transaction won't be able to pluck the job out of the queue. In cases where the job relies on data created in the transaction, it might mean that you have failing jobs because the data won't ever be there (super annoying, but you might have a forever-failing worker to remove). But the worst case might be that on record updates that don't get committed, you're firing off a job that will now operate on outdated data, and that could be bad.

In my situation, isolator helped quite a lot because, with the relatively robust test coverage that we have, I immediately was able to identify several potential gotchas that would've happened if I'd just flipped over to Sidekiq as is.

I can see a couple of other potential benefits by having this gem in our project. I find having a new, concrete error around that particular pattern makes for a more straightforward explanation about what's happening and why doing this work within a transaction is a potential foot-gun, as opposed to starting at "my background job behaves weirdly in production sometimes". Having the errors helps a lot when it comes to code reviews as well. In the same vein as gems like strong_migrations, the author of the changes will get messages about these patterns ahead of time, and reviewers have one less thing to watch out for and can focus on other things.

What about those ad hoc transactions?

So changing the type of model hook solves our issue if we're queuing jobs up there, but what about that first example where we have a transaction we've started up manually in some other part of the code?

Well, the Evil Martians folks have something else for that problem. The after_commit_everywhere gem allows you to use similar transactional callbacks outside of your models, so something like this:

ActiveRecord::Base.transaction do
after_commit { puts "We're all done!" }
end

I've not had a real opportunity to use this yet; I'm currently using the gem because it replaces the after_commit hooks provided by aasm, where we're queuing jobs on state changes. But, I like the syntax in theory; I think it reads pretty well, and if you're doing a bit of work inside a transaction, having that block at the bottom would express the intent clearly.

I'm keen to see how this gem shakes out post-migration and how helpful we find it having the safety net there while folks get used to the patterns. Let me know if you've used either gem or if you have any other tools that help you with similar gotchas.