Hunting down spooky JSON module redefinition via Oj

April 25, 2022

I've been spending some time recently to get on top of the dependabot alerts in our repo that have fallen by the wayside a little.

Many of these are often a quick look over the changelog and approve, assuming the tests all pass; there might be some QA involved if anything is particularly spooky-looking or load-bearing. Then some upgrades need a bit of a closer look because the PR bumps several versions, and there are some breaking changes in the API, or the behaviour is now different, and things break in the app, not usually a massive drama.

Before we get into it, I want to give a special thanks to Iain McCoy for doing some of the spelunking on this issue and picked up what was happening in the C side of things.

A new mystery

Something more interesting than I ran into recently came up while upgrading the version of fhir_client that we're using, from 4.0.3 to 5.0.3. Most of the changes looked relatively innocuous, except our test suite was now failing in a whole bunch of places with the following error:

undefined method `has_key?' for nil:NilClass

There was no specific place that was causing issues; it seemed to me whenever we were making a request via the client we had set up to hit an external API. Taking a quick trip to the issues for the project turned up an issue that looked suspiciously similar but with no actual outcome at the end of the thread barring some workarounds.

So it seems like I was on my own here. Let's take a look at the full stack trace from the failures:

undefined method `has_key?' for nil:NilClass
# /usr/local/bundle/gems/fhir_models-4.2.1/lib/fhir_models/bootstrap/json.rb:10:in `pretty_generate'
# /usr/local/bundle/gems/fhir_models-4.2.1/lib/fhir_models/bootstrap/json.rb:10:in `to_json'
# /usr/local/bundle/gems/fhir_client-5.0.3/lib/fhir_client/client.rb:397:in `request_payload'
# /usr/local/bundle/gems/fhir_client-5.0.3/lib/fhir_client/client.rb:527:in `post'
# /usr/local/bundle/gems/fhir_client-5.0.3/lib/fhir_client/sections/crud.rb:207:in `base_create'
# /usr/local/bundle/gems/fhir_client-5.0.3/lib/fhir_client/sections/crud.rb:177:in `create'

Cracking open the fhir_models gem with code $(bundle show fhir_models) on the update branch to get a look at the changes, I can see that the module having issues is pretty simple:

require 'JSON'
module FHIR
module JSON
def to_json(opts = nil)
JSON.pretty_generate(to_hash, opts)

I could confirm with pry that trying to generate the JSON with to_hash and opts (which was nil) was raising that error, and via rails console that trying to pass other random hashes of data to JSON.pretty_generate would also throw the same error when opts was nil, but would be fine if opts was an empty map.

The mystery arose when looking at the source for JSON.pretty_generate, where the definition of that method uses nil as the default value of opts, and the handling should very much take care of a nil value.

Things became a little clearer when trying to do the same test via irb:

$ irb
irb(main):001:0> require 'json'
=> true
irb(main):002:0> JSON.pretty_generate({a: 'b'}, nil)
=> "{\n \"a\": \"b\"\n}"

No error. I began to wonder if JSON was even the same thing here. My next lead was that there was something off in our app.

A case of mistaken identity

I knew that our app was using Oj for JSON serialisation. But I didn't initially notice anything spooky about pretty_generate. A colleague of mine (shoutout Iain) had keener eyes and pulled up the C file that dealt with something spooky sounding in mimic_json.c.

The Let's See Who This Really Is meme, where the mask, labelled JSON has been removed to reveal the villain, labelled OJ

Reading up a bit more about the Rails setup for the Oj gem, the documentation suggests adding the following to an initialiser in your app to have the gem "take over" (quotes theirs) many methods of the JSON gem. That info, partnered with the C file, illuminated us to the fact that the OJ.mimic_JSON method redefines the JSON.pretty_generate method (among others) with a different implementation, one that either doesn't handle a nil value for opts or changes the handling to be focused on an empty hash.

Taking a look back at our initialisers, I found one for Oj, one where we're calling Oj.optimize_rails, of which mimic_JSON is one of the steps.

Moving forward from a drop-in replacement

We find ourselves here because we broadly do a lot of JSON generation as part of our API. We use active_model_serializers, and the easiest way of getting the benefits of Oj's speedier JSON generation is to let it take over everything in one step making all of your serialisers more performant.

On the other hand, the problem is that you're now rolling with all of those redefined methods any time you use the JSON module, even when you're not a) aware of it or b) intending to use them.

In a happy coincidence, we're knee-deep in the middle of a migration of our serialisers to use Blueprinter instead of AMS. One benefit, in this case, is that it lets you explicitly set what generator you're using, like so:

require 'oj'
Blueprinter.configure do |config|
config.generator = Oj # default is JSON

This whole scenario has fast-tracked our plans a little to finish off our migration so that we can remove gotchas like this and unblock our upgrade of the fhir_client gem that we were looking for at initially.

I always enjoy a bit of a spelunk like this. I'd be keen to hear of any other interesting scenarios that you might have run into that are similar.