ActiveRecord models: How to remove data in GDPR compliant way

blog_post_title_image_1-13

If dependent: :destroy in Rails ActiveRecord models is not working out for you, but you still need to ensure GDPR compliance and actually remove the data. Check out WipeOut library.

Story time

A few years ago, we were introducing data retention policies within our applications, as a result of a “small” change called GDPR. We mostly used dependent: :destroy or raise or nothing in ActiveRecord models, however, with GDPR and our ISO27001 policies, this was unacceptable. It couldn’t be an ad-hoc, hope-for-the-best approach. Our team had to quickly - and confidently - deliver a solution before the GDPR deadline. To begin, we made sure all relations have dependent: :destroy, and there’s no actual data, no data - no problem, but…

Some time later, we also wanted to conduct data analysis to understand what’s happening in our apps. We were collecting events, but they weren’t always enough. We could add more events, but since data already existed within our own database, why would we simply not just use that? So we popped a few database views (exposing tables directly has other problems…) into the database and exposed them in DataStudio. With this method, we can control what’s visible in analytics and denormalize complex structures for easier analysis.

Our team soon realised that this combination of removing data with analytics based on the database doesn’t work well. We lose historical data, it’s easy to forget about dependent: :destroy, and there’s another problem. Removing data regarding “money stuff” is a no-go from a compliance standpoint. Some countries require up to 10(!) years of history in case of an audit. On the other hand, keeping data forever is an absolute no go too - GDPR.

We were wondering how to approach this problem and at the beginning, we started writing additional scripts to do this - delete some, overwrite some, keep what’s left, etc. Although this worked for a while, over time, new features were added, making it too easy to forget/ignore separate scripts. By implementing code reviews, we mostly mitigated this, however, code reviews shouldn’t focus on this! In case someone missed something, data would be left over - a disaster waiting to happen. We also had similar requirements in other apps. Knowing our requirements, we decided to start from scratch and at the same time, build it as a separate gem; this approach would make it easily reusable.

The library required the following offerings:needed to offer:

  • Flexible DSL to use and understand, even if someone from outside looked at it (like an auditor)
  • Safety, making sure all fields/relations are handled
  • Extensibility, in case one of the apps has additional requirements

Introducing WipeOut

blog_post_title_image_1-14

WipeOut helped greatly in maintaining data retention policies. It allows one to declaratively define a plan in its own DSL and validate it separately from the execution process, ensuring every single field is taken care of.

How do we use it?

Let’s start with defining example model:

#  id         :integer          not null, primary key
#  name       :string           not null
#  banned_at  :datetime
class Tester < ApplicationRecord
  has_many :ranking_points
end

#  id         :integer          not null, primary key
#  source     :string           not null
#  points     :integer          default(0), not null
#  tester_id  :integer          not null
class RankingPoint < ApplicationRecord
  belongs_to :tester
end

In the example above, a tester’s name and information on if they’re currently banned is present. Also visible are ranking points that they receive for testing they perform for us.

The name must be removed when the tester no longer works with us - it may contain personally identifiable information (PII). On the other hand, we want to keep information about their ranking history and info if they were banned; this data will prove useful later on when analyzing our crowd.

Let’s discuss how we would do it in Rails-way. We don’t actually want to remove the Tester, so original #destroy or #delete is not an option for us. For simplicity in a blog post, I’ll add this to a model, but you might want to extract this into a separate service.

class Tester < ApplicationRecord

  ...
  
  def alternative_destroy
    update!(name: "[deleted]")
  end

This looks easy enough, but there’s a catch. Let’s expand our application and encounter a new requirement that ranking points have a title field. It will be displayed to a tester when they browse their ranking points history index. Since this title may contain customer/tester specific information, it needs to be removed when the Tester no longer works with us.

class AddTitleToRankingPoints < ActiveRecord::Migration[6.1]
  def change
    add_column :ranking_points, :title, :string, null: false
  end
end

What happens in #alternative_destroy? Nothing. You had to have known about updating it.

This title will forever remain in ranking points.

How do we do this with WipeOut? In the most basic setup, it may have originally looked something like this:

TesterPlan = WipeOut.build_plan do
  wipe_out(:name) { "[deleted]" }

  ignore :banned_at

  relation :ranking_points do
    ignore :points, :source
  end
end

It uses DSL to define declaratively how a given object should be handled. It also allows us to separate definition from execution; thus, it’s possible to run static validation where it deeply walks through the object’s fields and relations, verifying that everything is defined in our plan. Ignoring is explicit, otherwise we wouldn’t be able to fully validate models, attributes and relations.

What happens when we add a title field in RankingPoint model when WipeOut gem is used?

$ TesterPlan.validate(Tester).valid?
=> false
$ TesterPlan.validate(Tester).errors
=> ["RankingPoint plan is missing attributes: :title"]

We can fix it by adding:

TesterPlan = WipeOut.build_plan do
  ....
  relation :ranking_points do
    ignore :points, :source
    wipe_out(:title) { "[deleted]" }
  end
end

You may be thinking, wait, there’s another column in ranking points, :tester_id, so why didn’t we receive any error about it? This is because WipeOut checks relations too. By default, we ignore foreign keys defined for belongs_to relation. This reduces verbosity of the plan.

We run validations as part of tests on CI. We plan for handling all models in our apps - they’re easy to find, understand and change. If someone adds something new and they forget about defining it in the plan, they’ll receive an error.

Other features are also available:


We invite you to explore, and if you find it interesting, feel free to open an issue with feedback. The gem is available at https://github.com/GlobalAppTesting/wipe_out.

If you employ a different solution for your app, we’d love to read about it!

Was this article useful?

Great! We'd love to send you more articles like this

Subscribe