Using Arel to build complex SQL expressions

February 10, 2013 Grant Hutchins

I write and maintain a gem called pg_search that makes it easy to build Active Record scopes that take advantage of PostgreSQL’s built-in full-text search functionality.

Part of generating these scopes involves taking user input (which come in as Ruby strings) and comparing it against columns in database tables to see which records match.

Escaping strings

Here’s a (simplified) example query that uses the trigram operator, %, to do a fuzzy text match.

SELECT * FROM "blog_posts" WHERE content % 'foo';

A naïve way to write this in an Active Record would be:

BlogPost.where("content % '#{query}'")

This is, of course, a bad way to write this query, because it would be trivial for a user to supply a query string that does something destructive.

query = "'; DELETE FROM blog_posts; SELECT '1"
BlogPost.where("content % '#{query}'")

This query breaks out of the quotes and generates multiple SQL queries. The second query would delete every blog post in the system.

SELECT * FROM "blog_posts" WHERE content % ''; DELETE FROM blog_posts; SELECT '1';

The most common way Rails developers work around this in Active Record is to use the built-in support for escaping strings by providing two arguments to #where.

query = "'; DELETE FROM blog_posts; SELECT '1"
BlogPost.where("content % ?", query)

That way, the BlogPost model would generate SQL with the evil query escaped. The following example is from PostgreSQL, where '' inside a string is an escaped single quote.

SELECT "blog_posts".* FROM "blog_posts"  WHERE (content % '''; DELETE FROM blog_posts; SELECT ''1');

Avoiding strings of SQL

Now, when you want to write a scope that matches by equality, Active Record offers a nicer syntax. For example, you could find all blog posts whose title is an exact match to a query using the Hash-based syntax. The two following lines are mostly equivalent.

BlogPost.where("title = ?", query)
BlogPost.where(title: query)

I prefer the Hash syntax because there are no string literals involved. Also, as an added bonus, Active Record uses a fully-qualified column name such as "blog_posts"."title" instead of just title, which makes it easier to use in multi-table expressions that use joins.

Also, it’s still possible to generated invalid SQL when using the two-parameter version.

BlogPost.where("title = '?'", query) # extra ' surrounds query

Luckily this example always generates invalid SQL, so it’s easy to catch. But it’s still frustrating that the most common syntax doesn’t prevent bugs from leaking all the way into the database.

Symbolic expressions

That’s where Arel steps in. Arel is a Relational Algebra gem that allows you to generate SQL queries directly from an abstract syntax tree (AST) of nodes. It’s what Active Record uses internally to build up expressions symbolically from the Hash syntax.

The #where method in Active Record can accept an Arel node. For the title equality example, it would look like this:

expression = BlogPost.arel_table[:title].eq(query)
BlogPost.where(expression)

Unfortunately, the documentation for Arel is pretty sparse. For instance, the #eq method I used above has no documentation at all.

However, digging into the source of the method gives some clues as to how it works.

# File 'lib/arel/predications.rb', line 15
def eq other
  Nodes::Equality.new self, other
end

It turns out that #eq, when called on one Arel node and passed another, creates an Arel::Nodes::Equality instance. Looking at the docs for this Equality class shows that it is a subclass of Arel::Nodes::Binary.

It turns out that an Arel::Nodes::Binary instance is a symbolic representation of a operation between two Arel nodes. In this case, you should think of “binary” as meaning “having two arguments”, as opposed to the Boolean concept of true and false.

In our example, the nodes are a database column, BlogPost.arel_table[:title], and a Ruby string, query. And Equality is the particular Binary relationship that deals with the equality operator, =.

It turns out there are other subclasses, such as Arel::Nodes::GreaterThan, for other binary operations.

Putting it all together

Anyway, coming back to our original example using trigrams, we want to find a way to create a Binary expression that Active Record will accept and turn into a SQL query that uses the % operator.

After hunting around with my pair JT Archie, we discovered a class named Arel::Nodes::InfixOperation. An “infix” operator is a binary operator such as = or + that is written between its arguments. For example, in SQL notation, you would write 1 + 2, not + 1 2 or 1 2 +.

It turns out that % is an infix operator in PostgreSQL. So we can create an InfixOperation instance directly. Looking at its constructor, we see that it takes three arguments named “operator”, “left”, and “right”.

# File 'lib/arel/nodes/infix_operation.rb', line 13
def initialize operator, left, right
  super(left, right)
  @operator = operator
end

So we can write our content-matching trigram scope this way:

content_column = BlogPost.arel_table[:content]
query = "Pivotal Labs"
expression = Arel::Nodes::InfixOperation.new("%", content_column, query)
BlogPost.where(expression)

This scope generates exactly the SQL we want:

SELECT "blog_posts".* FROM "blog_posts" WHERE ("blog_posts"."content" % 'Pivotal Labs')

Let’s clean this up and put it into a class method on BlogPost.

class BlogPost < ActiveRecord::Base
  def self.search_content(query)
    expression = Arel::Nodes::InfixOperation.new(
      "%",
      arel_table[:content],
      query
    )
    where(expression)
  end
end

Looking forward

As of now, my pg_search gem builds most of its SQL through creative use of Active Record SQL-escaping methods and Ruby string interpolations. As far as I know, I have done everything correctly. Thus, I feel mostly safe from SQL injection attacks and other bugs. But it’s hard to know whether or not I’ve missed a spot.

But one day soon, I hope to convert it all into Arel objects. I’ve already started, and I hope it won’t be long before a fully string-interpolation-free version is released. Then I can rest easier, knowing that a typo somewhere deep in my code won’t end up going directly into the database unescaped.

About the Author

Biography

Previous
How to Build an Awesome, Affordable, Flexible Standing Desk using Metroshelves
How to Build an Awesome, Affordable, Flexible Standing Desk using Metroshelves

I’ve written about using a standing desk; now let’s talk about building one. Commercial standing desks are ...

Next
"expect errors"
"expect errors"

Helps "expect errors" when compiling Ruby with clang Compiled Ruby with clang and the compile output contai...