SEO-friendly single-page apps in Rails

May 9, 2013 Herval Freire

TL;DR version

To make all this process as simple as possible, a variation of the third approach (Rack middleware + Selenium Webdriver + no caching) is available here as a Gem. Drop it in your project, have the dependencies installed, and may the SEO gods bless you!

The whole story

Much has been said about how hard it is to build a single-page app that responds well to crawlers. Most search crawlers don’t support Javascript, which means those applications which rely completely on client-side rendering will serve blank pages.

Luckily, there are several approaches one can take to circumvent the lack of faith from certain crawlers – there’s obviously no “one size fits all” approach, so let’s take a minute to go through three of the most commonly used approaches, highlighting pros and cons of each one of them.

Render everything in two “modes” (the no script approach)

This strategy consists of rendering your app normally, BUT with pieces of “static” content already baked in (usually inside a <noscript> block). In other words, the client-side templates your app servers to be rendered will have at least some level of server-side logic on them – hopefully, very little.

Although it’s a somewhat natural approach for a developer used to rendering templates and content on the server side, it leads to a scenario where everything has to be implemented twice – once in the javascript templates/views, once in the pages, making everything hard to maintain (and potentially out of sync) real quick. That Product detail view now includes a “featured review”? No problem, have its text rendered on the server-side (and returned by the JSON endpoint your client app will use to render it again in a Javascript template).

This DOES work well for mostly-static, single-content apps (eg.: blogs), where even the javascript app itself would benefit from having the content already pre-loaded (the javascript views/snippets/segments would effectively fiddle with blocks of content, instead of fetching them from the server).

It’s worth noting that you should NOT rely on rendering just bits of content when serving pages to bots, as some of them state that they expect the full content of the page. It’s also worth pointing that Google also the snapshots it takes when crawling to compose the tiny thumbnails you see on search results, so you want these to be as close to the real thing as possible – which just compounds on the maintenance issues of this approach.

The hash fragment approach

This technique is supported by Google bot alone (with limited support by some other minor search bots – Facebook’s bot works too, for instance) and is explained in detail here.

In short, the process happens as follows:
The search bot detects that there are hash parameters in your URL (eg.: www.example.com/something#!foo=bar)
The search bot then makes a SECOND request to your server, passing a special parameter (_escaped_fragment_) back. Eg.: www.example.com/something?_escaped_fragment_=foo=bar – it’s now up to your server-side implementation to return a static HTML representation of the page.

Notice that for pages that don’t have a hash-bang on their URL (eg.: your site root), this also requires that you add a meta tag to your pages, allowing the bot to know that those pages are crawlable.

<meta name="fragment" content="!">

Notice the meta tag above is mandatory if your URLs don’t use hash fragments (which is becoming the norm these days, due to the amazing adoption of html5 across browsers) – analogously, this is probably the only technique of these three that will work if you depend on hash-bang urls on your site (please don’t!).

You still have to figure out a way to handle the _escaped_fragment_ requests and render the content these are supposed to return (like the previous approach), but at least it takes that content away from the body of the pages served to regular users (reducing, but not eliminating, the duplication issue). This works somewhat well on sites which part of the content is dynamic – not so much on single-page apps. No universal search bot support is also an obvious downside. Plus, you still have to pick a strategy to render the content without Javascript when the _escaped_fragment_ request arrives. Which leads us directly to the third approach…

Crawl your own content when the client is a bot

Although this seems counterintuitive at first, this is one of the approaches Google suggests when you’re dealing with sites whose majority of the content is generated via javascript (all single-page apps fall on this category).

The idea is simple: if the requester of your content is a search bot, spawn a SECOND request to your own server, render the page using a headless browser (thankfully, there are many, many options to choose from in Ruby) and return that “pre-compiled” content back to the search bot. Boom.

The biggest advantage of this approach is that you don’t have to duplicate anything: with a single intercepting script, you can render any different page and retrieve them as needed. Another positive point of this approach is that the content search bots will see is exactly what a final user would.

You can implement this rendering in several ways:

A before_filter on your controllers checks the user-agent making the request, then fetches the desired content and return it. PROS: all vanilla-Rails approach. CONS: you’re hitting the entire rails stack TWICE.
Have a Rack middleware detect the user-agent and initiate the request for the rendered content. PROS: still self-contained on the app approach. CONS: need to be careful on which content will be served, since the middleware will intercept all requests.
Have the web server (nginx, apache) handle the user-agent and send requests to a different server/endpoint on your server (eg.: example.com/static/original/route/here) that will serve the static content. PROS: only one request hits your app, CONS: requires poking around the underlying server infrastructure.

As for how to store the server-side rendered content (again, from worst to best):

Re-render things on every request. PROS: no cache-validation hell, CONS: performance.
Cache rendered pages, optimally with a reasonable expiration time PROS: much faster than re-fetching/rendering pages every time, CONS: cache maintenance might be an issue.
Cache rendered pages in a way that the web server can fetch them directly (eg.: save them as temp files). PROS: INSANELY FAST. CONS: huge cache maintenance overhead.

There are a few obvious issues you have to keep in mind when using this approach:

Every request made by a search bot will consume two processes on your web server (since you’re calling yourself back to get content to return to the bot)
The render time is a major factor when search engines rank your content, so fast responses here are paramount (caching the static versions of pages, therefore, is very important).
Some hosting services (I’m looking at you, Heroku) might not support the tools you use to render the content on the server side (capybara-webkit, selenium, etc). Either switch servers or simply host your “staticfying” layer somewhere else.

About the Author

Biography

Why You’re Not Hiring Efficiently

A guest post by Karim Gillani, Co-Founder of Deskribed There is no magic formula for hiring great people in...

Mysterious Mouse

SEO-friendly single-page apps in Rails

TL;DR version

The whole story

Render everything in two “modes” (the no script approach)

The hash fragment approach

Crawl your own content when the client is a bot

About the Author

Previous

Next

SEO-friendly single-page apps in Rails

TL;DR version

The whole story

Render everything in two “modes” (the no script approach)

The hash fragment approach

Crawl your own content when the client is a bot

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.