Using Pivotal GemFire/Apache Geode with Lucene Indexing for Application UI Typeahead

July 17, 2017 Kyle Dunn

Pivotal GemFire 9.1 is Here!

We’re pleased to inform you that GemFire 9.1 has been released. Pivotal GemFire 9.1 is an in-memory data grid based on the recently released Apache Geode V1.2. More information on GemFire can be found on the GemFire page on Pivotal.io. The release is now available for download from Pivotal Network .

This release features the tight integration of Lucene with partitioned data in GemFire. The Lucene indexes are stored alongside the corresponding GemFire data partitions. Just like the data, the indexes are horizontally scalable - addition of servers (data partitions) is supported automatically.

The use cases for this include the need to search through JSON documents. Also, this feature enables lookups by partial name, social security number, or other attributes. A specific use of the partial lookup use case is for type ahead searches that progressively narrow the search results while the user is entering the search term. In this blog post, we will do a deep dive into this type ahead use case.

Motivation

Whether you're parsing through Google Maps results for dinner or fantasizing the next vacation with AirBnB, the user-centric autocomplete (aka typeahead) is nearly as important as the search ability itself. These examples are just two of the familiar faces employing this feature; modern, enterprise applications are likely to benefit from this capability as well.

Improving business productivity has always been the promise of IT, either with fully baked vendor offerings or home-grown software applications and automation infrastructure. While typeahead is a very small contributor to effective IT, having a boilerplate example proven out makes for an easier sell when trying to coerce your Product Manager to accepting such a nicety into the backlog.

Approach

In alignment with the open source philosophy of sharing more (rather than less) code, many of the tech unicorns (Pivotal included) publish code in the skunkworks-gone-production genre. For typeahead, Twitter created an exceptional Javascript library, aptly-named typeahead.js. While the details of using the library are beyond the scope of this writing, simply put, it requires three things for basic functionality:

Data to search against
A mechanism to perform the search/match
Some HTML and Javascript to invoke the search and display the results

For toy examples or static datasets, a purely Javascript and HTML implementation is enough. In practical applications, a database and fuzzy search capabilities are necessary; for SQL backends, the predicate pattern: LIKE %mySearchString%, against an indexed column, can easily accomplish this. From a "boxes and lines" slidedeck (aka marketecture) perspective, this will work. As pragmatic technologists, we know it's never quite this simple.

Gotchas

The nuances of implementing "query by LIKE" as the database grows, either in ingest rate or total volume, quickly become apparent. Our foremost concern: Which users will be using this and how many of them are there? An executive-only dashboard and a web-scale application have drastically different levels of forgiveness to deficiencies. Secondarily, how many records are in the dataset we're querying? How frequently does this dataset change? How fast does it grow in volume? Coincidentally, these are common questions our Data Engineering team asks customers when embarking on data architecture questions. It's clear there is no one-size fits all way to provide a persistence layer for applications.

GemFire ♥ Lucene

The persistence layer chosen in this case is the in-memory data grid: Pivotal GemFire. This architecture decision affords the safety to leave many of the aforementioned concerns to the tool's featureset, instead of the implementation details. The requirement for a substring match capability, opens the opportunity to highlight a new feature in GemFire: Lucene indexing.

Apache Lucene is a popular, Java-based indexing and search tool. In GemFire 9.1 and Geode 1.2, this indexing and query capability has been integrated into GemFire/Geode regions (equivalent to a database table). While the applicability for typeahead would be a more nuanced discussion, it's worth mentioning Pivotal's Greenplum MPP database offers a similar indexing capability using the GPText add-on but GPText is targeted for text analytics not high concurrency lookups.

Result

So after a long haul of context and whiteboarding, we've arrived at a decision for the minimal viable product (MVP in agile/startup lingo):

The data store is a Lucene-indexed GemFire region
Search and match mechanism is done with the GemFire-Lucene query API
The frontend UI component is implemented with the Handlebars Javascript templating library

These design decisions manifest themselves in different parts of the code base:

Data store setup in the application configuration:

  @Bean
    public Region<Integer, Property> propertyRegion(Cache cache,
LuceneService luceneService) {
        // Create Index on fields with default analyzer:
        luceneService.createIndexFactory()
.setFields("Address").create("propIndex", "/Property");

        Region region = cache.createRegionFactory(RegionShortcut.PARTITION)
.create("Property");

        return region;
    }

Lucene Search/Query in the repository implementation:

public Iterable<Property> getAFewByAddress(String address, Integer limit) 
throws Exception {
        LuceneQuery<Integer, Property> query = 
luceneService.createLuceneQueryFactory()
                .setLimit(limit)
                .create("propIndex", "/Property", address, "Address");

        return query.findValues();
    }

typeahead.js Javascript: 
 
        $('#remote .typeahead').typeahead(
        {
            minLength: 3,
            highlight: true
        }, 
        {
            source: addressMatches, /* this function performs the REST call to the controller  */
            display: displayField,
            items: 8,
            templates: {
                empty: [
                    '<div class="empty-message">',
                    'unable to find any addresses that match the current query',
                    '</div>'
                ].join('\n'),
                suggestion: Handlebars.compile("<div><strong>{{address}}</strong> – elevation: {{elevationFeet}}'</div>")
             }
        })

When you put a bow on all of this, what is left is this kind of app cool-factor:

All the code is available on Github here: https://github.com/kdunn-pivotal/gemfire-typeahead-spike/tree/lucene

Happy hacking!

About the Author

Kyle Dunn works professionally as a Data Engineer (aka data "nerd") for Pivotal Software, based in Denver, Colorado, USA. His professional background spans electric utility engineering, distributed systems/HPC, and more recently, RDBMS and data-driven workflows for enterprises. He earned a Bachelors of Science in Electrical Engineering from the University of Colorado, Denver, with academic publications ranging from heterogeneous cloud computing to direct current bus protection schemes. Find him on Twitter (@kdunn926) and LinkedIn (https://www.linkedin.com/in/dunnkyle).
Follow on Twitter Follow on Linkedin

Designing for a Difference

Tutorial: How to Simplify Cloud-Native Identity with Azure AD and SSO for Pivotal Cloud Foundry

This post shows how application developers can secure their apps with AzureAD and Single Sign-On for PCF.

Using Pivotal GemFire/Apache Geode with Lucene Indexing for Application UI Typeahead

Pivotal GemFire 9.1 is Here!

Motivation

Approach

Gotchas

GemFire ♥ Lucene

Result

Data store setup in the application configuration:

Lucene Search/Query in the repository implementation:

About the Author

Previous

Next

Using Pivotal GemFire/Apache Geode with Lucene Indexing for Application UI Typeahead

Pivotal GemFire 9.1 is Here!

Motivation

Approach

Gotchas

GemFire ♥ Lucene

Result

Data store setup in the application configuration:

Lucene Search/Query in the repository implementation:

About the Author

Previous

Next

Related content in this Stream

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.