Blockchain: Use Cases For Blockchain On Cloud Foundry

August 8, 2016 Jared Gordon

 

sfeatured-36346-blockchain-miningThere has been more than a little buzz around the coming blockchain disruption in fintech. For those who are just tuning in, blockchain technologies underpin digital currencies such as bitcoin and ethereum. At their core, the principles behind blockchain are actually pretty straightforward. This blog discusses these concepts using a simple demo application and suggests practical uses for blockchain beyond tracking cryptocurrencies.

What Is A BlockChain?

According to Wikipedia:

A blockchain consists of blocks that hold timestamped batches of valid transactions. Each block includes the hash of the prior block in the blockchain, linking the two. The linked blocks form a chain, with only one (successor) block allowed to link to one other (predecessor) block, thus giving the database type its name.

To help illustrate this in a concrete manner, we’ve created a simplified demo blockchain. Instructions on how to build and run the chain are in the readme, and the demo can be run directly from a command line or pushed to Cloud Foundry. The rest of this blog will make use of the demo to highlight various Blockchain characteristics.

Our Use Case

Imagine you needed to implement the following:

Provide a system that can store Something. At any time, it must be possible for anyone to independently verify when “The Something” entered the system, and that it has not been modified. Whatever The Something represents must not be exposed publicly.

The usual technical approach to this would be to create a datastore, create credentials for users, have the users load information into the system securely, and then protect the datastore with passwords, encryption, firewalls, and restrictive interfaces.

But, what about the requirement for independent verification? At any time, anyone should be able to verify the integrity of an entry (or the whole datastore) using “external methods.” This is where the proverbial paradigm shifts:

  • We can’t trust the system to verify itself
  • We can’t rely upon the assurances of system creators or operators to determine data validity
  • We don’t want to rely upon some third party to certify that the data is correct and the system is secure

So, where do we start? How about with Hashes and Maps?

One-Way Cryptographic Hashes

A common way to ensure that some piece of data has not been tampered with is via the use of one-way hashes. Hashing is a method for creating a mathematical fingerprint of something, and a one-way hash is designed so that it is infeasible to recreate the original data from its hash. Ideally, there should also be a very high unlikelihood of clashes—where two different inputs result in the same hash.

A commonly used hashing function, SHA-256, is useful for this purpose, and our demo includes the Hasher service, which provides SHA-256 hashes. To illustrate, we can use Hasher to hash the string “helloWorld”:

http://chain.cfapps.io/hasher/hash/helloWorld →
       11d4ddc357e0822968dbfd226b6e1c2aac018d076a54da4f65e1dc8180684ac3

We can then verify this hash “outside the system” by rehashing our data on anything that supports a SHA-256 hashing function. On a Mac, we can do the following:

echo -n helloWorld | shasum -a 256 →
       11d4ddc357e0822968dbfd226b6e1c2aac018d076a54da4f65e1dc8180684ac3

This supports our requirement—that we should be able to verify entries at any time via their hashes using tools external to the application. If our externally computed hash is different from the stored hash, someone has tampered with the data.

A First Attempt

Now that we have a way to verify data, we can try to create a way to store it. How about a simple dictionary?

The MapA class is our first approach—to add an entry to MapA, do something like this:

http://chain.cfapps.io/mapA/add/yourEntryGoesHere →
       7bc445de0ea68285157b8d9a64a40b8cd8e0fe5737355d17c93be223feb39d2e

The entry can be verified internally by the demo app via this call:

http://chain.cfapps.io/mapA/verify?entry=yourEntryGoesHere&hash=7bc445de0ea68285157b8d9a64a40b8cd8e0fe5737355d17c93be223feb39d2e → true

And, it can be verified externally via system libraries (such as in the previous Mac example):

echo -n yourEntryGoesHere | shasum -a 256 →
      7bc445de0ea68285157b8d9a64a40b8cd8e0fe5737355d17c93be223feb39d2e

All of the entries in MapA can then be viewed:

http://chain.cfapps.io/mapA →
     {
              "7bc445de0ea68285157b8d9a64a40b8cd8e0fe5737355d17c93be223feb39d2e": "yourEntryGoesHere",
              ...
       }

Fine, but when we publish the contents of this datastore later for public verification won’t we be exposing sensitive user data? We could encrypt the “yourEntryGoesHere” part, but if something can be encrypted, it can be decrypted too. We need some other way to safeguard user data when we publish the datastore.

Protecting The Contents

We iterate our approach to MapB—an improvement over MapA. Instead of storing the user data, a random unique key is created, associated with the hashed entry, and returned to the user. Then, it is up to the user to retain these keys for later interaction with the datastore. Here are some examples:

http://chain.cfapps.io/mapB/add/yourEntryGoesHere → akey
http://chain.cfapps.io/mapB/verify?key=&entry=yourEntryGoesHere → true
http://chain.cfapps.io/mapB →
{
       "4d9213c9-16cf…": "7bc445de0ea68285157b8d9a64a40b8cd8e0fe5737355d17c93be223feb39d2e",
       …
}

Ensuring The Integrity Of The Store

Now we have a way to add, store, and share data. But how do we ensure the integrity of the datastore itself? What is keeping someone from tampering with our entries? We could hash all of our our hashes to create a “root” signature for the store. However, this hash would need to be recomputed across the entire store whenever a new entry is made, and this would become more computationally expensive as the data grows. Luckily, we have Merkle Trees to help us out. Using a Merkle tree, we can “hash our hashes” in intermediate layers in a tree structure, with each layer supporting the hashes of the layer below. This is depicted in the following diagram:

image00

In a Merkle tree, a node’s hash is the hash of the combination of its children. The bottom (leaf) layer is where our key:hash pairs are stored. The root node’s hash—at the top of the tree—represents the overall signature of the entire collection. To validate any entry, we climb up the tree from the entry to the root, validating hashes as we go. The demo MerkleTree is implemented as a binary tree, and it allows us to make use of some interesting mathematical properties to efficiently manage the tree. For instance, the distance from any entry leaf to the root of the tree is log2(n). This means, if we can create a tree with 1,000,000 entries, we will need to do only 20 “hops” to the root to verify any individual entry. Our demo Merkle tree is simplified for illustrative purposes:

  • It is created in memory and goes away if the application is killed
  • It is not “partitioned” and parts cannot be pulled in or out of memory
  • There are limits on its size, constrained by the available resources of the machine where it is running

Nonetheless, we can use it to create large trees. To add a single entry:

http://chain.cfapps.io/merkleTree/add/foo → a key

There’s also a method to add random entries (within reason, see above caveats) to fill up the tree:

http://chain.cfapps.io/merkleTree/load/1000 →
{
  "hash": "08563569f313d2b33ff84b6414e703f4af3e7fda9756987660c25aa0477e4f0f",
  "leaves": 1000,
  "height": 10,
  "root": {
           "hash": "08563569f313d2b33ff84b6414e703f4af3e7fda9756987660c25aa0477e4f0f",
           "left": {
                    "hash": "ff10c6ec8b30fdffdfdf7cc16edcaf7e36a11f1cea2405175958a073a4caea24",
		    …
                   }
            …
           }
}

We are now able to publish the tree without exposing user data, and we can externally verify a given entry or the entire tree as needed.

For a full listing of the MerkleTree API, please refer to the github readme.

Adding A Time Component

How about the part of our use case that says “…verify when ‘The Something’ entered the system…” How do we handle the concept of time? Is there a way to checkpoint the state of the tree at a given point?

This is where the concept of a “chain of blocks” comes into play. The idea is as follows:

  1. At an agreed upon “cadence” (say, every hour, or minute, or second, depending upon the granularity needed), we freeze the Merkle tree as a “block” (the red tree in the diagram below).
  2. We take the frozen tree block and add it to the end of the chain of previously frozen blocks (the blue tree).
  3. A new chain root entry (the green node) is created with the previous chain root entry on the left, and the new block’s root entry on the right.
  4. The chain’s root hash is recomputed to be a combination of its previous root hash and the hash of the block we just added.
  5. Then, we create a new, empty Merkle tree for the next cadence and start adding new transactions to it.
  6. This cycle continues for the life of the blockchain (the yellow tree, the purple tree… etc.).

image01
With this approach in place, we can determine when the entry was added to the chain based upon the cadence in which its block was appended.

Our demo also includes a Chain service that we can use to simulate cadences. And, a typical usage scenario might go something like the following:

  1. Add some entries to a Merkle tree (http://chain.cfapps.io/merkleTree/load/100)
  2. Add another entry and take note of its key (http://chain.cfapps.io/merkleTree/add/foo)
  3. Load this block into our chain (“cadence1”) (http://chain.cfapps.io/chain/addBlock)
  4. Our Merkle tree is now empty again, add some more entries (http://chain.cfapps.io/merkleTree/load/50)
  5. Load these new entries as a block onto our chain (“cadence2”) (http://chain.cfapps.io/chain/addBlock)
  6. Verify the entry we saved the key for in step 2. (http://chain.cfapps.io/chain/verify?key=theKey&entry=foo
  7. View the chain (http://chain.cfapps.io/chain)

Adding blocks onto our chain turns out to be a trivial operation because of the “fractal” nature of these tree structures: we simply glue them together with a new root node. Chains are in fact just Merkle trees, and they can be validated by traversing from their leaf nodes up to the root. The entire chain can be verified either expensively via brute force verification of each leaf, or cleverly via partitioning the chain (proof left to the reader).

Mystery Demystified?

There are some important differences between our simple demo and a full-blown public ledger blockchain. “Real” blockchains add additional capabilities such as:

  • Authentication: who is entering what, plus support for additional data attributes (timestamps, IP addresses, other identifying information)
  • Secure transmission of entries
  • Massive Scalability: support for zillions of entries, data partitioning for efficient processing and storage
  • Massive Redundancy: many copies of the blockchain (thousands in some cases) scattered across the globe
  • Strict, formal Synchronization of cadences between these copies
  • Consensus mechanisms for resolving cases where instances do not agree with each other

Evolving Uses For Blockchain

We are currently working with vendors to provide blockchain as a platform capability. Thinking beyond classic, public ledger use cases, blockchain technologies are being deployed as special-purpose “private ledger” components across industries. And, these use cases call for additional capabilities such as:

  • Verification of system configuration: If we can separate configuration from code (one of the principles of 12-Factor Apps), we can track our configurations in blockchains and monitor for any unexpected or unauthorized modification.
  • Verification of log files: Logs can be used as a system-of-record, but logs might be manipulated to cover someone’s tracks. If we hash log entries, they can be verified offline for auditing purposes.
  • Non-Repudiation of Transactions: If the contents of a transaction can be hashed, it can be added to a blockchain. This lets us independently verify the transaction at a later time, including the content and timestamp for when it entered the system.
  • Certification of system components: Source code, shared libraries, operating systems, etc. can all be hashed. If builds can be automated via CI pipelines, hashes can be collected along the way. It should then be possible to trace an entire system back from source repository through the various build processes using these hashes. Then, these could be used to verify components for certified builds, and also allow teams to watch for any unexplained changes.

Within Pivotal Cloud Foundry, we could use blockchain to:

  • Digitally sign trusted buildpacks, and use these signatures to verify buildpack authenticity at runtime
  • Digitally sign applications from approved build pipelines, and ensure that only verified apps can be pushed to Cloud Foundry
  • Digitally sign bosh manifests and Operations Manager Tiles so that system state can be controlled and verified

This is, perhaps, less glamorous than tracking crypto-currencies, but useful nonetheless!

Additional technical information is available on various provider sites. Here are a few examples:

Please let us know your thoughts in the comments below. Which of the various providers would be most useful to you, and why? We are looking for feedback from the Pivotal Cloud Foundry community!

 

About the Author

Biography

Previous
EMC Documentum Goes Cloud-Native With Pivotal Cloud Foundry
EMC Documentum Goes Cloud-Native With Pivotal Cloud Foundry

Today, software companies across every industry must address the requirements to migrate their applications...

Next
Meet Cornelia Davis: Pivotal’s Role Model For Women In Cloud
Meet Cornelia Davis: Pivotal’s Role Model For Women In Cloud

Pivotal, with its mission to transform how the world builds software, attracts some of the best and brighte...