Data Science How To: Massively Parallel, In-Database Image Processing: Part 2

August 19, 2014 Ailey Crow

featured-cell-vision-2Joint work performed by Ailey Crow, Sarah Aerni, Hulya Farinas, Regunathan Radhakrishnan and Gautam Muralidhar of Pivotal’s Data Science Labs.

In this post we will continue from the previous post that demonstrated in-database image processing which enables fast processing of large images or a large number of images. In part 1, we discussed image loading and filtering (smoothing) as part of a 6 step process: image loading, filtering (smoothing), thresholding, cleanup via morphological operations, object recognition using connected components, and counting. Now, we will cover steps three through six.

Thresholding

Now let’s select the pixels of interest by separating the foreground from the background—a process known as thresholding. Thresholding is easily implemented in-database. For example, a simple threshold can be used to select all pixels above or below the mean intensity value of the image, as demonstrated below. We will create an additional column, “isFG” as a binary flag indicating if the given pixel has been selected as foreground (i.e. the pixel of interest).

SELECT *, CASE WHEN (b_intensity<blue_ave)
THEN 1 ELSE 0 END isFG
FROM (
  SELECT * FROM testim_smooth
  JOIN
  (SELECT img_name, avg(b_intensity) blue_ave FROM testim_smooth group by img_name) thresh
  USING (img_name)) t
DISTRIBUTED BY (img_name);

img_name | row | col | b_intensity | isFG
————–+——-+—–+—————+————-
test.jpg | 331 | 188 | 150 | 1
test.jpg | 332 | 188 | 151 | 0
test.jpg | 331 | 189 | 152 | 0

Smoothed_Threshold_full

This approach clearly selects too many pixels as foreground. Alternatively, we can apply a more complex thresholding algorithm such as Otsu’s, which identifies the threshold that optimally minimizes the intra-class variance (implemented as maximizing the inter-class variance), first described here: Nobuyuki Otsu (1979). “A threshold selection method from gray-level histograms”. IEEE Trans. Sys., Man., Cyber. 9 (1): 62–66.

We apply Otsu’s method as follows. First, we consider every intensity value as a possible threshold by dividing all pixels into 2 classes: a) those with intensity value above the potential threshold and b) those below. We then calculate the average intensity for each class and number of pixels in each class (lines 8-11 below). Finally, we select the threshold that maximizes the following: countacountb(avga-avgb)2 where a and b refer to the 2 classes (above and below) for each potential threshold value (line 5 below).

CREATE TEMP TABLE otsu AS
SELECT *, row_number() OVER(order by otsucomp desc) r
FROM (
  SELECT img_name, thresh,
	  count_b*count_a*power(avg_b-avg_a,2) otsucomp
  FROM (
    SELECT img_name, b_intensity AS thresh,
      count(*) OVER(wdw_below) AS count_b,
      avg(b_intensity) OVER(wdw_below) AS avg_b,
      count(*) OVER(wdw_above) AS count_a,
      avg(b_intensity) OVER(wdw_above) AS avg_a
    FROM testim_smooth
    WINDOW wdw_below AS (order by b_intensity rows unbounded preceding),
      wdw_above AS (order by b_intensity rows between current row and unbounded following)
    )otsu
  )r
DISTRIBUTED BY (img_name);

Then, we apply the optimal threshold in the same fashion as the simple threshold above:

CREATE TABLE mask AS
SELECT *, CASE WHEN (b_intensity<thresh) THEN 1 ELSE 0 END isFG
FROM (
  SELECT * FROM testim_smooth
  JOIN
  (SELECT img_name, thresh FROM otsu where r=1)o
  USING (img_name)) t
DISTRIBUTED BY (img_name);

threshold_graph2

Morphological Operations

Next, let’s apply a morphological opening in-database to remove small pixel noise from our binary mask. A morphological opening is defined as an erosion (for each pixel, if any neighbors have value 0, assign value 0), followed by a dilation (for each pixel, if any neighbors have value 1, assign value 1). We’ll start with the erosion, once again leveraging window functions and our novel coordinate system to access neighboring pixels:

SELECT img_name, row, col,
  least(
    min(isFG) OVER( wdw_col ),
    min(isFG) OVER( wdw_row ),
    min(isFG) OVER( wdw_diag1 ),
    min(isFG) OVER( wdw_diag2 ) ) isFG
FROM mask
WINDOW wdw_col AS (partition by col order by row rows between 1 preceding and 1 following),
  wdw_row AS (partition by row order by col rows between 1 preceding and 1 following),
  wdw_diag1 AS (partition by (row-col) order by col rows between 1 preceding and 1 following),
  wdw_diag2 AS (partition by (row+col) order by col rows between 1 preceding and 1 following);

A morphological opening is then implemented as an erosion (lines 9-15 below) followed by a dilation (lines 2-8 below):

CREATE TABLE mask_op AS
SELECT img_name, row, col,
  greatest(
    max(isFG) OVER( wdw_col ),
    max(isFG) OVER( wdw_row ),
    max(isFG) OVER( wdw_diag1 ),
    max(isFG) OVER( wdw_diag2 ) ) isFG
FROM (
    SELECT img_name, row, col,
     least(
    	 min(isFG) OVER( wdw_col ),
    	 min(isFG) OVER( wdw_row ),
    	 min(isFG) OVER( wdw_diag1 ),
    	 min(isFG) OVER( wdw_diag2 ) ) isFG
    FROM mask
    WINDOW wdw_col AS (partition by col order by row rows between 1 preceding and 1 following),
     wdw_row AS (partition by row order by col rows between 1 preceding and 1 following),
     wdw_diag1 AS (partition by (row-col) order by col rows between 1 preceding and 1 following),
     wdw_diag2 AS (partition by (row+col) order by col rows between 1 preceding and 1 following)
    )t
WINDOW wdw_col AS (partition by col order by row rows between 1 preceding and 1 following),
  wdw_row AS (partition by row order by col rows between 1 preceding and 1 following),
  wdw_diag1 AS (partition by (row-col) order by col rows between 1 preceding and 1 following),
  wdw_diag2 AS (partition by (row+col) order by col rows between 1 preceding and 1 following)
Distributed by (img_name);

threshold_opened

More complex operations with larger kernels could be addressed using window functions over additional coordinate systems.

Connected Components (Object Recognition)

The next step is to organize the pixels of interest into objects, or groups of connected pixels. We will leverage graph theory by considering each pixel as a node or vertex and the edges or paths connecting them only exist between immediately adjacent or diagonal pixels. We can then identify objects (i.e. blob detection or region labeling in computer vision) by implementing a connected components algorithm. To do this, we first need to create a list of neighboring, or connected pixels. We’ll start by creating a unique id for each pixel:

CREATE TEMP TABLE mask_op_id AS
SELECT row_number() OVER(order by row, col) id, *
FROM mask_op;

Employing window functions again, we collect the pixel’s neighbors to the left and below (4 of 8 total neighbors) and the binary mask value of the neighbors for each pixel. Then, we select only the neighbor pairs for which both pixels are labeled as foreground (1) in the binary mask (line 27 below).

CREATE TEMP TABLE neighbors AS
SELECT u,v FROM (
   SELECT id u, isFG,
    lag(id) OVER( wdw_col) v,
    lag(isFG) OVER( wdw_col ) isFGNeigh
  FROM mask_op_id
  WINDOW wdw_col AS (partition by col order by row)
  UNION ALL
  SELECT id u, isFG,
    lag(id) OVER( wdw_row ) v,
    lag(isFG) OVER( wdw_row ) isFGNeigh
  FROM mask_op_id
  WINDOW wdw_row AS (partition by row order by col)
  UNION ALL
  SELECT id u, isFG,
    lag(id) OVER( wdw_diag1 ) v,
    lag(isFG) OVER( wdw_diag1 ) isFGNeigh
  FROM mask_op_id
  WINDOW wdw_diag1 AS (partition by (row-col) order by col)
  UNION ALL
  SELECT id u, isFG,
    lag(id) OVER( wdw_diag2 ) v,
    lag(isFG)OVER( wdw_diag2 ) isFGNeigh
  FROM mask_op_id
  WINDOW wdw_diag2 AS (partition by (row+col) order by col)
 ) t
 WHERE isFG=1 AND isFGNeigh=1
 distributed by (u);

Now, we are ready to run the connected components algorithm. This will identify all nuclei objects and the associated pixels. Please see this recent blog post for more details on the connected components algorithm.

Size Exclusion and Counting

Finally we are ready to count the number of nuclei. We will impose a criterion here that nuclei should be a certain size (i.e. be composed of more than 50 pixels, but less than 500 pixels), based on a priori information regarding the size or scale of the objects of interest. We can impose our size exclusion criterion and count the number of nuclei in a single query:

SELECT count(*) FROM (
  SELECT object, count(*) size_object
  FROM connected_components
  group by object ) t
WHERE size_object > 50 AND size_object < 500;

And we find 217 cells in this particular image!

cells_graph

Recap: The 6 Steps for Image Processing in Database

We have now covered 6 steps of image processing—image loading, filtering (smoothing), thresholding, cleanup via morphological operations, object recognition using connected components, and counting— all applicable in-database in a distributed fashion.

While our example used a single small image, the approach becomes particularly powerful when images are too large for in-memory operations or too numerous for sequential processing.

The tools and techniques described here are just a starting point for the diverse approaches and applications enabled by processing many large images distributed across the database.

Please feel free to ask questions or make comments.
We are happy to dialogue with you.

To Learn More:

  • Read Part I and other data science articles on our Twitter, LinkedIn, Facebook, Google Plus, and Youtube where you can learn about related topics like neurobiology, statistics, optimization, machine learning, cognitive vision, artificial intelligence, control robotics, signal processing, optics, security, and smart cameras.

About the Author

Biography

More Content by Ailey Crow
Previous
Pivotal's Cloud Platform Roadshow Brings Pivotal CF to Cities in the U.S. and Europe
Pivotal's Cloud Platform Roadshow Brings Pivotal CF to Cities in the U.S. and Europe

The Pivotal Cloud Platform Roadshow is now touring major cities in the United States and Europe, featuringP...

Next
Pair Design Rule #1: Wear One Hat at a Time
Pair Design Rule #1: Wear One Hat at a Time

Written by Pam Dineva and Kim Dowd. Imagine pairing with another designer for 6 months on a project. and l...

Enter curious. Exit smarter.

Learn More