A High-performing Mid-range NAS Server, Part 2: Performance Tuning for iSCSI

November 4, 2014 Brian Cunnie

This blog post describes how we tuned and benchmarked our FreeNAS fileserver for optimal iSCSI performance.

For most workloads (except ones that are extremely sequential-read intensive) we recommend using L2ARC, SLOG, and the experimental iSCSI kernel target.

Of particular interest is the experimental iSCSI driver, which increased our IOPS 334% and increased our sequential write performance to its maximum, 112MB/s (capped by the speed of our ethernet connection). On the downside, there was a 45% decrease in sequential read speed.

[2014-11-6 We have added a third round of benchmarks]

Using an L2ARC also improved performance (IOPS increased 46%, sequential write improved 13%, and sequential read decreased 4%).

We also experimented with three ZFS sysctl variables, but they were a mixed bag (they improved some metrics to the detriment of others).

Here is the summary of our results in a chart format:

Summary of Benchmark results. Note that Sequential Write and Read use the left axis (MB/s), and that IOPS is measured against the logarithmic right axis.

Summary of Benchmark results. Note that Sequential Write and Read use the left axis (MB/s), and that IOPS is measured against the logarithmic right axis.

There is no optimal configuration; rather, FreeNAS can be configured to suit a particular workload:

  • to maximize sequential write performance, use the experimental kernel iSCSI target and an L2ARC
  • to maximize sequential read performance, use the default userland iSCSI target and no L2ARC
  • to maximize IOPS, use the experimental kernel iSCSI target, L2ARC, enable prefetching tunable, and aggressively modify two sysctl variables.

0. Background

0.0 Hardware Configuration

We describe the hardware and software configuration in a previous post, A High-performing Mid-range NAS Server. Highlights:

  • FreeNAS 9.2.1.8
  • Intel 8-core Avoton C2750
  • 32GiB RAM
  • 7 x 4TB disks
  • RAIDZ2
  • 512GB SSD (unused)
  • 4 x 1Gbe

0.1 Metrics and Tools

We use bonnie++ to measure disk performance. bonnie++ produces many performance metrics (e.g. “Sequential Output Rewrite Latency”); we focus on three of them:

  1. Sequential Write (“Sequential Output Block”)
  2. Sequential Read (“Sequential Input Block”)
  3. IOPS (“Random Seeks”)

We use an 80G file for our bonnie++ tests. We store the raw output of our benchmarks in a GitHub repo.

0.2 iSCSI Setup

Our FreeNAS server provides storage (data store) via iSCSI to VMs running on our ESXi server. This post does not cover setting up iSCSI and accessing it from ESXi; however, Steve Erdman has written such a blog post, “Connecting FreeNAS 9.2 iSCSI to ESXi 5.5 Hypervisor and performing VM Guest Backups

0.3 iSCSI and the exclusion of Native Performance

Although we have measured the native performance of our NAS (i.e. we have run bonnie++ directly on our NAS, bypassing the limitation of our 1Gbe interface), we don’t find those numbers terribly meaningful. We are interested in real-world performance of VMs whose data store is on the NAS and which is mounted via iSCSI.

0.4 Untuned Numbers and Upper Bounds

We want to know what our upper bounds are; this will be important as we progress in our tuning—once we hit an theoretical maximum for a given metric, there’s no point in additional tuning for that metric.

The 1Gb ethernet interface places a hard limit on our sequential read and write performance: 111MB/s.

For comparison we have added the performance of our external USB hard drive (the performance numbers are from a VM whose data store resided on a USB hard drive). Note that the external USB hard drive is not limited by gigabit ethernet throughput, and thus is able to post a Sequential Read benchmark that exceeds the theoretical maximum.

Sequential Write
(MB/s)
Sequential Read
(MB/s)
IOPS
Untuned 59 74 99.8
Theoretical
Maximum
111 111
External
4TB USB3
7200 RPM
33 159 121.8

The raw benchmark data is available here.

1. L2ARC and ZIL SLOG

L2ARC is ZFS’s secondary read cache (ARC, the primary cache, is RAM-based).

Using an L2ARC can increase our IOPS “8.4x faster than with disks alone.

ZIL (ZFS Intent Log) SLOG (Separate Intent Log) is a “… separate logging device that caches the synchronous parts of the ZIL before flushing them to slower disk”.

Typically an SSD drive is used as secondary cache; we use a Crucial MX100 512GB SSD.

We will implement L2ARC and SLOG and determine the improvement.

1.0 Determine Size of L2ARC (190GB)

L2ARC sizing is dependent upon available RAM (L2ARC exacts a price in RAM), available disk (we have a 512GB SSD), and average buffer size (the L2ARC requires 40bytes of RAM for each buffer. Buffer sizes vary).

We first determine the amount of RAM we have available:

ssh root@nas.nono.com
 # determine the amount of RAM available
top -d 1 | head -6 | tail -3
Mem: 250M Active, 3334M Inact, 26G Wired, 236M Cache, 467M Buf, 929M Free
  ARC: 24G Total, 2073M MFU, 20G MRU, 120K Anon, 1303M Header, 574M Other
  Swap: 14G Total, 23M Used, 14G Free

We see we have 5GiB RAM at our disposal for our L2ARC (32GiB total – 1GiB Operating System – 26GiB “Wired” = 5GiB L2ARC).

We arrive at our L2ARC sizing experimentally: we note that when we use a 200GB L2ARC, we see that ~200MiB of swap is used. We prefer not to use swap at all, so we know that we want to reduce our L2ARC RAM footprint by 200MiB (i.e. instead of 5GiB RAM, we only want to use 4.8GiB). We find that a 190GB L2ARC meets that need.

For our configuration, we need 1GiB of RAM for every 38GB of L2ARC

1.1 Determine Size of SLOG (12GB)

We use this forum post to determine the size of our SLOG:

  • The SLOG “must be large enough to hold a minimum of two transaction groups”
  • A transaction group is sized by either RAM or time, i.e. “In FreeNAS, the default size is 1/8th your system’s memory” or 5 seconds
  • Based on 32GiB RAM, our transaction group is 4GiB
  • We will triple that amount to 12GB and use that to size our SLOG (i.e. our SLOG will be able to store 3 transaction groups)

We note that we most likely over-spec’ed our SLOG by a factor of 12, “I can’t imagine what sort of workload you would need to get your ZIL north of 1 GB of used space

1.1 Create L2ARC Partition

We use a combination of sysctl and diskinfo to determine our disks:

foreach DISK ( `sysctl -b kern.disks` )
  diskinfo $DISK
end
da8 512 16008609792 31266816    0   0   1946    255 63
da7 512 4000787030016   7814037168  4096    0   486401  255 63
da6 512 4000787030016   7814037168  4096    0   486401  255 63
da5 512 4000787030016   7814037168  4096    0   486401  255 63
da4 512 512110190592    1000215216  4096    0   62260   255 63
da3 512 4000787030016   7814037168  4096    0   486401  255 63
da2 512 4000787030016   7814037168  4096    0   486401  255 63
da1 512 4000787030016   7814037168  4096    0   486401  255 63
da0 512 4000787030016   7814037168  4096    0   486401  255 63

We see that da4 is our 512G SSD (and da8 is our 16GB bootable USB stick and the remaining disks are our 4TB Seagates which make up our RAID Z2).

We use gpart to initialize da4. Then we create a 190GB partition which we align on 4kB boundaries (-a 4k):

gpart create -s GPT da4
  da4 created
gpart add -s 190G -t freebsd-zfs -a 4k da4
  da4p1 added

Create a 12GB SLOG:

gpart add -s 12G -t freebsd-zfs -a 4k da4

1.2 Add L2ARC to ZFS Pool

We add our new L2ARC and SLOG partitions:

zpool add tank cache da4p1
zpool add tank log da4p2
zpool status

1.3 L2ARC Results

We perform 7 runs and take the median values for each metric (e.g. Sequential Write). The L2ARC provides us a 14% increase in write speed, a 4% decrease in read speed, and a 46% increase in IOPS.

Sequential Write
(MB/s)
Sequential Read
(MB/s)
IOPS
Untuned 59 74 99.8
200G L2ARC 67 71 145.7
Theoretical
Maximum
111 111

The raw benchmark data can be seen here.

2. Experimental Kernel-based iSCSI

FreeNAS 9.2.1.6 includes an experimental kernel-based iSCSI target. We enable the target and reboot our machine.

2.0 Configure Experimental iSCSI Target

  • We browse to our FreeNAS server: https://nas.nono.com
  • log in
  • click the Services icon at the top
  • click the “wrench” icon
To modify the iSCSI services settings and enable the experimental kernel driver, click the wrench icon

To modify the iSCSI services settings and enable the experimental kernel driver, click the wrench icon

  • check the Enable experimental target checkbox
Check the "Enable experimental target" to activate the kernel-based iSCSI target

Check the “Enable experimental target” to activate the kernel-based iSCSI target

  • click Save
  • we see a message: Enabling experimental target requires a reboot. Do you want to proceed?. Click Yes
  • our FreeNAS server reboots

2.1 Re-enable iSCSI

After reboot we notice that our iSCSI service has been disabled (bug?). We re-enable it:

  • We browse to our FreeNAS server: https://nas.nono.com
  • log in
  • click the Services icon at the top
  • click the iSCSI slider so it turns on

2.2 Kernel iSCSI Results

We perform 9 runs and take the median values for each metric (e.g. Sequential Write). The experimental iSCSI target provides us a 67% increase in write speed (hitting the theoretical limit), a 45% decrease in read speed, and a 334% increase in IOPS.

The decrease in read speed is curious; we hope it’s a FreeBSD bug that has been addressed in 10.0.

Sequential Write
(MB/s)
Sequential Read
(MB/s)
IOPS
Untuned 59 74 99.8
200G L2ARC 67 71 145.7
L2ARC +
Experimental
kernel-based
iSCSI target
112 39 633.0
Theoretical
Maximum
111 111

The raw benchmark data is available here.

3. L2ARC Tuning

We want to aggressively use the L2ARC. The FreeBSD ZFS Tuning Guide suggests focusing on 3 tunables:

  1. vfs.zfs.l2arc_write_boost
  2. vfs.zfs.l2arc_write_max
  3. vfs.zfs.l2arc_noprefetch

We ssh into our NAS to determine the current settings:

ssh root@nas.nono.com
sysctl -a | egrep -i "l2arc_write_max|l2arc_write_boost|l2arc_noprefetch"
  vfs.zfs.l2arc_noprefetch: 1
  vfs.zfs.l2arc_write_boost: 8388608
  vfs.zfs.l2arc_write_max: 8388608

3.0 l2arc_write_max and l2arc_write_boost

The FreeBSD ZFS Tuning Guide states, “Modern L2ARC devices (SSDs) can handle an order of magnitude higher than the default”. We decide to increase the amount from 8 MB/s to 201 MB/s (we increase it 25 times):

  • on the left hand navbar, navigate to System → Sysctls → Add Sysctl
    • Variable: vfs.zfs.l2arc_write_max
    • Value: 201001001
    • Comment: 201 MB/s
    • click OK
  • click Add Sysctl
    • Variable: vfs.zfs.l2arc_write_boost
    • Value: 201001001
    • Comment: 201 MB/s
    • click OK

3.1 l2arc_no_prefetch

vfs.zfs.l2arc_noprefetch is interesting: it allows us to cache streaming data. Unfortunately, it must be set before the ZFS pool is imported (i.e. it can’t be set in /etc/sysctl.conf; it must be set in /boot/loader.conf). That means we must set this variable as a tunable rather than as a sysctl:

  • on the left hand navbar, navigate to System → Tunables → Add Tunable
    • Variable: vfs.zfs.l2arc_noprefetch
    • Value: 0
    • Comment: disable no_prefetch (enable prefetch)
    • click OK

Reboot (browse the lefthand navbar of the web interface and click Reboot). Click Reboot when prompted.

3.2 L2ARC Tuning Results (IOPS improves)

We run our tests and note the following results:

  • Sequential write performance drops 35% from 112 MB/s to 73
  • Sequential read performance increases 35% from 39 MB/s to 53
  • IOPS performance more than doubles (120%) from 633 to 1392.
Sequential Write
(MB/s)
Sequential Read
(MB/s)
IOPS
Untuned 59 74 99.8
200G L2ARC 67 71 145.7
L2ARC +
Experimental
kernel-based
iSCSI target
112 39 633.0
L2ARC +
Experimental
kernel-based
iSCSI target
+ tuning
73 53 1392

The raw benchmark data can be seen here. The results are a mixed bag—we like the improved read and IOPS performance, but we’re dismayed by the drop in the write performance, which is our most important metric (our workload is write-intensive)

4. L2ARC Tuning Round 2 (worse all-around)

We disable pre-fetch.

  • vfs.zfs.l2arc_noprefetch=1

This requires us to reboot our NAS to take effect.

We also use a more rigorous approach to setting the two remaining variables by determining the write throughput of the SSD by copying a large file to a raw partition; we determine that the SSD can sustain a write throughput of 193 MB/s:

ssh root@nas.nono.com
 # add a 20G for benchmark testing
gpart add -s 20G -t freebsd-zfs -a 4k da4
  da4p3 added
 # let's copy an 11G file to benchmark the SSD raw write speed
dd if=/mnt/tank/big/iso/ML_2012-08-27_18-51.i386.hfs.dmg of=/dev/da4p3 bs=1024k
  dd: /dev/da4p3: Invalid argument
  11460+1 records in
  11460+0 records out
  12016680960 bytes transferred in 62.082430 secs (193560094 bytes/sec)
 # our SSD write throughput is 193 MB/s
 # remove the unneeded device
gpart delete -i 3 da4
  • vfs.zfs.l2arc_write_max=193560094
  • vfs.zfs.l2arc_write_boost=193560094
Sequential Write
(MB/s)
Sequential Read
(MB/s)
IOPS
Untuned 59 74 99.8
200G L2ARC 67 71 145.7
L2ARC +
Experimental
kernel-based
iSCSI target
112 39 633.0
L2ARC +
Experimental
kernel-based
iSCSI target
+ tuning
73 53 1392
L2ARC +
Experimental
kernel-based
iSCSI target
+ tuning
Round 2
65 35 1178

This has been a step backwards for us—every metric performed worse. We suspect that disabling pre-fetch was a mistake.

The raw data is available here.

5. L2ARC Tuning Round 3

We re-enable pre-fetch:

  • vfs.zfs.l2arc_noprefetch=0

We drop the value of the remaining sysctls from 193 MB/s to 81 MB/s (the FreeBSD Tuning Guide suggested an order-of-magnitude increase from the default of 8 MB/s; we increase by 10× (one order of magnitude) rather than by 23×):

  • vfs.zfs.l2arc_write_max=81920000
  • vfs.zfs.l2arc_write_boost=81920000

We run our benchmark again:

Sequential Write
(MB/s)
Sequential Read
(MB/s)
IOPS
Untuned 59 74 99.8
200G L2ARC 67 71 145.7
L2ARC +
Experimental
kernel-based
iSCSI target
112 39 633.0
L2ARC +
Experimental
kernel-based
iSCSI target
+ tuning
73 53 1392
L2ARC +
Experimental
kernel-based
iSCSI target
+ tuning
Round 2
65 35 1178
L2ARC +
Experimental
kernel-based
iSCSI target
+ tuning
Round 3
97 36 1633

This configuration has achieved the highest IOPS score of any we’ve benchmarked—a sixteen-fold increase from the untuned configuration. It approaches the IOPS of an SSD (~8,600).

This also posts the second-highest sequential write throughput, a quite-respectable 97 MB/s.

The sequential read is disappointing—the only good thing to say that it’s not the absolute worst (but it is the second-worst). To re-iterate, we hope that this is a kernel iSCSI bug that’s addressed in a future release of FreeBSD.

The raw data is available here.

6. Conclusion

6.0 No “Magic Bullet”

There’s no “magic bullet” to ZFS performance tuning that improves all metrics.

For most workloads (except ones that are extremely sequential-read intensive) we recommend using L2ARC, SLOG, and the experimental iSCSI kernel target.

6.1 Our Configuration

We chose the final configuration (best IOPS, second-best sequential write, second-worst sequential read) for our setup. Our workload is write-intensive and IOPS-intensive.

6.2 Test Shortcomings

  • Subtle inconsistencies: some tests were run with a 200GB L2ARC and no SLOG, and later tests were run with a 190GB L2ARC and a 12GB SLOG
  • Not completely-dedicated NAS: the NAS was not completely dedicated to running benchmarks—it was also acting as a Time Machine backup during the majority of the testing. It is possible that some of the numbers would be slightly higher if it was completely dedicated
  • The size of the test file (80G) was very specific: it was meant to exceed the ARC but not exceed the L2ARC. We ran three tests with file sizes smaller than the ARC, and the results (unsurprisingly) were excellent (4GB file, 8GB, and 16GB file)
  • The test took almost 24 hours to run; this impeded our ability to run as many tests as we would have liked
  • We would have liked to have run some of the benchmarks a second time to eliminate the possibility that our testbed changed (e.g. intense benchmarking may have caused the SSD performance to diminish towards the end of the testing)
  • We would have liked to have been able to eliminate the limitation of the 1 gigabit ethernet link; it would be interesting to see the performance with a 10Gbe link
  • The scope of the tests were very narrow (e.g. iSCSI-only, a very specific server hardware configuration). It would be overreaching to generalize these numbers to all ZFS fileservers or even all protocols (e.g. AFP, CIFS).

About the Author

Biography

More Content by Brian Cunnie
Previous
Introducing Agouti – A Golang Acceptance Testing Framework
Introducing Agouti – A Golang Acceptance Testing Framework

Ever wish you could write acceptance or integration tests for your Go-based web app without bringing in Cap...

Next
Mobile Technology Leaders Including Facebook, Google, and Twitter Converge At M1 Summit
Mobile Technology Leaders Including Facebook, Google, and Twitter Converge At M1 Summit

Pivotal now has what may be the world’s largest agile development team, building web and mobile apps, many ...

Enter curious. Exit smarter.

Register Now