The Cog

That Little Bit of Knowledge That Makes Everything Work

How To Spin Down Hard Disks at Shutdown on LSI HBAs on Linux

Anyone who has a file server or just a lot of disks in a workstation knows that the best and most reliable way to connect them is with a proper server grade HBA or RAID card, usually with a chipset made by LSI, such as the famous SAS2008.

One thing you may have noticed is that when running Linux, the disks are not spun down when the system is shut down. Instead, the power is cut with the disks still spinning, causing an emergency retract cycle. This is extremely hard on the drive mechanics and will ultimately kill the drive after only several hundred cycles, as opposed to the hundreds of thousands of regular cycles the drives can handle.

You can solve this problem fairly easily. Each disk on the system has an entry at /sys/class/scsi_disk. In there you will find a property called manage_start_stop. Setting this to a value of 1 will enable spindown at shutdown. You will also notice that regular ATA disks already have this set to 1, but drives on your HBA are set to 0 by default.

root@tesla:~$ cat /sys/class/scsi_disk/1\:0\:0\:0/manage_start_stop
root@tesla:~$ echo 1 > /sys/class/scsi_disk/1\:0\:0\:0/manage_start_stop
root@tesla:~$ cat /sys/class/scsi_disk/1\:0\:0\:0/manage_start_stop

You can easily set this property on all drives by adding this little script to /etc/rc.local:

for i in /sys/class/scsi_disk/*/manage_start_stop; do echo 1 > $i; done

The SoundPeats Q23: Just Your Average Bluetooth Headphones

SoundPeats sent me this pair of Bluetooth headphones to review. They retail for $40 CAD. My main review and teardown can be seen on my YouTube channel. This post serves only to act as a summary.

These headphones (or technically headset since they have an integrated microphone) don’t stand out from other products in the same price range, however this is not necessarily a bad thing.

The build quality is quite good, and the feature set is not lacking anything critical. The most important specifications such as battery life and range are not exaggerated from the manufacturer spec. of 6 hours and 10m respectively.

That being said, the drivers have a sound signature similar to that of wired headphones in the $15-30 range and require a great deal of equalisation to render them usable. Driving them at high power does cause distortion above 6kHz. They have a massive peak at 3kHz which must be toned down.

The noise floor of the built-in DAC is quite high and is very audible. I do not recommend buying these if you intend to listen in a quiet environment as you will hear a constant hissing.

Taking them apart reveals a CSR 8635 all-in-one Bluetooth headset chipset. It is nice to see a name-brand chip, however after reading the datasheet and inspecting the PCB, I suspect that they have sacrificed sound quality in order to squeeze the most out of the tiny 80mAh cell. The chip should be capable of a 95dB SNR, however they appear to be using switch-mode regulators to power the chip’s 1.35V rails, including the analog supply, instead of the included LDOs. This drops the SNR well into the audible range (I suspect ~80dB, but cannot test). As this regulator requires firmware to enable, it is more than just a board level mod to change.

In addition, the chip’s 80MIPS DSP (which can implement a 5-band equaliser using CSR’s stock ROM) is also not being utilised. If it had been, the headset might not have required the user to have a software equaliser on their device in order to correct for the poor driver performance.

The device also lacks the 2 data wires to the USB microB connector which would have allowed for use as a USB sound card. The firmware instead does not even allow the device to be turned on when plugged in, a feature which the chip does support.

All-in-all, they are not bad, and on the surface they seem fine when compared to what is in the market. It was only when I opened them up that I noticed what could have been done, and that left me somewhat disappointed. They appear to have come so close on the design only to let me down by making what I can only assume are marketing decisions.

Ubuntu 16.04 Runs Automatic Updates Whether You Say So or Not in the Installer

After selecting “No automatic updates” in the Ubuntu 16.04 server installer, I expected that the system would not attempt to upgrade anything without my permission, just like every version before. Turns out that is not the case. I found out the hard way when my production database shut down to upgrade in the middle of the day.

You can turn it off totally by editing the following lines in /etc/apt/apt.conf.d/10periodic:

APT::Periodic::Download-Upgradeable-Packages “0”;
APT::Periodic::Download-Upgradeable-Packages-Debdelta “0”;
APT::Periodic::AutocleanInterval “0”;

And editing this line in /etc/apt/apt.conf.d/20auto-upgrades:

APT::Periodic::Unattended-Upgrade “0”;

Install zfs-auto-snapshot on Ubuntu 16.04 LTS/Debian

zfs-auto-snapshot not being available for 16.04 was not a situation I thought I would run into, yet here we are, disappointed by Canonical once again.

In the meantime, I have ported the package from Trusty (14.04.5) to work on Xenial (16.04.1) and Debian Jessie.

You can download the Ubuntu 16.04 package here, and install it by running:
# dpkg -i zfs-auto-snapshot-trustyport.deb

You can download the Debian package here, and install it by running:
# dpkg -i zfs-auto-snapshot-debian.deb

(edit: I borked the permissions on the files in the Ubuntu .deb. You will need to chown them after install. The Debian package is correct.)

Romaco Radio EP4 – How to Get Started With Open Source

A bit of information as to how to get your foot in the door on an open source project.

Romaco Radio EP4 – How to Get Started With Open Source [26:15]

Romaco Radio EP3 – Why HD Audio is Useless

A little technical explanation about why 24bit/96kHz audio isn’t all it’s talked up to be.

Romaco Radio EP3 – Why HD Audio is Useless [34:02]

Romaco Radio EP2 – How I Got Started with Microcontrollers

A tale of the decisions I made in choosing what microcontroller architecture to learn first and how I got started with embedded programming.

Romaco Radio EP2 – How I Got Started with Microcontrollers [22:16]

A Tale of ZFS’ Success

ZFS is great. We hear it all the time, but you hope you will never need to utilize any of it’s advanced data-protecting features. I recently went through a series of events which makes me truly grateful for ZFS. This is a real world example that would have gone horribly different if I wasn’t using ZFS.

It’s Wednesday April 29th, 2015. I get home from work at about 6pm, fire up my workstation and tend to some emails. I look down and see the HDD LED on my file server, tesla, is solidly lit. All normal; today is a Wednesday and tesla started a scheduled ZFS scrub at 2am, which probably won’t complete until Thursday morning. I SSH in to check to see it’s progress and notice something new in the output of zpool status (serial numbers have been redacted in all the screenshots):

Screenshot from 2015-04-29 22:38:28-2

Some data was silently corrupted and repaired on one disk. To some this might not seem very odd, this happens a lot on some systems, but in over 2 years with this server, I’ve never seen this before. I immediately checked the SMART on the disk with corrupt data and found that 36 sectors were unrecoverably errored and the disk had 1091 pending sectors for reallocation. I was a little concerned, but in the past I’ve had several disks in other machines get a few bad sectors without any further issues for the rest of their lives.

I monitored the disk over the next few days and watched as the stats got worse:

  • 04/30/2015, reallocated sector count: 592, pending reallocation count: 7640.
  • 05/02/2015, reallocated sector count: 592, pending reallocation count: 7744.
  • 05/04/2015, reallocated sector count: 592, pending reallocation count: 7816.
  • 05/05/2015, reallocated sector count: 600, pending reallocation count: 7848.

My first thought was that this could be one of two cases. The first is that there is a bad portion of the disk which is slowly getting flushed out. The second is that there is a mechanical failure which is manifesting itself as read errors. I made a simple decision; if the disk has more serious issues during next week’s scrub, I will replace the disk. I had gone out earlier that week and bought a new disk because they were on sale, so I could replace the hot spare if need be.

Wednesday comes along and after work I come home to find this:

Screenshot from 2015-05-06 17:29:10r

Well, I guess my decision is made. Time to replace the drive now and investigate later. I replaced the drive with the hot spare and the resilver began.

Screenshot from 2015-05-06 18:26:06r


Screenshot from 2015-05-07 18:26:01r

After detaching the bad drive, the pool looks good again.

Screenshot from 2015-05-07 18:26:10r

The job isn’t over yet though. Time to replace the bad drive with a new one to act as a hot spare. Doing this requires shutting down the server because the case does not have hot-swap bays. After replacing the drive and logging in, I’m faced with a terrifying sight. I don’t have a screenshot of this one because I was a little in shock. 3 vdevs were unavailable, and the pool was degraded. The drives were on different controllers, so the only thing I could think of was that some cables had come loose in the drive replacement. So I promptly shut down the server and opened it up again. On one drive, the power connector was slightly askew, but the other 2 had no signs of issue, so I just unplugged and plugged them back in at both ends.

At this point I was unsure of what would happen when I rebooted the server. Would ZFS be able to reassemble the pool now that the vdevs were reattached? Would I have to resilver the entire pool again? The only way to know was to power it up.

Screenshot from 2015-05-10 15:05:36r

Now that’s impressive. ZFS resilvered only the changed data and the pool is back online again.

This series of events isn’t that uncommon, but the outcome would have likely been very different if I was not using ZFS. First of all, without ZFS I likely would not have been able to detect the silent corruption on the disk when I did. I would have eventually noticed as there were errors in the kernel log and SMART logs, but that could have been days or weeks later, after more damage had been done. Even if I knew there were errors, other implementations of RAID wouldn’t have been able to correct them. My best bet would have been to remove the disk entirely and rebuild the array from parity and hope that the parity was valid. If upon replacing the bad drive the cables had come loose as they did in my adventure, the array would most likely be destroyed. Most RAID implementations will want to resync a disk completely upon it disappearing for any period of time. The loss of 3 drives would mean that almost any RAID implementation would not have enough data to rebuild.

Experimental Differences in Audio Compression Formats

I think everyone knows that lossy audio compression formats in the likes of MP3 and AAC sacrifice audio quality for a smaller file size. Some people like myself can distinguish between lossless and lossy copies of the same song just by ear, but most cannot, or simply do not know what to listen for.

You can easily take a diff of 2 text files and see the differences, so why can’t you do that with audio?

For my experiment, I began with Jillette Johnson’s song “Torpedo” stored as lossless 16bit/44.1kHz CD audio in the FLAC format. For copyright reasons, I won’t provide a download link to the file, but the Internet is a thing, so you can find it if you want.
I began by transcoding this original into several different formats with varying parameters:

  1. 320kbps CBR MP3 – 10.9MB
  2. 256kbps CBR MP3 – 9.1MB
  3. 256kbps VBR MP3 – 8.4MB
  4. 128kbps CBR MP3 – 5.6MB
  5. 256kbps CBR AAC – 5.4MB
  6. 500kbps VBR OGG – 15.5MB
  7. 256kbps VBR OGG – 9.6MB

CBR stands for constant bitrate, and VBR is variable bitrate.

I took one channel from each converted file and the original FLAC file and aligned them to the sample. I then took the difference of the 2 waveforms. This resulting difference is the error induced by the compression algorithm. I saved 3 data points from each comparison: the waveform as an image, the frequency spectrum versus amplitude plot, and a rendered audio file of the error. The waveforms plot the relative amplitude at each moment in time from 0 to 1. The frequency spectrums show the average amplitude in decibels (dB) of each frequency from 0Hz to 22,050Hz (the maximum of CD audio) over the entire signal. 0dB is the highest amplitude that can be stored in the audio file. All amplitudes are taken relative to this maximum and are expressed in negative decibels, so the more negative the amplitude, the quieter is is. I have chosen to render the error signals as uncompressed WAV files in order to accurately represent them in their entirety and still allow them to be played in a web browser (most don’t have FLAC support yet). I have not amplified any of the error signals to keep everything as accurate as possible. If you intend to listen along, try seeking the track to the times listed along the top of the waveform images to hear how different parts of the song sound, like the chorus at 1:40. Also note that I’m not really interested in the exact error versus filesize figures, so feel free to figure them out on your own.

Just a disclaimer; I am not an audio engineer and these tests are in no way scientifically accurate. This is just a home-grown experiment.

Let’s take a look at the 320kbps CBR MP3 first. You can click any image to enlarge it in another tab.

320kbps CBR MP3 Error

320kbps CBR MP3 Error

320 CBR MP3 Spectrum

320 CBR MP3 Spectrum

Listen to the 320kbps CBR MP3 Error

As you can see (and hear), the error is clearly there. If you know the song, you can probably even make out lyrics. The first thing to note is that the highest amplitude error is in the high frequencies, specifically above 16kHz. This is not surprising, because this is how MP3 was designed. MP3 takes in to account how we perceive sound. Frequencies in the range of the human voice for example are most important, and high frequency content which is for some people not even audible at all, is least important. The highest error amplitude peaks at -53dB with the average around -61dB.

In order to make some comparisons, let’s take a look at 256kbps CBR MP3.


256kbps CBR MP3 Error

256kbps CBR MP3 Error

256kbps CBR MP3 Spectrum

256kbps CBR MP3 Spectrum

Listen to the 256kbps CBR MP3 Error

Notice that the highest peak is at -52dB, not much higher than the 320kbps CBR file, however the other frequencies are significantly higher, by about 3 to 4 dB. Remember that dB is a logarithmic scale, so an increase by 3dB is roughly 10 times louder.

I have heard that VBR is “better” than CBR, so I decided to test that as well.

256kbps VBR MP3 Error

256kbps VBR MP3 Error

256kbps VBR MP3 Spectrum

256kbps VBR MP3 Spectrum

Listen to the 256kbps VBR MP3 Error

Going from 256kbps CBR to VBR, the spectrum is very similar with the exception of the high frequencies. The peak that was very prominent at 320kbps CBR, and slightly less at 256kbps CBR, is now actually lower in amplitude than the lower frequencies. At a glance it looks like the error has actually decreased, but look at the scale. The overall error has actually increased by 1 to 2dB. Now given that the file size is 700kB lower, it’s debatable whether VBR is “better”.

Just to round out this part of the experiment, I tested 128kpbs CBR MP3 as well.

128kbps CBR MP3 Error

128kbps CBR MP3 Error

128kbps CBR MP3 Spectrum

128kbps CBR MP3 Spectrum

Listen to the 128kbps CBR MP3 Error

The average error is a whopping 10dB higher than the 256kbps CBR, and follows an almost linear drop as the frequency increases. This is consistent with the gradual drop off of average amplitude in the original file, so statistically that is expected. At this point the song is becoming scarily audible; making out lyrics is not difficult at all. Just remember that what you hear in the error signals is what is missing from the transcoded file.

Who says that MP3 is the only format used for lossy compression? I thought I’d start off with the lesser known OGG Vorbis audio codec. The following is the result of transcoding at 256kbps VBR.

256kbps VBR OGG Error

256kbps VBR OGG Error

256kbps VBR OGG Spectrum

256kbps VBR OGG Spectrum

Listen to the 256kbps VBR OGG Error

This is quite different than what we saw with the MP3 spectrum, and it sounds very different too. The OGG codec has higher error at low frequencies, which is easier to hear than see, however the overall average error is less than that of MP3 over the rest of the spectrum, by about 2dB. Being 1.2MB larger in size than the 256kbps VBR MP3, this should be expected.

Unlike MP3, OGG is not limited to 320kbps. I ran the test at a bitrate of 500kbps VBR.

500kbps VBR OGG Error

500kbps VBR OGG Error

500kbps VBR OGG Spectrum

500kbps VBR OGG Spectrum

Listen to the 500kbps VBR OGG Error

The filesize is 5.9MB larger than the 256kbps VBR OGG file, but the error is a massive 15dB lower on average. Surprisingly, the error at the low end is still worse than that of 320kbps CBR MP3. Also note that just because the spectrum plot bottoms out around 17kHz does not mean that there is no error there, it is just at -82dB, which is near the edge of the zoomed in area.

The last file type I compared, is 256kbps CBR AAC, the same format used by Apple for most iTunes downloads.

256kbps CBR AAC Error

256kbps CBR AAC Error

256kbps CBR AAC Spectrum

256kbps CBR AAC Spectrum

Listen to the 256kbps CBR AAC Error

I was kind of horrified at how poor the output was. The error is only about 2dB less than the 128kbps CBR MP3 file. The only saving grace for AAC is that the resulting filesize was 5.4MB, 200kB less than the 128kbps CBR MP3. The error also sounds very different than the other formats.

A common scenario is the conversion of one lossy format to another. In my library, all files are either FLAC or MP3, so when purchasing music off iTunes for example, it would make sense for me to transcode the files to 320kbps CBR MP3.

256kbps CBR AAC to 320kbps CBR MP3 Error Difference

256kbps CBR AAC to 320kbps CBR MP3 Difference Spectrum

256kbps CBR AAC to 320kbps CBR MP3 Difference Spectrum

Listen to the 256kbps CBR AAC to 320kbps CBR MP3 Error

The error doesn’t seem that bad, however, remember that this is the error from the second transcode. This error compounds on the error from the original transcode. The following is a diff taken from the original FLAC to the new MP3.

256kbps CBR AAC to 320kbps CBR MP3 Error Difference from FLAC

256kbps CBR AAC to 320kbps CBR MP3 Error Difference from FLAC

256kbps CBR AAC to 320kbps CBR MP3 Error Difference from FLAC

256kbps CBR AAC to 320kbps CBR MP3 Error Difference from FLAC

Listen to the 256kbps CBR AAC to 320kbps CBR MP3 versus FLAC Error

The total error is almost no different from the original AAC. This is not surprising, as the relative amplitude of the new error in the AAC to MP3 conversion is several orders of magnitude lower than the original error in the AAC file.

In a future post I’ll explain more about errors in audio reproduction more mathematically, including quantization error and bit depth, sample rate, and noise floors. Until then, I hope this was interesting and provided a little more insight into audio compression quality.

Fix NFS Stalls on Linux with Fast Networking (Like InfiniBand)

Update 4/8/2017: This issue is actually caused by an underlying memory allocation deadlock bug in the ib_mthca kernel driver. The only true fix is to buy new hardware which does not use that driver and instead uses mlx4_ib. The following is left up for reference.

For the last 6 months, I’ve had the most frustrating issue with my 20Gbps Infiniband home network. Copying files from my file server would ultimately cause NFS to lockup for an indeterminate period of time. During that time, which could be anywhere from 3 seconds to 3 hours, there is no traffic on the IB link whatsoever. I spent several weekends chasing red herrings – everything from hardware issues like bad cables and overheating HCAs, to needless kernel driver changes and firmware upgrades. From my testing I determined some interesting properties of the failure mode:

  • The failure was not limited to just NFS, it also happened with Samba (SMB/CIFS) and would even affect iperf.
  • When the connection was stalled, the IB link still works. I could SSH over it just fine.
  • The issue only affected the outgoing direction of iperf from my workstation. It could receive fine, just not send.
  • When NFS stalls, it takes 200% CPU. When anything else stalls, it uses no extra CPU.
  • The last packet sent from the client is always an ACK. TCP keepalive packets are sent every 60 seconds while the connection hangs. The transfer(s) are otherwise normal, with the only exception being the huge delay in between packets. (I’m using TCP rather than RDMA due to a kernel bug affecting 3.15 and earlier.)

I was convinced that this was a kernel bug, and my suspicions were confirmed, but not in the way I’d expected. It turns out that the issue is caused by the Linux kernel swapper daemon, kswapd. I noticed that every time NFS would lockup, the CPU would spike to 200%. One of those processes was a kworker, the other was kswapd.

Wait a second. I don’t have any swap.

This immediately struck me as odd. What would the kernel be trying to swap? I was only using 50% of 16GiB of RAM, so there shouldn’t be any issues regarding OOM conditions. A quick Google search returned some interesting info. Apparently this has been a known issue for a while, and according to some, kswapd is in need of a rewrite for modern systems.

From what I understand, during the file transfer(s), some data is cached in RAM as usual. As the transfers progress, the RAM slowly fills to capacity with cached data. I’m also running ZFS, so that can’t help with the caching, as its ARC will try to use 7/8th of the available memory for itself. At some point, kswapd scans through all the memory looking for what to evict to disk. This process takes 100% CPU and blocks NFS while it does… something?

This leaves the question as to what we can do. The short answer is that there isn’t a proper fix at the moment of writing this, but there are a few workarounds.

The first is to set vm.swappiness = 0 to reduce the likelihood that kswapd will want to ruin your day. This change will not solve the issue on its own, but it does seem (at least in my case) to reduce the severity. You can set this parameter on the fly by running sudo sysctl -w vm.swappiness=0, and you can make it permanent by adding vm.swappiness=0 to /etc/sysctl.conf.

The real workaround is to clear the cache completely. This frees up a massive amount of memory and immediately removes anything for kswapd to do. You can clear the cache by running as root: echo 3 > /proc/sys/vm/drop_caches. My solution was to create a bash script which runs that command once every 10 minutes. In my cases that is enough to prevent kswapd from ever doing anything, and my NFS transfers running. I just start the script and keep it running while I’m doing intensive transfers. I kill it when I don’t need it, as there will be an increase in I/O in general, it is purging cache after all.

I hope that this issue gets addressed soon in the mainline kernel.