Here at Gillware, we use many LSI MegaRAID controllers, some of the more popular controllers out there. Although LSI Technologies was acquired by Avagotech in 2014, everyone still refers to them as LSI controllers, so we will too. Semantics aside, in this post we want to discuss a common performance killer you might encounter in your RAID array: scheduled consistency checks.
We discovered this issue after experiencing some I/O bottlenecks on a server running a SAS LSI 3108-based controller from Supermicro. The LSI 3108s you get with Supermicro servers are equivalent to the LSI MegaRAID 9300 series controllers, such as the 9361-4i and 9361-8i. Our configuration had a total of 36 8TB HGST helium-filled 12Gb/s SAS hard drives split into 3, 12-drive RAID-6 virtual disks (MegaRAID virtual disks, not VMWare/Hyper-V), for a total of 240TB of usable capacity. On a weekly basis, the controller was dutifully consistency checking all 240TB of our data. This check would take about 24 hours, so about 15% of the time we were operating with reduced I/O capacity. This was completely unnecessary, as the server is in a high quality data center and was not experiencing any issues.
To give more background on the problem at hand, we will discuss consistency checks a bit. By default, an LSI MegaRAID controller will run a background consistency check on all of its RAID arrays once per week. Stating the obvious and without getting into too much boring detail, consistency checks comb through your data to ensure it’s consistent. This means it checks your virtual drives for any parity or block errors. If it finds an error or a bad block, it can be rewritten with the correct data.
When running the consistency check, a certain percentage of capacity is devoted to performing the check, the default being 30% (you can change this value). There are also two types of consistency checks you can run, concurrent or sequential. Concurrent is the default setting and checks every drive at the same time, or concurrently. Sequential is when it checks each drive individually and in order, so there is theoretically less of a strain on your array.
A consistency check is a great tool in theory and can be useful in situations where you think something might be wrong with your RAID, but the default time interval for consistency checks is every 168 hours, or once a week. If you’re running a large array with a whole bunch of disks and blocks to check, it can take anywhere from 8-24 hours. That’s up to an entire day of sluggish performance sacrificed for the sake of a check that likely isn’t even necessary, exactly like what was happening to our RAID.
Unless you’ve had an abnormal event like a power loss or have some other reason to think something is wrong with your RAID, then there’s really no reason to run consistency checks that often. Once every few months (8 to 12 weeks) or even a few times a year should be adequate, and if you feel something is wrong, just go ahead and schedule one for a Saturday when your users will not complain of slow performance. The following pictures provide more specific information on how to check and change some of these values using StorCLI.
Here you can see the command to run StorCLI and then show information on consistency checks: /c0 show cc. In the information that follows, we can see the current consistency check mode is Concurrent, the execution delay (when it runs) is every 168 hours, or once per week, and when the next check will run, in this case at 12:00 am of February 26th.
To completely turn off automatic consistency checks, the command is /c0 set cc=off. The information that follows should indicate CC Mode is “OFF”.
In the picture above, we display how to set consistency checks to run sequentially, run once every 1344 hours (8 weeks), and when to start the next consistency check and therefore begin the schedule. The command is /c0 set cc=seq delay=1344 starttime=2016/02/27 04. Note that the delay must be set in hours, and the final value of “04” sets the check to run at 4:00 am on the 27th. You must choose a value >24 to set the run hour. We only chose the 27th because it was a Saturday. Additionally, to have the check run concurrently, the command is the same, except replace “cc=seq” with “cc=conc”.
To display the capacity your check is set to run at using StorCLI, the command is /c0 show ccrate. The value show in this picture is the default rate, or 30%.
Finally, to set the capacity rate that the consistency check will use, the command is /c0 set ccrate=15. Obviously you can change the number to be whatever % you would like the check to use, but in this instance we just used 15% so the check isn’t using quite so much capacity as the default of 30.