OS-5564: new driver for Smart Array storage controllers

Details

Issue Type:Improvement
Priority:4 - Normal
Status:Resolved
Created at:2016-08-02T00:35:26.000Z
Updated at:2016-09-15T17:44:32.000Z

People

Created by:Joshua M. Clulow [X]
Reported by:Joshua M. Clulow [X]
Assigned to:Joshua M. Clulow [X]

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2016-09-15T09:21:44.000Z)

Fix Versions

2016-09-15 Yurufun (Release Date: 2016-09-15)

Related Issues

Related Links

Comments

Comment by Joshua M. Clulow [X]
Created at 2016-08-26T04:19:55.000Z
Updated at 2016-08-26T04:20:02.000Z

Out of the gate, we'll be binding the new smrt driver to these controllers:

GEN 6 CONTROLLERS (SHIPPED IN G7 BOXES?):
        Smart Array P212 Controller     0x103c  0x3241
        Smart Array P410 Controller     0x103c  0x3243
        Smart Array P410i Controller    0x103c  0x3245
        Smart Array P411 Controller     0x103c  0x3247
        Smart Array P812 Controller     0x103c  0x3249
        Smart Array P712m Controller    0x103c  0x324a
        Smart Array P711m Controller    0x103c  0x324b

GEN 8 CONTROLLERS:
        Smart Array P222 Controller     0x103c  0x3350
        Smart Array P420 Controller     0x103c  0x3351
        Smart Array P421 Controller     0x103c  0x3352
        Smart Array P822 Controller     0x103c  0x3353
        Smart Array P420i Controller    0x103c  0x3354
        Smart Array P220i Controller    0x103c  0x3355
        Smart Array P721m Controller    0x103c  0x3356

GEN 8+ CONTROLLERS:
        Smart Array P430i Controller    0x103c  0x1920
        Smart Array P830i Controller    0x103c  0x1921
        Smart Array P430 Controller     0x103c  0x1922
        Smart Array P431 Controller     0x103c  0x1923
        Smart Array P830 Controller     0x103c  0x1924
        Smart Array P731m Controller    0x103c  0x1926
        Smart Array P230i Controller    0x103c  0x1928

GEN 9 CONTROLLERS:
        Smart Array P244br Controller   0x103c  0x21bd
        Smart Array P741m Controller    0x103c  0x21be
        Smart Array H240ar Controller   0x103c  0x21bf
        Smart Array P440ar Controller   0x103c  0x21c0
        Smart Array P840ar Controller   0x103c  0x21c1
        Smart Array P440 Controller     0x103c  0x21c2
        Smart Array P441 Controller     0x103c  0x21c3
        Smart Array P841 Controller     0x103c  0x21c5
        Smart Array H244br Controller   0x103c  0x21c6
        Smart Array H240 Controller     0x103c  0x21c7
        Smart Array H241 Controller     0x103c  0x21c8
        Smart Array P246br Controller   0x103c  0x21ca
        Smart Array P840 Controller     0x103c  0x21cb
        Smart Array P542t Controller    0x103c  0x21cc
        Smart Array P240tr Controller   0x103c  0x21cd
        Smart Array H240nr Controller   0x103c  0x21ce

These are all SAS controllers that I am lead to believe will behave similarly. We have a P410i in the HP DL580 in the lab. The HP DL360p has a P420i.


Comment by Joshua M. Clulow [X]
Created at 2016-09-14T00:16:42.000Z

Testing

Builds

I have done a full (release) platform build. I have also done a DEBUG build of just the smrt module for the extra warnings that enables, as well as made sure the source is lint-clean.

Hardware

I have been running a battery of tests to generate a lot of ZFS I/O on both a HP DL360p G8 and a HL DL580 G7 in the SF lab. I have also been routinely scrubbing the pool without any problems.

Failure Testing

It's somewhat hard to induce the controller to fail in the ways that I'd like in order to test all of the error handling. Because it's magical hardware RAID, it tries very hard to shield us from any kind of errors at all, so most or all of the SCSI commands we send to it succeed eventually. I pulled out a single disk from a RAID-5 set which appeared not to interrupt I/O to the logical volume. Removing a second disk appears to cause the controller to stop responding to commands of any kind, including pings, but not to stop updating its heartbeat register. Once this happens we eventually panic after trying, unsuccessfully, to reset the controller.

I don't think the failure handling is any worse than in the old driver.

Crash Dumps

I have tested dumping a couple of times, most recently in order to run ::findleaks (no leaks detected!). This certainly works better than the old driver.


Comment by Bot Bot [X]
Created at 2016-09-15T09:20:36.000Z

illumos-joyent commit c86756a (branch master, by Joshua M. Clulow)

OS-5564 new driver for Smart Array storage controllers
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Approved by: Dave Pacheco <dap@joyent.com>


Comment by Bot Bot [X]
Created at 2016-09-15T09:20:39.000Z

smartos-live commit 4a7c365 (branch master, by Joshua M. Clulow)

OS-5564 new driver for Smart Array storage controllers
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Dave Pacheco <dap@joyent.com>


Comment by Bot Bot [X]
Created at 2016-09-15T17:33:25.000Z

illumos-joyent commit 8512364 (branch master, by Joshua M. Clulow)

OS-5564 new driver for Smart Array storage controllers (fix Makefile)


Comment by Bot Bot [X]
Created at 2016-09-15T17:43:13.000Z

illumos-joyent commit 41c2980 (branch master, by Joshua M. Clulow)

OS-5564 new driver for Smart Array storage controllers (missing intel Makefile)


Comment by Joshua M. Clulow [X]
Created at 2016-09-15T17:44:32.000Z

The build broke because when I did the full build in testing, I didn't clean out the proto/ area first, so a stale copy of the driver was in there. In addition, the incomplete set of Makefile changes was missed during review.