Home
South Pole Logbook

Search below for 'logbook_sop' for help on usage.

Sections

Search

Archives

November 2009
Sun Mon Tue Wed Thu Fri Sat
         

RSS Feed

Powered by Blosxom


Aug 17, 2009

sps-supercore RAID problems


By: Tex
Time: 20090817 2240Z
Music: None

HP web gui not accessible as non-root and seems to have different root
password. Cmd line shows:
=> ctrl all show config detail

Smart Array 6i in Slot 0
   Bus Interface: PCI
   Slot: 0
   RAID 6 (ADG) Status: Disabled
   Controller Status: OK
   Chassis Slot:
   Hardware Revision: Rev B
   Firmware Version: 2.68
   Rebuild Priority: Low
   Expand Priority: Low
   Surface Scan Delay: 15 sec
   Cache Board Present: True
   Cache Status: OK
   Accelerator Ratio: 100% Read / 0% Write
   Total Cache Size: 64 MB
   Battery Pack Count: 0
   SATA NCQ Supported: False

   Array: A
      Interface Type: Parallel SCSI
      Unused Space: 0 MB
      Status: OK

      Logical Drive: 1
         Size: 271.3 GB
         Fault Tolerance: RAID 5
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Stripe Size: 64 KB
         Status: Ready for Rebuild
         Array Accelerator: Enabled
         Parity Initialization Status: Initialization Completed
         Unique Identifier: 600508B1001FFFFFA003EA0AF5B90008
         Disk Name: /dev/cciss/c0d0
         Mount Points: /boot 250 MB, / 267.7 GB
         Logical Drive Label: A003EA0AF5B9

      physicaldrive 2:0
         SCSI Bus: 2
         SCSI ID: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: Parallel SCSI
         Transfer Mode: Ultra 320 Wide
         Size: 72.8 GB
         Transfer Speed: 320 MB/Sec
         Rotational Speed: 10000
         Firmware Revision: HPB7
         Serial Number: 3HZ90T7A00007506P4QV
         Model: COMPAQ  BD07285A25
      physicaldrive 2:1
         SCSI Bus: 2
         SCSI ID: 1
         Status: OK
         Drive Type: Data Drive
         Interface Type: Parallel SCSI
         Transfer Mode: Ultra 320 Wide
         Size: 72.8 GB
         Transfer Speed: 320 MB/Sec
         Rotational Speed: 10000
         Firmware Revision: HPB1
         Serial Number: DAL1P670DEUR0627
         Model: COMPAQ  BD07289BB8
      physicaldrive 2:2
         SCSI Bus: 2
         SCSI ID: 2
         Status: OK
         Drive Type: Data Drive
         Interface Type: Parallel SCSI
         Transfer Mode: Ultra 320 Wide
         Size: 72.8 GB
         Transfer Speed: 320 MB/Sec
         Rotational Speed: 10000
         Firmware Revision: HPB6
         Serial Number: B4R54VTM
         Model: COMPAQ  BD072863B2
      physicaldrive 2:3
         SCSI Bus: 2
         SCSI ID: 3
         Status: OK
         Drive Type: Data Drive
         Interface Type: Parallel SCSI
         Transfer Mode: Ultra 320 Wide
         Size: 72.8 GB
         Transfer Speed: 320 MB/Sec
         Rotational Speed: 10000
         Firmware Revision: HPB1
         Serial Number: DAL1P670DENP0627
         Model: COMPAQ  BD07289BB8
      physicaldrive 2:4
         SCSI Bus: 2
         SCSI ID: 4
         Status: Predictive Failure
         Drive Type: Data Drive
         Interface Type: Parallel SCSI
         Transfer Mode: Ultra 320 Wide
         Size: 72.8 GB
         Transfer Speed: 320 MB/Sec
         Rotational Speed: 10000
         Firmware Revision: HPBC
         Serial Number: J2072B0K
         Model: COMPAQ  BD0728A4B4
      physicaldrive 2:5
         SCSI Bus: 2
         SCSI ID: 5
         Status: OK
         Drive Type: Spare Drive
         Interface Type: Parallel SCSI
         Transfer Mode: Ultra 320 Wide
         Size: 72.8 GB
         Transfer Speed: 320 MB/Sec
         Rotational Speed: 10000
         Firmware Revision: HPB2
         Serial Number: AAL1P5205CT10507
         Model: COMPAQ  BD0728856A

=>                                   
So Drive 4 is in predictive failure. Drive 1 was replaced earlier. Since 5 is
the hot spare, why didn't drive 1 failure activate the sparing? This looks
like the usual problem we have with these arrays.  1 drive is marked bad or
predictive failure, on replacement rebuild fails, usually with no information.
On reboot, sometimes another drive gets marked as bad or predictive failure
which == data loss. Unclear why predictive failure == real failure.


Edgar Nielsen | 17 Aug 2009 17:45 GMT | Ice Cube/SPS | | permalink