Has this raid1 software array failed? (mdadm)

Asked 15 years, 7 months ago

Viewed 5k times

Long version: I am running a Red Hat Enterprise Linux 5 (REHL5) machine with software raid1 (mdadm).

A few days ago I went to backup some MySQL data and all of sudden I could no longer log into the machine. I typed in a username to login and then it would just sit there. If a pressed control sequences they would appear on the screen but it would never log in. It also did not respond to ctrl+alt+delete. So I did a hard power down.

I booted it back up and monitored the raid1 array via:

mdadm --detail /dev/md1

This array holds the root mount point.

It began to do a resync of the array. I am not sure if this happened because of the crash or just because I did a hard power down. Either way I let it finish:

[f@mysqldatanode ~]# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Thu Apr 19 15:28:52 2007
     Raid Level : raid1
     Array Size : 479893568 (457.66 GiB 491.41 GB)
    Device Size : 479893568 (457.66 GiB 491.41 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Dec 25 10:03:50 2009
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : ab4849de:1f4f41c4:defd01e8:a4979ca6
         Events : 0.78

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2

I looked through some logs (/var/log/messages*) and found several messages like the one below indicating hard-drive trouble:

Dec 21 11:39:47 localhost kernel: sd 0:0:1:0: SCSI error: return code = 0x08000002
Dec 21 11:39:47 localhost kernel: sdb: Current: sense key: Medium Error
Dec 21 11:39:47 localhost kernel:     Additional sense: Unrecovered read error
Dec 21 11:39:47 localhost kernel: Info fld=0x3348912
Dec 21 11:39:47 localhost kernel: end_request: I/O error, dev sdb, sector 53774610
Dec 21 11:39:47 localhost kernel: raid1:md1: read error corrected (8 sectors at 53565760 on sdb2)
Dec 21 11:39:48 localhost kernel: raid1: sdb2: redirecting sector 53565648 to another mirror

So then I tried to look for badblocks and it locked up again in the same fashion.

[f@mysqldatanode ~]# badblocks -s /dev/md1
Checking for bad blocks (read-only test):               0/      479893568

So how should I go about evaluating the health of the two drives? Since the array in question holds the root mount point do I need to move them to another machine to analyze them?

阿里山紀錄

Book navigation

20250819 Has this raid10 software array failed? (mdadm)

Has this raid1 software array failed? (mdadm)