1and1 Help Centre Categories

print article

Rebuild the Software RAID Array after a Drive Replacement

For 1&1 Servers Linux with RAID software

Learn how to rebuild a software RAID array on a Linux server after a failed hard drive replacement.

This article explains how to rebuild a software RAID array on a Linux system after a drive replacement. This article does not apply to 1&1 Dedicated Servers that utilise a hardware RAID controller.

Please note:
In this example, the /dev/sdb drive has been replaced. Be careful when referencing this guide, remembering to specify the correct drive/partitions that apply to your scenario.
Step 2
Check the status of the RAID array by using the cat /proc/mdstat command. This will output the multi-disk status. Below is an example of two functioning RAID arrays.
[root@u12345678 ~]# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid6] [raid5] [raid4] md3 : active raid1 sda3[1] sdb3[2]
238324160 blocks [2/2] [UU]

md1 : active raid1 sda1[0] sdb1[1]
3911680 blocks [2/2] [UU]
unused devices: <none>
[root@u12345678 ~]#
Step 3
Since each array has a status of [2/2] and [UU] this shows that out of 2 devices in the array, 2 devices are functional and both are up.

Below is a failed drive (sdb) has been replaced with a new blank drive and with the status [2/1] and [U_] since out of 2 devices in the array, 1 is functional and 1 is up. The sda drive is still functioning while the sdb drive has been replaced and needs to be added back into the array and rebuilt. The output shows that the sdb drive has failed by the trailing (F) behind sdb3[2] and sdb1[2].

It is also possible that one of the devices in each array may not be listed at all. In such a case, the next step can be skipped.

[root@u12345678 ~]# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid6] [raid5] [raid4]
md3 : active raid1 sdb3[2](F) sda3[0]
238324160 blocks [2/1] [U_]md1 : active raid1 sdb1[2](F) sda1[0]
3911680 blocks [2/1] [U_]unused devices: <none>
[root@u12345678 ~]#
Step 4
Remove the failed devices (with a trailing F) from the array. If instead of the devices being marked as failed, they are already removed from the array, you can skip this step.
[root@u12345678 ~]# mdadm --manage /dev/md1 --remove /dev/sdb1
mdadm: hot removed /dev/sdb1
[root@u12345678 ~]# mdadm --manage /dev/md3 --remove /dev/sdb3
mdadm: hot removed /dev/sdb3
[root@u12345678 ~]#
Step 5
Run the fdisk -l command to list the partitions of all the drives.
[root@u12345678 ~]# fdisk -l

Disk /dev/sda: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 1 487 3911796 fd Linux raid autodetect
/dev/sda2 488 731 1959930 82 Linux swap / Solaris
/dev/sda3 732 30401 238324275 fd Linux raid autodetect

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System /dev/sdb1 1 487 3911796 fd Linux raid autodetect
/dev/sdb2 488 731 1959930 82 Linux swap / Solaris
/dev/sdb3 732 30401 238324275 fd Linux raid autodetect
Disk /dev/md1: 4005 MB, 4005560320 bytes
2 heads, 4 sectors/track, 977920 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md3: 244.0 GB, 244043939840 bytes
2 heads, 4 sectors/track, 59581040 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md3 doesn't contain a valid partition table
[root@u12345678 ~]#

If the failed drive (in this example, sdb) has partitions listed, these partitions must be deleted. If instead of the above, your output does not have any partitions listed but instead has Disk /dev/sdb doesn't contain a valid partition table, you can skip the next step.
Step 6
To delete the partitions of the failed disk (in this example sdb), we run the fdisk /dev/sdb command. Make sure to specify the failed disk in this command. If at any time you believe you made a mistake, press q and press ENTER to quit with saving changes.
[root@u15376217 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 30401.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help):

Enter the p command and press ENTER to print the partition tables.
Command (m for help): p

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 487 3911796 fd Linux raid autodetect
/dev/sdb2 488 731 1959930 82 Linux swap / Solaris
/dev/sdb3 732 30401 238324275 fd Linux raid autodetect

Command (m for help):

Press d and ENTER to delete a partition and then enter 1 to delete the first partition.
Command (m for help): d
Partition number (1-4): 1

Command (m for help):

Follow the same process for the rest of the partitions.
Command (m for help): d
Partition number (1-4): 2

Command (m for help): d
Selected partition 3

Command (m for help):

Enter p to print the partition table again and ensure that all partitions have been removed.
Command (m for help): p

Disk /dev/sdb: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

Command (m for help):

Press w and ENTER to write and save the partition table.
Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at the next reboot.Error closing file
[root@u12345678 ~]#

Reboot the server to delete the partitions and re-read the partition tables. Use shutdown -r now to reboot the server.
[root@u12345678 ~]# shutdown -r now

Broadcast message from root (pts/0) (Thu Aug 18 15:35:30 2011):

The system is going down for reboot NOW!
[root@u15376217 ~]#
Step 7
Copy the same partition structure from the good drive (sda) to the blank drive (sdb). The command below may potentially wipe the good drive if used incorrectly. Make sure the first drive specified is the functional drive and the second drive specified is the blank drive.
[root@u12345678 ~]# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 30401 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdb1 0+ 486 487- 3911796 fd Linux raid autodetect
/dev/sdb2 487 730 244 1959930 82 Linux swap / Solaris
/dev/sdb3 731 30400 29670 238324275 fd Linux raid autodetect
/dev/sdb4 0 - 0 0 0 Empty
New situation:
Units = sectors of 512 bytes, counting from 0

Device Boot Start End #sectors Id System
/dev/sdb1 63 7823654 7823592 fd Linux raid autodetect
/dev/sdb2 7823655 11743514 3919860 82 Linux swap / Solaris
/dev/sdb3 11743515 488392064 476648550 fd Linux raid autodetect
/dev/sdb4 0 - 0 0 Empty
Warning: no primary partition is marked bootable (active)
This does not matter for LILO, but the DOS MBR will not boot this disk. Successfully wrote the new partition tableRe-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
[root@u12345678 ~]#
Step 8
The partition structure of the new drive matches the partition structure of the drive containing data. Enable the swap partition by running the mkswap followed by the partition that holds the swap.
[root@u12345678 ~]# mkswap /dev/sdb2
Setting up swapspace version 1, size = 2006962 kB
[root@u12345678 ~]#

Run the swapon command for the same partition.
[root@u12345678 ~]# swapon -p 1 /dev/sdb2
[root@u12345678 ~]#
Step 9
Add the partitions of the new drive to the correct arrays. Once the partitions have been added to the array, the data will be copied over to the new drive, rebuilding the array. To find out which partitions should be added to which array, use the cat /etc/mdadm.conf command.
[root@u12345678 ~]# cat /etc/mdadm.conf
DEVICE /dev/sda* /dev/sdb* ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md3 devices=/dev/sda3,/dev/sdb3
[root@u12345678 ~]#
Step 10
Since the sdb drive has been replaced, we will need to add the sdb partitions to the correct arrays. The output from the last step states that /dev/sdb1 should be added to the /dev/md1 array. We will use the command mdadm --manage /dev/md1 --add /dev/sdb1 command.
[root@u12345678 ~]# mdadm --manage /dev/md1 --add /dev/sdb1 mdadm: added /dev/sdb1[root@u12345678 ~]#
Step 11
Check on the RAID status by using the cat /proc/mdstat command. Since the correct partition has been added back to the correct array, the data will begin copying over to the new drive.
[root@u12345678 ~]# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid6] [raid5] [raid4]
md3 : active raid1 sda3[0]
238324160 blocks [2/1] [U_]

md1 : active raid1 sdb1[2] sda1[0]
3911680 blocks [2/1] [U_] [>....................] recovery = 0.7% (28416/3911680) finish=2.2min speed=28416K/secunused devices: <none>
[root@u12345678 ~]#
Step 12
Do the same for the second RAID array (md3) by adding the correct sdb partition to the array using the mdadm --manage /dev/md3 --add /dev/sdb3 command.
[root@u12345678 ~]# mdadm --manage /dev/md3 --add /dev/sdb3 mdadm: added /dev/sdb3[root@u12345678 ~]#
Step 13
Check the RAID array status again using the cat /proc/mdstat command to see that the second RAID array resync is DELAYED and will begin as soon as the first array is finished rebuilding.
[root@u12345678 ~]# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid6] [raid5] [raid4]
md3 : active raid1 sdb3[2] sda3[0]
238324160 blocks [2/1] [U_] resync=DELAYEDmd1 : active raid1 sdb1[2] sda1[0]
3911680 blocks [2/1] [U_] [====>................] recovery = 22.3% (874432/3911680) finish=0.8min speed=62459K/secunused devices: <none>
[root@u12345678 ~]#

GRUB Setup
Step 1
Set up the GRUB bootloader on the hard drive that was replaced to ensure that if the other drive fails in the future, the server can still boot properly from the new (replaced) drive. Check that the md1 array has finished rebuilding by using the cat /proc/mdstat command again.
[root@u12345678 ~]# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid6] [raid5] [raid4]
md3 : active raid1 sdb3[2] sda3[0]
238324160 blocks [2/1] [U_]
[=====>...............] recovery = 25.9% (1013125/3911680) finish=36.8min speed=62459K/sec

md1 : active raid1 sdb1[2] sda1[0] 3911680 blocks [2/2] [UU]unused devices: <none>
[root@u12345678 ~]#
Step 2
Once the md1 array is finished and you see the [2/2] [UU] status, you can now run the grub command. If you were performing the previous steps in rescue mode, boot the server into the normal mode through the 1&1 Recovery Tool before continuing.
[root@u12345678 ~]# grub
Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub>
Step 3
At the GRUB prompt, issue the following commands.
grub> device (hd0) /dev/sda
grub> root (hd0,0)
grub> setup (hd0)
grub> device (hd1) /dev/sdb
grub> root (hd1,0)
grub> setup (hd1)

This will set up GRUB on the first hard drive (sda) and the second hard drive (sdb). GRUB uses its own designation hd0 for sda and hd1 for sdb.