2022-02-28

Dealing with RAID arrays

Dear Future Self,

 We have come to another letter where we are going to better document something PastSelf thought it knew, but clearly didn't. In this case we are going to start recovering from a RAID array after a reinstall. For reasons we won't get into, PastSelf had to reinstall the home server for the 2nd time this week. [Let us just say that PastSelf is no longer allowed to use sudo without supervision and move on.] In the reinstall, we could not get the /dev/sdb and /dev/sdc RAID array to be fully recognized and realized that we had also made the original ones too small for what we needed [which is what started the whole problem when we tried to grow a partition but forgot that the external backup always becomes /dev/sda for some reason and /dev/sdb was not the RAID drive but the / drive. Live and learn, live and learn.]

Due to some bad signatures we needed to clear the drives of their current data. This was done by booting from a USB stick (which also becomes /dev/sda in this hardware.... wtf?) and clearing each drive of its signatures. 

# wipefs -a /dev/sdb
# wipefs -a /dev/sdc
# wipefs -a /dev/sdd
# cat /proc/mdstat 
Personalities : 
md127 : inactive sdc1[1](S)
      1464851456 blocks super 1.2
       
unused devices: 

  

The above failed because the kernel and boot had tried to make them part of a RAID array /dev/md127 but was not able to sync them. I was also unable to

mdadm --stop /dev/md127
for some reason. At this point, PastSelf further broke his oath of primum non nocere by using dd on each of the disks.
# dd if=/dev/zero of=/dev/sdb bs=1024 count=1000000
# dd if=/dev/zero of=/dev/sdc bs=1024 count=1000000
# dd if=/dev/zero of=/dev/sdd bs=1024 count=1000000
A reboot and going into rescue mode still showed that some signatures were there which I realized was due these disks being formatted with GPT and being much more capable of surviving stupidity. However mdadm --stop now worked so I could use gdisk on the drives. I then reinstalled a minimal Alma8.5 onto the box and then did a manual creation of the RAID array:
# gdisk /dev/sdc
GPT fdisk (gdisk) version 1.0.3

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries.

Command (? for help): n
Partition number (1-128, default 1): 1
First sector (34-3907029134, default = 2048) or {+-}size{KMGTP}: 
Last sector (2048-3907029134, default = 3907029134) or {+-}size{KMGTP}: 
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): fd00
Changed type of partition to 'Linux RAID'

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y

  

At this point we were able to get the system ready for creating the RAID partition.

# mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1 --force
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: /dev/sdc1 appears to be part of a raid array:
       level=raid1 devices=2 ctime=Thu Dec 30 18:54:28 2021
mdadm: size set to 1953381440K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? y
mdadm: Fail to create md1 when using /sys/module/md_mod/parameters/new_array, fallback to creation via node
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
[root@xenadu ~]# cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sdc1[1] sdb1[0]
      1953381440 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  0.7% (15491456/1953381440) finish=158.1min speed=204199K/sec
      bitmap: 15/15 pages [60KB], 65536KB chunk

unused devices: <none>
# mdadm --detail --scan
ARRAY /dev/md1 metadata=1.2 name=xenadu.int.smoogespace.com:1 UUID=c032f979:e8e4deda:a590ca5d:820a8548
# mdadm --detail --scan > /etc/mdadm.conf
# echo '/dev/md0 /srv xfs defaults 0 0' >> /etc/fstab

Now wait for the sync to be done, and then start the restore from backups... you know the ones that Past-PastSelf made just in case of this situation. Also Future-Self, could you please write up some ansible playbooks to do this from now on? Future-FutureSelf will appreciate it.

Yours Truly, PastSelf

No comments: