2022-02-28

Dealing with RAID arrays

Dear Future Self,

 We have come to another letter where we are going to better document something PastSelf thought it knew, but clearly didn't. In this case we are going to start recovering from a RAID array after a reinstall. For reasons we won't get into, PastSelf had to reinstall the home server for the 2nd time this week. [Let us just say that PastSelf is no longer allowed to use sudo without supervision and move on.] In the reinstall, we could not get the /dev/sdb and /dev/sdc RAID array to be fully recognized and realized that we had also made the original ones too small for what we needed [which is what started the whole problem when we tried to grow a partition but forgot that the external backup always becomes /dev/sda for some reason and /dev/sdb was not the RAID drive but the / drive. Live and learn, live and learn.]

Due to some bad signatures we needed to clear the drives of their current data. This was done by booting from a USB stick (which also becomes /dev/sda in this hardware.... wtf?) and clearing each drive of its signatures. 

# wipefs -a /dev/sdb
# wipefs -a /dev/sdc
# wipefs -a /dev/sdd
# cat /proc/mdstat 
Personalities : 
md127 : inactive sdc1[1](S)
      1464851456 blocks super 1.2
       
unused devices: 

  

The above failed because the kernel and boot had tried to make them part of a RAID array /dev/md127 but was not able to sync them. I was also unable to

mdadm --stop /dev/md127
for some reason. At this point, PastSelf further broke his oath of primum non nocere by using dd on each of the disks.
# dd if=/dev/zero of=/dev/sdb bs=1024 count=1000000
# dd if=/dev/zero of=/dev/sdc bs=1024 count=1000000
# dd if=/dev/zero of=/dev/sdd bs=1024 count=1000000
A reboot and going into rescue mode still showed that some signatures were there which I realized was due these disks being formatted with GPT and being much more capable of surviving stupidity. However mdadm --stop now worked so I could use gdisk on the drives. I then reinstalled a minimal Alma8.5 onto the box and then did a manual creation of the RAID array:
# gdisk /dev/sdc
GPT fdisk (gdisk) version 1.0.3

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries.

Command (? for help): n
Partition number (1-128, default 1): 1
First sector (34-3907029134, default = 2048) or {+-}size{KMGTP}: 
Last sector (2048-3907029134, default = 3907029134) or {+-}size{KMGTP}: 
Current type is 'Linux filesystem'
Hex code or GUID (L to show codes, Enter = 8300): fd00
Changed type of partition to 'Linux RAID'

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y

  

At this point we were able to get the system ready for creating the RAID partition.

# mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1 --force
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: /dev/sdc1 appears to be part of a raid array:
       level=raid1 devices=2 ctime=Thu Dec 30 18:54:28 2021
mdadm: size set to 1953381440K
mdadm: automatically enabling write-intent bitmap on large array
Continue creating array? y
mdadm: Fail to create md1 when using /sys/module/md_mod/parameters/new_array, fallback to creation via node
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.
[root@xenadu ~]# cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sdc1[1] sdb1[0]
      1953381440 blocks super 1.2 [2/2] [UU]
      [>....................]  resync =  0.7% (15491456/1953381440) finish=158.1min speed=204199K/sec
      bitmap: 15/15 pages [60KB], 65536KB chunk

unused devices: <none>
# mdadm --detail --scan
ARRAY /dev/md1 metadata=1.2 name=xenadu.int.smoogespace.com:1 UUID=c032f979:e8e4deda:a590ca5d:820a8548
# mdadm --detail --scan > /etc/mdadm.conf
# echo '/dev/md0 /srv xfs defaults 0 0' >> /etc/fstab

Now wait for the sync to be done, and then start the restore from backups... you know the ones that Past-PastSelf made just in case of this situation. Also Future-Self, could you please write up some ansible playbooks to do this from now on? Future-FutureSelf will appreciate it.

Yours Truly, PastSelf

Getting past EL-{8,9}'s limitations with toolbx

Dear Future Self,

One of the biggest issues with dealing with Enterprise Linux 8 (be it Rocky to Red Hat) is the lack of additional packages which you know are in Fedora. Trying to get them into EL-8 turns into a Sisyphean task of moving the boulder of multiple python/go/ruby/etc packages into EL8 only to find that the RPM macros and other software have changed so much in 2 to 3 Fedora releases you can't. Past self spent the weekend trying to get a simple GO package backported and found that he needed to touch at least 175 src.rpms to make this 'work'. That was just too much for trying to get something else working.

Thankfully, EL8 ships with a tool which will allow to get past most of these problems if you meet the following criteria:

  1. The package must not require any kernel feature not shipped in the EL-8 kernel.
  2. You have lots of disk space available to basically install a second OS. 
  3. You can deal with some of the limitations of containers.

The tool which does all this is Container Toolbx which uses podman to create an interactive shell using the runtime space of the OS you want.

$ sudo -i dnf install toolbox
Password:
$ toolbox create --distro fedora --release f35 f35
$ cat /etc/system-release
AlmaLinux release 8.5 (Arctic Sphynx)
$ ls
Ansible-smoogespace/  HUGO/  OLD/  Packages/  RPMS/  SSH-AGENT  Website-smoogespace/  go/  yadm-dotfiles/
$ toolbox enter f35
$ ls
Ansible-smoogespace/  HUGO/  OLD/  Packages/  RPMS/  SSH-AGENT  Website-smoogespace/  go/  yadm-dotfiles/
$ cat /etc/system-release
Fedora release 35 (Thirty Five)
$ sudo -i dnf update
< no password asked >
$ sudo -i dnf install {package I want}
$ {package_command}
  
As can be seen by the example above, toolbx basically puts the container in the home directory in the user but using the userspace of Fedora 35. This allowed me to have some newer commands which allowed for a compiled go package which I couldn't do in EL-8 at the moment. Since go is static, I can then use this package regularly in my EL-8 environment. [I was also able to get past some similar errors in emacs where I had used some package calls from newer emacs which compile elc which works with EL-8 emacs.]

Important!

This is not a cure-all. You are basically downloading basic containers and then using overlays to do updates and other magic to make this work. While it is quite likely possible one could make various daemons (say openvpn) work this way, I also expect that the network hell that comes with containers would make it fragile. However when needing fedpkg or some similar command it is easier to use this than try and port all the other 'packages' that it relies on if you have only a couple of hours free.

Anyway, this is the 2nd time I have had to re-discover this in the last 2 years so I figured I had better write a note to future me in 6 months or a year who has to do this again.

Yours truly, Past Self

2022-02-08

How to Install CentOS Stream 9 Cloud Image

Dear Future Self,

You have probably started to install a CentOS Stream 9 cloud image, and completely forgot all the things you learned this time around. No worries, past-self is going to write these down for your usage. 

First off, download the image you want. On the day we are writing this, the latest image is http://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-9-20220207.0.x86_64.qcow2 but it will most likely be something much newer. They don't put a 'latest' in the directory, so open a browser, search for qcow2, and then instead of searching through 4000 entries from 2021-08-30, press the up-arrow and jump to the last entry on the web-page.

Next, we need to use virt-install to get the image imported to where virtual manager will use it. Older versions of CentOS had a default user, but CentOS Stream 9 relies on cloud-init in order set up the root user and password. This is done via the virt-install command IF you have a virt-install after version 3, so need to look at different command for EL8 and Ubuntu 18.

$ sudo virt-install --name guest-cs9 --memory 2048 \
  --vcpus 2 --disk ./CentOS-Stream-GenericCloud-9-20220207.0.x86_64.qcow2 \
  --import --os-type Linux --os-variant centos-stream9 \
  --network default --console pty,target_type=serial --graphics vnc \
  --cloud-init root-password-generate=on,disable=on,ssh-key=/home/ssmoogen/.ssh/id_ecdsa.pub
  

You can add more cloud init options by creating data-files for meta and user-data. Go to the cloud-init site for that.

Alternative method (ok the one most likely used).

In the case of trying to do this on EL8 or earlier Ubuntu editions, you will need to use the virt-customize command instead. First we have to make sure it is installed.

For Ubuntu:
  $ sudo apt install libguestfs-tools
  
For EL based distros:
  $ sudo dnf install guestfs-tools
  

The virt-customize command is meant to alter a non-running image. If you use it on a running one, you will probably have a very dead box afterwards. YOU HAVE BEEN WARNED.

 
$ virt-customize -v --uninstall cloud-init --selinux-relabel \
  -a CentOS-Stream-GenericCloud-9-20220207.0.x86_64.qcow2 \
  --ssh-inject root:file:/home/ssmoogen/.ssh/id_ecdsa.pub \
  --root-password random |& tee CSG.out
$ grep 'virt-customize.password' CSG.out # for the random set password.
  
Depending on the OS this may or may not need to be run with sudo. Check the CSG.out file for any extra errors. We sent both standard out and standard error there to make sure it all got captured. After this is done, you can import as working image with:
$ sudo virt-install --name guest-cs9 --memory 2048 \
  --vcpus 2 --disk ./CentOS-Stream-GenericCloud-9-20220207.0.x86_64.qcow2 \
  --import --os-type Linux --os-variant centos-stream9 \
  --network default --console pty,target_type=serial --graphics vnc
  

And with that, past self is done recording what is needed to be done.