Many moons ago, before Grub2 came about, I looked into booting from a RAID partition. It was possible, but quite difficult. Today it's possible, but only slightly difficult. I can't claim this blog post is 100% accurate, but these are my experiences, and if you got here via a search engine for a particular error, hopefully it will help guide you.
I started from a complete, working Gentoo installation on one hard drive comprised of the recommended 4 partitions - call it hard drive sda. I wanted to migrate this installation to a RAID-1 setup where there were two disks, four RAID partitions (md0, md1, md2, md3), in the same configuration. And, of course, boot from this RAID configuration. The second hard drive, blank, is sdb.
There are several players in the boot process, and you're going to need to cater to every one of them:
- The GRUB Core Image
- The GRUB Configuration
- The Kernel Autodetection
- The Kernel Configuration
Converting The System to RAID-1
You're going to need to do a lot of configuration, and then reboot, and hope it works. If it didn't work, you can safely back out of it, but reconfiguring will be a little tricky.
Follow this excellent tutorial. The gist of it is:
- Replicate the partition structure of sda on sdb
- Make sure the partition types are 0xFD
- Create RAID arrays on each partition, with a missing disk, using the old superblock format. E.G.: mdadm --create /dev/md4 --level=1 --metadata=0.90 --raid-disks=2 missing /dev/sdb4
- Create the correct filesystems (ext4 probably) on your /dev/md devices
- Confirm /proc/mdstat, then mount all your RAID arrays in /mnt or somewhere
- Copy your filesystems across using cp -axu / /mnt/md4 or such
- Update your /etc/fstab
- Don't reboot! You've got more work to do...
Configuring your Kernel
Make sure you've got the appropriate options enabled in your kernel. To be honest, I'm not sure exactly what options you need, but it's a safe bet you're going to need the options for your filesystem of choice (ext4?), hard drive, and RAID option compiled in . (That is, not a module.)
You will also need to cater to a couple of other whims of the kernel, which I mentioned above but did not explain. In more detail:
- The kernel will attempt to auto-assemble RAID arrays when the parition type of 0xFD (or RAID Auto-Assemble). If the partition tpye is not 0xFD - you will have a bad time.
- The kernel is only capable of assembling a RAID array if the superblock is type 0.9, also known as 'really really old' ref. Apparently, letting the kernel auto-asemble your RAID is deprecated ref - but I opted to let it do it anyway to avoid mucking with a initramfs.
GRUB
This part is going to be tricky. This post, which is not a tutorial, was the most helpful piece of information I found. Roughly speaking, what I did was:
Made sure (as much as I could be) that the grub core image had the modules I needed. I believe this is handled by adding GRUB_PRELOAD_MODULES="lvm diskfilter mdraid1x" to the end of /etc/defaut/grub. You should also make sure package.use contains sys-boot/grub:2 device-mapper mount sdl.
Ran (and read) grub2-mkconfig along with grub-probe-verbose and confirmed that it was detecting the current hard drive (sda) as the booting harddrive, when I wanted it to be the /dev/md devices. I copied it to grub2-mkconfig-fiddle and edited the following variables to be constants rather than queries to grub-probe.
GRUB_DEVICE="/dev/md4" #root filesystem GRUB_DEVICE_BOOT="/dev/md2" #boot partition
Generated the config file, and confirmed all the UUIDs by looking at ls /dev/disk/by-uuid and lsblk -o "NAME,UUID". I kept these around in a text file as it's easy to get confused about what UUIDs go where.
Finally ran grub2-install /dev/sdb. (I only ran this on sdb, see below for why).
An interesting item I found was that after you set up your RAID array, and it is by necessity degraded, you will get error messages ref. These error messages are the same ones you will get if your grub core image is missing the modules it needs. grub2-install: warning: Couldn't find physical volume `(null)'. Some modules may be missing from core image How can you tell if the error messages are because your core image is incorrect or the array is degraded? Who knows.
Now at this point, I shut the PC down, and unplugged the SATA cable that went to sda. You'll probably need to use smartctl -i to get the Serial Number of the drives so you know which is which. I didn't have to, but my logic was - if I bork this so much, I'll just replug it in and be back at square one. This was actually quite helpful, as I was able to switch my system between a working and non-working state. I also leaned heavily on a LiveCD while I worked through all the kinks. (While this blog post was presented linearly, my process was anything but.)
Cleanup
After you're successfully booting into your RAID, you'll need to shutdown, connect the second hard drive (sda), and boot. Cross your fingers your machine boots off sdb - if not, try changing the priority in BIOS. Once you're booting into your RAID partitions, add the sda[1-4] partitions to the appropriate RAIDs, change the partition types on sda to 0xFD, and remember to clean up the legacy grub installation on sda with grub2-install /dev/sda.
Finally - what's the point of RAID if it doesn't actually work when a hard drive dies. Shut down your computer and unplug sdb. Boot, and make sure it works. Shutdown, readd sdb, make sure the RAID recovery is done, then repeat for sda.
Other
I ran into a couple of other problems. Initially I didn't set the superblock format correctly the first time. I didn't look too hard, but I couldn't find a way to change the format in-place without wiping the array. So I wiped it, and re-copied it - it didn't pose a big problem to me in my situation.
Another was that after I got into the kernel, it hung at the line "Switched to clocksource tsc". This was, as far as I could tell, not related to the RAID stuff. If you google this, you'll find a ton of people having this problem, including a lot on Ubuntu. This thread solved the problem for me - I needed to enable CONFIG_DEVTMPFS_MOUNT in the kernel.
After I got it booted, something (probably my constant LiveCD-ing) had renamed my RAID arrays into md124, md125, md127, and md4. This annoyed me. I found a promising article>, but alas it doesn't help when the superblock is 0.9 or when you need to rename the root filesystem that is mounted. The answer is actually in this answer under "Option #2" - you need to update the 'Preffered Minor' value.
Finally, during my testing, I pulled the cable to the sdb, and tried booting off just sda. I got a kernel panic in native_smp_send_reschedule - it took me a bit to remember what caused this error, as I had definrtly seen it before in my fiddling. This is the error you may get if your partition types are not set to 0xFD. After I fixed that, I was able to pull either cable and still boot.
After all that, I was (finally) good to go.
required, hidden, gravatared
required, markdown enabled (help)
* item 2
* item 3
are treated like code:
if 1 * 2 < 3:
print "hello, world!"
are treated like code: