atp

Atp's external memory

btrfs recovery again

An update to the previous article on btrfs recovery.

Generally the recovery for btrfs looks like this;

Step 1 - Check your backups - beyond the first few recovery steps its likely that you're going to be losing data so you should decide if you want to just cut to the chase and restore from backups.. 

Step 2 -Try a normal mount and look at dmesg.

mount -t btrfs /dev/sdc /export/btrfs

If it didn't mount, do you have a missing disk? if so mount with the degraded option

mount -o degraded /dev/sdc /export/btrfs

btrfs sometimes doesn't deal well with a dead disk on boot.

If you see a message like;

BTRFS: couldn't mount because of unsupported optional features (40).

in your dmesg then you've accidentally booted into an older kernel. If you're running btrfs then you'll be on a recent kernel. Assuming you've got a shred of sense.

If the file system has mounted at this point move on to step 3 - scrub.

If you see lots of transid messages in the dmesg log, panic mildly - its likely that the next step will fix them.

If the file system is still not mounted check the obvious things - mount point exists, kernel version, the device name you're using actually exists and is part of the btrfs volume.

btrfs fi show should give you something like this.

$btrfs fi show
Label: archive  uuid: da455304-d586-4213-97a1-beee2deac8bc
    Total devices 2 FS bytes used 424.80GiB
    devid    1 size 465.76GiB used 465.76GiB path /dev/sdb
    devid    2 size 465.76GiB used 465.76GiB path /dev/sdc

Btrfs v3.12

if that doesn't work do a btrfs device scan

$ btrfs device scan
Scanning for Btrfs filesystems

And try again. If the disks don't show, you're out of luck at this point. Check the hardware.

Assuming you have the disks visible and all other things checked out, try a recovery mount.

$ mount -t btrfs -o recovery,nospace_cache /dev/sdc /export/btrfs

If you still don't get the disks mounted then restore from backup if you have them. Or dig elsewhere on the internet. There are plenty of other pages like this. In desperation, you may want to try the btrfsck --repair command in step 3. 

If you see 

BTRFS: failed to read log tree
BTRFS: open_ctree failed

then try zeroing the log. Its gone anyhow, so its likely you've lost data;

# btrfs-zero-log /dev/vg_os/backups 
parent transid verify failed on 13180840902656 wanted 128823 found 128592
parent transid verify failed on 13180840902656 wanted 128823 found 128592
parent transid verify failed on 13180840902656 wanted 128823 found 128592
parent transid verify failed on 13180840902656 wanted 128823 found 128592
Ignoring transid failure

Then

# mount -t btrfs -o recovery,nospace_cache /dev/vg_os/backups /mnt/backups

Assuming you got the filesystem mounted by some combination of the above;

Step 2 - run a scrub.

btrfs scrub start /export/btrfs

you can monitor it with

btrfs scrub status /export/btrfs

This should go through and clean up checksum errors and correct anything correctable.

I've found this deals with the majority of 'soft' errors that look so scary in dmesg.

However, you may come across uncorrectable errors. They look like this

scrub status for 66bb5f88-2c63-4d8a-83a4-e9b606571d1f
    scrub started at Sat Jan  3 11:19:13 2015, running for 17640 seconds
    total bytes scrubbed: 5.03TiB with 16 errors
    error details: csum=16
    corrected errors: 0, uncorrectable errors: 16, unverified errors: 0

with dmesg errors that may look like;

[12656.283626] BTRFS: bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 12, gen 0
[12656.284355] BTRFS: unable to fixup (regular) error at logical 9415517843456
on dev /dev/sdd
[12656.421257] BTRFS: checksum error at logical 9415517847552 on dev /dev/sdd,
sector 2587456608, root 5, inode 7383923, offset 4325376,
 length 4096, links 1 (path: mac_timemachine/
MacBook Pro.sparsebundle/bands/6262)
[12656.421273] BTRFS: bdev /dev/sdd errs: wr 0, rd 0, flush 0, corrupt 13, gen 0

(linebreaks added for readability)

After one of my "normal" btrfs failures I tend to get a lot of corrected errors in the scrub status. The uncorrectable ones are the problematic ones.

If you get these try a recovery mount as listed above and a second scrub. 

Failing that

Step 3 - run btrfsck --repair

At this point you've pretty much given up hope. It may work, or it may not. It worked for me once. But not all my files came back. I hope you have backups.

btrfsck --repair /dev/sdc

Before you try that command, read this page;

https://btrfs.wiki.kernel.org/index.php/Btrfsck

It lists all the things you should do first.

Here's what it looks like;

Fixed 0 roots.
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
enabling repair mode
Checking filesystem on /dev/sdc
UUID: 66bb5f88-2c63-4d8a-83a4-e9b606571d1f
cache and super generation don't match, space cache will be invalidated
found 1755560853731 bytes used err is 0
total csum bytes: 2955891292
total tree bytes: 13966262272
total fs tree bytes: 10553098240
total extent tree bytes: 266878976
btree space waste bytes: 1691246361
file data blocks allocated: 3055467118592
 referenced 3025937235968
Btrfs v3.17

This page may get updated as I encounter more. It probably should be promoted from a blog post to a page in and of itself. I stripped out the sarcasm of the previous article as I'm less annoyed now so consider that one obsolete.

Written by atp

Monday 13 April 2015 at 1:38 pm

Posted in Default

Leave a Reply