qemu-img shipped with fedora corrupts qcow2 disk images

Up to and including F38 there were no problems with using qemu-img to compress disk images.

On both F29 and F30 using qemu-img to compress a qcow2 disk image results in severe virtual disk corruption, visable as virtual machines dying a slow death due to IO errors and ‘qemu-img check’ on the disk image reporting lots of errors.

How to repeat:

  1. take a qcow2 disk image that ‘qemu-img check’ reports has no issues (‘qemu-img check diskname.qcow2’ to ensure no errors)
  2. use qemu-img to create a re-compressed image of that virtual disk (‘qemu-img convert -c -f qcow2 -O qcow2 diskname.qcow2 diskname_compressed.qcow2 -p’)
  3. and check if the new qcow2 file is corrupt (‘qemu-img check diskname_compressed.qcow2’)
  4. result: the new qcow2 virtual disk is unusable

Unfortunately disk compression is essential, that is the reason people use qco2 format disk images. One reason is massive disk space savings, from my own use a 100Gb virtual disk after compression only uses 26Gb of disk space.

Another reason compression is needed is that qcow2 images grow, for example the 100Gb image refered to above is only a 70Gb disk image. qcow2 disks are ‘copy on write’ (thats the ‘cow’ part of the qcow2 name) which simply means that if you edit or replace a 1Gb file an additional 1Gb of space is used by the qcow2 disk image on the host filesystem. Over time this means the qcow2 virtual disk image can consume much more space on the host filesystem than the actual disk size allocated to the qcow2 disk; so a 70Gb qcow2 disk can use 100Gb of host filesystem space easily. Compression to reclaim space is important.

I found the options to try to repair a corrupt qcow2 disk at http://www.geekpills.com/virtulization/repair-qcow2-disk; after running the repair against the corrupt disk image ‘qemu-img check’ reported no errors in the image. However using the ‘repaired’ disk image resulted in the virtual machine using the disk image reporting IO errors within a few hours of starting the VM even though ‘qemu-img check’ could find no further errors in the virtual disk.
Yes, filesystem checks were run by the guest OS against the disk on OS boot before filesystems were mounted (using the kernel boot parameter I posted on earlier) plus on OS boot from the /forcefsck file, no errors found in the filesystems so it is the disk structure that is corrupted using qemu-img to compress a qcow2 file.

Workaround

The only workaround I could find to reclaim space was to copy the image between formats without compression as below

  1. take a qcow2 disk image that ‘qemu-img check’ reports has no issues (‘qemu-img check diskname.qcow2’ to ensure no errors)
  2. convert the qcow2 file to a RAW file format (‘qemu-img convert -f qcow2 -O raw diskname.qcow2 diskname_compressed.raw -p’)
  3. convert the RAW file back to qcow2 format (‘qemu-img convert -f raw -O qcow2 diskname.raw disknamenew.qcow2 -p’)
  4. check the new qcow2 file to ensure there are no errors (‘qemu-img check disknamenew.qcow2’)
  5. and the file should have no errors, I have been running a VM with a qcow2 image shrunk this way without any IO errors for 4hrs without any IO errors (they normally occur within a few hours with a corrupt image)

The only issue with this approach of course is that there is no compression, you can only reclaim space used by the copy-only portion of the disk image so this approach will only reclaim space if you have been working with some huge files. Below is what I achieved using this method; I shaved off around 50Gb off a disk image (compression saved around 75Gb but as noted since F29 onward compressed disks are corrupted and unusable).

[root@vmhost1 VM_disks]# ls -ltr *osprey*
-rw-r--r--. 1 qemu qemu 103274446848 Nov 17 14:53 osprey_good.qcow2
-rw-r--r--. 1 qemu qemu 107374182400 Nov 18 11:48 osprey.raw
-rw-r--r--. 1 qemu qemu  47915925504 Nov 18 19:43 osprey.qcow2                     
-rw-r--r--. 1 root root  26832449024 Nov 18 19:36 osprey_compressed.qcow2          # corrupted
[root@vmhost1 VM_disks]# 

Update 18 Nov 2019

The Fedora 31 repositories have been updated with a new version of the qemu-img package.

  Upgrading        : qemu-img-2:4.1.0-6.fc31.x86_64      # new version is even worse
  Cleanup          : qemu-img-2:4.1.0-5.fc31.x86_64      # definately buggy
  Running scriptlet: qemu-img-2:4.1.0-5.fc31.x86_64   
  Verifying        : qemu-img-2:4.1.0-6.fc31.x86_64  
  Verifying        : qemu-img-2:4.1.0-5.fc31.x86_64  

Upgraded:
  qemu-img-2:4.1.0-6.fc31.x86_64                                                                                                            

The new version of qemu-img creates a compressed qcow2 file that ‘qemu-img check’ reports no errors on the new disk. However it is worse than the version it replaced, compressed disks output by the latest version start causing IO errors almost immediately, and after the first boot it is impossible to boot off the disk again. The major change in the new version seems to be that ‘qemu-img check’ has been changed to report no errors when the disk image is a mess.

So it is still no longer possible to compress qcow2 disk images using qemu-img. My workaround is the only way I can manage my files for now.

I will investigate how this utility works under CentOS7 when I can get a spare physical machine (physical as creating a VM with a 300Gb virtual disk just to try to compress a 100Gb virtual disk is just a waste of space).

About mark

At work, been working on Tandems for around 30yrs (programming + sysadmin), plus AIX and Solaris sysadmin also thrown in during the last 20yrs; also about 5yrs on MVS (mainly operations and automation but also smp/e work). At home I have been using linux for decades. Programming background is commercially in TAL/COBOL/SCOBOL/C(Tandem); 370 assembler(MVS); C, perl and shell scripting in *nix; and Microsoft Macro Assembler(windows).
This entry was posted in Unix. Bookmark the permalink.