Help! I was running apt-get, and now it keeps telling me, Unknown error 530!
If you're running Debian GNU/Linux on a VPS (such as virtuozzo) and you like living on the edge (i. e., run unstable), then you may have been bitten by bug #454638 after the latest libc6 upgrade (glibc 2.7). That is to say, you were happily running apt-get upgrade and all of a sudden, dpkg aborts during the upgrade with Unknown error 530. Subsequently, commands like ls /usr also abort with Unknown error 530.
Why is this happening??
First, an explanation. This bug is caused by the modified Linux 2.6.9 kernel
used for virtualization systems such as virtuozzo, which is incompatible with
the new O_CLOEXEC
bit introduced in later official versions of the
kernel. Starting with 2.7, Glibc started using this bit all over the library
code, and guess what happens when libc sets this bit in the
opendir
syscall, and the kernel misinterprets it to mean something
else? That's right, you get error 530.
So does this mean my system is hosed?
First of all, don't panic!
And most of all, don't attempt to reboot the server! There is hope yet. Sergey Vasilyev has told me that apparently new SSH connections still work, so if you have closed your SSH session to the box, all hope is not lost yet.
As far as I know, the primary way this incompatibility shows up is that
opendir()
will die with Unknown error 530 when
opening anything other than the current working directory. Don't worry, your
filesystem is not hosed! It's just that programs can't read
directories other than the current directory.
BUT, this means that programs that don't try to access a remote directory should still work! This also means that as long as pathnames end in a file, rather than a directory, utilities like cp, mv, and friends, should still work! Furthermore, your current shell should still be dynamically linked to the old version of libc6, which doesn't have this problem. So your shell is still able to reference remote directories. In particular, you can still cd around your filesystem.
How do I rescue my system?
It's not that hard, once you realize that your filesystem is still intact, and that most programs will still run as long as they don't have to scan the contents of a directory other than the current working directory. Unfortunately, this limitation means dpkg, apt-get, and friends don't work anymore. So, our first task is to get the system back into a state where the packaging system is back up again.
So here's what you do:
Download the .deb for the previous version of libc6 that you were running prior to the upgrade. Or, if you're lucky, you may find a copy of an older, working version in /var/cache/apt/archives. This should be some 2.6 version, since the
O_CLOEXEC
stuff was introduced in 2.7. (If you can't find it, you might want my copy of the file: libc6_2.6.1-1+b1_i386.deb. This is only for i386, though, sorry).Go to a temporary directory like /tmp, and do:
ar x libc6_2.6.1-1+b1_i386.deb
This should extract the files data.tar.gz, control.tar.gz, and debian-binary. If you look at data.tar.gz, you'll see that it refers to root directory paths, and contains the library .so's. What you need to do is to replace your system's .so's with these older versions that don't have the
O_CLOEXEC
incompatibility.So, cd / and then tar zxf tmp/data.tar.gz to extract the library .so's into your system. WARNING: this will overwrite existing library files, so you may want to tar ztf ... first and make backup copies of the files that will be overwritten, just in case this step doesn't work.
Now dpkg and friends should be able to work again. But they might complain that your system is in a bad state ('cos you have overwritten files manually). So you should now tell dpkg to properly install the package you just manually "installed":
dpkg -i /tmp/libc6_2.6.1-1+b1_i386.deb
This should fix up the internal package database a bit. Next, apt-get may complain that you have broken dependencies. So you need to download the old versions of libc6-dev (if you have that installed) and locales, and use dpkg -i to install them. Use apt-get -f install to find any other broken dependencies that you may need to fix manually.
This should get your system back up and running. Now you want to make sure apt-get doesn't try to upgrade libc6 into an incompatible version again, so do this:
echo libc6 hold | dpkg --set-selections
This will put libc6 on hold, so that apt-get will not upgrade it unless you tell it to.
OK, the system's back up, but now I can't upgrade anything.
Now, although your system is back up, you can no longer upgrade very many
packages (without breaking your system again), because most of them now depend
on the incompatible new version of libc6. One way to solve this, is to
download the sources of the new libc6, disable the use of the
O_CLOEXEC
bit, and build a custom libc6 package. You'll
probably need to do this on your local PC, since you won't be able to install
some of libc6's build dependencies on your server due to conflicting
dependencies.
WARNING: This is quite complicated, and quite hackish. Follow these steps at your own risk. Note that you'll need LOTS of free RAM, and lots of disk space, in order to build libc6.
First, on a PC that doesn't have this incompatibility, install the following packages: build-essential, dpkg-dev. If you don't have gcc and friends installed yet, now would be a good time to install them. You'll also need the build dependencies of libc6:
apt-get build-dep libc6
Then make a subdirectory under /usr/src, say, /usr/src/libc6, cd into it, and run:
apt-get source libc6
This will download and unpack the libc6 sources in glibc-2.7.
Now, download the following patch file that hacks the O_CLOEXEC
bit away:
Apply this patch:
cd glibc-2.7 patch -p0 < ../cloexec-hack.patch
Hopefully the patch applies cleanly. If not, you should probably not proceed any further (you might end up with an even more broken system if you try to install a broken libc6 package).
Now, build libc6 by typing:
dpkg-buildpackage
This will take a (long) while to build. After it's all done, the parent
directory (i. e., the parent of the glibc-2.7 directory) will
have a bunch of .deb's, including the file for the libc6
package. That's the only file you need right now (and in fact, it's probably a
bad idea to use the hacked libc6-dev since we mangled the meaning of
O_CLOEXEC
to get the library to stop using it).
So, install the hacked libc6 first, using dpkg -i. You may have to force it to do this by adding --force-depends, since it may complain that libc6-dev and a whole bunch of other stuff is not compatible with the new library. After you install the new library, make sure dpkg is still running, and there are no error 530's happening anywhere. Then run apt-get -f install to upgrade the packages that need to be upgraded to work with the new libc6.
If all goes well at this point, you should be able to upgrade the rest of your system again.
Note that you'll have to repeat this process every time you want to upgrade
libc6 again; do not ever install the official version of the
package until the virtuozzo guys switch to a saner kernel, since that will just
bring back all the O_CLOEXEC
woes all over again.
Hope this helps my fellow Linuxers!
Pre-built libc6 packages
To help people running Debian unstable on virtuozzo VPS's, I've decided to make available the custom libc6 packages I built using the above hack. This will help you upgrade the rest of your system without fearing nasty breakage or having to build libc6 yourself (it requires ~2MB of disk space to build on x86—not very nice if your VPS has limited disk space).
DISCLAIMER: these packages are NOT official Debian packages, and I do NOT guarantee that they will not cause more problems on your server. I'm providing these packages just for your convenience; I am not responsible for any kind of damages whatsoever that results from using them. You are responsible for backing up your system files before installing these packages. Use AT YOUR OWN RISK.
- libc6_2.7-5_i386.deb (built 2 Jan 2008, on my home PC, sources obtained from the Debian archive)
- libc6_2.7-8_i386.deb (built 27 Feb 2008, on my virtuozzo VPS server, sources obtained from the Debian archive)
Note: after installing this package, you may need to run apt-get -f install to install other packages that strictly depend on a specific version of libc6, such as libc6-dev.
End notes
- Somebody should seriously lobby the virtuozzo guys to clue up and stop using a broken incompatible kernel.
- The patch for hacking the
O_CLOEXEC
bit away can probably be improved. The current method of disabling it may have unintended consequences, especially if the library ever compares the value ofO_CLOEXEC
. But since this is just a quick hack to workaround the incompatibility, maybe there's no need to improve it. The real solution is to get the virtuozzo guys to stop using a broken kernel. - What exactly does
O_CLOEXEC
do, and why do we need it in the first place?!
Answer: it's just an additional flag available to the open syscall, that indicates that the opened file descriptor should be closed if the current process is replaced by an exec call. This flag was added to the official kernel after virtuozzo "forked" their version of the kernel. It has nothing inherently incompatible with virtuozzo, if it weren't for the fact that the virtuozzo kernel has added its own extension to the open flags, and it so happened thatO_CLOEXEC
was assigned the same bit as one of the virtuozzo kernel's extensions. Hence, glibc thinks that it's asking for the file descriptor to be closed upon exec, but the virtuozzo kernel thinks it's asking for something else (not legal in the contexts that glibc uses it in), and so returns an error. (That's where Unknown error 530 comes from).
Credits
Thanks to Sergey Vasilyev for pointing out that new SSH connections to the box still work, and also to look in /var/cache/apt/archives for older .deb files.