Friday, February 15, 2013

Collector's Cards: rpm2cpio

Once upon a time, at a job far, far away, we used to refer to bugs as "collectors cards".  Here's an example of why...

It all started innocently enough.  I wanted to crack open a RPM to inspect the contents without actually installing the RPM on a system.  The way I normally do this is using rpm2cpio:
rpm2cpio <RPM> | cpio -id
This takes the RPM payload -- which is in CPIO format -- and dumps it to standard output for the cpio utility to extract.  Then you can go spelunking through the extracted files to see whatever you might be looking for.  (This is also a great rabbit to have in your sysadmin hat for recovering from any number of systems failure scenarios, BTW.)

This simple command normally works great.  That is, until I tried it on a CentOS 6 RPM on the CentoOS 5 system that still manages our internal mirrored content.

When I tried it this time, I consistently got:
cpio: premature end of archive
It didn't matter if I was working on the streamed output (thinking a read error may have caused a failure that was silently eaten by the act of streaming the output into a pipe) or on a file that I piped the output to.  The rpm2cpio extraction seemed to run fine, it's just what was supposed to be cpio content was not decipherable:

$ file output.cpio
output.cpio: data

Taking advantage of a bit of knowledge of the mechanisms behind RPM's payload handling, I deduced that the archive was compressed by something that rpm2cpio was not handling correctly, as I tried the usual suspects: gzip, bzip2, uncompress, zip, with no success.  The file was not identifiable by file, either, but had this in its header:

$ od -a output.cpio | head -1
0000000 } 7 z X Z nul nul nl a { ff ! stx nul ! soh
Hmmm..  "7z" "XZ".  I've heard of compression algorithm 7-zip, and I remember something about "xz" compression being more rsync friendly, and remember talk of RPM using that compression format to make Fedora content more efficient more efficient to mirror.

That got me on the right path, and sure enough, there is a bug (602423) in Red Hat's bugzilla on this very issue, along with a pointer to the unxz command that I had not had a need to use before:
$ cat output.cpio | unxz > output
$ file output
output: ASCII cpio archive (SVR4 with no CRC)
Ahh, there we are, finally the output I was after.

So there are multiple failures at play here:
  1. The file command does not understand how to identify the data compressed with the xz format.
  2. The rpm2cpio command only understands how to handle gzip and bzip2 compressed content.
Both of these are understandable for newly developed code; the ecosystem needs time to catch up with new development work.  The file lag is even more understandable since it is a separate package altogether, and the database that it works from needs to be updated.

The reason this is a "collector's card":  This issue was first reported in the middle of 2010.  It is now 2013, almost 2 1/2 years later.  Support for xz compressed payloads for RPM was added during the Fedora 12 release cycle, which is what served as the basis for RHEL 6.  You're honestly telling me that at no time in the past 2 1/2 years Red Hat could not have released an updated version of RPM on RHEL 5 to one that understood xz compressed payloads?

Here is my prediction of how this bug is going to play out:

This bug will be ignored until RHEL 5 reaches the end of one of its Production cycles that dictates that no further updates will be shipped at that stage of the product's lifecycle.  

If this is deemed an "enhancement" rather than a "bug fix", then that milestone has already passed on Jan. 8 2013.  I highly doubt this will be classified as an "Urgent Priority Bug Fix" worthy of an errata, so the window has likely already closed.

Why does this rub me the wrong way?  Mainly because this has become the modus operandi for how far too many RHEL bugs are "resolved":  Let them fester in bugzilla for a few years, until the time window for dealing with them has passed, and then close them as "too late to fix it now".

"But you can't really expect Red Hat to ship support for new features on old systems!" you say.  

This is an interesting point to address.  Red Hat did change RPM mid-release several years ago, during RHL 6 (no "E") when they updated from RPM 3 to RPM 4.  This created all kinds of challenges when building software during the second half of that product's lifetime.  You had to update any newly installed systems to the RPM 4 binaries before you could install any custom-built software from your own repositories.  I think even released errata had this issue -- you had to update to RPM 4 before you could fully update the system.  Nasty stuff!

This is not quite the same situation, as it is an update that accommodates a new payload compression format, rather than a new RPM header structure.  But is it really that unreasonable to ask that RHEL N-1 be able to understand RHEL N's RPM package format?  Or that support tools like rpm2cpio be able to, if for no other justification than it makes mirror management easier and keeps system recovery options open.

No comments:

Post a Comment