Will Woods, Fedora Testing Guy (qa_rockstar) wrote,
Will Woods, Fedora Testing Guy

depcheck: the why and how (part 1)

From the very beginning, one of the big goals of the AutoQA project was to set up an automated test that would keep broken updates out of the repos. People have been asking for something like this for years now, but nobody's managed to actually make it work. It turns out this is because it's actually really hard.

But after a year (maybe two years?) of work on AutoQA we finally have such a test. It's called depcheck and it's very nearly complete, and should be running on all newly-created package updates very, very soon.

There's a lot of interest in this subject among Fedora developers (and users!) and there have been a lot of discussions over the years. And there will probably be a lot of questions like: "Will it keep [some specific problem] from happening again?" But since it's a really complicated problem (did I mention how it's taken a couple of years?) it's not easy to explain how the test works - and what it can (and can't) do - without a good deal of background on the dependency checking process, and how it can go wrong. So let's start with:

A Rough Definition of the Problem

Normally, when you update your system, yum[1] downloads all the available updates - packages that are newer versions of the ones on your system - and tries to install them.

Sometimes a new update will appear in the update repos that - for some reason - cannot be installed. Usually there will be a set of messages like this, if you're using yum on the commandline:

Setting up Update Process
Resolving Dependencies
--> Running transaction check
--> Processing Dependency: libedataserverui-1.2.so.10()(64bit) for package: gnome-panel-2.31.90-4.fc14.x86_64
---> Package evolution-data-server.x86_64 0:2.32.0-1.fc14 set to be updated
---> Package nautilus-sendto.x86_64 1:2.32.0-1.fc14 set to be updated
--> Finished Dependency Resolution
Error: Package: gnome-panel-2.31.90-4.fc14.x86_64 (@updates-testing)
           Requires: libedataserverui-1.2.so.10()(64bit)
           Removing: evolution-data-server-2.31.5-1.fc14.x86_64 (@fedora/$releasever)
           Updated By: evolution-data-server-2.32.0-1.fc14.x86_64 (updates-testing)
               Not found
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

What's happened here is that Fedora package maintainers have accidentally pushed out an update which has unresolved dependencies - that is, the RPM says that it requires some certain thing to function, but that other thing is not available. And so - rather than installing a (possibly broken) update - yum gives up.

So the problem to solve is this: how can we check proposed updates - before they hit the update repos - to make sure that they don't have (or cause[2]) unresolved dependencies?

Aside: An Oversimplified Summary of How Dependencies Work

RPM packages contain a lot more than files. They also contain scripts (to help install/uninstall the package) and data about the package, including dependency info. This mainly takes the form of four types of headers: Provides, Requires, Conflicts, and Obsoletes.

Provides headers list all the things that the package provides - including files, library names, abstract capabilities (e.g. httpd - the package for the Apache webserver - has a "Provides: webserver" header) and the like.

Requires headers list all the things that the package requires - which must match the Provides headers as described above. The majority of these headers list the libraries that a given program requires to function properly - such as gnome-panel requiring libedataserverui-1.2.so.10 in the example above.

Conflicts headers list packages that conflict with this package - which means this package cannot be installed on a system that has the conflicting package already installed, and trying to install it there will cause an error.

Finally, Obsoletes headers list packages that this one obsoletes - that is, this package can safely replace the other package (which should then be removed).

Collectively, this data is sometimes called PRCO data. When yum is "downloading metadata", this is the data it's downloading - a list of what packages require what things, and what packages provide those things, and so on. And that's how yum figures out what it needs to complete the update and ensure that your system keeps working - and when the new Requires don't match up with existing Provides, that's when you see the dreaded unresolved dependencies.

An Overly Simple (And Therefore Useless) Proposed Solution, With A Discussion Of Its Shortcomings

"Well then let's check all the Requires in proposed updates and make sure there's a matching Provides in some package in the repos!"

Unfortunately, dependency resolution is a bit more complicated than this. First, just checking every package in every repo doesn't quite work - you need to only look at the newest version of each package. Second, you need to take Conflicts and Obsoletes headers into account - ignoring packages that have been obsoleted, for instance. Oh also: you need to watch out for multilib packages - which is a special kind of black magic that nobody seems to fully understand - and, well, it's all kind of complicated. If only there was already some existing code that handled this..

..and there is! Yum itself does all this when it's installing updates. And if we want to be sure that yum will accept proposed updates, it makes sense to use the same code for the test as we do for the actual installation. So:

A Slightly More Concrete Proposed Solution To The Problem

"Let's trick yum into simulating the installation of each proposed update and use its algorithms to determine whether the updates are installable!"

This, as it turns out, is not that hard to do. Yum is designed in such a way that it can use the repo metadata as if it were actually the local RPM database - which nicely simulates having all available packages installed on the local system. We can then ask yum to run just the dependency solving step of the package update process and see if that turns out OK.

If that works, the update(s) we're testing must have consistent, solvable dependencies, and are safe to push to the repos. Otherwise we have problems, and the proposed update should be sent back to the maintainers for review and fixing.

That's the general idea, anyway - and for simple cases, it works just fine! But there are over 10,000 packages in Fedora and some of them are.. less than simple. Sometimes there are interdependent updates - two (or more!) updates that require each other to function - and testing them individually would fail, but testing them together would work. Furthermore, what about the scary multilib black magic? How do we make sure we're handling that properly?

I'll discuss these issues (and our solutions) further in Part 2.

1 PackageKit and friends still use yum behind the scenes, so all this information still applies no matter what update system UI you use.
2 A new update can cause problems with other packages by obsoleting/conflicting with things they require - more on this later.


  • depcheck: the why and the how (part 3)

    In part 1 I talked about the general idea of the depcheck test, and part 2 got into some of the messy details. If you'd like a more detailed look…

  • depcheck: the why and the how (part 2)

    In part 1 I discussed the general idea of the depcheck test: use yum to simulate installing proposed updates, to be sure that they don't have any…

  • A helpful git config snippet

    So, if you're like me, you like to poke through the source of things from time to time but you always forget what the proper URL is for, say, GNOME…

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded