In part 1 I talked about the general idea of the
depcheck test, and part 2 got into some of the messy details. If you'd like a more detailed look at how
depcheck should operate - using some simplified examples of problems we've actually seen in Fedora - you should check out this document and the super-fancy inkscape drawings therein.
Now let's discuss a couple of things that
depcheck (and AutoQA in general) doesn't do yet.
Handling (or Not Handling) File Conflicts
As mentioned previously,
depcheck is not capable of catching file conflicts. It's outside the scope of the test, mostly due to the fact that
yum itself doesn't handle file conflicts. To check for file conflicts,
yum actually just downloads all the packages to be updated and tells RPM to check them. RPM then reads the actual headers contained in the downloaded files and uses its complex, twisty algorithms (including the multilib magic described elsewhere) to decide whether it also thinks this update transaction is OK. This happens completely outside of
yum - only RPM can correctly detect file conflicts.
So if we want to correctly catch file conflicts, we need to make RPM do the work. The obvious solution would be to trick RPM the same way we trick
depcheck - that is, by making RPM think all the available packages in the repos are installed on the system, so it will check the new updates against all existing, available packages.
Unfortunately, it turns out to be significantly harder to lie to RPM about what's installed on the system. All the data that
yum requires in order to simulate having a package installed is in the repo metadata, but the data RPM needs is only available from the packages themselves. So the inescapable conclusion is: right now, to do the job correctly and completely, a test to prevent file conflicts would need to examine all 20,000+ available packages every time it ran.
We could easily have a simpler test that just uses the information available in the
yum repodata, and merely warns package maintainers about possible file conflicts. But turning this on too soon might turn out to do more harm than good: the last thing we want to do is overwhelm maintainers with false positives, and have them start ignoring messages from AutoQA. We want AutoQA to be trustworthy and reliable, and that means making sure it's doing things right, even if that takes a lot longer.
In the meantime, I'm pretty sure
depcheck is correctly catching the problems it's designed to catch. It'll need some testing but soon enough it will be working exactly how we want. Then the question becomes: how do we actually prevent things that are definitely broken from getting into the live repos?
Infrastructure Integration, or: Makin' It Do Stuff
A little bit of background: the
depcheck test is part of the Fedora QA team's effort to automate the Package Update Acceptance Test Plan. This test plan outlines a set of (very basic) tests which we use to decide whether a new update is ready to be tested by the QA team. (Please note that passing the PUATP does not indicate that the update is ready for release - it just means the package is eligible for actual testing.)
So, OK, we have some tests -
rpmguard, and others to come - and they either pass or fail. But what do we do with this information? Obviously we want to pass the test results back to the testers and the Release Engineering (rel-eng) team somehow - so the testers know which packages to ignore, and rel-eng knows which packages are actually acceptable for release. For the moment the simplest solution is to let the
depcheck test provide karma in Bodhi - basically a +1 vote for packages that pass the test and no votes for packages that don't.
Once we're satisfied that
depcheck is operating correctly, and we've got it providing proper karma in Bodhi when updates pass the test, we'll add a little code to Bodhi so it only shows
depcheck-approved updates to rel-eng. They can still choose to push out updates that don't pass
depcheck if necessary, but by default packages that fail depcheck will be ignored (and their maintainers notified of the failure). If the package later has its dependencies satisfied and passes depcheck, the maintainer may be notified that all is well and no action is necessary.
The Glorious Future of QA Infrastructure (pt. 1: Busy Bus)
If you've hung around anyone from the QA or rel-eng or Infrastructure teams for any amount of time, you've probably heard us getting all worked up about The Fedora Messagebus. But for good reason! It's a good idea! And not hard to understand:
The Fedora Messagebus is a service that gets notifications when Things Happen in the Fedora infrastructure, and relays them to anyone who might be listening. For example, we could send out messages when a new build completes, or a new update request is filed, or a new bug is filed, or a test completes, or whatever. These messages will contain some information about the event - package names, bug IDs, test status, etc. (This will also allow you to go to the source to get further information about the event, if you like.) The messagebus will be set up such that anyone who wants to listen for messages can listen for whatever types of messages they are interested in - so we could (for example) have a build-watcher applet that lives in your system tray and notifies you when your builds finish. Or whenever there's a new kernel build. Or whatever!
How does this help QA? Well, it simplifies quite a few things. For example, AutoQA currently runs a bunch of watcher scripts every few minutes, which poll for new builds in Koji, new updates in Bodhi, changes to the repos, new installer images, and so on. Replacing all these
cron-based scripts with a single daemon that listens on the bus and kicks off tests when testable events happen will reduce complexity quite a bit. Second (as mentioned above) we can send messages containing test results when tests finish. This would be simpler (and more secure) than making the test itself log in to Bodhi to provide karma when it completes - Bodhi can just listen for messages about new test results, and mark updates as having passed
depcheck when it sees the right message.
But wait, it gets (arguably) more interesting.
The Glorious Future of QA Infrastructure (pt 2: ResultsDB)
We've also been working on something we call ResultsDB - a centralized, web-accessible database of all the results of all the tests. Right now the test results are all sent by email, to the autoqa-results mail list. But email is just text, and it's kind of a pain to search, or to slice up in interesting views ("show me all the test results for
glibc in Fedora 13", for example).
I said "web-accessible", but we're not going to try to create the One True Centralized Generic Test Result Browser. Every existing Centralized Generic Test Result Browser is ugly and hard to navigate and never seems to be able to show you the really important pieces of info you're looking for - mostly because Every Test Ever is a lot of data, and a Generic Test Result Browser doesn't know the specifics of the test(s) you're interested in. So instead, ResultsDB is just going to hold the data, and for actually checking out test results we plan to have simple, special-purpose frontends to provide specialized views of certain test results.
One example is the israwhidebroken.com prototype. This was a simple, specialized web frontend that shows only the results of a small number of tests (the ones that made up the Rawhide Acceptance Test Suite), split up in a specific way (one page per Rawhide tree, split into a table with rows for each sub-test and columns for each supported system arch).
This is a model we'd like to continue following: start with a test plan (like the Rawhide Acceptance Test Plan), automate as much of it as possible, and have those automated tests report results (which each correspond to one test case in the test plan) to ResultsDB. Once that's working, design a nice web frontend to show you the results of the tests in a way that makes sense to you. Make it pull data from ResultsDB to fill in the boxes, and now you've got your own specialized web frontend that shows you exactly the data you want to see. Excellent!
But How Will This Help With
depcheck And The PUATP?
Right! As mentioned previously, there's actually a whole Package Update Acceptance Test Plan, with other test cases and other tests involved -
depcheck alone isn't the sole deciding factor on whether a new update is broken or not. We want to run a whole bunch of tests, like using
rpmguard to check whether a previously-executable program has suddenly become non-executable, using
rpmlint to make sure there's a valid URL in the package Once an update passes all the tests, we should let Bodhi know that the update is OK. But the tests all run independently - sometimes simultaneously - and they don't know what other tests have run. So how do we decide when the whole test plan is complete?
This is another planned capability for ResultsDB - modeling test plans. In fact, we've set up a way to store test plan metadata in the wiki page, so ResultsDB can read the Test Plan page and know exactly which tests comprise that plan. So when all the tests in the PUATP finish, ResultsDB can send out a message on the bus to indicate "package
martini-2.3 passed PUATP" - and Bodhi can pick up that message and unlock
martini-2.3 for all its eager, thirsty users.
But anyone who has used
rpmlint before might be wondering: how will anyone ever get their package to pass the PUATP when
rpmlint is so picky?
The Wonders of Whitelists and Waivers
This is another planned use for ResultsDB - storing whitelists and waivers. Sometimes there will be test failures that are expected, that we just want to ignore. Some packages might be idiosyncratic and the Packaging Committee might want to grant them exceptions to the normal rules. Rather than changing the test to handle every possible exception - or making the maintainers jump through weird hoops to make their package pass checks that don't apply or don't make sense - we'd like to have one central place to store exceptions to the policies we've set.
If (in the glorious future) we're already using AutoQA to check packages against these policies, and storing the results of those tests in ResultsDB, it makes sense to store the exceptions in the same place. Then when we get a 'failed' result, we can check for a matching exception before we send out a 'failed' message and reject a new update. So we've got a place in the ResultsDB data model to store exceptions, and then the Packaging Committee (FPC) or the Engineering Steering Committee (FESCo) can use that to maintain a whitelist of packages which can skip (or ignore) certain tests.
There have also been quite a few problematic updates where an unexpected change slipped past the maintainer unnoticed, and package maintainers have thus (repeatedly!) asked for automated tests to review their packages for these kinds of things before they go out to the public. Automating the PUATP will handle a lot of that. But: we definitely don't want to require maintainers to get approval from some committee every time something weird happens - like an executable disappearing from a package. (That might have been an intentional change, after all.) We still want to catch suspicious changes - we just want the maintainer to review and approve them before they go out to the repos. So there's another use for exceptions: waivers.
So don't worry: we plan to have a working interface for reviewing and waiving test failures before we ever start using
rpmlint to enforce any kind of policy that affects package maintainers.
The Even-More Glorious Future of QA
A lot of the work we've discussed here is designed to solve specific problems that already exist in Fedora, using detailed (and complex) test plans developed by the QA team and others. But what about letting individual maintainers add their own tests?
This has actually been one of our goals from Day 1. We want to make it easy for packagers and maintainers to have tests run for every build/update of their packages, or to add tests for other things. We're working right now to get the test infrastructure (AutoQA, ResultsDB, the messagebus, and everything else) working properly before we have packagers and maintainers depending on it. The test structure and API are being solidified and documented as we go. We still need to decide where packagers will check in their tests, and how we'll make sure people don't put malicious code in tests (or how we'll handle unintentionally misbehaving tests).
We also want to enable functional testing of packages - including GUI testing and network-based testing. The tests I've been discussing don't require installing the packages or running any of the code therein - we just inspect the package itself for correctness. Actual functional testing - installing the package and running the code - requires the ability to easily create (or find) a clean test system, install the package, run some test code, and then review the results. Obviously this is something people will need to do if they want to run tests on their packages after building them. And this isn't hard to do with all the fancy virtualization technology we have in Fedora - we just need to write the code to make it all work.
These things (and more) will be discussed and designed and developed (in much greater detail) in the coming days and weeks in Fedora QA - if you have some ideas and want to help out (or you have any questions) join the
#fedora-qa IRC channel or the Fedora tester mailing list and ask!
1 This is why some update transactions can fail even after
yum runs its dependency check, declares the update OK, and downloads all the packages.
2 Actually, this test already exists. See the
conflicts test, which is built around a tool called
potential_conflict.py. Note how it's pretty up-front about only catching potential conflicts.
3 Yeah, "PUATP" is a crappy acronym, but we haven't found a better name yet.
4 Although maybe not - it seems really silly to send someone an email to tell them they don't need to do anything. Informed opinions on this matter are welcomed.
5 In fact, AQMP and the
qpid bindings allow you to listen only for messages that match specific properties - so Bodhi could listen only for
depcheck test results that match one of the
-pending updates - it doesn't have to listen to all the messages and filter them out itself. Neat!
6 Some AutoQA tests will test multiple test cases, and thus report multiple test results. Yes, that can be a little confusing.
7 See the instructions here: http://fedoraproject.org/wiki/QA