|depcheck: the why and the how (part 3)
||[Nov. 4th, 2010|10:45 am]
Will Woods, Fedora Testing Guy
In part 1 I talked about the general idea of the
depcheck test, and part 2 got into some of the messy details. If you'd like a more detailed look at how
depcheck should operate - using some simplified examples of problems we've actually seen in Fedora - you should check out this document and the super-fancy inkscape drawings therein.
Now let's discuss a couple of things that
depcheck (and AutoQA in general) doesn't do yet.
Handling (or Not Handling) File Conflicts
As mentioned previously,
depcheck is not capable of catching file conflicts. It's outside the scope of the test, mostly due to the fact that
yum itself doesn't handle file conflicts. To check for file conflicts,
yum actually just downloads all the packages to be updated and tells RPM to check them. RPM then reads the actual headers contained in the downloaded files and uses its complex, twisty algorithms (including the multilib magic described elsewhere) to decide whether it also thinks this update transaction is OK. This happens completely outside of
yum - only RPM can correctly detect file conflicts.
So if we want to correctly catch file conflicts, we need to make RPM do the work. The obvious solution would be to trick RPM the same way we trick
depcheck - that is, by making RPM think all the available packages in the repos are installed on the system, so it will check the new updates against all existing, available packages.
Unfortunately, it turns out to be significantly harder to lie to RPM about what's installed on the system. All the data that
yum requires in order to simulate having a package installed is in the repo metadata, but the data RPM needs is only available from the packages themselves. So the inescapable conclusion is: right now, to do the job correctly and completely, a test to prevent file conflicts would need to examine all 20,000+ available packages every time it ran.
We could easily have a simpler test that just uses the information available in the
yum repodata, and merely warns package maintainers about possible file conflicts. But turning this on too soon might turn out to do more harm than good: the last thing we want to do is overwhelm maintainers with false positives, and have them start ignoring messages from AutoQA. We want AutoQA to be trustworthy and reliable, and that means making sure it's doing things right, even if that takes a lot longer.
In the meantime, I'm pretty sure
depcheck is correctly catching the problems it's designed to catch. It'll need some testing but soon enough it will be working exactly how we want. Then the question becomes: how do we actually prevent things that are definitely broken from getting into the live repos?
Infrastructure Integration, or: Makin' It Do Stuff
A little bit of background: the
depcheck test is part of the Fedora QA team's effort to automate the Package Update Acceptance Test Plan. This test plan outlines a set of (very basic) tests which we use to decide whether a new update is ready to be tested by the QA team. (Please note that passing the PUATP does not indicate that the update is ready for release - it just means the package is eligible for actual testing.)
So, OK, we have some tests -
rpmguard, and others to come - and they either pass or fail. But what do we do with this information? Obviously we want to pass the test results back to the testers and the Release Engineering (rel-eng) team somehow - so the testers know which packages to ignore, and rel-eng knows which packages are actually acceptable for release. For the moment the simplest solution is to let the
depcheck test provide karma in Bodhi - basically a +1 vote for packages that pass the test and no votes for packages that don't.
Once we're satisfied that
depcheck is operating correctly, and we've got it providing proper karma in Bodhi when updates pass the test, we'll add a little code to Bodhi so it only shows
depcheck-approved updates to rel-eng. They can still choose to push out updates that don't pass
depcheck if necessary, but by default packages that fail depcheck will be ignored (and their maintainers notified of the failure). If the package later has its dependencies satisfied and passes depcheck, the maintainer may be notified that all is well and no action is necessary.
The Glorious Future of QA Infrastructure (pt. 1: Busy Bus)
If you've hung around anyone from the QA or rel-eng or Infrastructure teams for any amount of time, you've probably heard us getting all worked up about The Fedora Messagebus. But for good reason! It's a good idea! And not hard to understand:
The Fedora Messagebus is a service that gets notifications when Things Happen in the Fedora infrastructure, and relays them to anyone who might be listening. For example, we could send out messages when a new build completes, or a new update request is filed, or a new bug is filed, or a test completes, or whatever. These messages will contain some information about the event - package names, bug IDs, test status, etc. (This will also allow you to go to the source to get further information about the event, if you like.) The messagebus will be set up such that anyone who wants to listen for messages can listen for whatever types of messages they are interested in - so we could (for example) have a build-watcher applet that lives in your system tray and notifies you when your builds finish. Or whenever there's a new kernel build. Or whatever!
How does this help QA? Well, it simplifies quite a few things. For example, AutoQA currently runs a bunch of watcher scripts every few minutes, which poll for new builds in Koji, new updates in Bodhi, changes to the repos, new installer images, and so on. Replacing all these
cron-based scripts with a single daemon that listens on the bus and kicks off tests when testable events happen will reduce complexity quite a bit. Second (as mentioned above) we can send messages containing test results when tests finish. This would be simpler (and more secure) than making the test itself log in to Bodhi to provide karma when it completes - Bodhi can just listen for messages about new test results, and mark updates as having passed
depcheck when it sees the right message.
But wait, it gets (arguably) more interesting.
The Glorious Future of QA Infrastructure (pt 2: ResultsDB)
We've also been working on something we call ResultsDB - a centralized, web-accessible database of all the results of all the tests. Right now the test results are all sent by email, to the autoqa-results mail list. But email is just text, and it's kind of a pain to search, or to slice up in interesting views ("show me all the test results for
glibc in Fedora 13", for example).
I said "web-accessible", but we're not going to try to create the One True Centralized Generic Test Result Browser. Every existing Centralized Generic Test Result Browser is ugly and hard to navigate and never seems to be able to show you the really important pieces of info you're looking for - mostly because Every Test Ever is a lot of data, and a Generic Test Result Browser doesn't know the specifics of the test(s) you're interested in. So instead, ResultsDB is just going to hold the data, and for actually checking out test results we plan to have simple, special-purpose frontends to provide specialized views of certain test results.
One example is the israwhidebroken.com prototype. This was a simple, specialized web frontend that shows only the results of a small number of tests (the ones that made up the Rawhide Acceptance Test Suite), split up in a specific way (one page per Rawhide tree, split into a table with rows for each sub-test and columns for each supported system arch).
This is a model we'd like to continue following: start with a test plan (like the Rawhide Acceptance Test Plan), automate as much of it as possible, and have those automated tests report results (which each correspond to one test case in the test plan) to ResultsDB. Once that's working, design a nice web frontend to show you the results of the tests in a way that makes sense to you. Make it pull data from ResultsDB to fill in the boxes, and now you've got your own specialized web frontend that shows you exactly the data you want to see. Excellent!
But How Will This Help With
depcheck And The PUATP?
Right! As mentioned previously, there's actually a whole Package Update Acceptance Test Plan, with other test cases and other tests involved -
depcheck alone isn't the sole deciding factor on whether a new update is broken or not. We want to run a whole bunch of tests, like using
rpmguard to check whether a previously-executable program has suddenly become non-executable, using
rpmlint to make sure there's a valid URL in the package Once an update passes all the tests, we should let Bodhi know that the update is OK. But the tests all run independently - sometimes simultaneously - and they don't know what other tests have run. So how do we decide when the whole test plan is complete?
This is another planned capability for ResultsDB - modeling test plans. In fact, we've set up a way to store test plan metadata in the wiki page, so ResultsDB can read the Test Plan page and know exactly which tests comprise that plan. So when all the tests in the PUATP finish, ResultsDB can send out a message on the bus to indicate "package
martini-2.3 passed PUATP" - and Bodhi can pick up that message and unlock
martini-2.3 for all its eager, thirsty users.
But anyone who has used
rpmlint before might be wondering: how will anyone ever get their package to pass the PUATP when
rpmlint is so picky?
The Wonders of Whitelists and Waivers
This is another planned use for ResultsDB - storing whitelists and waivers. Sometimes there will be test failures that are expected, that we just want to ignore. Some packages might be idiosyncratic and the Packaging Committee might want to grant them exceptions to the normal rules. Rather than changing the test to handle every possible exception - or making the maintainers jump through weird hoops to make their package pass checks that don't apply or don't make sense - we'd like to have one central place to store exceptions to the policies we've set.
If (in the glorious future) we're already using AutoQA to check packages against these policies, and storing the results of those tests in ResultsDB, it makes sense to store the exceptions in the same place. Then when we get a 'failed' result, we can check for a matching exception before we send out a 'failed' message and reject a new update. So we've got a place in the ResultsDB data model to store exceptions, and then the Packaging Committee (FPC) or the Engineering Steering Committee (FESCo) can use that to maintain a whitelist of packages which can skip (or ignore) certain tests.
There have also been quite a few problematic updates where an unexpected change slipped past the maintainer unnoticed, and package maintainers have thus (repeatedly!) asked for automated tests to review their packages for these kinds of things before they go out to the public. Automating the PUATP will handle a lot of that. But: we definitely don't want to require maintainers to get approval from some committee every time something weird happens - like an executable disappearing from a package. (That might have been an intentional change, after all.) We still want to catch suspicious changes - we just want the maintainer to review and approve them before they go out to the repos. So there's another use for exceptions: waivers.
So don't worry: we plan to have a working interface for reviewing and waiving test failures before we ever start using
rpmlint to enforce any kind of policy that affects package maintainers.
The Even-More Glorious Future of QA
A lot of the work we've discussed here is designed to solve specific problems that already exist in Fedora, using detailed (and complex) test plans developed by the QA team and others. But what about letting individual maintainers add their own tests?
This has actually been one of our goals from Day 1. We want to make it easy for packagers and maintainers to have tests run for every build/update of their packages, or to add tests for other things. We're working right now to get the test infrastructure (AutoQA, ResultsDB, the messagebus, and everything else) working properly before we have packagers and maintainers depending on it. The test structure and API are being solidified and documented as we go. We still need to decide where packagers will check in their tests, and how we'll make sure people don't put malicious code in tests (or how we'll handle unintentionally misbehaving tests).
We also want to enable functional testing of packages - including GUI testing and network-based testing. The tests I've been discussing don't require installing the packages or running any of the code therein - we just inspect the package itself for correctness. Actual functional testing - installing the package and running the code - requires the ability to easily create (or find) a clean test system, install the package, run some test code, and then review the results. Obviously this is something people will need to do if they want to run tests on their packages after building them. And this isn't hard to do with all the fancy virtualization technology we have in Fedora - we just need to write the code to make it all work.
These things (and more) will be discussed and designed and developed (in much greater detail) in the coming days and weeks in Fedora QA - if you have some ideas and want to help out (or you have any questions) join the
#fedora-qa IRC channel or the Fedora tester mailing list and ask!
1 This is why some update transactions can fail even after
yum runs its dependency check, declares the update OK, and downloads all the packages.
2 Actually, this test already exists. See the
conflicts test, which is built around a tool called
potential_conflict.py. Note how it's pretty up-front about only catching potential conflicts.
3 Yeah, "PUATP" is a crappy acronym, but we haven't found a better name yet.
4 Although maybe not - it seems really silly to send someone an email to tell them they don't need to do anything. Informed opinions on this matter are welcomed.
5 In fact, AQMP and the
qpid bindings allow you to listen only for messages that match specific properties - so Bodhi could listen only for
depcheck test results that match one of the
-pending updates - it doesn't have to listen to all the messages and filter them out itself. Neat!
6 Some AutoQA tests will test multiple test cases, and thus report multiple test results. Yes, that can be a little confusing.
7 See the instructions here: http://fedoraproject.org/wiki/QA
|depcheck: the why and the how (part 2)
||[Oct. 11th, 2010|10:18 am]
Will Woods, Fedora Testing Guy
In part 1 I discussed the general idea of the
depcheck test: use
yum to simulate installing proposed updates, to be sure that they don't have any unresolved dependencies that would cause
yum to reject them (and thus cause everyone to be unable to update their systems and be unhappy with Fedora and the world in general.)
In this part we're going to look at two of the trickier parts of the problem - interdependent updates and multilib.
Interdependent Updates: no package is an island
This, by itself, is a pretty simple concept to understand: some packages require a certain version of another package. For example,
evolution both require a certain matching version of
e-d-s for short) to operate properly. So if we update
e-d-s we also have to rebuild and update
So - that's all fine, but what happens if we test the new
empathy update before the new
evolution-data-server has been tested and released?
If we test
empathy by itself, depcheck will reject it because we haven't released the new
e-d-s yet. And then checking
e-d-s by itself would fail, because we rejected the new
empathy package that works with it - switching to the new
e-d-s would cause the existing
empathy to break.
Obviously this is no good - these two updates are perfectly legitimate so long as they're tested together, but tested independently they both get rejected. And it's not an uncommon problem, really - there are actually 8 other packages on my system which require
e-d-s, and dozens (probably hundreds) of other examples exist. So we have to handle this sensibly.
The solution isn't terribly complicated: rather than testing every new update individually, we put new updates into a holding area, test them all as a batch, and then the packages in the batch that are judged to be safe are allowed to move out of the holding area. So interdependent packages will sit in the holding area until all the required pieces are there - and then they all move along together. Easy!
This can be confusing, though. For instance: it's true that we run
depcheck for every new proposed update - but remember that we aren't only testing the new update. We're testing the new update along with every previously-proposed update that hasn't passed
depcheck yet. This means that a package that fails
depcheck will be retested with every new update until it passes (or gets manually removed or replaced).
Because of this quirk, we need to design
depcheck to notify the maintainer if their package fails its initial test, but not send mail for every failure - after the first time, failed updates can just sit quietly in the holding area until they finally have their dependencies satisfied and pass the test. At that point, the maintainer should get a followup notification to let them know that the update is OK. We might also want to notify maintainers if their packages get stuck in the holding area for a long time, but we haven't decided if (or when) this would be useful or necessary.
It's Actually Even More Complicated Than That
There's actually more subtle complications here. First, you need to know that all Fedora updates are pushed into the live repos by hand - by someone from Fedora Release Engineering (aka rel-eng). So there's going to be a delay - perhaps a few hours - between
depcheck approving a package for release and the actual release of the package.
So: updates that have passed
depcheck won't actually get moved out of the holding area until someone from rel-eng comes along and pushes them out. But that's fine - we want to include approved (but not-yet-pushed) updates in the
depcheck test. We need them there, in fact, because we need to test subsequent updates as if the approved ones are already part of the public package repos (because they will be, just as soon as someone from rel-eng hits the button).
But: if someone revokes or replaces an update, this could cause other previously-approved updates to lose their approval. For example, let's say
evolution-data-server turns out to actually have some horrible security bug and needs to be fixed and rebuilt before it gets released into the wild. This would cause our previously-approved
empathy update to fail
depcheck! So clearly we need to retest all the proposed update - including approved ones - whenever new updates land. And rel-eng should only consider the currently-approved updates when they're pushing out new updates.
Multilib is the term for the magic hack that allows you to run 32-bit code on your 64-bit system. It's also the reason
i686 packages show up on
x86_64 systems (which annoys a lot of
x86_64 users, but hey, at least you can use the Flash plugin!). Multilib support allows you to do some strange things - like have two versions of the same package installed (e.g. my system has
sqlite.i686). They can even both install the same file under certain circumstances (e.g. both
sqlite packages install
/usr/bin/sqlite3 - and this is totally allowed on multilib systems.)
You might think this would cause some strange complications with (already complicated) dependency checking - and you'd be absolutely right. Luckily, though,
yum already handles all of this for us - provided we give it the right things to check.
i686 packages are placed into the
x86_64 repo by a program called
mash. Its job is to take a set of builds and decide which ones are multilib - that is, which ones are required for proper functioning of 32-bit binaries on 64-bit systems. When new updates are pushed out,
mash is the thing that runs behind the scenes and actually picks the required RPMs and writes out all the metadata.
This means that if we want
depcheck's results to be accurate, we need to feed it the same RPMs and metadata that normal users would see once we pushed the updates. Which means that
depcheck needs to run
mash on the proposed updates, and use the resulting set of RPMs and metadata to run its testing. Otherwise we'll completely miss any weird problems arising from incorrect handling of multilib packages.
mash was designed to take a Koji tag as its input, having the
-pending tags for proposed updates allows us to use
mash just like the normal push would, and therefore we can be sure we're testing the right set of packages. Which means all our multilib problems are solved forever! ..right?
Unsolved Problems and Future Work
Sadly, no. The fact that we're correctly checking multilib dependencies doesn't necessarily mean we'll catch all
yum problems involving multilib. For example: problems keep arising when a package (e.g.
nss-softokn) accidentally stops being multilib - so then you get an update that upgrades
nss-softokn.x86_64 but not
yum considers this type of update legitimate, and these dependencies to be properly resolved. But subsequent updates that want to use
nss-softokn will be confused by the fact that there are two different versions of
nss-softokn installed, and then
yum will fail.
Another example is file conflicts. Normally it's not allowed for multiple files to install the same package - but, as mentioned above, multilib packages can (under certain circumstances) install multiple copies of the same file. But
depcheck doesn't check this - mostly because
yum (by itself) does not check for file conflicts. It does use the RPM libraries to check for file conflicts, but this is completely separate from
yum's dependency checking code. And strictly speaking, the purpose of the
depcheck test is to check dependencies, and this is.. something else.
So: there are problems that
depcheck will not solve - not because of bugs in depcheck, but because they're outside of the reach of its design. But it's important to understand what those problems are and - more importantly - to plan for future AutoQA tests that will be able to catch these problems. And we also need to think about how to use the test results to enforce policy - that is, how to make the Fedora infrastructure reject obviously broken updates. Or how to flag seemingly-broken updates for review, and require signoff before releasing them. We'll talk about all that in part 3.
1 Technically not every change to
evolution-data-server requires us to rebuild
evolution, but let's just ignore that for now.
2 Like if the maintainer replaces it with a newer version (hopefully one with fixed dependencies!), or if the Fedora Release Engineering team decides to remove it.
3 This message will include the error output so the maintainer knows what other package(s) are causing the problem, and therefore which maintainer to talk to if they want to get the problem resolved.
4 The holding area is actually a set of Koji tags that end in
-pending - package maintainers may have seen some email involving this tag. Well, that's what it's for.
5 Yes, this means approved packages will actually keep getting retested even after they get approved. This is another place where we need to avoid notifying maintainers over and over.
6 Note that there's also a small delay between when the update set changes and when the test completes - and so it could be possible for rel-eng to be looking at obsolete test results. We're still trying to figure out the best way to make sure rel-eng is only dealing with up-to-date info.
7 Or 64-bit binaries on your mostly-32-bit system, in the case of
|depcheck: the why and how (part 1)
||[Oct. 7th, 2010|02:45 pm]
Will Woods, Fedora Testing Guy
From the very beginning, one of the big goals of the AutoQA project was to set up an automated test that would keep broken updates out of the repos. People have been asking for something like this for years now, but nobody's managed to actually make it work. It turns out this is because it's actually really hard.
But after a year (maybe two years?) of work on AutoQA we finally have such a test. It's called
depcheck and it's very nearly complete, and should be running on all newly-created package updates very, very soon.
There's a lot of interest in this subject among Fedora developers (and users!) and there have been a lot of discussions over the years. And there will probably be a lot of questions like: "Will it keep [some specific problem] from happening again?" But since it's a really complicated problem (did I mention how it's taken a couple of years?) it's not easy to explain how the test works - and what it can (and can't) do - without a good deal of background on the dependency checking process, and how it can go wrong. So let's start with:
A Rough Definition of the Problem
Normally, when you update your system, yum downloads all the available updates - packages that are newer versions of the ones on your system - and tries to install them.
Sometimes a new update will appear in the update repos that - for some reason - cannot be installed. Usually there will be a set of messages like this, if you're using yum on the commandline:
Setting up Update Process
--> Running transaction check
--> Processing Dependency: libedataserverui-1.2.so.10()(64bit) for package: gnome-panel-2.31.90-4.fc14.x86_64
---> Package evolution-data-server.x86_64 0:2.32.0-1.fc14 set to be updated
---> Package nautilus-sendto.x86_64 1:2.32.0-1.fc14 set to be updated
--> Finished Dependency Resolution
Error: Package: gnome-panel-2.31.90-4.fc14.x86_64 (@updates-testing)
Removing: evolution-data-server-2.31.5-1.fc14.x86_64 (@fedora/$releasever)
Updated By: evolution-data-server-2.32.0-1.fc14.x86_64 (updates-testing)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
What's happened here is that Fedora package maintainers have accidentally pushed out an update which has unresolved dependencies - that is, the RPM says that it requires some certain thing to function, but that other thing is not available. And so - rather than installing a (possibly broken) update - yum gives up.
So the problem to solve is this: how can we check proposed updates - before they hit the update repos - to make sure that they don't have (or cause) unresolved dependencies?
Aside: An Oversimplified Summary of How Dependencies Work
RPM packages contain a lot more than files. They also contain scripts (to help install/uninstall the package) and data about the package, including dependency info. This mainly takes the form of four types of headers:
Provides headers list all the things that the package provides - including files, library names, abstract capabilities (e.g.
httpd - the package for the Apache webserver - has a "
Provides: webserver" header) and the like.
Requires headers list all the things that the package requires - which must match the
Provides headers as described above. The majority of these headers list the libraries that a given program requires to function properly - such as
libedataserverui-1.2.so.10 in the example above.
Conflicts headers list packages that conflict with this package - which means this package cannot be installed on a system that has the conflicting package already installed, and trying to install it there will cause an error.
Obsoletes headers list packages that this one obsoletes - that is, this package can safely replace the other package (which should then be removed).
Collectively, this data is sometimes called PRCO data. When yum is "downloading metadata", this is the data it's downloading - a list of what packages require what things, and what packages provide those things, and so on. And that's how yum figures out what it needs to complete the update and ensure that your system keeps working - and when the new
Requires don't match up with existing
Provides, that's when you see the dreaded unresolved dependencies.
An Overly Simple (And Therefore Useless) Proposed Solution, With A Discussion Of Its Shortcomings
"Well then let's check all the
Requires in proposed updates and make sure there's a matching
Provides in some package in the repos!"
Unfortunately, dependency resolution is a bit more complicated than this. First, just checking every package in every repo doesn't quite work - you need to only look at the newest version of each package. Second, you need to take
Obsoletes headers into account - ignoring packages that have been obsoleted, for instance. Oh also: you need to watch out for multilib packages - which is a special kind of black magic that nobody seems to fully understand - and, well, it's all kind of complicated. If only there was already some existing code that handled this..
..and there is!
Yum itself does all this when it's installing updates. And if we want to be sure that
yum will accept proposed updates, it makes sense to use the same code for the test as we do for the actual installation. So:
A Slightly More Concrete Proposed Solution To The Problem
yum into simulating the installation of each proposed update and use its algorithms to determine whether the updates are installable!"
This, as it turns out, is not that hard to do. Yum is designed in such a way that it can use the repo metadata as if it were actually the local RPM database - which nicely simulates having all available packages installed on the local system. We can then ask yum to run just the dependency solving step of the package update process and see if that turns out OK.
If that works, the update(s) we're testing must have consistent, solvable dependencies, and are safe to push to the repos. Otherwise we have problems, and the proposed update should be sent back to the maintainers for review and fixing.
That's the general idea, anyway - and for simple cases, it works just fine! But there are over 10,000 packages in Fedora and some of them are.. less than simple. Sometimes there are interdependent updates - two (or more!) updates that require each other to function - and testing them individually would fail, but testing them together would work. Furthermore, what about the scary multilib black magic? How do we make sure we're handling that properly?
I'll discuss these issues (and our solutions) further in Part 2.
1 PackageKit and friends still use yum behind the scenes, so all this information still applies no matter what update system UI you use.
2 A new update can cause problems with other packages by obsoleting/conflicting with things they require - more on this later.
|A helpful git config snippet
||[Jun. 4th, 2010|05:15 pm]
Will Woods, Fedora Testing Guy
So, if you're like me, you like to poke through the source of things from time to time but you always forget what the proper URL is for, say, GNOME git. Or Fedora hosted. Or whatever.|
Well, good news, friend! Stuff this into your
~/.gitconfig and make your life a bit easier:
insteadOf = "fh:"
insteadOf = "fh-ssh:"
insteadOf = "gnome:"
insteadOf = "gnome-ssh:"
insteadOf = "fdo:"
insteadOf = "fdo-ssh:"
Now you can do stuff like
git clone fdo:plymouth or
git clone fh-ssh:autoqa.git and it should Just Work*. Neat!
Now, if I was really clever, I'd find somewhere to ship this in the default Fedora install, or at least as part of the developer tools. Maybe someone else out there is really clever?
*Unless your local username is different from the remote username, in which case ssh might not work - but you can fix that by changing the url to
ssh://username@... or putting the following in
|In which I admit to making a stupid mistake in Python
||[Feb. 24th, 2010|01:44 pm]
Will Woods, Fedora Testing Guy
So yesterday we found a bug in RATS (Rawhide Acceptance Test Suite - the scripts we use for testing Rawhide and other Fedora candidate trees). The test found no problems with the Fedora 13 release candidate tree we'd just made, but when we tried to install it, the installer died because the kernel had missing dependencies. Huh? The tests are supposed to check that!|
We checked the logs, and the test script hadn't even checked the kernel package. But 'kernel' was definitely in the list of packages that were supposed to be checked - and it was still in the list after we got through the loop that checked all the packages. But it never got tested. What gives?
The cause turned out to be an embarrassing mistake on my part. Consider the following python snippet:
meats = ('bacon', 'pork', 'beef')
input = ['one', 'two', 'pork', 'three', 'four']
for n in input:
if n in meats:
What would you expect to see as output? Probably 'one, two, three, four', right? Instead, you'll get:
Shortening a list while you're iterating over it turns out to be a bad idea. You can append to the list just fine, and the loop will happily iterate over the new items you added once it gets to the end of the list. But if you remove an item, bad things happen. Here's what happens:
When the loop is processing
'pork', it's processing the third item in the list. When you remove
'pork', the list gets shifted up, so now
'three' becomes the third item in the list.
Then we hit the end of the loop and move to the fourth item in the list - without ever processing
So it turns RATS was removing the package before 'kernel' in the package list - which was the right thing to do - but that caused us to accidentally skip 'kernel', leading to the false positive result from the test.
Long story short: Never remove items from a list while you're looping over it.
||[Feb. 2nd, 2010|02:15 pm]
Will Woods, Fedora Testing Guy
As discussed at FUDCon, I've been working on a automated test to perform dependency checks for new package builds/updates, so we stop having broken |
I've been working on it for a couple weeks now, and spent a while writing code to manually examine all the PRCO (
Obsoletes) data for new packages and compare it to the previous version of that package. It's been a helpful exercise for straightening out in my mind how dependency resolution works, and what kinds of changes we need to worry about. For example, it's basically harmless for a new package to
Provide something new, but a new
Requires entry will cause problems if there isn't a matching
Provide somewhere in the repos. That kind of thing.
But now I've realized that I'm really just rewriting the depsolving algorithms already in
yum, and trying to ensure that my version of the algorithm is complete and correct would be really complicated and painful. So with some help from skvidal and geppetto (thanks, guys!) I've managed to rewrite it as an extension of the existing
yum objects - and it seems to be working (yay!) and typical runs take 15-20 seconds, which makes it feasible to run this test for every new build and update (double yay!).
I still need to write some proper test cases to ensure everything is working as expected, but hopefully I'll have some good news on that front in the next week or two.
..On the other hand, my wife and I are heading to New Orleans this weekend. We were just going for the various Mardi Gras parades but, uh, now there's this whole Super Bowl thing going on? And it's the first time the Saints have ever been in the Super Bowl and basically the entire town appears to be going completely bonkers. Schools are cancelling classes, trials have been delayed, thousands of men in dresses and parading through the streets, dogs & cats living together, mass hysteria, &c.
So if you don't hear from me for a while, well.. I'm sure I'll be drawn back to the incredibly exciting world of RPM dependency checking in due time.
|Gosh, just the FUDConniest time
||[Dec. 15th, 2009|04:22 pm]
Will Woods, Fedora Testing Guy
Oh man, FUDCon. It's hard to sort out your thoughts after five solid 16-hour days talking about Fedora stuff. It's like two full work weeks - with brilliant people I don't normally get time to talk to - crammed into one long weekend. There are parts that don't really need much further discussion (mostly involving drinkin' and cussin') but here are some standout bits:|
AutoQA BarCamp Talk
On Friday Bill Peck and I gave a combined talk about our automated test efforts. My half was about the new AutoQA system (based on autotest) which we've developed to address some of the Big Problems in Fedora. (Slides are here). Bill's part was about Beaker, which is a system developed inside Red Hat to address some of the testing needs of RHEL. It's interesting to see where the two things overlap and the places where they diverge, just based on the differing needs of the two projects.
For example: AutoQA was designed to tackle things like Rawhide failing to build, the installer failing to start properly, packages with broken dependencies hitting the public repos, and so on. These problems are generally independent of the hardware you're running on, so AutoQA currently runs all its tests on whatever test system is available - or inside a VM.
Beaker, on the other hand, has some robust system inventory and provisioning features, because it's very important for Red Hat to be able to (for example) test new kernels on all the various hardware supported and sold by their partners.
Despite the differences we've been talking a lot about ways the two projects could work together to make them both stronger. Yay Open Source!
I had some hallway-style conversations with Seth Vidal and Jesse Keating which confirmed the idea that we can do a dependency check for new packages (or package updates) without having to do a full repoclosure run every time. I'm still working on code for this but with a little work we should be able to automate the check to happen after every single package build and proposed update, and prevent broken deps from ever getting into the repos again.
This will require a couple of new things in AutoQA - a post-bodhi-update hook, for example - but Luke Macken assures me this is pretty easy to do with the currently-available RSS feeds of info from Bodhi.
automating LiveCD testing
So Adam Miller (maxamillion on IRC, XFCE spin maintainer) wants a way to automatically test the Live images he's producing for XFCE. Which is awesome, because we all want a way to automatically test Live images. We talked a bit about ways to script GUI stuff (hello, Dogtail), how to boot Live images in a VM, and how to cram tests into Live images without completely rebuilding them.
It would be nice if Live images used a boot parameter like 'updates=XXX' and we could pass a filesystem image (or cpio archive) full of files that would get dumped onto the Live system before rc.sysinit. Then we could (for example) drop tests and a test-launcher in place at bootup. Until that happens, we might be able to fake it by using livecd-iso-to-disk and a loopback disk image instead of a real USB key. But if it comes down to it, we can probably tell the Live image to put a login console on the serial port, and log in through that.
We'll see where that all goes soon enough, I hope.
This is a project I keep picking up and messing aroung with - in short, a network-mountable filesystem with debuginfo data, so you don't have to install debuginfo packages anymore. The (old) Feature page is here.
Peter Jones and I restarted work on the debuginfofs client/server and talked through some of the implementation details. And apparently the GDB guys are going to rework the debuginfo packages so different package versions don't conflict with each other. Nice!
There were some other Big Ideas discussed at FUDCon - big changes to the way we do updates, completely revamped bug reporting, even the basic purpose of the Fedora Project - but I think those will get a lot more thought and discussion in the coming weeks.
Oh, two other things: The artichoke knows things, and you should all install the hot-dog boot animation. Because the mustard indicates progress.
|FUDCon, you win again
||[Dec. 9th, 2009|02:37 pm]
Will Woods, Fedora Testing Guy
Proper FUDCon post coming after I've had more time to arrange my thoughts and recover. |
Until then, check this out! While I was gone, my wife (as part of her Plush-a-Day challenge) made two plush robots in honor of FUDCon:
"This robot is made from the FUDcon Boston 2009 shirt that I got - in fact, I plan to make 2-3 robots from the material! I was reminded about this shirt since FUDCon is going on right now up in Toronto!
FUDbot here features:
- Sweet logo design
- Saucy winking interface
- Convenient pocket program (see other photo for pocket utilization techniques)
His full name is Leonidas Stentz, but he prefers "FUDbot" so I'm fine with that."
"ConBot is made from some more of the Boston 2009 FUDCon tshirt that I had. This part includes the tag cloud, which I really like the visual imagery of.
Unfortunately, ConBot is a bit crazed - look at those eyes! Maybe there are too many words on his belly?"
|"Linux is about choice" - yes, but not how you mean it
||[Sep. 23rd, 2009|11:18 am]
Will Woods, Fedora Testing Guy
I agree with Richard - the phrase "Linux is about choice" sets my teeth on edge. I think it's because every time I hear it, it's being waved about angrily, like a club, by someone who really means "Linux is about me getting what I want".|
But here's the thing: in a way, Linux (and all Open Source) really is all about choice. Because one of the essential guarantees of Open Source is the freedom to fork, and the freedom to fork gives you the power to make any choice you could possibly want to make. And this fact makes all other arguments about choice irrelevant.
It also frees the developers of all Open Source projects from any responsibility to consider these sorts of demands - if you want it so bad, fork your own project.
Oh, you can't/don't want to fork? Well, remember that every time you add support for an optional backend or a checkbox for optional behavior, you've just added at least two moving parts and increased possible failure rates by a factor of six. Put another way: Every choice presented to the user means less testing and more bugs for all users. So you better have a really damn good justification - this means code or data, not anecdotes and slogans - before you even bring up the idea of adding a new choice. And "Linux is about choice!" is definitely not a valid justification. At all. Ever.
I think Ajax's seminal post from January 2008 may have said it best:
There is a legitimate discussion to be had about where and how we draw
the line for feature inclusion [...] But the chain of logic from "Linux is about
choice" to "ship everything and let the user choose how they want their
sound to not work" starts with fallacy and ends with disaster.
||[Aug. 14th, 2009|04:57 pm]
Will Woods, Fedora Testing Guy
Ladies and gentlemen, if I could direct your attention to the autoqa-results mailing list for a moment? What you see here are the first stuttering steps of Fedora's autoqa project. |
The basic design of the system is pretty simple: We wait for certain things to happen (the repos are updated, new Rawhide boot/installer images are built, a new update is created in Bodhi, a package is built in Koji, etc.) and launch appropriate tests. The test results get sent somewhere people can see them. And we examine those test results to figure out what's broken, and how to fix it.
Obviously reality is much more complicated than that, but you get the idea. As of now we have two 'hooks' (events that can trigger tests) implemented and four tests written.
The first hook -
post-repo-update - triggers
conflicts check for broken deps and file conflicts in the repo, and
rats_sanity is a sanity test of the repo metadata and the integrity of the Critical Path Packages.
The second - and more exciting to me - is
post-tree-compose, which runs when a new install tree is composed (usually the nightly Rawhide build). It currently runs one test:
rats_install. This sanity-checks the boot images and then uses them to boot a virtual guest system. It watches the console output and log files generated by the installer, and tries to do a very simple install.
These tests run automatically when the repos get updated - including Fedora 10 and 11 updates and updates-testing.
Everything is under heavy development but we're working on documentation for writing new tests and implementing new hooks as we go along. We also have plans to set up a public instance of the test system in the Fedora Infrastructure so people can examine the full logs of failed jobs and make neat reports of test statistics and pull useful data out of test results and all that good stuff.
So - what kind of hooks / tests would you write for this system? Anyone have existing tests they'd like to run automatically? Responses in comments are welcome - or fedora-test-list if you'd like to discuss with a wider audience.
||most recent entries