2017-06-22

Problems with EPEL and Fedora mirroring: Many Root Cause Analysis

There was a problem with EPEL and Fedora mirrors for the last 24 hours where people getting updates would get various errors like:

Updateinfo file is not valid XML:

The problem was caused by a problem in the compose which output the XML file not as xml but as sqllite. The problem was fixed within a couple of hours on the Fedora side, but it has taken a lot longer to fix further downstream.

  • Some of the Fedora mirror containers were not updating correctly. We use a docker container on each proxy to keep the data fresh. 4? of the 14 proxies said they were updating but seem to not do so. These servers were our main ipv6 servers so people getting updates from these were more affected than other users. 
  • Some mirrors only update 1 or 2 times a day (or even slower). This means that your favourite mirror may keep the data for 12 to 48 hours. 
  • Some client plugins like to peg to a quickest mirror to try and keep downloads fast. While we may tell you that there are 20 mirrors up to date, the plugin will use the one it got stuff fastest from in the past. This means you can end up with going to a 'broken' mirror for a lot longer.
  • Some yum/dnf systems seem to have other options set to keep the bad xml file until it 'ages' out. This means that while an updated xml is there, some systems are still complaining because their box already has it.
The fixes on the Fedora side are to put in better tests to try and see that this does not happen again. The client side fixes are currently to do either one of the following:

  • yum clean all
  • yum clean metadata
Thank you all for your patience on this problem.

No comments: