2024-05-29

Where did 5 Million EPEL-7 systems come from starting in March?

ADDENDUM (2024-05-30T01:08+00:00): Multiple Amazon engineers reached out after I posted this and there is work on identifying what is causing this issue. Thank you to all the people who are burning the midnight oil on this.

ORIGINAL ARTICLE:

So starting earlier this year, the Fedora mirror manager mailing list started getting reports about heavier usage from various mirrors. Looking at the traffic reported, it seemed to be a large number of EL-7 systems starting to check in a lot more than in the past. At first I thought it was because various domains were beginning to move their operating systems from EL-7 to a newer release using one of the transition tools like Red Hat LEAPP or Alma ELevate. However, the surge didn't seem to die down, and in fact the Fedora Project mirrors have had regular problems with load due to this surge. 

A couple of weeks ago, I finally had time to look at some graphs I had set up when I was in Fedora Infrastructure and saw this:

Cumulative EPEL releases since 2019

 

Basically the load is an additional 5 million systems starting to query both the Fedora webproxies for mirror data, and then mirrors around the world to get further information. Going through the logs, there seems to be a 'gradual' shift of additional servers starting to ask for content when they had not before. In looking at the logs, it is hard to see what the systems asking for this data are. EL-7 uses yum which doesn't report any user data beyond:

urlgrabber/3.10 yum/3.4.

That could mean the system is Red Hat Enterprise Linux 7, CentOS Linux 7, or even Amazon Linux 2 (which is sort of based on CentOS 7, but with various changes that using EPEL is probably not advised).

Because there wasn't a countme or any identifiers in the yum code, the older data-analysis program does a 'if I see an ip address 1 time a day, I count it once.. if I see it 1000 times, I count it once.' This had a problem of undercounting for various cloud and other networks behind a NAT router.. so normally maybe only 1 ip address would show up in a class C (/24) network space. What seemed to change is where we might only count one ip address in various networks, we were now seeing every ip address showing up in a Class C network. 

Doing some backtracking of the ip addresses to ASN numbers, I was able to show that the 'top 10' ASNs changed dramatically in March

January 27, 2024
Total  ASN 
1347016 16509_AMAZON-02,
219728 14618_AMAZON-AES,
53500 396982_GOOGLE-CLOUD-PLATFORM,
11205 8560_IONOS-AS
10403 8987_AMAZON
8463 32244_LIQUIDWEB,
8019 54641_IMH-IAD,
7965 8075_MICROSOFT-CORP-MSN-AS-BLOCK,
7889 398101_GO-DADDY-COM-LLC,
7234 394303_BIGSCOOTS,

February 27, 2024
1871463 16509_AMAZON-02,
219545 14618_AMAZON-AES,
51511 396982_GOOGLE-CLOUD-PLATFORM,
11021 8560_IONOS-AS
9016 8987_AMAZON
8208 32244_LIQUIDWEB,
7885 54641_IMH-IAD,
7768 8075_MICROSOFT-CORP-MSN-AS-BLOCK,
7618 398101_GO-DADDY-COM-LLC,
7383 394303_BIGSCOOTS,

March 27, 2024
2604768 16509_AMAZON-02,
276737 14618_AMAZON-AES,
34674 396982_GOOGLE-CLOUD-PLATFORM,
10211 8560_IONOS-AS
9560 135629_WESTCLOUDDATA
8134 8987_AMAZON
7952 54641_IMH-IAD,
7677 32244_LIQUIDWEB,
7445 394303_BIGSCOOTS,
7250 398101_GO-DADDY-COM-LLC,

April 27, 2024
4247068 16509_AMAZON-02,
1807803 14618_AMAZON-AES,
65274 8987_AMAZON
51668 135629_WESTCLOUDDATA
41190 55960_BJ-GUANGHUAN-AP
9799 396982_GOOGLE-CLOUD-PLATFORM,
7662 54641_IMH-IAD,
7561 394303_BIGSCOOTS,
6613 32244_LIQUIDWEB,
6425 8560_IONOS-AS

May 27, 2024
4186230 16509_AMAZON-02,
1775898 14618_AMAZON-AES,
62698 8987_AMAZON
50895 135629_WESTCLOUDDATA
38521 55960_BJ-GUANGHUAN-AP
9059 396982_GOOGLE-CLOUD-PLATFORM,
7613 394303_BIGSCOOTS,
7531 54641_IMH-IAD,
6307 398101_GO-DADDY-COM-LLC,
6222 32244_LIQUIDWEB,

I am not sure what changed in Amazon in March, but it has had a tremendous impact on parts of Fedora Infrastructure and the volunteer mirror systems which use it.

2024-05-14

CentOS Linux 7: End of Life 2024-06-30

CentOS 7 EOL

If this looks a lot like the CentOS Stream 8 EOL content.. well it is because they aren't too different. It is just that instead of doing this once every 4 years, we get to do this twice in one year.
 
The last CentOS Linux release, CentOS Linux 7, will reach its end of life on June 30th, 2024. When that happens, the CentOS Infrastructure will follow the standard practice it has been doing since the early days of CentOS Linux:
  1. Move all content to https://vault.centos.org/
  2. Stop the mirroring software from responding to EL-7 queries.  
  3. Additional software for EL-7 may also be removed from other locations.

The first change usually causes any mirror doing an rsync with delete options to remove the contents from their mirror. The second change will cause the yum commands to break with errors. The vault system is also a single server with few active mirrors of its contents. As such, it is likely to be overloaded and very slow to respond as additional requests are made of it. 

 At the same time, the EPEL software on https://dl.fedoraproject.org/pub/epel/7 will be moved to /pub/archive/epel/7 and the mirrormanager for that will be updated appropriately.

In total, if you are using CentOS Linux 7 in either production or a CI/CD system, you can expect a lot of errors on July 1st or shortly afterwords.

What you must do!

There are several steps you can do to get ahead of the tsunami of problems:
  1. You can convert to Red Hat Enterprise Linux 7 and see about getting a Extended LifeCycle Support contract with the system. This is a stop gap measure for you to move toward a newer release of a Linux operating system.
  2. You can move your system to a replacement Enterprise Linux distribution. The Alma Project, Rocky Linux, Oracle and Red Hat all offer tools which can transition an EL7 system to a version of the operating system they will support for several years.
  3. If you are not able to move your systems in the next 45 days, you should look at mirroring the CentOS Linux 7 operating system to a more local location and move your update configs to use your mirror. With the large size of systems which could potentially try to use the vault system, I would not expect this to be very useful. As you will probably need to reinstall, add software or do continuous CI/CD in other areas.. you should keep a copy of the operating system local to your networks.

 References:

2024-05-13

CentOS Stream 8 END OF LIFE : 2024-05-31

CentOS Stream 8 EOL

CentOS Stream 8 will reach its end of life on May 31st, 2024. When that happens, the CentOS Infrastructure will follow the standard practice it has been doing since the early days of CentOS Linux:
  1. Move all content to https://vault.centos.org/
  2. Stop the mirroring software from responding to EL-8 queries. 

The first change usually causes any mirror doing an rsync with delete options to remove the contents from their mirror. The second change will cause the dnf or yum commands to break with errors. The vault system is also a single server with few active mirrors of its contents. As such, it is likely to be overloaded and very slow to respond as additional requests are made of it.

In total, if you are using CentOS Stream 8 in either production or a CI/CD system, you can expect a lot of errors on June 1st or shortly afterwords. 

What you can do!

There are several steps you can do to get ahead of a possible tsunami of problems:
  1. You can look to moving to a newer release of CentOS Stream before the end of the month. This usually will require deployment of new images or installs versus straight updates. 
  2. You can see if any of the 'move my Enterprise Linux' tools have added support for moving from CentOS Stream 8 to their EL8.10. For releases before 8.10, this was very hard because CentOS Stream 8 was usually the next release, but 8.10 is at a point where Alma, Oracle, Red Hat Enterprise Linux, or Rocky are at the same revisions or newer.
  3. You can start mirroring the CentOS Stream 8 content into your infrastructure and point any CI/CD or other systems to that mirror which will allow you to continue to function.

Mirroring CentOS Stream

Of the three options, I recommend the third. However in working out what is needed to mirror CentOS Stream, I realized I needed newer documentation and it would probably be a long post in itself. For a shorter version for self-starters, I recommend the documentation on the CentOS wiki,  https://wiki.centos.org/HowTos(2f)CreateLocalMirror.html  While the information was written for CentOS Linux 6 which was end-of-lifed in 2020, it covers most of the instructions needed. The parts which may need updating is the amount of disk space required for CentOS Stream which seems to be about 280 GB for everything and maybe around 120GB for any one architecture. 

 References:

 

Recent EPEL dnf problems with some EL8 systems

Last week there were several reports about various systems having problems with EPEL-8 repositories. The problems started shortly after various systems in Fedora had been updated from Fedora Linux 38 to Fedora 40, but the problems were not happening to all EL-8 systems. The problems were root-caused to the following:

  1. Fedora 40 had createrepo-1.0 installed which defaults to using zstd compression for various repositories.
  2. The EL8 systems which were working had some version of libsolv-0.7.20 installed which links against libzstd and so works.
  3. The EL8 systems which did not work either had versions of libsolv before 0.7.20 (and were any EL before 8.4) OR they had versions of libsolv-0.7.22

The newer version of libsolv was traced down to coming from repositories of either Red Hat Satellite or a related tool pulp, and was a rebuild of the EL9 package. However it wasn't a complete rebuild as the EL9 version has libzstd support, but the EL8 rebuild did not. 

A band-aid fix was made by Fedora Release Engineering to have EPEL repositories use the xz compression method for various files in /repodata/ versus libzstd. This is a slower compression method, but is possible to be used with all of the versions of libsolv reported in the various bugs and issue trackers.

This is a band-aid fix because users will run into this with any other repositories using the newer createrepo. A fuller fix will require affected systems to do one of the following:

  1. If the system is either Red Hat Enterprise Linux, Rocky Linux, Alma Linux, or Oracle Enterprise Linux and not running EL-8.9 or later, they need to upgrade to that OR point to an older version of a repository in https://dl.fedoraproject.org/pub/archive/epel/
  2. If the system has versions of libsolv-0.7.22 without libzstd support, they should contact the repository to see if a rebuild with libzstd support can be made. 
  3. If the system is CentOS Stream 8, then you should be making plans to upgrade to a different operating system as that version will be EOL on May 31st 2024.

External References