2024-05-29

Where did 5 Million EPEL-7 systems come from starting in March?

ADDENDUM (2024-05-30T01:08+00:00): Multiple Amazon engineers reached out after I posted this and there is work on identifying what is causing this issue. Thank you to all the people who are burning the midnight oil on this.

ORIGINAL ARTICLE:

So starting earlier this year, the Fedora mirror manager mailing list started getting reports about heavier usage from various mirrors. Looking at the traffic reported, it seemed to be a large number of EL-7 systems starting to check in a lot more than in the past. At first I thought it was because various domains were beginning to move their operating systems from EL-7 to a newer release using one of the transition tools like Red Hat LEAPP or Alma ELevate. However, the surge didn't seem to die down, and in fact the Fedora Project mirrors have had regular problems with load due to this surge. 

A couple of weeks ago, I finally had time to look at some graphs I had set up when I was in Fedora Infrastructure and saw this:

Cumulative EPEL releases since 2019

 

Basically the load is an additional 5 million systems starting to query both the Fedora webproxies for mirror data, and then mirrors around the world to get further information. Going through the logs, there seems to be a 'gradual' shift of additional servers starting to ask for content when they had not before. In looking at the logs, it is hard to see what the systems asking for this data are. EL-7 uses yum which doesn't report any user data beyond:

urlgrabber/3.10 yum/3.4.

That could mean the system is Red Hat Enterprise Linux 7, CentOS Linux 7, or even Amazon Linux 2 (which is sort of based on CentOS 7, but with various changes that using EPEL is probably not advised).

Because there wasn't a countme or any identifiers in the yum code, the older data-analysis program does a 'if I see an ip address 1 time a day, I count it once.. if I see it 1000 times, I count it once.' This had a problem of undercounting for various cloud and other networks behind a NAT router.. so normally maybe only 1 ip address would show up in a class C (/24) network space. What seemed to change is where we might only count one ip address in various networks, we were now seeing every ip address showing up in a Class C network. 

Doing some backtracking of the ip addresses to ASN numbers, I was able to show that the 'top 10' ASNs changed dramatically in March

January 27, 2024
Total  ASN 
1347016 16509_AMAZON-02,
219728 14618_AMAZON-AES,
53500 396982_GOOGLE-CLOUD-PLATFORM,
11205 8560_IONOS-AS
10403 8987_AMAZON
8463 32244_LIQUIDWEB,
8019 54641_IMH-IAD,
7965 8075_MICROSOFT-CORP-MSN-AS-BLOCK,
7889 398101_GO-DADDY-COM-LLC,
7234 394303_BIGSCOOTS,

February 27, 2024
1871463 16509_AMAZON-02,
219545 14618_AMAZON-AES,
51511 396982_GOOGLE-CLOUD-PLATFORM,
11021 8560_IONOS-AS
9016 8987_AMAZON
8208 32244_LIQUIDWEB,
7885 54641_IMH-IAD,
7768 8075_MICROSOFT-CORP-MSN-AS-BLOCK,
7618 398101_GO-DADDY-COM-LLC,
7383 394303_BIGSCOOTS,

March 27, 2024
2604768 16509_AMAZON-02,
276737 14618_AMAZON-AES,
34674 396982_GOOGLE-CLOUD-PLATFORM,
10211 8560_IONOS-AS
9560 135629_WESTCLOUDDATA
8134 8987_AMAZON
7952 54641_IMH-IAD,
7677 32244_LIQUIDWEB,
7445 394303_BIGSCOOTS,
7250 398101_GO-DADDY-COM-LLC,

April 27, 2024
4247068 16509_AMAZON-02,
1807803 14618_AMAZON-AES,
65274 8987_AMAZON
51668 135629_WESTCLOUDDATA
41190 55960_BJ-GUANGHUAN-AP
9799 396982_GOOGLE-CLOUD-PLATFORM,
7662 54641_IMH-IAD,
7561 394303_BIGSCOOTS,
6613 32244_LIQUIDWEB,
6425 8560_IONOS-AS

May 27, 2024
4186230 16509_AMAZON-02,
1775898 14618_AMAZON-AES,
62698 8987_AMAZON
50895 135629_WESTCLOUDDATA
38521 55960_BJ-GUANGHUAN-AP
9059 396982_GOOGLE-CLOUD-PLATFORM,
7613 394303_BIGSCOOTS,
7531 54641_IMH-IAD,
6307 398101_GO-DADDY-COM-LLC,
6222 32244_LIQUIDWEB,

I am not sure what changed in Amazon in March, but it has had a tremendous impact on parts of Fedora Infrastructure and the volunteer mirror systems which use it.

No comments: