2017-02-14

Trying to get an idea about what packages are used

Background

One of the questions I get asked a lot is "You provide various statistics for Fedora, can you show which packages are installed the most?"

To head off a lot of future requests, the answer is no, no I can't. We do not have any sort of popcorn database which shows what packages are popular. When a user requests the OS to install a package, there is no "Hey I am asking for Bob if I can install libfoobar" that gets sent to the Fedora servers. What yum, dnf, PackageKit, or Salt do is then request for the repo data, looks to see if there is a way to figure out what is wanted and then asks for any packages that it needs to get.

It is this data that I can sort of glean some sort of idea of most installed packages.. but I feel it is way past "Lies", "Damn Lies", and "Statistics" into regions like  "Political Promises" or "Half Life 3 confirmed". Looking over an entire month of requests, sorting the data, and ranking the requests, I find that a bunch of packages show up a lot while others fall off in a long tail. Things that make this data dirty are the fact that if 200 people ask for wordpress, 150 for mediawiki and 90 for nagios.. I will see various PHP trunk packages that all three want as a higher number. I can't simply tell if the person wanted that PHP package by itself or wanted wordpress. [I could possibly try and work out a transaction of requested packages and figure out what nodes and leafs there might be.. but I found that the tools don't always request from download.fedoraproject.org everything it is wanting because it possibly already 'knows' where something is.

In any case, here are the most requested packages to the download website for January.

EPEL-7

  1. epel-release-7-9
  2. python2-pip-8
  3. python2-boto-2
  4. openvpn-2
  5. php-tcpdf-6
  6. php-tcpdf-dejavu-sans-fonts-6
  7. pdc-updater-0
  8. duplicity-0
  9. nagios-plugins-2 *lots of plugins show up here*
  10. ansible-2
  11. libopendkim-2
  12. opendkim-2
  13. cowsay-3
  14. python2-wikitcms-2
  15. pkcs11-helper-1
  16. fedmsg-0
  17. htop-2
  18. munin *lots of munin packages here
  19. awscli-1
  20. hdf5-1

EPEL-6

  1. nagios-plugins-2 *lots of other nagios removed*
  2. libmcrypt-2
  3. nodejs-0 *lots of other nodejs removed*
  4. python2-boto-2
  5. GeoIP-1 *other GeoIP removed*
  6. geoipupdate-2
  7. nrpe-2
  8. libnet-1
  9. denyhosts-2
  10. eventlog-0
  11. syslog-ng-3
  12. epel-release-6-8
  13. php-pear-Auth-SASL-1
  14. php-pear-Net-SMTP-1
  15. php-pear-Net-Socket-1
  16. perl-Net-IDN-Encode-2
  17. perl-Net-Whois-Raw-2
  18. perl-Regexp-IPv6-0
  19. pwhois-2
  20. v8
EPEL-6 is our most popular distribution with a ratio of about 12 EPEL-6 : 7 EPEL-7: 1.5 Fedora 25 to 1 EPEL-5 request over the month of January. 

EPEL-5

  1. R-core-3 *lots of other R packages removed*
  2. globus-gssapi-gsi-devel-12 *lots of other globus removed*
  3. nordugrid-arc-5
  4. xrootd-client-libs-4 *lots of other xrootd removed*
  5. pcp-libs-devel-3
  6. nordugrid-arc-devel-5
  7. libopendkim-2
  8. libopendmarc-1
  9. pcp-libs-3
  10. nordugrid-arc-plugins-globus-5
  11. libopendkim-devel-2
  12. libopendmarc-1
  13. ebtree-6
  14. myproxy-libs-6
  15. mosh-1
  16. lua-cyrussasl-1
  17. drupal7
  18. rear-2
  19. clustershell-1
  20. rsnapshot-1
I found it interesting that R was getting pulled in by a lot of computers on EPEL-5. This OS is almost end of lifed, but it looks like systems are still getting provisioned with it.

Fedora 25

  1. java-1
  2. vim-minimal-8
  3. kernel-core-4
  4. libX11-1
  5. perl-libs-5
  6. perl-5
  7. perl-IO-1
  8. perl-macros-5
  9. perl-Errno-1
  10. nss-3
  11. gdk-pixbuf2-2
  12. gtk3-3
  13. audit-libs-2
  14. nss-softokn-freebl-3
  15. libX11-common-1
  16. gdk-pixbuf2-modules-2
  17. libnl3-3
  18. gnutls-3
  19. pcre-8
  20. gtk-update-icon-cache-3
As can be seen from the Fedora 25, there is another problem with my trying to get an idea of packages.. a package getting updated that is installed on a lot of boxes will show up also. 

Conclusions

I really don't think any 'real' conclusions can come out of this other than people really want vim on their Fedora 25 desktops (emacs was way down the list). 😑 I also want to say that we should get an opt-in popcorn for Fedora :).

[Edited: I forgot this part]

This list of agents which get used to pull down packages for EPEL and Fedora was rather interesting. I combined all the yum together as the many different versions kind of polluted the numbers but here are the top agents:


  1. yum
  2. Salt
  3. dnf
  4. Artifactory
  5. python-requests
  6. Debian Apt-Cacher-NG
  7. PackageKit-hawkey
  8. Axel 2.4 (Linux)
  9. Wget
  10. libdnf
  11. curl
  12. urlgrabber
The Salt seems to come from a large number of amazon systems which are installing either epel-release-6 (80% of the time) or epel-release-7 (20% of the time). Nothing else seemed to be 'pulled' from download.fedoraproject.org so it is probably just a config artifact on bootup. 

No comments: