2018-11-30

NOTICE: nagios-4.4.2 is heading towards updates

There are 2 CVE's for nagios which require an update to the latest version from the Nagios.com.




This is a major upgrade from 4.3 to 4.4, and will require extra testing (Karma has been made +4 versus +3). Other fixes seem to be a memory leak which had been seen in the 4.2 and 4.3 versions.

If you use nagios in Fedora or EPEL, please test and give karma to the builds:


2018-10-16

NOTICE: Major problem with nrpe-3.2.1-6 in EPEL-7

During the summer, I worked on updating nrpe to a newer version and made changes to the systemd startup to match the provided one. Part of this was adding PIDfile so that systemd could send signals and monitor the correct nrpe daemon as there had been bugs where systemctl was unable to restart the daemon.

I tested nrpe-3.2.1-6 on my systems and had no problems, and then put it in epel-testing for a couple of months waiting for some testing. This is where I made a mistake and forgot about it and also I did not thoroughly test nrpe updates from very old versions of nrpe. My tests of updates had been with more recent versions which had a line in the start up for


pid_file = /var/run/nrpe/nrpe.pid

which made sure that my tests worked fine. The daemon started up and it ran without problems, created the file in the correct place etc etc. However if you had a configuration management system with an older template for the file, or had touched your /etc/nagios/nrpe.cfg previously you have problems. yum update will fail to restart the nrpe and other errors will occur.

One fix would be to update the config file to the newer version in the 3.2.x series, but that is not going to work for a lot of people.

I have worked with Andrea Veri to work out a functional change which will allow for systemctl to work properly without needing the pid_file. This is by removing the PIDfile and making the startup a simple versus forking daemon. I have built nrpe-3.2.1-8 and it should show up in epel-testing in the next day or so.

Please, please test this and see if it works. If it really works (aka after 24 hours of an update it is still running, please add karma to it in

https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-7f7330f37a 

Thank you.

2018-07-25

NOTICE: EPEL/Fedora updates to nagios/nagios-plugins/nrpe

I have pushed out multiple updates to various nagios packages. They will be arriving in the various repository (f28, epel-7, epel-6) updates and in rawhide in the next 24 hours or so.


  1. nagios I tried making an updated package to 4.4.1 but our current spec file and patches need a lot more work than I currently have time for. I instead made minor changes to the 4.3.4 to deal with some permission errors and such.
  2. nrpe is mainly small fixes also but closes out some persistent bugzillas. 
  3. nagios-plugins is a fairly major update even though the version number has not changed. Many little fixes have been done in the upstream git tree's maint that needed to be groomed together so I have updated to that and marked the version so you can see which git commit it is. I have also fixed a FTBFS in rawhide and made openssl more granular so it is not getting added to every plugin.

Bodhi Links

These will be updated as I get them.

Nagios:

Nagios-plugins:

NRPE:

2018-07-18

When your software is used way after you EOL it.

One of my first jobs was working on a satellite project called ALEXIS at Los Alamos National Laboratory and had been part of a Congressional plan to explore making space missions faster and cheaper. This meant the project was a mix-mash of whatever computer systems were available at the time. Satellite tracking was planned on I think a Macintosh SE, the main uploads and capture were a combination of off the shelf hardware and a Sparc 10. Other analysis was done on spare Digital and SGI Irix systems. It was here I really learned a lot about system administration as each of those systems had their own 'quirks' and ways of doing things.

I worked on this for about a year as a Graduate Research Assistant, and learned a lot about how many projects in science and industrial controls get 'frozen' in place way longer than anyone writing the software expects. This is because at a certain point the device becomes cheaper to keep running than replace or even updating. So when I was watching this USGS video this morning,



I wasn't surprised to see old DEC computers with CRT screens intermixed with newer computers. The LANDSAT 7 was launched in 1999 when DEC no longer existed, but was designed in the early 1990's. The software for running specific hardware on the system was probably written on whatever system (I am guessing an Alpha but I am not sure). As long as that satellite is running, there will be some sort of team working to make sure that hardware has a giant box of spare parts and trying to make sure the software is still running.

Satellites may seem an extreme case, but the same goes for any large scientific studies and many things in the aerospace industry. You can still find inflight TV systems on major plane lines that will reboot themselves to some Red Hat Linux 7 logo.. an OS that was EOL over a decade ago. There are similar items in industrial controllers for making textiles, plastics, and other items.. the devices are large and expensive to replace so will run whatever software was in them for decades. They will also require software which interfaces with them to be 'locked' in place which can have a pile on effect where you find that you need to have some new computer system be able to run something written in Python 1.5.

I expect that a LOT of systems are currently written to work only with Python 2.7 and will be wanting software for it until the late 2030's. The problem is that very few of them are have plans or ability to pay for that maintenance support. While it is very late in the game, I would say that if you are relying on python for such a project, you need to start budgeting your 2020 and future budgets to take in account of paying some group to support those libraries somehow.

2018-05-16

Blue Sky Discussion: EPEL-next or EPIC

EPIC Planning Document

History / Background

Since 2007, Fedora Extra Packages for Enterprise Linux (EPEL) has been rebuilding Fedora Project Linux packages for Red Hat Enterprise Linux and its clones. Originally the goal was to compile packages that RHEL did not ship but were useful in the running of Fedora Infrastructure and other sites. Packages would be forked from the nearest Fedora release (Fedora 3 for EPEL-4, Fedora 6 for EPEL-5) with little updating or moving of packages in order to give similar lifetimes as the EL packages. Emphasis was made on back-porting fixes versus upgrading, and also not making large feature changes which would cause confusion. If a package could not longer be supported, it would be removed from the repository to eliminate security concerns. At the time RHEL lifetimes were thought to be only 5-6 years so back-porting did not look like a large problem.

As RHEL and its clones became more popular, Red Hat began to extend the lifetime of the Enterprise Linux releases from 6 years to 10 years of "active" support. This made trying to back-port fixes harder and many packages in EPEL would be "aged" out and removed. This in turn caused problems for consumers who had tied kick-starts and other scripts to having access to those packages. Attempts to fix this by pushing for release upgrade policies have run into resistance from packagers who find focusing on the main Fedora releases a full time job already and only build EPEL packages as one-offs. Other attempts to update policies have run into needing major updates and changes to build tools and scripting but no time to do so. Finally, because EPEL has not majorly changed in 10 years, conversations about changing fall into "well EPEL has always done it like this" from consumers, packagers, and engineering at different places.

In order to get around many of these resistance points with changing EPEL, I suggest that we frame the problems around a new project called Extra Packages for Inter Communities. The goal of this project would be to build packages from Fedora Project Linux releases to various Enterprise Linux whether they are Red Hat Enterprise Linux, CentOS, Scientific Linux or Oracle Enterprise Linux.

Problems and Proposals

Composer Limitations:

Problem:
Currently EPEL uses the Fedora build system to compose a release of packages every couple of days. Because each day creates a new compose, the only channels are the various architectures and a testing where future packages can be tested. Updates are not in a separate because EPEL does not track releases.
EPEL packagers currently have to support a package for the 10 year lifetime of an RHEL release. If they have to update a release, all older versions are no longer available. If they no longer want to support a package it is completely removed. While this sounds like it increases security of consumers, Fedora does not remove old packages from older releases.
Proposed Solution
EPIC will match the Enterprise Linux major/minor numbers for releases. This means that a set of packages will be built for say EL5 sub-release 11 (aka 5.11). Those packages would populate for each supported architecture a release, updates and updates-testing directory. This will allow for a set of packages to be composed when the sub-release occurs and then stay until the release is ended.
/pub/epic/releases/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/epic/updates/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/epic/updates/testing/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/epic/development/5/CR/

Once a minor release is done, the old tree will be hard linked to an appropriate archive directory.

/pub/archives/epic/releases/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/archives/epic/updates/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/
/pub/archives/epic/updates/testing/5/5.11/{x86_64,source,i386,aarch64,arm,ppc64}/

A new one will be built and placed in appropriate sub directories. Hard links to the latest will point to the new one, and after some time the old-tree will be removed from the active directory tree.

Channel Limitations:

Problem
EPEL is built against a subset of channels that Red Hat Enterprise Linux has for customers, namely the Server, High Availability, Optional, and some sort of Extras. Effort is made to make sure that EPEL does not replace with newer packages anything in those channels. However this does not extend to packages which are in the Workstation, Desktop, and similar channels. This can cause problems where EPEL’s packages replace something in those channels.
Proposed Solution
EPIC will be built against the latest released CentOS minor release using the channels which are enabled by default in the CentOS-Base.repo. These packages are built from source code that Red Hat delivers via a git mechanism to the CentOS project in order to rebuild them for mass consumption. Packages will not be allowed to replace/update according to the standard RPM Name-Epoch-Version-Release (NEVR) mechanism. This will allow EPIC to actually service more clients

Build System Limitations

Problem
EPEL is built against Red Hat Enterprise Linux. Because these packages are not meant for general consumption, the Fedora Build-system does not import them but builds them similarly to a hidden build-root. This causes multiple problems:
  • If EPEL has a package with the same name, it supersedes the RHEL one even if the NEVR is newer. This means old packages may get built against and constant pruning needs to be done.
  • If the EPEL package has a newer NEVR, it will replace the RHEL one which may not be what the consumer intended. This may break other software requirements.
  • Because parts of the build are hidden the package build may not be as audit-able as some consumers would like.
Proposed Solution
EPIC will import into the build system the CentOS build it is building against. With this the build is not hidden from view. It also makes it easier to put in rules that an EPIC package will never replace/remove a core build package. Audits of how a build is done can be clearly shown.

Greater Frequency Rebasing

Problem
Red Hat Enterprise Linux have been split between competing customer needs. Customers wish to have some packages stay steady for 10 years with only some updates to them, but they have also found that they need rapidly updated software. In order to bridge this, recent RHEL releases have rebased many software packages during a minor release. This has caused problems because EPEL packages were built against older software ABI’s which no longer work with the latest RHEL. This requires the EPEL software to be rebased and rebuilt regularly. Conversely, because of how the Fedora build system sees Red Hat Enterprise Linux packages, it only knows about the latest packages. In the 2-4 weeks between various community rebuilds getting their minor release packages built, EPEL packages may be built against API’s which are not available.

Proposed Solution
The main EPIC releases will be built against specific CentOS releases versus the Continual Release (CR) channel. When the next RHEL minor is announced, the EPIC releng will create new git branch from the current minor version (aka 5.10 → 5.11). Packagers can then make major updates to versions or other needs done. When the CentOS CR is populated with the new rpms, CR will be turned on in koji and packages will be built in the new tree using those packages. After 2 weeks, the EPIC minor release will be frozen and any new packages or fixes will occur in the updates tree.

Guidelines

Packaging

EL-4

This release is no longer supported by CentOS and will not be supported by EPIC.

EL-5

This release is no longer supported by CentOS and will not be supported by EPIC.

EL-6

This release is supported until Nov 30 2020 (2020-11-30). The base packaging rules for any package would be those used by the Fedora Project during its 12 and 13 releases. Where possible, EPIC will make macros to keep packaging more in line with current packaging rules.

EL-7

This release is supported until Jun 30 2024 (2024-06-30). The base packaging rules for any package would be those used by the Fedora Project during its 18 and 19 releases. Because EL7 has seen major updates in certain core software, newer packaging rules from newer releases is possible to follow.

EL-next

Red Hat has not publicly announced what its next release will be, when it will be released, or what its lifetime is. When that occurs, it will be clearer which Fedora release packaging will be based off of.

GIT structure

Currently EPEL uses only 1 branch for every major RHEL release. In order to better match how current RHEL releases contain major differences, EPIC will have a branch for every major.minor release. This is to allow for people who need older versions for their usage to better snapshot and build their own software off of it. There are several naming patterns which need to be researched:

/<package_name>/epic/6/10/
/<package_name>/epic/6/11/
/<package_name>/epic/7/6/
/<package_name>/epic/7/7/
//epic-6/6.10/
/<package_name>/epic-6/6.11/
/<package_name>/epic-7/7.6/
/<package_name>/epic-7/7.7/

/<package_name>/epic-6.10/
/<package_name>/epic-6.11/
/<package_name>/epic-7.6/
/<package_name>/epic-7.7/
Git module patterns will need to match what upstream delivers for any future EL.

Continuous Integration (CI) Gating

EPIC-6

The EL-6 life-cycle is reaching its final sub releases with more focus and growth in EL-7 and the future. Because of this gating will be turned off EPIC-6. Testing of packages can be done at the packagers discretion but is not required.

EPIC-7

The EL-7 life-cycle is midstream with 1-2 more minor releases with major API changes. Due to this, it makes sense to research if gating can be put in place for the next minor release. If the time and energy to retrofit tools to the older EL are possible then it can be turned on.

EPIC-next

Because gating is built into current Fedora releases, there should be no problem with turning it on for a future release. Packages which do not pass testing will be blocked just as they will be in Fedora 29+ releases.

Modules

EPIC-6

Because EL-6’s tooling is locked at this point, it does not make sense to investigate modules.

EPIC-7

Currently EL-7 does not support Fedora modules and would require updates to yum, rpm and other tools in order to do so. If these show up in some form in a future minor release, then trees for modules can be created and builds done.

EPIC-next

The tooling for modules can match how Fedora approaches it. This means that rules for module inclusion will be similar to package inclusion. EPIC-next modules must not replace/conflict with CentOS modules. They may use their own name-space to offer newer versions than what is offered and those modules may be removed in the next minor release if CentOS offers them then.

Build/Update Policy

Major Release

In the past, Red Hat has released a public beta before it finalizes its next major version. If possible, the rebuilders have come out with their versions of this release in order to learn what gotchas they will have when the .0 release occurs. Once the packages for the beta are built, EPIC will make a public call for packages to be released to it. Because packagers may not want to support a beta or they know that there will be other problems, these packages will NOT be auto branched from Fedora.

Minor Release

The current method CentOS uses to build a minor release is to begin rebuilding packages, patching problems and then when ready put those packages in their /cr/ directory. These are then tested for by people while updates are built and ISOs for the final minor release is done. The steps for EPIC release engineering will be the following:
  1. Branch all current packages from X.Y to X.Y+1
  2. Make any Bugzilla updates needed
  3. Rebuild all branched packages against CR
  4. File FTBFS against any packages.
  5. Packagers will announce major updates to mailing list
  6. Packagers will build updates against CR.
  7. 2 weeks in, releng will cull any packages which are still FTBFS
  8. 2 weeks in, releng will compose and lock the X.Y+1 release
  9. symlinks will point to the new minor release.
  10. 4 weeks in, releng will finish archiving off the X.Y release

Between Releases

Updates and new packages between releases will be pushed to the appropriate /updates/X.Y/ tree. Packagers will be encouraged to only make minor non-api breaking updates during this time. Major changes are possible, but need to follow this work flow:
  1. Announce to the EPEL list that a change is required and why
  2. Open a ticket to EPIC steering committee on this change
  3. EPIC steering committee approves/disapproves change
  4. If approved change happens but packages are in updates
  5. If not approved it can be done next minor release.

Build System

Build in Fedora

Currently EPEL is built in Fedora using the Fedora Build system which integrates koji, bodhi, greenwave, other tools together. This could be still used with EPIC.

Build in CentOS

EPIC could be built in the CentOS BuildSystem (CBS) which also uses koji and has some integration to the CentOS Jenkins CI system.

Build in Cloud

Instead of using existing infrastructure, EPIC is built with newly stood up builders in Amazon or similar cloud environments. The reasoning behind this would be to see if other build systems can transition there eventually.

Definitions

Blue Sky Project
A project with a different name to help eliminate preconceptions with the existing project.
Customer
A person who pays for a service either in money, time or goods.
Consumer
Sometimes called a user. A person who is consuming the service without work put into it.
EPEL
Extra Packages for Enterprise Linux. A product name which was to be replaced years ago, but no one came up with a better one.
EPIC
Extra Packages Inter Community.
RHEL
Red Hat Enterprise Linux

Last updated 2018-05-16 19:10:17 EDT This document was imported from an adoc..

2018-05-11

EPEL Outage Report 2018-11-05

Problem Description:

On 2018-05-11 04:00 UTC reports started coming into centos IRC channels about EPEL being corrupted and causing breakages. These were then reported to #fedora-admin and #epel-devel. The problem would show up as something like:

 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

The problem was examined and turned out to be that an NFS problem on the backend systems causing the createrepo_c to create the repositories to create a corrupted SQL file. A program which was to catch this did not work for some reason still being investigated and the corrupted sqllite file was mirrored out.

Admins began filling up the #epel and #centos channel asking why their systems were broken. I would like to thank avij, tmz and others who worked on answering as many of the people as possible. I would also like to thank Kevin Fenzi for figuring out the problem, regenerating the builds and unstopping the NFS blockage.

Solution:

Because of the way mirroring works, this problem may affect clients for hours after the fix has been made on the server. There are three things a client can do:
  1. If you have a dedicated mirror, have the mirror update itself with the upstream mirrors.
  2. On client systems you may need to do a yum clean all in order to remove the bad sql in case yum thinks it is still good to cache from.
  3. You can skip yum on updates with:
    
    yum --disablerepo=epel update

Notes:

This will be filled out later as more information and future steps are taken.
  1. Mirrormanager did not have anything to do with this. It's job is to check that mirrors match the master site and in this case the master site was borked so it happily told people to go to mirrors which matched that.
  2. The problem showed up at 04:00 UTC because most servers are set up using GMT/UTC as their clock. At 04:00 the cron.daily starts up and many sites use a daily yum update which broke and mailed them.

2018-05-09

Looking for old game source Conquer (FOUND)

Early this morning (or late last night.. ) while trying to rescue some computers which decided to die during reboot.. I got hit by a memory of computer labs in the late 1980's when I first went to college. While many of us would play Nethack and hang out on MUD's, the big draw was playing a turn based game called Conquer. The point of the game was that you would be in some sort of fantasy based world and you were the King of that country. Your job was to grow your country be it vampires, orcs, elves, humans and destroy all competition. I believe it was based off the classic Empire games but I am not sure. I expect it was not 'Free or open source' and I know it was full of really bad coding as the main point of the game for the CS people was to find a new overflow to make your country win.

Years later I met someone who had helped write a similar game called Dominion which is also very similar.  The game has been kept up and is under a GPL license which is probably why it is still findable.

And while waiting for ansible to rebuild various virtual machines which had existed on the now kaput servers, I went diving to find source code. My 2 am searches didn't come up with any copies of the Conquer code, but I expect it is because various search engines expect me to want to look for clones of Command and Conquer versus "Conquer".   Looking for fantasy Empire like games brings up tons of clones of Ages of Empire. Even looking for Dominion brings me to many clones of the Dominion board game versus the actual source code. I did find that someone has made updated versions of Trade Wars and Taipan! which made me happy as those were ones I had played a lot when I was in High School in the mid 1980's.  I was even able to find some code for Xtank which was another diversion on the poor Sun Sparc SLC systems which did not have enough RAM or CPU to do the game justice.

I expect that the game source code is probably sitting somewhere easily findable or that the game was called Conquest or something similar and I am not remembering correctly after 30 years. I also expect that the code has no usable lessons in it.. it just seemed important at 4 am and 6 am this morning when I couldn't get back to sleep. Hopefully a blog post will put that little worry to bed. It was like "Who wrote the game?", "Where did it go?", "Why did I always lose?" Ok the last one was easy.. I am not good at strategy and I was playing the wrong game (aka I was trying to play by the inside game rules versus the social "hey look at what we should join and do" and the "oh wow did you see what this does if you send 4 ^k to the game?")

One thing I do remember from these games was that there was no idea about client and server in them. Everything was written into one application (which was were most of the security problems came up). These days, the game would probably be written as a webserver application which would send HTML5 to the clients which the players would manipulate to send back 'moves'. This would then be checked by the server to make sure they were legitimate and confirm when the turn ran. Conflicts like army A moving into army B space would then get dealt with at the turn cycle and the next turn would begin.

[Quickly found by Alan Cox (thanks Alan) at https://ftp.gnome.org/mirror/archive/ftp.sunet.se/pub/usenet/ftp.uu.net/comp.sources.games/. It was originally called Conquest and then renamed Conquer. The game was written by Ed Barlow. Adding in Ed Barlow now gives the source code engines enough to find other versions. Looking at the source code license https://github.com/quixadhal/conquer/blob/master/header.h this is not open source in any way. There was a discussion on Debian Legal about the license being changed to GPL ?!? but without a formal release from the original author.. I am leery of saying it was done.]

2018-05-07

Computers and honesty

In today's Quote Investigator, they investigate a quote from Isaac Asimov which shows up from time to time.

Part of the inhumanity of the computer is that once it is competently programmed and working smoothly—it is completely honest.

I remembered this quote from my time in computer science (CS) courses in the late 1980's as something that non CS people would bring up, and CS people would laugh and laugh about.

Isaac Asimov was a 20th century author who wrote about almost everything at one point or another. While best known as a science fiction writer, he wrote many popular science books which were most of the literature I read from the local South Carolina library when I was in elementary school. Some of his most famous science fiction were around robots who were programmed to work according to three laws with the stories revolving around how the laws broke down in some way or another in what people expected them to do.

Modern computers are incredibly complex systems, and the first thing you learn in any complex analysis is that they are never going to run smoothly enough that complete honesty will happen. The system may think it is being honest, but at some point, somewhere 1+1 =1 happened (or 1+1=0 or 1+1=3) . In fact a large amount of electrical engineering in chip design, BIOS writing, and other low level sorcery is cleaning that up. Maybe the chip redoes the calculation a couple of times, maybe there are just low bits you never use to clean up that electrical signal loss, or some other trick of the trade. However at some point, those incantations will fail and the little bit of Maxwell's demon leaks out somewhere.

Even when you have a smooth enough working system, the fact that the programmer is never competent enough is a completely different problem. We all have our off days, we all don't see all the ways a piece of code might get used where it does mostly what you want.. but not all. [Or we do see it, pop open the cheap Scotch and try to desperately to forget. ]

Even not counting the complex system problems, there are many times when we find either our programming or our computer will prove both Asimov and Charles Babbage wrong:

On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

I expect that the programmers for Alexa, Cortana, Siri, and Gooda (google voice needs a name and I am horrible with names) are having to deal with this daily. A person may ask a question which literally means one thing, but has a different contextual common meaning. Giving the literal answer would not be lying, but the person asking feels the computer did. Giving the contextually correct answer has the computer lying, but the person getting the 'honest' answer they expected. [And somewhere in England, they have hooked up Babbages spinning casket to a electrical motor to produce free electricity.]

In the end, I wonder if all this means we need to re-evaluate the 'humanity' of modern computers (or at least the definition of 'humanity' as posited by Asimov nearly 40 years ago 😉.)

From the quote investigator page:

  1. 1981, Change! Seventy-One Glimpses of the Future by Isaac Asimov, Chapter 6: Who Needs Money?, Start Page 15, Quote Page 17, Houghton Mifflin Company, Boston, Massachusetts. (Verified with scans) 

2018-04-29

Cygwin: FAST_CWD problem

Cygwin is a useful set of tools which make working on Windows systems closer to working on a UNIX/Linux system. These tools used to be bundled with various other software which may run into problems if they do not update to newer versions. What normally is seen is that a person will try to compile a program and get:

find_fast_cwd: WARNING: Couldn't compute FAST_CWD pointer. Please report
this problem to the public mailing list cygwin@cygwin.com

If you are seeing this error, you have a very very old version of cygwin and should contact the software vendor who you got the software from. They need to rebase their version of Cygwin to a more current version in order to get both security updates and other fixes you need for Cygwin to work with the version of Windows you have.

This error has been showing up a lot on the Cygwin mailing lists from software associated with:
  • Some particular Eclipse plugin
  • Some circuit diagram software that wasn't named.
Please see the Cygwin FAQ entry for more information.

2018-04-23

Fedora Infrastructure Meeting Change to Thursdays 1400 UTC

For several years, the Fedora Infrastructure meeting has been held every Thursday at 1800 UTC. This would be lunchtime to morning for the the U.S. members,early evening for our European members, and late night for people in India. [I think it is a different day in China and Japan.]  In order to see if attendance was problematic because of the time, the Fedora Infrastructure leader Kevin Fenzi recently asked for a new meeting time. The results came back in and the meetings will be moved to 1400 UTC on Thursdays. In order to see what the time is in your time zone you can use the date command


[smooge@smoogen-laptop ~]$ date -d "Apr 26 14:00:00 UTC 2018"
Thu Apr 26 10:00:00 EDT 2018
Fedora Infrastructure tries to set its meetings against UTC versus any local daylight savings/unsavings times since many regions do not have them or start/end them at different times.

2018-04-20

Fedora Infrastructure Hackathon (day 1-5)

From 2018-04-09 to 2018-04-13, most of the Fedora Infrastructure team was in Fredericksburg, Virginia working face to face on various issues. I already covered my trip on the 08th to Fredericksburg so this is a followup blog to cover what happened. Each day had a pretty predictable cycle to it starting with waking up around 06:30 and getting a shower and breakfast downstairs. The hotel was near Quantico which is used by various government agencies for training so I got to see a lot of people every morning suiting up. Around 07:30, various coworkers from different time zones would start stumbling in.. some because it was way too late to get up in a day, and others because it was way too early. Everyone would get a cup or two of coffee in them and Paul would show up to herd us towards the cars. [Sometimes it took two or three attempts as someone would straggle away to try and get another 40 winks.] Then we would drive over to the University of Mary Washington extension campus.

I wanted to give an enormous shout-out to the staff there, people checked in on us every day to see if we had any problems, and worked around our weird schedules. They also helped get our firewall items fixed as the campus is fairly locked down for guests but made it so our area had an exception for the week so that ssh would work. 

Once we got situated in the room, we would work through the days problems we would try to tackle. Monday was documentation, Tuesday was reassigning tasks, Wednesday was working through AWX rollouts, Thursday was trying to get bodhi working with openshift. Friday we headed home via our different methods. [I took a train though not this one.. this was the CSX shipping train which came through before ours.]

Most of the work I did during this was working on tasks to get people enabled and working. I helped get Dusty and Sinny into a group which could log into various atomic staging systems to see what logs and builds were doing. I worked with Paul Frields on writing service level expectations that I will be putting into more detail in next weeks blogs. I talked with Brian Stinson and Jim Perrin on CentOS/EPEL build tools and plans.


Finally I worked with Matthew Miller on statistics needs and will be looking to work with CoreOS people someday in the future on how to update how we collect data. As with any face to face meetings, it was mostly about getting personal feedback on what is working and what isn't. I have a better idea on things needed in the future for the Fedora Apprentice group (my blogs for 2 weeks from now), Service Level Expectations, and EPEL (3 to 4 weeks from now).

2018-04-10

Fedora Infrastructure Hackathon (day 0)

The Fedora Infrastructure Hackathon is currently going on in Fredericksburg Virginia outside of Washington, D.C. My first day was to get to the site from North Carolina which I did via the US Amtrak system. I have not been on a US train in many decades, and was not sure what it would be like. First off, getting onto the train was incredibly easy. I bought a ticket, arrived, and got on the train with other people. The conductors and other staff were friendly and helpful in getting me a seat. The crew were also very helpful to an older gentleman in a wheel chair in making sure he got food and drinks.

Next, the train seating was comfortable and I had plenty of leg room. The leg room on each seat was equal to business class seating in most planes. The person in front of me could lean back quite a bit and not interfere with my long legs. While the chair cushion slid out a bit, it was much more comfortable than the first class I had sat on a major airline recently. The ride was fairly comfortable, there was the general back and forth motion, and 'turbulence' when the train had to move off to a side track, but it was in general a lot smoother than driving I-85/I-95.

The views were very good and it as nice to not have to worry about "stayin' alive on  I-95". The train had a food area which served Dunkin Doughnuts coffee, various premade sandwiches and other foodstuffs. The best part was that the other people travelling to the Hackathon could all sit together and work on things for a while. This was useful due to the 2 downsides: the travel time was a bit longer than expected, and the wifi was incredibly laggy. The travel time was due to having to sit on side tracks 2 or 3 times while a CSX line went past. CSX owns the rails and Amtrak uses those rails as a lower priority than shipping traffic. So every now and then, the amtrak will have to sit on a side while the spice flows. The wifi was mostly due to a lot of people using it and a limited bandwidth available through an uplink. I am guessing it was a satellite link with a cell backup so that when something would block either, you had drops. This was ok for writing local documents, but people working on web mail would switch to music for various times.

All said, I enjoyed the trip. It cost about as much as if I had driven a rental and I didn't have to deal with the headaches of it. I also did not get any motion sickness which I do when other people are driving or I am on a bus. The people working the trip were happy and looked like they enjoyed their jobs. Having seen more than enough flight attendants in the last 2 years who look like they would rather eat glass than another flight.. it was not what I was expecting. The crew also enforced courtesy rules so that when a people started talking too loud on the phone, they were asked to move to another section. When people tried putting their luggage in empty handicap seats, it was removed and the people were reprimanded that this was not acceptable.

2018-04-05

Explaining myself with xkcd

April is Autism awareness month, and I thought I would start off with a couple of blogs about what it can be like to have even mild autism. I find xkcd to be a good way to illustrate many different points in conversation, technology, and life. It is where a picture and some words say more than a long essay can.

While I have been told not being good with conversations is a human condition, every day seems to combine the following cartoons together. I either need a checklist to remember what things I need to 'fulfill' in a social conversation or I end up not knowing if the conversation has ended or not.



These are funny to me because I know I end up like this daily, but they are also not funny because it is frustrating to me and everyone around me. It can seem to them a lot like
I also realize I am very lucky. I can carry my post-it notes in my head most of the time, and only need to be reminded how to do things every now and then. Other people have it where it isn't 'funny' and every day is a struggle to keep the world together.
There are other parts of autism which are harder to describe. The inability to close off sounds and scents are harder to explain. Some days it is an easy task, other days it is exactly like:
On days like that I can't sit in even a library without it sounding like a cacophony of voices. The brain tries to parse every conversation which can make a work meeting much harder because the brain is trying to make each word heard part of a coherent conversation. This means that manager talking in the room and the guy outside on the phone to his girl friend get intermingled at times. You wonder why the manager is asking if you have a negligee or some other weird connotation. I end up having to cup my ears to focus on what one conversation at a time is doing or just write down prime numbers on a sheet of paper until the brain stops muddling up.

I know this isn't how it is for every person with autism.. each one of us has it slightly different. I have been incredibly lucky in how my autism has manifested and just want to help people who don't know what it might be like to know.

2018-04-04

EPEL Statistics: NOW WITH EVEN MORE GRAPHS

So a friend of mine said that I needed to look at the graph data a bit more closely. I decided to look at a 7 week average (49 days) and a 29 week average (203 days). What I found interesting was how noisy the data was still at 49 days. Here we see a comparison between different EPEL-6 curves using 7, 49 and 203 day moving averages:
Looking at the curves, while EPEL-6 is still "growing" it seems to have plateau-ed in early to mid 2017. A curve by itself is useless, so here is it in comparison to EPEL-7 where the curve for EPEL-7 seems to increase when EPEL-6 leveled out.
 From this, I expect that EPEL-7 will cross over EPEL-6 in mid to late 2018 though with a 203 day graph that would be hard to see. Finally here is a stacked graph of the releases from EPEL-5 onward using 203 day averages.
Stacking this way shows that when RHEL-5 was EOL in 2017, there is an inflection in EPEL-7 growth.

Note: Originally I was going to compare the powers of 7: 7, 49, 343 but I found the 343 to be so smooth it wasn't clear when a change was occurring. I backed it down to the 203 to get some fluctuations.. and then realized that this was close to the standard 200 day moving average that financial organizations use. However, I am not sure they are the same because financial data is usually in 5 day weeks while I am looking at 7 day weeks.

2018-04-03

EPEL: Security Profiles in EL7 can cause problems with outside repositories

Currently, if you are installing CentOS 7 or Red Hat Enterprise Linux and use a security profile, you will have problems with 3rd party repositories. The errors can seem rather obtuse, and it usually gets diagnosed as "EPEL is down" or some similar problem. The test will look something like:


epel/x86_64/metalink                                                                                                                                  |  17 kB  00:00:00     
https://mirrors.rit.edu/fedora/epel/7/x86_64/repodata/repomd.xml.asc: [Errno 14] HTTPS Error 404 - Not Found
Trying other mirror.
To address this issue please refer to the below wiki article 

https://wiki.centos.org/yum-errors

If above article doesn't help to resolve this issue please use https://bugs.centos.org/.

http://mirror.nodesdirect.com/epel/7/x86_64/repodata/repomd.xml.asc: [Errno 14] HTTP Error 404 - Not Found
Trying other mirror.
http://mirror.us.leaseweb.net/epel/7/x86_64/repodata/repomd.xml.asc: [Errno 14] HTTP Error 404 - Not Found
Trying other mirror.
...
The key part to look at is the request for repomd.xml.asc. That is where yum is asking for the gpg signed repository xml metadata. The Fedora Project does not currently sign its data for various reasons. This means that the yum will not see the epel archive as active and refuse to show packages.

There are two fixes that are currently available:

  1. reposync the repository and sign that repository with keys that you have accepted. This is what most sites which require a security profile are going to need to do. It means that there is a process and control and signoff which would meet that sites security plan.
  2. Turn off the checking of repository signatures for the EPEL repository.
    
    [epel]
    name=Extra Packages for Enterprise Linux 7 - $basearch
    #baseurl=http://download.fedoraproject.org/pub/epel/7/$basearch
    metalink=https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch
    failovermethod=priority
    enabled=1
    gpgcheck=1
    gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
    repo_gpgcheck=0
    
    WARNING: Doing this on many systems without getting an allowed exception will cause audit problems. This is the primary reason that EPEL does not come with this automatically. It MUST be a conscious decision of the installation systems administration to turn it off.
And yes, a third option would be to have the metadata signed. I am not an authority on why the data is not currently done so and do not like arm-chair quarterbacking people who have to deal with the build system. 

2018-04-02

Fedora Infra PSA: I have been marked spamchecked! What do I do?

Background

About 1-2 times a week we get a new user who are told during account creation something like the following:


Your account status have just been set to spamcheck_denied by an admin

or an account's moderator.

If this is not expected, please contact admin@fedoraproject.org and
let them know.


- The Fedora Account System

I realized I haven't written about this since 2016, so it was time to update some of the data on it. We still see a number of accounts created daily which seem only meant to create spam on the wiki. While sometimes the account user will use an obvious name like techsupport or freeantivirus, the majority will have account names will look 'normal'. The spamcheck tool tries to look at other items like where the potential user's ip, email address, and other sets 'stack' up compared to other accounts with similar items in the past. Sadly this means that we will have some amount of 'false positives' even though we try to push towards more 'false negatives'.

What do you do?

If you find yourself getting a 'false positive', please open an email to admin@fedoraproject.org with the following information:

  1. The account name you tried to open. Account admins need this to look in the Fedora Account System to check on the account. 
  2. The email address you used to open the account. Multiple times I have gotten an email from foo@gmail.com but the account which was opened was with foobar@yahoomail.com or some other domain. We normally do not activate this account in this case until we get an email from the one that was listed. 
  3. If possible the ip address you had when you registered the account. This can help us figure out if some other problem is causing issues. We have to blacklist some ips just from the sheer amount of 'spam', and sometimes forget to remove those blocks.
Normally if you do this, you should get a response back in 24-48 hours from someone who is on the admin mailing list. We may need to get one or two more bits of data and then will turn on the account in many cases. [There was one person who was honest enough to say that they just wanted the account to put up HP printer support pages. We said no thank you.]

Do I need a Fedora Account?

If you are wanting to do long term work with the Fedora Project, I would get a user account. If you are only needing to answer a question or help someone else out on a mailing list, you do not need an account to do so.

In many cases, you do not need a Fedora account in order to work with or on Fedoraproject items. If you are trying to answer questions on ask.fedoraproject.org you can login using multiple other authenticators (Google, Facebook, Yahoo, and OpenID). If you are wanting to fix something on the wiki, you will still need to get sponsored in another group. This is because even with the anti-spam measures knowing out 99% of bad accounts, 1% of thousands of accounts still is a lot. If you really want to fix something but don't want to wait for getting into a group.. send the changes you want to an appropriate  mailing list

2018-03-27

EPEL statistics: EL-7 and EL-6


The above is a 7 day moving average graph of checkins to mirror manager looking for EPEL updates. In general it shows that EPEL has been growing steadily since its inception with some some interesting peaks and dips.
In March of 2017, you see a drop off of EL-5 when CentOS end of lifed it from their mirror manager. I don't know if those clients are still just looking for updates and failing or if clusters all over were shut off. You will notice that the EL-7 growth becomes much steeper than it was before, so I am assuming it is people jumping from 5 to 7.  In January of 2018, you see another blip. This shows up in the normal data when the Spectre/Meltdown updates came out. It looks like a lot of hosts may only update every now and then, but they ALL updated on that week. 

The big news is that EL-6 is still growing by a lot more than anyone would expect. By this time in its lifetime, EL-5 was beginning to drop off, but it would seem that people are still deploying EL-6 in large numbers. That has made it a lot harder for EL-7 to 'catch' up, but it looks like it will finally do so in the next month or so. I realize that in todays 'move fast and break things' this seems crazy slow.. you can't code RUST on EL-6 or 7 without some serious hacks.. but this is actually a fast paced change in the 'ops' world. [I had to support Solaris 2.5 (release in 1995) in 2007. The system it was running had another 5 years before it was to be end of lifed. To replace it would have required recertifying a large amount of other systems and processes. ]

2018-03-26

Things to be prepared when asking for support

It is fairly common to see someone drop into an IRC channel, ask a question and then quit 10-20 minutes later if they don't get an answer. [It isn't just IRC, I have seen this on other 'group' chat interfaces.. ] They may randomly jump from channel to channel trying to get anyone to answer their question. It is frustrating for the questioner and for anyone who comes online, writes up an answer and finds the person has left for some other channel right before they hit Enter.

Here are some useful guidelines if you are looking for help:

  1. The older the software, the longer it is going to take to get support. Most online communities go through a standard lifecycle. There is the first year or so where it seems like everyone is online all the time, and there are plenty of people to answer questions. However as time goes on there is a fall off as many people find other interesting things. It can look like the community is comatose, but there are usually people still around but with other jobs and problems taking up most of their time. [In other words, you will be able to find someone quicker to help on a nodejs problem than a Red Hat Linux 7.2 problem.]
  2. The more complicated the software, the more complicated the medium needed to support it. Communication tools like IRC and twitter are good for getting quick answers, but fall over quickly trying to go over a complicated configuration file. Especially when other people are trying to have conversations at the same time 😅. For really complicated problems, email is not going to be enough as more and more people need to answer things.
  3. FLOSS communities help those who help themselves. This means that a person needs to be willing to look up things first, read manuals, and have explored the entire system problem first. When someone has done this and can answer things like: "How much memory does the system have?", "Which version of the software is it?", "What extra components are there?" the problem can be solved quickly. Otherwise it can be 10 emails slowly having the person do all the exploration they should have done first.
  4. You can get online help quick, cheap, or easy.. pick 1. If someone tells you can get two of the three in software, they are usually selling you something. If they say you can get all three, it goes from selling to a con job.  There is usually a quick solution which is neither easy or cheap. There is usually an easy solution which is definitely not cheap or quick in the long run. And finally there is the expensive solution which will neither be quick or easy.
  5. Be prepared for contrary answers. As much as they call it "computer science", most software development and operations is part cargo-cult and part performance art. Some people will say you have to dance naked in front of the database server for the accounts receivable to work, and others will say that a chicken sacrificed is all that is required. We all know that the few times we didn't do it, bad stuff happened.. but we can't be sure which one gets us a paycheck or not. [This of course an exaggeration. Payroll completes because the plane gods flew over and no amount of dancing or chicken sacrifices will fix that.] 
  6. Be honest. Nothing makes a support session go bad faster than when someone feels the other side lied. If you have made configuration changes be open about it. If you don't remember what changes you made or that someone else made, just say that. The mistake many of us make is to say "Nothing" which is nearly always wrong. 

I think that list of metaphors got away from me somewhere around 3.. In any case, if you need to get help in most FLOSS online communities be patient, be prepared, and be kind. [This also goes for the person giving help.]

2018-03-23

1.5 Year Warning: Python2 will be End of Lifed

The end of upstream Python 2.7 support will be January 1, 2020 (2020-01-01) and the Fedora Project is working out what to do with it. As Fedora 29 would be released in 2019-11 and would get 1.5 years of support, the last release which would be considered supportable would be the upcoming release of Fedora 28. This is why the current Python maintainers are looking to orphan python2. They have made a list of the packages that would be affected by this and have started a discussion on the Fedora development lists, but people who only see notes of this from blogs or LWN posts may not have seen it yet. 

Here is my current status on things:
  1. The epylog package will most likely be dead.package in rawhide. The package would need a complete rewrite to work with Python3 and it has not been touched by upstream, nor do I have time to do the port.
  2. I don't own the nagios-plugins-fts or nagios-plugins-lgcdm but if they ended up on my plate I would do the same thing.
  3. python2 is still supported in RHEL-7 and will be until 2024. epylog and any python scripts inside of nagios packages I own will be supported in EL7.
If your project currently requires python2, you need to either help your upstream move to python3, find a company or service that will maintain python2.7 for future years, or be prepared to spend years in the wilderness. From the many years and projects I supported Fortran IV, perl4, Java 1.2, and ADA83 code in the early 2000's... I expect that this will be where many people are going to be. [I will try to cover how to handle that in a future blog.]

2018-03-22

Dealing with network hackers in 1995

Going back to early 1995, I was working for Los Alamos National Labs as a contractor systems administrator. I didn't have a security clearance so could not work 'behind the fence' as they said. Instead, I worked with a large number of similarly uncleared post-docs, graduate students, and college interns in a strip mall converted into offices. The offices ran from nearly one end of the strip mall to the other with a large selection of Unix, PC, and Mac systems spread through the building connected together with 10base2 (or thin-wire). To make things even more fun, most of the systems were disk-less SunOS Sparc ELC/SLC and IPC systems booting off a Sparc 10 which had 64 MB of RAM and I think 2 2 GB disk drives.

The first problem I had to deal with was my most of the systems would crash at different times during the day. I got a Digital network book my Dad had given me, and learned about common problems with networking as this was not something I had dealt with before. I found that the local network was connected to a T1 which ran back to the main campus about 2 miles away. The T1 went to a hub which had 7 thin-wire lines running out of it. That seemed fine until I traced the thin-wire out. I was worried there were bad connectors (there were) or kinks in the line (there were) but the real problem was that out of the 7 thin-wire lines 3 were used.  Most of the systems were on one line. 2 (my desktop and the Sparc 10) were on another one, and the Next and SGI's were on the third. The other lines were just laying under the carpets not used. I met with my new boss Dale, and showed him what I had found. I learned a lot from Dale. He got me a copy of the Unix System Administrators Handbook and told me to start reading it on networks.

So I spent the next week or so learning how to properly crimp and connect thin-wire. I also learned testing signals and ohm resistance as I found connectors which didn't work very well. I moved all the Windows 3.11 and Macintosh 6? systems over to one set of cables and then spread the disk-less stations over to other lines in order to try and diagnose the crashes. At first, I didn't have a crash for 2-3 days and I thought everything was licked. And then the next Monday, things started crashing again.

I started investigating, but we started getting reports in a different building of a similar problem. I found a server with no disk space because of a file in /usr/tmp which seemed to be random data. I believe this was late January 1995, and I had been putting in 60-80 hour weeks trying to get caught up with things. [I was an hourly and this was allowed overtime so I was paying off my college debts pretty quickly.] I hadn't watched TV or read USENET in weeks and had no idea that there was a massive computer hunt at the time. Now when I had worked at university, I would have probably deleted the file and moved on, but because I was supporting scientists I didn't want to delete some research. I contacted Dale, and he helped me work out that the file was being written to by the rshd or telnetd command. He had me check the Ethernet port and sure enough it was in promiscuous mode. Now I had been slow up until this point but I realized this was not a good thing.

Now back in 1995, nothing on these networks was encrypted. You used telnet or rsh to login into systems. You had fingerd running on every box because you used it to see who was logged in. Most networks were on hubbed networks so you could just listen on one system and hear everything in your /24. Dale started contacting the people in security and I started looking at what other boxes were bad. It turned out several had this but one of the ones in my area which had crashed still had the source code for the sniffer daemon on it. I looked at it and found that it linked to the equivalent to tcpdump and filtered for for anything typed after Password: so it caught telnetd, rshd, ftpd and also the su command.

It then kept whatever came afterwords til a NULL and xor that data to the file it opened. I think the xor was against  libc random() stream of data using a starting string as the seed. [The string was then freed I guess to make sure gdb couldn't find it.] To de-crypt the data you just ran the same command with I think a -u and the same password. You could set it up to log even more data at compile time so I was trying that out to see if there was anything I could see about what had been captured on different systems.

At this point I had gotten it running on the system and was looking at what it was capturing when along came the attacker. They logged into the system as a normal user and ran a program ... I had missed which was setuid. They then checked what they had captured and logged out. In doing so I had captured the password they were using. I also had found that they had been running finger against a bunch of systems looking for a particular login. It turned out that a prominent researcher had collaborated with a professor at Los Alamos and the systems with their accounts were being targeted. I turned this all over to my boss who asked me if any of that meant anything to me. It didn't. He told me to go read some particular Usenet groups and that we would have to reinstall everything. [I believe he had been trying to get everything installed, updated and secured for years but it had been a low priority in his management chain.]

For a while it was thought that the attacker may have been on particular person, but these days I doubt it very much. There were a lot of people trying to be 'helpful' at that time by doing various things, and I expect it was one of them. It was an eye opening experience and I wanted to capture it here for the amount of naivete we had back then:

  1. No firewall because it was meant to be an educational section where students and professors should just be able to telnet from their University to whatever computer they needed.
  2. Most of the passwords were stored in a centralized space anyone could look at the hashes for. The Sun yellow page system was useful in mass deployments but had its limits. All of the systems stored their passwords in /etc/password so if you got onto the system at all you could see the hash.
  3. Networks were generally hubbed so that anyone could listen to anyone else locally. This was considered less of a problem when only Unix systems were around because there was a 'separation' of root and user so you could 'control' who could look at the network. The growing number of Mac and PC's which allowed anyone to listen made this the next part hard.
  4. Network communication was mostly in the clear. This was due in part because encryption was expensive on CPUs but it was also that encryption was export controlled so no one wanted to use it in case they got in trouble for it 'leaking'. 
  5. Most systems came out of the box with many network services turned on which never needed to be. Just as these days, your Facebook or LinkedIN account starts off public to the universe, your computer would share if you were logged in, where you had logged in from, what your .plan might have in it and a dozen other things. 
  6. That security problems tend to travel along lines of social trust. The attackers were following along various people who had worked with one researcher and using each set of systems to jump to the next ones. One of the computers hacked was done via a talk session where someone asked someone else if they could help them. I think the victim thought they were helping their adviser with a temporary .rlogin and it escalated from there. 
  7. While Howard Tayler would put this better years later, I found that in security, failure is always going to happen. What is important is how to react to it. I made sure that I could always backup and restore systems en-mass if needed. I also made sure that I have a plan B and C for when the next security problem occurs. 
It would be great if that naivete was unique to that time frame, but we constantly reinvent ways to think we are completely safe.. only to find we covered ourselves in steak-sauce and laid down in the hungry lions pit. 

Addendum: I am recalling things from over 20 years ago and like a fish tale.. some things which make me look better will have grown in that time, and other things which would make me look like an idiot will have faded away.. 

2018-03-21

How to install EPEL Packages

What is EPEL?

EPEL stands for Extra Packages for Enterprise Linux. As I have stated before, the Extra Packages are rebuilds of packages from various Fedora Project Linux releases with an aim to keep EPEL packages slower moving than what is in Fedora. The Enterprise Linux in this statement is aimed at the Red Hat Enterprise Linux and the many rebuilds of it (CentOS, Scientific Linux, Oracle Linux and sometimes Amazon Linux). [As opposed to the enterprise offerings from SuSE or Canonical.] 

How to set up EPEL?

In order to enable the EPEL repositories, you need to do the following things. 
  1. Determine which version of an Enterprise Linux you are using by looking in the /etc/system-release file. On my CentOS-6 system it looks like  
    
    [root@el-6 ~]# cat /etc/system-release
    CentOS release 6.9 (Final)
    
    and on my CentOS 7 system it currently looks like
    
    [root@el-7 ~]# cat /etc/system-release
    CentOS Linux release 7.4.1708 (Core) 
    
  2. If you are running CentOS or Scientific Linux you can now simply install and enable EPEL with a yum command:
    
    [root@el-7 ~]# yum --enablerepo=extras install epel-release
    [root@el-6 ~]# yum --enablerepo=extras install epel-release
    
    This will install the release keys and files with a package which has been signed by your distribution for you to have a chain of trust.
  3. If you are running Red Hat Enterprise Linux, you will need to do some extra steps. We will use EL-7 but they would be similar for EL-6.
    1. Read the page at https://getfedora.org/keys/faq/
    2. Download the RPM and GPG key and confirm they are valid.
      
      # wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
      # wget https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-7
      # rpm --import RPM-GPG-KEY-EPEL-7
      # rpm -K epel-release-latest-7.noarch.rpm
      # rpm -K epel-release-latest-7.noarch.rpm 
      epel-release-latest-7.noarch.rpm: rsa sha1 (md5) pgp md5 OK
      
  4. If you are more of a curl | su - type person then you can just install directly from the internet using the rpm command. I don't recommend this but it gets asked a lot.
  5. You can now list the packages using yum list which will give more packages. These can be installed with the normal install methods. If you did not already import the gpg keys, you will be prompted to trust these keys.
    
    [root@el-7 ~]# yum install pax-utils
    Loaded plugins: auto-update-debuginfo, fastestmirror
    Loading mirror speeds from cached hostfile
     * base: mirror.yellowfiber.net
     * epel: archive.linux.duke.edu
     * epel-testing: packages.oit.ncsu.edu
     * extras: mirror.yellowfiber.net
     * updates: mirror.yellowfiber.net
    Resolving Dependencies
    --> Running transaction check
    ---> Package pax-utils.x86_64 0:1.2.3-1.el7 will be installed
    --> Finished Dependency Resolution
    
    Dependencies Resolved
    
    ===========================================================================
     Package          Arch         Version          Repository             Size
    ===========================================================================
    Installing:
     pax-utils        x86_64       1.2.3-1.el7      epel-testing           96 k
    
    Transaction Summary
    ===========================================================================
    Install  1 Package
    
    Total download size: 96 k
    Installed size: 249 k
    Is this ok [y/d/N]: y
    Downloading packages:
    pax-utils-1.2.3-1.el7.x86_64.rpm                        |  96 kB  00:00:00
    Running transaction check
    Running transaction test
    Transaction test succeeded
    Running transaction
      Installing : pax-utils-1.2.3-1.el7.x86_64                           1/1
      Verifying  : pax-utils-1.2.3-1.el7.x86_64                           1/1 
    
    Installed:
      pax-utils.x86_64 0:1.2.3-1.el7
    
    Complete!
    
    
  6. Sometimes you may need to install a package which has not made it to stable yet. Please see my earlier post on that.
  7. I have been told by various people in IRC that they were told by someone else that current version of Amazon Linux was based off EL-6 so they want to use the EL-6 EPEL for it. We have also had lots of reports of people finding things not working at times. I would recommend that any users of Amazon Linux to use what Amazon recommends first.

Side Note:

I have been told multiple times that the EPEL logo looks like a horse's tail swatting flies. I actually think it looks more like a full sail being blown as our original want was to have a trading ship  or even a container ship (this was years before containers) but none of the attempts looked good. This one was provided by the art team as a combination of an backwards E and a P/L. 

2018-03-20

/usr/bin/whoami

I have not done a general biography in a long time so figured I should put one out as a courtesy for people reading these blogs and emails I send out on various lists:

Who Am I?

My name is Stephen Smoogen, and I have been using computers for a very long time. According to a bug in a PHP website I was on years ago, I am over 400 years old which would mean I was born in Roanoke island with Virginia Dare. I think I slept a lot in the next 360 years as, according to my sister, my parents found me in a 'DO NOT RETURN TO SENDER' box outside their door. How she knew that when she is younger than me, I do not know.. but I have learned not to question her.

My first computer was a Heathkit Microcomputer Learning System ET-3400 that my Dad got at a swap meet when he was in the Navy in the 1970's. I had worked with my Dad on some other systems he fixed for coworkers but it was mostly being bored while watching an oscilloscope and moving probes around boards every now and then. When I wanted to get a computer in the early 1980's, he said I had to show that I could actually program it since an Apple ][ would have been a major investment for the family. I spent the summer learning binary, hexadecimal and doing the simple machine code that the book had in it. I also programmed a neighbour's Apple ][+ with every game I could in the public libraries Creative Computing 101 Basic Games. My mom and dad saved up for an Apple and we got an Apple ][e in 1983 which I then used through high school. The first thing I learned about the Apple ][e was how different it was with the ][+. The older systems came with complete circuit diagrams and chip layouts. It had been the reason my dad wanted to get an Apple because he knew he could fix it if a chip went bad. The ][e did not come with that and boy was Dad furious. "You don't buy a car with the engine welded shut. Don't buy a computer you can't work on." It seemed silly to me at the time, but would be a founding principle for what I do.

During those years, I went with my dad and his coworkers to various computer clubs where I learned how to play hack on a MicroVax running I think Ultrix or BSD. While I was interested in computers, I had decided I was going to university to get a degree in Astrophysics.. and the computers were just a hobby. Stubborn person that I am, I finally got the degree though I kept finding computers to be more enjoyable. I played nethack and learned more about Unix on a Vax 11/750 running BSD 4.1 and became a system administrator of a Prime 300 running a remote telescope project. I moved over to an early version of LynxOS on i386 and helped port various utilities like sendmail over to it for a short time.

After college I still tried to work in Astrophysics by being a satellite operator for an X-ray observation system at Los Alamos. However, I soon ended up administrating various systems to get them ready for an audit, and that turned into a full time job working on a vast set of systems. I got married, and we moved to Illinois where my wife worked on a graduate degree and I worked for a startup called Spyglass. I went to work for them because they had done scientific visualization which Los Alamos used.. but by the time I got there, the company had pivoted to being a browser company with Enhanced Mosaic.

For the next 2 years I learned what it is like to be a small startup trying to grow against Silicon Valley and Seattle. I got to administer even more Unix versions than I had before, and also see how Microsoft was going to take over the desktop. That was because Enhanced Mosaic was at the core of Internet Explorer. At the end of the two years, Spyglass had not gotten bought by Microsoft, and instead laid off the browser people to try and pivot once again as an embedded browser company at a different location. The company was about 15 years too soon for that as the smart items their plans had as the near future didn't start arriving until 2015 or so.

Without a job, I took a chance to work for another startup in North Carolina called Red Hat. At a Linux Conference, I had heard Bob Young give a talk about how you wouldn't buy a car with a welded bonnet and it brought back my dad's grumpiness with Apple decades ago. I realized that my work in closed source software had been one of continual grumpiness because I was welding shut the parts that other people needed open.

Because of that quote, I worked at Red Hat the next four years learning a lot about openness, startups and tech support. I found that the most important classes I had from my college were psychology and not computer science. I also learned that being a "smart mouthed know it all in" doesn't work when there are people who are much smarter and know a lot more. I think by the time I burned out on 4 years of 80 hour weeks, I was a wiser person than when I came.

I went to work elsewhere for the next 8 years, but came back to Red Hat in 2009, and have worked in the Fedora Project as a system administrator since then. I have seen 15 Fedora Linux releases go out the door, and come to really love working on the slowest part of Fedora, EPEL. I have also finally used some of the astrophysics degree as the thermodynamics and statistics have been useful with the graphs that various Fedora Project leaders have used to show how each release and how the community has continually changed.

2018-03-19

Explaining disk speeds with straws

One of the most common user complaints in an Enterprise systems is 'why can't I have more disk space?' The idea is that they look at the costs of disks on Amazon or New Egg and see that they could get an 8 TB hard disk for $260.00 but the storage administrator says it will cost $26,000.00 for the same amount.

Years ago, I once even had someone buy me a disk and have it delivered to my desk to 'fix' the storage problem. They thought they were being funny so I thanked them for the paper weight. I then handed it back to them and then tried to explain to them why 1 drive was not going to help... I found that the developers eyes glistened over as I talked about RPM speeds of drives, cache sizes, amount of commands a ATA read/write use versus SCSI, etc. All of them are important but not terms useful for a person who just wants to never delete an email.

The best analogy I have is that you have a couple of 2 litre bottles of Coca Cola (fill in Pepsi, Fanta or Mr Pibb as needed) and a cocktail straw. You can only fill one Coke bottle with that straw. Sure the bottle is big enough but it takes a long time to move the soda from one to the other. That is what 1 SATA disk drive is like.

The next step is to add more disks and make a RAID array. Now you can get a bunch of empty coke bottles and empty out that one array through the multiple cocktail straws. Things are moving faster but it still takes a long time and you really can't use each of the large bottles as much as you like because emptying them out will be pretty slow via the cocktail straw.

The next sized solution is regular drinking straws with cans. The straws are bigger, but the cans are smaller.. you can fill the cans up or empty them without as much time waiting for a queue. However you need a lot more of them to equal the original bottle you are emptying. This is the SAS solution where the disks are smaller, faster, and much better throughput because of that. It is a tradeoff in that 15k drives use older technologies so store less data. They also have larger caches and smarter os's on the drive to make the straw bigger.

Finally there are the newest solution which would be the garden hose connected to a balloon to a coffee cup. This is the SAS SSD solution. The garden hose allows for a large amount of data to go up and down the pipe, the balloon is how much you can cache in case you are too fast somewhere in writes or reads. The coffee cup is because it is expensive and there isn't a lot of space. You need a LOT of coffee cups compared to soda cans or 2 litre bottles.

Most enterprise storage is some mixture of all of these to match the use case need.

  • SATA raid is useful for backups. You are going to sequentially read/ write large amounts of data to some other place. The straws don't need to be big per drive and you don't worry about how much is backed up. The cost per TB is of course the smallest.
  • SAS raid is useful for mixed user shared storage. The reads and writes to this need a larger straws because programs have different IO patterns. The cost per TB is usually an order or two of magnitude greater depending on other factors like how much redundancy you wanted etc.
  • SSD raid is useful for fast shared storage. It is still more expensive than SAS raid. 
And now to break the analogy completely. 

Software defined storage would be where you are using the cocktail straws with coke bottles but you have spread them around the building. Each time coke gets put on one, a hose spreads that coke around so each block of systems is equivalent. In this case the costs per system have gone down, but there needs to be a larger investment in the networking technology tying  the servers together. [A 1 gbit backbone network is like a cocktail straw between systems, A 10 gbit backbone is like a regular straw and the 40G/100G are the hoses.]


Now my question is .. has anyone done this in real life? It seems crazy enough that someone has done a video.. but my google-fu is not working tonight.

2018-03-16

Ramblings about long ago and far away

My first job outside of college in 1994 was working at Los Alamos National Labs as a Graduate Research Assistant. It was supposed to be a post where I would use my bachelor's in Physics degree for a year until I became a graduate student somewhere. The truth was that I was burnt out of University and had little urge to go back. I instead used my time to learn much more about Unix system administration. It turned out the group I worked on had a mixture of SGI Irix's, Sun Sparcstations, HP, Convex, and I believe AIX. The systems had been run by graduate students for their professors and needed some central management. While I didn't work for the team that was doing that work, I spent more and more time working with them to get that in place. After a year, it was clear I was not going back to Physics, and my old job was ending. So the team I worked on gave me a reference to another place at the Lab where I began work. 

This network had even more Unix systems as they had NeXT cubes, old Sun boxes, Apollo, and some others I am sure to have forgotten. All of which needed a lot of love and care as they had been built for various Phd's and postdocs for various needs and then forgotten. My favorite box was one where the owner required that nearly every file was set 777. I had multiple emails which echo every comment people come up with Selinux in the last decade. If there was some problem on the system it was because it had a permission set.. and until it was shown it didn't work at 777 you could look at it being something else. [The owner was also unbelievably brilliant in other ways.. but hated arbitrary permission models.]

Any case, I got a lot of useful experience on all kinds of Unix systems, user needs, and user personalities. I also got to use Linux Softland Linux Systems (SLS) on a 486 with 4 MB of RAM running the linux kernel 0.99.4? and learn all kinds of things about PC hardware versus 'Real Computers'. The 486 was really an overclocked 386 with some added instructions that had been originally a Cyrix DX33 that had been relabeled with industrial whiteout as a 40MHz. It sort of worked at 40Mhz but was reliable only at 20Mhz. The issues with getting deals from Computer magazines.. sure the guy in the next apartment worked great.. mine was a dud.

I had originally run MCC (Manchester Computer Center Interim Linux) in college but when I moved it was easier to find a box of floppies with SLS so I had installed that on the 486. I would then download software source code from the internet and rebuild it for my own use using all the extra flags I could find in GCC to make my 20Mhz system seem faster. I instead learned that most of the options didn't do anything on i386 Linux at the time and most of my reports about it were probably met by eye-rolls with the people at Cygnus. My supposed goal was to try and set up a MUD so I could code up a text based virtual reality. Or to get a war game called Conquer working on Linux. Or maybe get xTrek working on my system. [I think I mostly was trying to become a game developer by just building stuff versus actually coding stuff. I cave-man debugged a lot of things using stuff I had learned in FORTRAN but it wasn't actually making new things.]

For years, I looked back on that time and thought it was a complete waste of time as I should have been 'coding' something. However I have come to realize I learned a lot about the nitty-gritty of hardware limitations. A 9600 baud Modem is not going to keep up with people on Ethernet playing xTrek. Moving it to a 56k modem later isn't going to keep up with a 56k partial T1. The numbers are the same but they are counting different things. A 5400 RPM IDE hard-drive is never going to be as good as 5400 RPM SCSI disks even if it is larger. 8 MB on a Sparc was enough for a MUD but on a PC it ran into problems because the CPU and MMU were not as fast or 'large'. 

All of this later became useful years later when I worked at Red Hat between 1997 and 2001. The customers at that time were people who had been using 'real Unix' hardware and were at times upset about how Linux didn't act the same way. In most cases it was the limitations of the hardware they had bought to put a system together, and by being able to debug that and recommend replacements, things improved. Being able to compare how a Convex used disks or an SGI graphics to the limitations of the old ISA and related buses helped show that you could redesign a problem to meet the hardware. [In many cases, it was cheaper to use N PC systems to replicate the behaviour of 1 Unix box but the problem needed to be broken in a way that it worked on N systems versus 1 box.] 

So what does this have to do with Linux today? Well mostly reminders to me to be less cranky with people who are 
  1. Having fun breaking things on their computers. People who want to tear apart their OS and rebuild it to something else are going to run into lots of hurdles. Don't tell them it was a stupid thing. The people at Cygnus may have rolled their eyes but they never told me to stop trying something else. Just read the documentation and see that it says 'undefined behavior' in a lot of places.
  2. Working with tiny computers to do stuff that you do on a bigger computer these days. It is very easy to think that because it is 'easier' and currently more maintainable to do a calculation on 1 large Linux box.. that you are wasting time on dozens of raspberry pis to do the same thing. But that is what the mainframers thought of the minicomputers, and the minicomputers thought of the Unix workstations, and the Unix thought of Linux on PC. 
  3. Seeming to spin around, not knowing what they are doing. I spent a decade doing that.. and while I could have been more focused.. I would have missed a lot of things that happened otherwise. Sometimes you need to do that to actually understand who you are.