2018-03-19

Explaining disk speeds with straws

One of the most common user complaints in an Enterprise systems is 'why can't I have more disk space?' The idea is that they look at the costs of disks on Amazon or New Egg and see that they could get an 8 TB hard disk for $260.00 but the storage administrator says it will cost $26,000.00 for the same amount.

Years ago, I once even had someone buy me a disk and have it delivered to my desk to 'fix' the storage problem. They thought they were being funny so I thanked them for the paper weight. I then handed it back to them and then tried to explain to them why 1 drive was not going to help... I found that the developers eyes glistened over as I talked about RPM speeds of drives, cache sizes, amount of commands a ATA read/write use versus SCSI, etc. All of them are important but not terms useful for a person who just wants to never delete an email.

The best analogy I have is that you have a couple of 2 litre bottles of Coca Cola (fill in Pepsi, Fanta or Mr Pibb as needed) and a cocktail straw. You can only fill one Coke bottle with that straw. Sure the bottle is big enough but it takes a long time to move the soda from one to the other. That is what 1 SATA disk drive is like.

The next step is to add more disks and make a RAID array. Now you can get a bunch of empty coke bottles and empty out that one array through the multiple cocktail straws. Things are moving faster but it still takes a long time and you really can't use each of the large bottles as much as you like because emptying them out will be pretty slow via the cocktail straw.

The next sized solution is regular drinking straws with cans. The straws are bigger, but the cans are smaller.. you can fill the cans up or empty them without as much time waiting for a queue. However you need a lot more of them to equal the original bottle you are emptying. This is the SAS solution where the disks are smaller, faster, and much better throughput because of that. It is a tradeoff in that 15k drives use older technologies so store less data. They also have larger caches and smarter os's on the drive to make the straw bigger.

Finally there are the newest solution which would be the garden hose connected to a balloon to a coffee cup. This is the SAS SSD solution. The garden hose allows for a large amount of data to go up and down the pipe, the balloon is how much you can cache in case you are too fast somewhere in writes or reads. The coffee cup is because it is expensive and there isn't a lot of space. You need a LOT of coffee cups compared to soda cans or 2 litre bottles.

Most enterprise storage is some mixture of all of these to match the use case need.

  • SATA raid is useful for backups. You are going to sequentially read/ write large amounts of data to some other place. The straws don't need to be big per drive and you don't worry about how much is backed up. The cost per TB is of course the smallest.
  • SAS raid is useful for mixed user shared storage. The reads and writes to this need a larger straws because programs have different IO patterns. The cost per TB is usually an order or two of magnitude greater depending on other factors like how much redundancy you wanted etc.
  • SSD raid is useful for fast shared storage. It is still more expensive than SAS raid. 
And now to break the analogy completely. 

Software defined storage would be where you are using the cocktail straws with coke bottles but you have spread them around the building. Each time coke gets put on one, a hose spreads that coke around so each block of systems is equivalent. In this case the costs per system have gone down, but there needs to be a larger investment in the networking technology tying  the servers together. [A 1 gbit backbone network is like a cocktail straw between systems, A 10 gbit backbone is like a regular straw and the 40G/100G are the hoses.]


Now my question is .. has anyone done this in real life? It seems crazy enough that someone has done a video.. but my google-fu is not working tonight.

No comments: