My first job in technology was as a Unix systems administrator. This was back in ye-olden-days when Linux was relatively new. A lot of my job was to migrate systems off of Sun and Digital hardware and on to commodity x86 hardware running Linux.
At the time, this was a straightforward case of getting more bang for our hardware buck.
But, though we enjoyed getting more and better hardware given the constraints of our group's budget, we didn't at first understand that in moving to Linux (and, even more so once we adopted virtual machines) we could change the way we thought about our servers in a fundamental way.
When we deployed a server back in the dark ages, we thought a lot about it. We considered what kind of hardware should go in the system, what kind of network link it should have, what operating system and version, the applications to provision, and on and on and on.
There used to be a well established set of conventions for giving servers individual names. Folks used to give their servers the names of Greek gods, planets or characters in a favorite novel. The email server might be Athena, and the database server Zeus, etc.
The point is that each of our servers were unique and they had unique names to remember them by.
Each of these servers were unique, with unique software stacks and unique capabilities. As system administrators, our task was to set these up and then watch over them carefully so that they would remain available to our users whenever they needed them.
And they needed to be carefully watched, because each of them had unique data that was important to our users. While we did keep backups of that data, actually restoring a system from those backups was, even in the best of cases, a lengthy process. Moreover, since each server used a unique hardware configuration, there was no absolute guarantee that we would be able to restore that backed up data on to an appropriate new server if we needed to.
Because of this, when a system went down unexpectedly, even when things went right our users got upset, which they had every right to do. And when our users got upset, it seriously interfered with our ongoing quest to find funny things on the Internet.
So, we monitored these servers that we named closely. We wanted to be sure they were happy, healthy and provided with everything that a server might need.
In short, we treated these servers like pets.
At the time, treating servers like pets was simply The Way Things Are Done. But it was also at least a little bit crazy. It was a legacy from the time when buying a server meant buying a mainframe, or at least a very expensive system from a vendor like Sun or Digital. If you spent your entire hardware budget for the quarter, or maybe even the year on a given system, then of course you're going to treat it with kid gloves.
But, once Linux came along there was much less reason to do that. Now you could provision a system for a few thousand dollars. And, because all of these Linux systems used commodity hardware, you could standardize the hardware you used. Furthermore, once virtual machines came on to the scene, you didn't need to get any new hardware in order to provision a new server at all. Progress!
Once enough people in technology wrapped our heads around this new paradigm, the race was on to turn servers into standardized parts that could be spun up when needed, aggregated into resource pools and put out of their misery painlessly if something went wrong.
With servers, this shift from treating them like pets to treating them like cattle had to happen in order for applications to scale. When I got started in technology, a web site ran on a single web server. Now, websites run across hundreds or even thousands of servers.
For the Internet of Things, we have to treat our connected devices like cattle instead of like pets right from the start. There's no other way to go if we're going to run applications across tens of thousands or millions of connected devices distributed across the world. To run applications at IoT scale, we're going to need to treat our devices like free range cattle.
It's worth noting briefly here what we mean by an IoT application. We don't mean a piece of software that runs on a connected device. There's nothing wrong with that, but there's nothing special about it either. The IoT is about running software across many devices and the cloud.
The best IoT applications will run at IoT scale. Think about an application that monitors the operations of a factory or a set of factories. It might monitor inventory levels, manufacturing line output, any maintenance that is required, etc. Then think, where does this application run? It doesn't run on a server, because the server isn't connected to any of the factory systems, at least not directly. It doesn't run on any particular connected device either. Rather, it runs across the set of connected devices and servers that it needs.
Treating our devices like free range cattle in order to run at IoT scale means satisfying the following requirements:
- Devices need to be able to securely associate with our application from anywhere in the world (at the manufacturer, when the end user takes it out of the box, or anywhere in between)
- Devices need to be able to securely update anywhere in the world
- Applications need to maintain a secure and dynamic list of participating devices and cloud resources along with their versions and associated software
- The application needs to run no matter how many device failures (permanent or intermittent) there might be
- Replacement devices need to take the place of offline devices seamlessly
The first three requirements are challenging enough, but the fourth and fifth requirements are where things can really get tricky.
Ultimately, treating connected devices like cattle and not like pets isn't about the amount of special handling that each device requires (though you certainly want to minimize that). Rather, it's about making sure that failure is always an option for any of these devices. And, that if any of them fail, the application carries on without a hitch.
Those of you that are old enough will remember the scene in Jurassic Park when, in order to try and defend themselves against the dinosaurs run amok, our heroes decide to reboot all of the park's systems. At that moment, Samuel L. Jackson got to utter the immortal line:
Because nobody knew what would happen when they rebooted the system. That's a quintessential sign of systems as pets rather than systems as cattle. At IoT scale, devices will turn on or off, get dropped, lose Internet access, catch on fire (buildings do burn down from time to time) and otherwise go off the reservation as free range cattle tend to do occasionally. And when they do, your application infrastructure needs to treat it as an absolute non-event, and keep right on keeping on.
At Apiotics, we've spent a lot of time thinking about how to corral your herd of IoT devices. Check out how it works at https://portal.apiotics.com/. Keep your focus on building amazing IoT apps, let us take care of keeping the livestock in line.