Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.
Before joining Blameless, Kurt was a Sr. Staff SRE at LinkedIn, implementing SLOs (reliability metrics) at scale across the board for thousands of independently deployable services. Kurt is a member of the USENIX Board of Directors and part of the steering committee for the world-wide SREcon conferences.
In our seventh episode, Kurt chats with Tony Hansmann, Former Global CTO at Pivotal Software, Inc., about the joys and pains of being a consultant, how teams view digital transformation, how Tony is working towards killing ops, and more.
Kurt Andersen: Hello. I'm Kurt Andersen and welcome back to Resilience in Action. Today we're talking with Tony Hansmann, who started his computer career in operations consulting. He then spent some time with Cloud Foundry doing extreme programming to build a platform that's a service solution. He then went on to build and run the operations team for Pivotal Web Solutions. He's now found his back to consulting. So, by way of introduction, could you tell our listeners a bit about the joys and pains of being a consultant?
Tony Hansmann: In operations consulting, you get called for when there's a problem. I got to my, say, third contract in the Bay Area and I realized, “Well, this is a tire fire, too.” I don't know why I wasn't smart enough to figure it out, but you only got hired when there was a real problem. I went to a company that only made money by shipping a report, and they had not been able to run that report for like five weeks. They were supposed to ship it weekly.
I will tell you, one of the secrets of an operations consultant is to go through the cron tabs because almost always, many people misunderstand how cron works. They think their application will always run in one minute, so they run it five stars across, and then they build up. You can just go clean the process table and solve that problem.
I learned all of the things not to do. Between being a W-2 employee and a consultant, I probably worked at 25+ places, and I saw everything. I worked in UNIX and Linux, so I saw all the things not to do with those tools.
Kurt Andersen: So, you started out doing the fix-the-report-that-hasn't-run-in-five-weeks stuff. And then it seems like you went to Pivotal and started coaching teams for operational transformation, digital transformation, presumably for some of the problems you resolved? Or were you still encountering the same kinds of things?
Tony Hansmann: Well, boy, you'll get me on a topic there. Cloud Foundry, as a piece of technology, was (in the underlying orchestration system called BOSH) the answer to every operational prayer I had ever had. I think every operations person is familiar with that: a problem comes up at 4:30 PM. You're like, "Oh, that's pretty easy," and you go to fix it. And then, all of a sudden it's 7:00 PM and you're like, "Well, this is not as easy as I thought." And then it's 9:00 PM and you're like, "I think I've already tried what I am just about to try." Well, your mind gets lost in that circle of troubleshooting.
Cloud Foundry fundamentally changes the operations parameters of everything. The plain example is that Pivotal is the home of Spring.io. At this time, it got something like six million downloads a month. Not a complex site, but an extremely highly-trafficked site. And, in 2014, when I was running Pivotal Web Services, they switched onto AWS from whatever their hosting site was. They sent a note. We had to do a couple DNS changes. And then, in 2016, when I moved to the field, I was sitting in the room with the community manager of Spring. He's like, "Oh yeah, Spring gets eight million downloads a month," and blah-blah-blah. From 2014, when they moved on, after I had left operations, they never called me.
Kurt Andersen: Nice.
Tony Hansmann: Sites run on Cloud Foundry are so rock-solid reliable that you get essentially three nines out of the box. And you can tune that up even more if you'd like. That's part of the reason why I'm out here focusing on digital transformation, which is that you can take or leave Cloud Foundry as a technology. I'm romantically involved with it, I love it. But the technology stacks these days can deliver three nines to you out of the box with very, very low backend work. We're still trying to answer the question about how it changed, but developers can deliver a lot more features, and operators can run orders of magnitude more effectively.
Kurt Andersen: Yeah, the table stakes have changed, essentially. Your entry point is very different.
Tony Hansmann: Yeah, and thank goodness, because old-school operations were a mess, and filled with toil. I just say I spent a couple decades at the bottom of the salt mine, and my goal is that no one else has to spend decades at the salt mine. So, let's learn the basic techniques for new stuff.
Kurt Andersen: Digital transformation is a lovely buzz phrase. Everybody wants to do it, right? So, how do you view what it is that everyone thinks they want to do? What is it they really want to do, and what do they think they're getting themselves into?
Tony Hansmann: So, the type of digital transformation where you go and hire a consultancy to produce you a three million dollar report is not what I'm talking about. I really believe in an employee-led digital transformation. I want to hear senior executive teams say, "Well, there's a rumor out there that you could push way more features and have way less backend problems. Is that true?" When you start looking into it, it becomes very hard to deny.
Now, you could say, "For my business, we're FDA regulated, and we can only get one feature through every 24 months." Great, don't invest in that. But if you're at all in a competitive landscape, you need to actually look at what's on the ground. The way I describe it right now is if you're a senior executive, and imagine that it's 1910 and you went and toured Henry Ford's plant, would you be like, "Oh, good for these guys," and go back to your own manufacturing that you're doing all by hand and just be like, "This is okay"?
We're way past that point of having extremely effective systems. If you don't believe the hyperscalers are so radically better at this than you, you just don't understand what's going on. What people want in digital transformation is to lean up their production process for software. They're going to take paper processes and move them to digital processes. The first pass looks like that, and I think that's where people just get stuck. They get a report, they get a book, and they never put it in action. My version of transformation is, who are the five people at your company who really care about fast production cycles for software?
You get them together in a room, and you do a few brown bags with them, and you teach them a few very basic things. You teach them the lean product management approach, a la Eric Ries. Then you give them the book XP Explained, which is the plain blueprint for going really fast in software. Really fast, and safely.
I think what people really want is a much better production and support process, that's what they think they're going to get when they transform. They get distracted by the expensive report, and then they look in the face of it and they're like, "Well, who's going to do all this work? And how much I'm going to have to pay consultants to actually affect this?" I think you're lost at that point. I think you need to look internally.
Kurt Andersen: I like the ground up approach. Have you read Jonathan Smart's book, Better Value: Sooner, Safer, Happier?
Tony Hansmann: I have not read the book, but I've seen him speak a few times, and I've had a chance to talk to him once, and I love his approach. His ways of working with balanced teams, that is the thing.
Kurt Andersen: His principle of invite over inflict sounds a lot like your approach of finding the people who care and working from there.
Tony Hansmann: Pivotal was a singular workplace. I still don't believe there's anything like it on the planet at the scale that it ran. But I sat in the audience at a DevOps conference, in 2017 when Jonathan Smart spoke, and it just resonated like crazy. And it's exactly the same thing: put everyone who's going to have some stake in the end on the team at the beginning. But we'll get ahead of ourselves with that if we can't explain how to even bootstrap one team. And so, transformation is like, pick five people who care, give them executive support and some runway, and pick a friendly team. Find a friendly team who also wants to try some of these new methods for six straight weeks where you're coaching them, and then ask them at the end, "Was that worth it? What would you recommend to an adjacent team?"
That's the transformation cycle The books are all out there, the software stacks are all out there. There is not a lot to discover. There is a lot to work out in your own culture, for sure. But there's just not a lot to discover about this process anymore.
Kurt Andersen: I'm working on a talk along the lines of knowing the way is not the same as walking the way, the famous line from The Matrix. And that's what you just described.
Tony Hansmann: It's like a musical instrument. You can read all the books about piano you want, and you can listen to people play piano, but it has no effect on your ability to play piano. Sitting at the piano and working it out, that's what has the effect. And for whatever reason, most organizations don't have a model for creating capacity in their own space. They're like, "No, we're running it." And when we talked before, you spoke about the crisis model of transformation. Anytime you're running at 100%, you're in a crisis. And many organizations are running at 110 plus percent for years.
I wish we could say it's not sustainable, because people will pay you for decade over decade to press the red button day in and day out, whether it makes sense or not. That's actually one of the major problems in transformation: the economics have not crossed at that point. Computers are still ridiculously effective at making money, let's not kid ourselves.
The problem is, is the hyperscalers have changed where the curve is now. You can really effectively run your own data centers that are radically more effective than your current data centers. I'm not conjecturing here.
At Pivotal, they ran a consultancy like this for 25 years before they even got Cloud Foundry as a product. They had worked out how to deliver software at incredible velocity. High quality, exactly what the customer wants, because they're a consultancy. They wrote that piece of software and they had it off. And when Cloud Foundry came into Pivotal, we just used their models for product, and it was unbelievable. It was a strict nine to six shop. You didn't even take a computer home unless you were in the production path. The model was truly stunning. And then, when you start living and managing it, it's simply unbelievable.
That's the genesis of my desire to teach everyone this model, because it works, everyone likes it better. It's the only thing in business I've ever seen that's win-win-win. Better for the company, better for you, better for the product.
Kurt Andersen: You talk about your passion for teaching this model, but one of the other things I've found on your LinkedIn profile is you have “killing ops” listed as a long term objective. What do you mean by that?
Tony Hansmann: As an operations consultant, you see the same mistakes over and over again. So, some time in 2008 I realized everything in operations is done as a bespoke process. Everyone does something a little bit different. In 2010, you couldn't even agree to get most places to PXE boot their farm. They're still hand-loading their farm, and it's just completely crazy. What I figured out is that accounting doesn't run any company. Accounting is a function in a company. It has a well-established set of practices, guidelines, and benchmarks. And in that case there's tax rules around it.
But when I made that long term goal of killing ops as a category, it was specifically that. Ops needs to become significantly like accounting. It needs to have very understandable curves, it needs to have a set of practices that make sense, where you can say, "Look, you can vary practice, but if you need to supply DNS, don't choose one of the 50 options you could choose. Choose one of these top three options, and an external vendor is the right thing for that.”
When I made that commitment, it was essentially that operations is important, it's meaningful, it's software supported, and there are people who do it. It should not run the organization, though, because frequently operations becomes a psychological bog, because operations people are generally not included in planning on the long cycle, and so.
Then, you've got the absolutely perverse thing of, if operations’ schedule is forced to slip by, then the development schedule nominally slips, and then operations is brought in when you can see the finish line, and that's when they learn what they're going to have to run. It tends to be a disaster. Defensive systems and bogs all around you.
Kurt Andersen: And it builds a silo mentality, too. Or it's the manifestation of that.
Tony Hansmann: It becomes a psychological bog. People are like, "Well, we want to do this," and operations is like, "No." Then people are like, "Well, what do you mean no?"
Kurt Andersen: So, how well do you think DevOps has advanced the practice toward getting rid of those bogs?
Tony Hansmann: That's a very hard question to answer, because I am one of those people who is a bit suspicious of DevOps. When I look at it, I see it as a few of the XP practices, and they're not well integrated. I am not very interested in the relationship between dev and ops. I am extraordinarily interested in the relationship between the product organization and everything else. And the reason I'm interested in product-
Dev and ops could be a really well-oiled machine. They could be working well together. But, if product is not involved in that well-oiled machine, then you're still going to have a lot of friction. I want to start from the customer's point of view. In the lean sense, I love a customer persona, and I love delivering and iterating with a customer persona. And so, I would like development and operations to line up along the same customer persona and value line.
The DevOps tools are tools, but I don't think they address the Henry Ford production assembly line problem.
Kurt Andersen: So, are you a fan of value stream mapping? And that kind of approach?
Tony Hansmann: Yeah, now we're starting to get to where we need to go, which is, there is a business, the business has a purpose, there are tools to reach that business purpose, and some tools look like this, and some tools look like that. I was for sure an operations person who built psychological bogs. I just made sure that no one could move my cheese, basically. And so, I would be sometimes aggressively ambivalent to business purpose. I mean, that's not just unique to me, that is the old-bastard-operator-from-hell archetype.
I ran like that, because, honestly, that seemed to be the only way to survive in operations for a long time. And then I showed up at Pivotal, and there was a completely different customer-focused, product-driven model, where it turns out that operations and dev don't have any natural antagonism at all, as long as what we're focused on is clean delivery for the customer, and as long as we're willing to talk about things.
My critique of the DevOps movement would be that, if you're working at a medium and small size digital-only company (that is you only make your money by offering an API or something), then DevOps has good models, and the models tend to fit together pretty well with the product model. It just has a natural affinity. At a Fortune 1000, that is a real manufacturing, or another goods kind of thing, or a financial services company, DevOps can not carry the load of that kind of problem. I believe that kind of problem has to be solved much, much sooner in the product life cycle.
Transformation, to me, means what was impossible, or incredibly expensive, becomes possible, and sometimes just goes away and is replaced by nothing. We're actually after using the lean process model of validated hypothesis and validated learning to cut toil out of our process. The thing that drives me absolutely mad is the lack of understanding about what computers are for.
Kurt Andersen: That sounds unusual, but tell me, what do you see people thinking computers are for?
Tony Hansmann: What I see them thinking they're for is repeating some work, or doing some mundane thing that a human won't do. Most people believe that. But they don't live like they believe it. So, they don't say, "I showed up at work Monday, and there's a bunch of stuff that's toil, and oh, I guess I'll just do that toil."
No, no. You have this physical thing that will pattern the universe four billion times a second, and it will store the output from that. It is literally a machine. It is a machine instantiated in electronics, but still a machine. It just makes a pattern and runs a photon, runs an electron across. Maybe you've experienced this. Have you ever used Expect to solve a problem?
Kurt Andersen: Yes.
Tony Hansmann: So, how many people do you know that actually knew that Expect was a tool you could reach for when you were dealing with a terminal-based process? You're like, "We need to automate this process," and they're like, "Oh, it's terminal based. We can't automate it." I don't know what you're talking about, you can totally automate terminal-based processes. It's never pretty, but a machine will absolutely do it. And it tends to be brittle, but a machine will absolutely take care of that. When I say people don't understand what computers are for, that's what I mean.
They don't take whatever problem's in front of them and work like mad to reduce that to a turing-complete problem that they can just codify and turn their back on. To me, for people who are very deep in the operational space, this is what you see at the hyperscalers. The folks that built the hyperscalers had no illusion about what computers are for. That's what happens in a lot of organizations: the organization accepts that there is some level of computer-based problem that can't be solved by computers. Or the proper program. My stock phrase for that is computers will do exactly what you tell them to do. If they're not doing what you want them to do, you just haven't told them the right thing.
Kurt Andersen: This makes for an interesting segue into platforms as a service. One of the common complaints is, “Hey, we have this platform, or we have this commercial, off-the-shelf software, and it just doesn't do what we need it to do.” How do you address that when you're dealing with folks?
Tony Hansmann: We're getting to the point where I want to start asking questions like “Okay, we're talking about a commercial, off-the-shelf piece of software. What's the metadata about it? What's its uptime, what's its transaction cost, what's all of that stuff?” We have to start there, because we say, "This piece of software doesn't do what we want to do."
If it's toilsome, then the first thing I want to do is say, "I do not want to go and write my own platform as a service." I want to say, "What is the cost of the fact that that is toilsome?" The deal is, you should be looking at this toilsome piece of software and asking, is it bespoke? Where is it on the line from bespoke? We have to write it internally, it's special for us, all the way to it's an absolute utility.
I don't know how many APIs Amazon drops a month, but it feels like they drop half a dozen or a dozen APIs a month. That right there should tell you someone is good at dropping APIs. So, first of all, if you have a off-the-shelf piece of software you should enter into a cycle with your vendor, and be really specific about what you want, and be very flexible in working with them to drive what you want. If you can't enter into that cycle, it's an old piece of software, then you're going to see people put fortifications and structures around it.
But at some point, the organization that needs a new machine is already paying for it. So, when you have that toilsome piece of software, you're already paying for the fact that you don't have what you need. And now you need the actual dollars associated with that. What'll happen in a lot of places is they'll do that dollar analysis. Then they'll say, "We can solve this problem by writing our own thing.” But they do not do the dollar analysis on writing their own thing.
Kurt Andersen: Please expand upon that, because that seems to be one of the areas you're passionate about as well.
Tony Hansmann: I'm a fan of Rick Clark, and Rick Clark is known in the open source industry. He led Ubuntu server, he's done like a 500 petabyte install of SEF. He is the real deal, and he says it just like this. If you take two pieces of open source software and you put them together, you make a new product, and you now have to support that product. My experience is, when you go into places that have written their own platform, it's about three years old, because they had to replace it three years ago. Internal platforms have somewhere like a three to five year lifecycle.
Now, if you're at LinkedIn, or somewhere big where that platform is the only thing, then there's going to be more intention and focus on it. If you're at a Fortune 1000 that does something else, what happens is some smart person comes up with “I think we can write a platform.” And they implement part of it, it's wildly successful on rollout, then it becomes hard because people are now using it, and the demands start coming. And eight months later, that person is really tired of the support load of this platform, and they exit because they now have “I wrote a platform,” on their resume.
They leave the organization after about a year and a half with that platform in the middle, and then anyone else who's smart enough to go with that person goes with them. And they leave the organization with that platform. It immediately stops getting support, and security support goes through the floor. And then that thing grinds down, there's a rebellion in operation, there's a rebellion in development against using it, and then they start going shadow IT. Like, “Hey, Amazon can take care of all this stuff.”
This is an ugly, repeatable, totally predictable cycle. And in 2021, I'm encouraging organizations who are on that treadmill to seriously look in the mirror. It's 2021, the publicly available stacks have been around to take care of many of these problems for 5+ years. And it's time to start really looking at them. There's a maturity requirement from the business point of view. Everyone loves the example of whoever Facebook bought for like five billion dollars, which had 12 people.
Kurt Andersen: WhatsApp?
Tony Hansmann: WhatsApp, yeah. Any one of these ultra high-dollar acquisitions that was put together from mostly standard parts, and glued together with just what they needed, and then sold off as a proven concept, for billions and billions of dollars. If you are a senior executive at a Fortune 1000, you must be paying attention to this, and you must ask yourself honestly, "What is the impact of the Ford Motor Company going to a fully automated line against me?"
And we go all the way back to the top. One executive, who will bet their badge on moving this direction, can find five or six people in the organization to train. Don't go hire a consultancy to issue a report, just train a few people, pick a path to production, do the value stream mapping on that path to production, find friendly teams in that path to production, and go work with them on better processes (which typically means lean product management and getting to CI). I will say this, if you're not willing to do continuous integration at the 95th percentile in your organization, entropy will eventually chew you apart.
So what do you do if you want to transform? Train a few people in lean and have an executive who's willing to bet their badge on it, because if you don't have air cover it will go nowhere. I call it the grass plugs model. You just go plant a grass plug, one team. When you're done with their six-week cycle, you have a lunch-and-learn, and you invite all the teams, but specifically the adjacent teams and their path to production, and you say, "Here's what we thought we were going to do, it's what we read in books and heard on podcasts, and stuff like that. Here's what we actually learned when we did it, and here's what we'd recommend to you if you're going to adopt in our organization."
That's the grass plugs model. And then that senior executive, and the five or six people who are driving it through the organization go and help the next team that raises their hand. There's this great saying: gratitude turns to hunger. You eliminate 30% of the toil from one team, you get them from 100% down to 70%, and then you're like, "So, with this new 30% capacity we've got, we're going to use this anti-toil model everywhere." And they go crazy eliminating their own toil. And it dies when they lack executive and product leadership. I say, operations people are made of scars.
Kurt Andersen: Sad but true.
Tony Hansmann: I find that operations people are incredibly serious about uptime. They build fortresses against their uptime numbers. Now, I believe that you can get better uptime numbers without a fortress. But, you go to a serious operations person and you say, "I believe we can get you this stuff back," and they're like, "We can't do that because this VP won't allow it, the product people never listen when we say this, and forget about the developers doing anything like continuous integration." That's the world of transformation in 2021.
Kurt Andersen: It kind of sounds like the same as what you were probably encountering in 2008.
Tony Hansmann: And that's why I'm so passionate about it, is because in 2008 there wasn't a good answer. There really wasn't. But in 2021 there are radically better answers.
Kurt Andersen: That's fair, but people still have those psychological blocks: I'm not going to do this, or not do that.
Tony Hansmann: That's why a senior leader is so critical. You say, “Okay, I talked to a honestly sincere ops person who really wants to solve problems and keep things going, and they will not move because the organization doesn't have a model for supporting them.” And so, back to you as a senior leader betting your badge, are you going to figure out a way to support them? Or are you just going to allow what's essentially a gridlocked position to dominate the timeline of your delivery? It makes no sense.
Kurt Andersen: That sounds so much like a book I'm reading right now, Turn the Ship Around by David Marquet.
Tony Hansmann: I started that one. I do love the I-intend-to model. I intend to do this, I intend to do that. I think if you just take that part of the book and start with it, train the folks you work with to say, "I intend to solve this problem this way."
Kurt Andersen: And the leader-leader. The idea that you don't empower the followers, you want to build leaders throughout the organization.
Tony Hansmann: The other big lifetime commitment I have is Collins'-style leadership. Always be creating leaders, always, always, always. This transformation model actually matches what we're seeing in the culture, too. I believe people would really like a lot more control of their time, and control of their workflow, and things like that. And here in 2021, I feel like we have reached a point where the edge is smarter than the core.
Kurt Andersen: It's sort of one of these things similar to the structure of scientific revolutions where you have to wait for the old guard to die off, or retire, or something. And then people who have seen a better way can come in and become the senior executives to bet their badges.
Tony Hansmann: We worked with a company called Intrado. It was called West Corporation at that time. And their senior executive sat down with me and described what they had to do to be the person who bet their badge on all of this. And it's on YouTube. It's certainly not polished, but you can go and listen to senior executives in the CTO operations development software architecture roles describe what they had to do, and how they had to flex in order to get a deep digital transformation.
Kurt Andersen: There's some good stories from the DevOps Enterprise Summit as well. Similar ideas that Jane has brought people in. It has been my experience that most organizational changes take longer than the enthusiasts would like. And the grass plugs model that you described is like you're growing something. It's not an instant, add water, poof, you've got it. So, what's a reasonable timeline, and what does done look like?
Tony Hansmann: A reasonable timeline is, as a senior executive, you're probably going to need to commit to five real years. Five strong years, where you are solving problems to unblock teams who are doing this work. But it's not that you don't see any results until five years. You're seeing results all along the way and they're compounding. That's the thing most people miss is that if you're the kind of person who has to grit your teeth for this sort of thing, grit your teeth for a year, but do it strong. You get a few teams being able to eliminate their own toil, and in a value stream map you want to eliminate dead time.
As you see dead time eliminated on your value stream map, everyone gets very excited. If you keep your nose to the grindstone for a year, and you unblock teams, and you target working through six to 10 teams in that year, then the thing will auto-catalyze. If you give the right air cover to staff members, they love working in a very structured way that does not have them go back and visit problems that came up from six months ago.
Success is economically defined by your organization. Again, if you're an FDA-based thing and you can only ship one thing every two years, well, your economics are going to look like one thing.
Kurt Andersen: Well, that's success. But I mean, from the point of view of that executive, what would you consider done?
Tony Hansmann: If you have a manufacturing organization, take a junket to Detroit, Michigan, and go on the Ford Rouge Plant Tour, where that is chapter-and-verse lean. There's literally info-radiators everywhere, the same info across the whole line. So, done looks like you have a fully automated production process, such that you could say, "We used to push this many features a year, and we used to get this level of feedback from it, and now we push ..." Again, the economics are how do we explore our customers’ problem space most effectively, and in a way that benefits them the most? If you're doing it faster than your customer can take it up, then you're going too fast. That's it.
Pivotal had that problem. We pushed out a quarterly release of a large enterprise software PaaS every quarter. And most customers had no interest in taking a quarterly release. They wanted a yearly release, or even longer. So, don't go faster than your customers, but as long as your customers are leading you, you're still on the trail.
I'll throw in, when you have got continuous integration and continuous delivery at the 95th percentile, that is 95% of your features are going only through an automatic, promoted, tested chain, then you can put a real stake in the ground and say, "That's pretty darn good." And you can figure out whether the other 5% makes a difference to your business or not, because it probably won't if you don't need to automate it.
There's only three books you need. You need the Forsgren Accelerate book, you need one of the Eric Ries lean product management books, and you need the Kent Beck, Extreme Programming Explained book, which contains all the practices that you'll eventually want to get to. Start with focus on training a small cadre of lean product managers who can solve problems experimentally. I like the Martin Fowler book on CI because it completely covers continuous integration. And so, if you're going to do two things, those are the two things to start with. And then your toil elimination will go through the ceiling because you have automated methods, and you have methods for exploring.
Kurt Andersen: Awesome. Well, thank you for joining us, Tony. It was a pleasure to have you. Good luck on your continued quest to kill ops and transform the world.