This book is a novel about IT, DevOps and helping business win. The novel delivers a story that can be recognized by anyone in IT. It uses the storyline to highlight the problems such as deadlines, bottlenecks, interdepartmental communication, security aspects, etc. It lays out the struggle with growing complexity of IT and solution with DevOps concepts.
Have you ever read a book which involves an intruguing storyline that discusses all the problems faced by an engineer who struggles to handle a fast growing organiztion. If no then this is the book that you should pick up and read. The book explains different concepts of DevOps and how to handle them.
We highly recommend this book for any one wanting to understand not just the DevOps concepts but also scenarios where they can be applied. It could vary from a big project deployment to a simple process of documenting changes and even resource managing in a team. Apart from the story, there is a notable summary of important concepts at the end of the book. Below are the short gists of first few chapters to give you an idea of the plot and writing. Happy reading!
Chapter 1:
Our protagonist Bill Palmer is called to meet the CEO and worries that he may be getting fired. Instead he learns that the CIO and Chief of IT Operations(Luke and Damon) have been fired. After CEO Steve convinces Bill of the hard times the company is going through, Bill accepts the role of VP of Operations. He is already facing a new problem as Paychecks for company employees are stuck!
Chapter 2:
Bill meets Dick and Ann who explain to him about the Payroll problem. There are only zeroes in salary column for hourly employees. One backup option is to pay everyone based on previous cycle but that will lead to overpaying or underpaying some employees inviting the Union to step in. Bill suggests working on plan B while he investigates further. On reaching IT dept., Wes and Patty tell him about the SAN(Storage Area Network) failure which everyone is trying to fix since all servers are down. Bill tries to relate payroll failure to SAN failure.
Chapter 3:
The SAN failure was caused due to firmware upgrade. The upgrade was pending for a few years, taking a toll on system performance. On rebooting the SAN, self-tests started failing as they were written according to previous versions.Here we see a classic example of lack of a ticketing system as well as absence of change authorization system!
The issue of payroll is identified to be an effect of change introduced to system, without following processes and notifying other teams. John the Head of Security, deployed a tokenization application that affected timekeeping and caused payroll failure.
Chapter 4:
The project management meeting discusses important aspects of phoenix project; the project which is delayed by 2 years even after spending $20million. There are disagreements about deploying the project which is still unstable, doesnt have good documentation. we also see a lack of change management process in the team.
Chapter 5:
Bill struggles through audit meeting showing numerous deficiencies after testing phase. With many employees engaged in solving SAN failure, solving all errors in 3 weeks is near to impossible. There is a discussion highlighting Brent as the smartest, dependable engineer, but such is dependency can also be a single point of failure.
Chapter 6:
The change management tool in the team is not efficient leading to either clashes or delayed changes. Hence the team start to implement a sort of physical Kanban board on the whiteboard itself using paper cards. There is also a good discussion and definition to ‘what is change’-in short they conclude that change is any activity that impacts delivery of service.
Chapter 7:
An important board member Erik explains bottleneck concept to Bill by taking him to one of the many manufacturing plants of the company. The chapter also tells us about the work of IT operations: uninterrupted flow of planned work and minimizing impact of unplanned work. Erik urges Bill to understand 3 ways of DevOps and figure out 4 types of work.
Chapter 8:
There are over 400 changes submitted for the week which are too many to be handled! Hence they are classified into 3 groups:- 1. Fragile/high risk changes - changes that can lead to system crash. Such changes need authorization and proper scheduling.
- 2. Standard/everyday changes - these are changes that are done frequently and pose less or no threat. Such changes can be pre approved but must be scheduled.
- 3. Medium risk changes - For such changes we can depend on the team’s manager to have an idea and give authorization.
Chapter 9:
DevOps is also known as 'fire-fighting'. The system is unpredictable and can fail due to various reasons. The DevOps team has to always be ready to handle it just like our fire-fighters. But you can't expect company employees to solve problems as and when they occur. For this reason, it is essential to have training/practice incident calls. The chapter discusses change management further to explain 'collision' of changes and its impact. Bill understands that 'Change' itself is a type of work.