Game-day testing is often underrated. Many people think that it’s something companies do when they are too lazy to write automated tests, or if the development process was so broken that throwing random scenarios at them was the only way to gain confidence in the product. But those who neglect it are wrong.
The fact is, you’ll never be 100% certain what your system or processes will do under various scenarios so game-day testing is important. It’s something we do regularly at the New Relic European Development Center in Barcelona:
What you need to write-up a game-day testing scenario:
What the test is and why you’re running it (generally just a paragraph).
Details on where the game-day testing will take place. You can list things like servers or application URIs here.
💠 The process. Details such as how frequently game-day testing should take place and the steps needed to execute a game-day scenario. Provide a link to your “mock” incident game-day doc. Also, call out if this is a good ad-hoc scenario (that is, a scenario that can be run outside of the context of a normal game-day test).
💠 The important stuff. Here I call out the exceptional. Meaning that if there was a list of three things that were “musts” for the test to run, or that were critical to the business, document what they are and how can you can game-day them.
💠 The unexpected. This is my favorite section. This is where I call out how to create unpredictable events in my system. These can be automated (like Netflix’s chaos monkey) or manual such as “turn off web jobs.”
💠 What to do with issues found. This is a great section to focus on some process. Here you can detail processes and actions to take when the system reports unexpected results. Generally, you might detail your bug reporting workflow or hotfix process. This is an important section because you want to call out the fact that the game-day process should create artifacts that you can act on.
💠 Holes and future changes. Consider this your informal backlog. I put it with the game-day because I want to call out “Hey, we’re not covered here!” or “I really think we can improve here.” This way, pain points are always top of mind.
Things to keep in mind:
💠 Game-day testing is best done in a group, although can be done individually.
💠 Ask other people outside of your immediate team to help you create different scenarios and help you “break my system.”
💠 Doing things manually is a healthy way to better understand the current state of your system. Choose a few of the scenarios and do them in random order instead of just running through a script, one by one.
💠 Write down your hypothesis on what you think will happen when you execute a scenario. Investigate the results when it doesn’t match your expectations.
💠 Make sure you proactively communicate to all parties who might get alerts and notifications about this system that you are running tests. You don’t want to concern people unnecessarily!
💠 Document the bugs you encounter during your test so you can work on them later.
💠 All artifacts in your game-day “system” (documents, scripts, users, and systems) should be organic and updated often.
Focus on the learning
What could be more fun than breaking your software? Breaking your processes, of course!. The most important aspect of game-day testing is that you actually do it – when you run these tests, your app might survive or it might not. Game-day testing is meant to prevent future misery and, more importantly, as a software developer you should always make sure to take the hits before your users do. The crucial part is that you learn from the exercise regardless of the outcome.