CodingBlocks

The DevOps Handbook – Anticipating Problems

Aug 17 '20

We’re using telemetry to fill in the gaps and anticipate problems while discussing The DevOps Handbook, while Michael is still weird about LinkedIn, Joe knows who’s your favorite JZ, and Allen might have gone on vacation.

You can find these show notes at https://www.codingblocks.net/episode139, in case you’re reading these within your podcast player.

Survey Says

What's your favorite mobile device?

An iPad. The Tab-father of tablets.
An Android-based tablet. Great hw specs, without the hassle of longterm support.
A Kindle. It was on sale.
Chromebook FTW. Not quite as portable as a tablet, nor as useful as a laptop ...
2-in-1 Laptop. A giant, bulky tablet that can run Docker.

vote

Joe’s Super Secret Survey

Go or Rust?

Go
Rust

vote

News

Thank you to everyone that left us a new review:
- iTunes: AbhiZambre, Traz3r
- Stitcher: AndyIsTaken
Most important things to do for new developer job seekers?

I Got 99 Problems and DevOps ain’t One

Find and Fill Any Gaps

Once we have telemetry in place, we can identify any gaps in our metrics, especially in the following levels of our application:

Business level – These are metrics on business items, such as sales transactions, signups, etc.
Application level – This includes metrics such as timing metrics, errors, etc.
Infrastructure level – Metrics at this level cover things like databases, OS’s, networking, storage, CPU, etc.
Client software level – These metrics include data like errors, crashes, timings, etc.
Deployment pipeline level – This level includes metrics for data points like test suite status, deployment lead times, frequencies, etc.

Application and Business Metrics

Gather telemetry not just for technical bits, but also organizational goals, i.e. things like new users, login events, session lengths, active users, abandoned carts, etc.
Have every business metric be actionable. And if they’re not actionable, they’re “vanity metrics”.
By radiating these metrics, you enable fast feedback with feature teams to identify what’s working and what isn’t within their business unit.

Infrastructure Metrics

Need enough telemetry to identify what part of the infrastructure is having problems.
Graphing telemetry across infrastructure and application allows you to detect when things are going wrong.
Using business metrics along with infrastructure metrics allows development and operations teams to work quickly to resolve problems.
Need the same telemetry in pre-production environments so you can catch problems before they make it to production.

Overlaying other Relevant Information onto Our Metrics

In addition to our business and infrastructure telemetry graphing, you also want to graph your deployments so you can quickly correlate if a release caused a deviation from normal.
- There may even be a “settling period” after a deployment where things spike (good or bad) and then return to normal. This is good information to have to see if deployments are acting as expected.
Same thing goes for maintenance. Graphing when maintenance occurs helps you correlate infrastructure and application issues at the time they’re deployed.

Resources We Like

The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations (Amazon)
The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win (Amazon)
The Unicorn Project: A Novel about Developers, Digital Disruption, and Thriving in the Age of Data (Amazon)
The ONE Metric More Important Than Sales & Subscribers (YouTube)
2020 Developer Survey – Most Loved, Dreaded, and Wanted Languages (Stack Overflow)
Instrument your Python applications with Datadog and OpenTelemetry (Datadog)
Why does speed matter? (web.dev)
Dash goes virtual! Join us on Tuesday, August 11 (Datadog)

Tip of the Week

Google Career Certificates (grow.google)
- Google Offers 100,000 Scholarships – Here’s How To Get One (Forbes)
- Grow with Google (grow.google)
Hearth Bound (HearthBoundPodcast.com, Twitter)
Tsunami (GitHub) is a general purpose network security scanner with an extensible plugin system for detecting high severity vulnerabilities with high confidence.
- Plugins for Tsunami Security Scanner (GitHub)

Episode source

DEV Community

CodingBlocks

The DevOps Handbook – Anticipating Problems

Sponsors