DEV Community

loading...

Site Reliability Engineering

👋 Sign in for the ability sort posts by top and latest.
SRE2AUX: How Flight Controllers were the first SREs

SRE2AUX: How Flight Controllers were the first SREs

Reactions 2 Comments
20 min read
Deep Dive into Docker Internals - Union Filesystem

Deep Dive into Docker Internals - Union Filesystem

Reactions 15 Comments
10 min read
Communication Tool Down? Here are 3 Ways to Handle it

Communication Tool Down? Here are 3 Ways to Handle it

Reactions 2 Comments
5 min read
Lessons from Slack, GCP and Snowflake outages

Lessons from Slack, GCP and Snowflake outages

Reactions 4 Comments
3 min read
How They SRE

How They SRE

Reactions 7 Comments 1
1 min read
Getting Started as an SRE? Here are 3 Things You Need to Know.

Getting Started as an SRE? Here are 3 Things You Need to Know.

Reactions 2 Comments
5 min read
How to Backup your Applications Data to S3 with Walrus

How to Backup your Applications Data to S3 with Walrus

Reactions 4 Comments
2 min read
Introduce Chaos Platform 2.0 for Azure

Introduce Chaos Platform 2.0 for Azure

Reactions 7 Comments
2 min read
What Is Nix and Why You Should Use It

What Is Nix and Why You Should Use It

Reactions 3 Comments
7 min read
How do you wrap your head around observability?

How do you wrap your head around observability?

Reactions 46 Comments 13
1 min read
Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

Top Reliability and Scaling Practices from Experts at Citrix, Greenlight Financial, and Incognia

Reactions 2 Comments
14 min read
Reliability as an Inseparable Part of Software Engineering

Reliability as an Inseparable Part of Software Engineering

Reactions 3 Comments
5 min read
Istio - Your next K8s must-have tool

Istio - Your next K8s must-have tool

Reactions 5 Comments
2 min read
The Key Differences between SLI, SLO, and SLA in SRE

The Key Differences between SLI, SLO, and SLA in SRE

Reactions 6 Comments
9 min read
Splunk - Calculate duration between two events

Splunk - Calculate duration between two events

Reactions 4 Comments
1 min read
What is the right AWS Kubernetes distribution for you?

What is the right AWS Kubernetes distribution for you?

Reactions 4 Comments
5 min read
The True Cost of Building your Own Incident Management System (IMS)

The True Cost of Building your Own Incident Management System (IMS)

Reactions 2 Comments
5 min read
GCP DevOps Certification - Day Ten

GCP DevOps Certification - Day Ten

Reactions 4 Comments
3 min read
Quick Survey: IT on-call experience in an "Always-On" world

Quick Survey: IT on-call experience in an "Always-On" world

Reactions 5 Comments 2
1 min read
Azure Front Door: An Overview

Azure Front Door: An Overview

Reactions 3 Comments
3 min read
Managing health checks at scale

Managing health checks at scale

Reactions 5 Comments
5 min read
"I'm Just Doing my Job," An SRE Myth

"I'm Just Doing my Job," An SRE Myth

Reactions 3 Comments
5 min read
Executando AWS cli em múltiplas contas de maneira fácil

Executando AWS cli em múltiplas contas de maneira fácil

Reactions 6 Comments
3 min read
What is a microservice catalog?

What is a microservice catalog?

Reactions 2 Comments 1
5 min read
Top Observability tools for DevOps Engineers and SREs

Top Observability tools for DevOps Engineers and SREs

Reactions 12 Comments
7 min read
Kubernetes gone bust. Now what?

Kubernetes gone bust. Now what?

Reactions 6 Comments
4 min read
Localizer: An adventure in creating a reverse tunnel/tunnel manager for Kubernetes

Localizer: An adventure in creating a reverse tunnel/tunnel manager for Kubernetes

Reactions 5 Comments
8 min read
Argo CD

Argo CD

Reactions 6 Comments
2 min read
AWS Project: Deploying a Static Website to AWS

AWS Project: Deploying a Static Website to AWS

Reactions 4 Comments
1 min read
The Engineer's Guide to Preparing for Black Friday 2020

The Engineer's Guide to Preparing for Black Friday 2020

Reactions 2 Comments
8 min read
Choosing SLOs that users need, not the ones you want to provide

Choosing SLOs that users need, not the ones you want to provide

Reactions 6 Comments
6 min read
Blameless Book Club: Implementing Service Level Objectives, Part 1

Blameless Book Club: Implementing Service Level Objectives, Part 1

Reactions 6 Comments
7 min read
Debugging incidents in Google's Distributed Systems

Debugging incidents in Google's Distributed Systems

Reactions 1 Comments
2 min read
Resilience Engineering and Life

Resilience Engineering and Life

Reactions 3 Comments
4 min read
Testing ML incident detection using a cloud native microservices app

Testing ML incident detection using a cloud native microservices app

Reactions 11 Comments
10 min read
Operational Readiness Review Template

Operational Readiness Review Template

Reactions 6 Comments
7 min read
Google Down worldwide | Why is Google Down? Let's break it down

Google Down worldwide | Why is Google Down? Let's break it down

Reactions 15 Comments
4 min read
SREview Issue #7 November 2020

SREview Issue #7 November 2020

Reactions 4 Comments
2 min read
Making Instrumentation Extensible

Making Instrumentation Extensible

Reactions 5 Comments
7 min read
SREview Issue #8 December 2020

SREview Issue #8 December 2020

Reactions 4 Comments
2 min read
Challenges with Implementing SLOs

Challenges with Implementing SLOs

Reactions 3 Comments
11 min read
How to SRE without an SRE on your team

How to SRE without an SRE on your team

Reactions 3 Comments
10 min read
Honeycomb SLO Now Generally Available: Success, Defined.

Honeycomb SLO Now Generally Available: Success, Defined.

Reactions 6 Comments
7 min read
Working Toward Service Level Objectives (SLOs), Part 1

Working Toward Service Level Objectives (SLOs), Part 1

Reactions 6 Comments
5 min read
Creating Chaos and a Giveaway ⚒ �?

Creating Chaos and a Giveaway ⚒ �?

Reactions 19 Comments 6
2 min read
Top Open Source projects for SREs and DevOps

Top Open Source projects for SREs and DevOps

Reactions 5 Comments
7 min read
The Operational Excellence Collection

The Operational Excellence Collection

Reactions 4 Comments
1 min read
Here are the Top Predictions for SRE in 2021

Here are the Top Predictions for SRE in 2021

Reactions 4 Comments
6 min read
Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

Yury Niño Roa Shares her Insights on Chaos Engineering and SRE

Reactions 2 Comments
7 min read
How an SRE became an Application Security Engineer (and you can too)

How an SRE became an Application Security Engineer (and you can too)

Reactions 5 Comments
8 min read
My Top 5 Books for DevOps/SRE

My Top 5 Books for DevOps/SRE

Reactions 3 Comments
4 min read
How small changes to your SLOs can be SMART for your business - A narrative case study

How small changes to your SLOs can be SMART for your business - A narrative case study

Reactions 5 Comments
11 min read
LitmusChaos: A Reflection On The Past Six Months

LitmusChaos: A Reflection On The Past Six Months

Reactions 20 Comments
15 min read
3 Ways SRE Can Boost your Business Value

3 Ways SRE Can Boost your Business Value

Reactions 3 Comments
6 min read
Essence of Terraform

Essence of Terraform

Reactions 32 Comments 1
3 min read
Building on observability

Building on observability

Reactions 4 Comments
2 min read
SREview Issue #6 October 2020

SREview Issue #6 October 2020

Reactions 4 Comments
2 min read
The Future of Ops Careers — Honeycomb

The Future of Ops Careers — Honeycomb

Reactions 6 Comments
8 min read
The Resilient Architecture Collection

The Resilient Architecture Collection

Reactions 14 Comments
2 min read
DevOps 2021: Paving your way into SRE

DevOps 2021: Paving your way into SRE

Reactions 13 Comments
6 min read
loading...