Site Reliability Engineering (SRE) Fundamentals培训

课程大纲

课程大纲：

  Site Reliability Engineering (SRE) Fundamentals培训

Introduction

How SRE marries traditional IT and software development.
The need for automation and observability
The role of a software engineers vs system administrators.
Site Reliability Engineers vs DevOps engineers.
Overview of an IT System

System architecture, on-premise and in the cloud.
Overview of SRE Principles and Practices

Infrastructure as a Code.
The role of containerization and orchestration (Docker, Kubernetes, etc.)
Continuous Integration, Continuous Deployment and Continuous Delivery.
Observability.
Evaluating an IT System

Taking stock of the team and organizational resources.
Maping out the systems and processes.
Estimating the potential impact of SRE.
The role the software engineering team.
The role of the operational team.
The role of management.
Maintaining the Reliability of a System

Describing and measuring the desired reliability of a service.
Understanding Service Level Objectives (SLOs)
Understanding Service Level Indicators (SLIs) and Service Level Agreements (SLAs).
Working with Error Budgets.
Developing an SLO.
Optimizing System Administration

Setting up a development environment
Evaluating SRE tools
Prioritizing tasks for automation.
Writing software.
Deploying "Infrastructure as Code"

Testing and iterating code
Making a system anti-fragile
Learning from failure
Monitoring a System

Observing system performance.
SRE tools and techniques.
The Future of SRE

Summary and Conclusion

课程教师

进阶课程

课程教师

进阶课程

开始新实验

开始评估课实验

开始实验