Course: Network Admin to Site Reliability Engineer – Part 4: Site Reliability Engineer
duration: 15 hours |
Language: English (US) |
access duration: 180 days |
Details
This comprehensive course delves into various aspects of Site Reliability Engineering (SRE), covering best practices for onboarding new team members, essential technical skills, and handling operational overload effectively. You'll explore methods for managing operational loads, utilizing support ticketing systems, and setting service-level objectives. Additionally, you'll learn about 'toil' and its adverse effects on team productivity, along with strategies for identifying and eliminating it.
Moreover, you'll gain insights into emergency planning strategies, knowledge sharing, and writing effective postmortems. The importance of service level objectives and successful team attributes are highlighted, encouraging a team-first approach and effective communication techniques. Collaboration and communication skills are honed through learning effective meeting management, pair programming, and utilizing collaboration tools. Furthermore, you will delve into project management metrics, software testing methodologies, fault analysis methods, and API monitoring strategies. You'll also explore SRE engagement models, Production Readiness Review processes, and scaling SRE to larger environments. Through case studies, you'll gain practical insights into applying SRE engagement models in real-life scenarios, equipping you with valuable skills to excel in the field of Site Reliability Engineering.
Result
After completing this course, you will be ready to scale the SRE team, handle operational loads, communicate and collaborate effectively, manage software reliability metrics, and manage the SRE engagement model as a Site Reliability Engineer.
Prerequisites
No formal prerequisites. However, it is recommended to be familiar with Site Reliability Engineering, Networking and DevOps.
It is also recommended to first follow Parts 1, 2 and 3 of the learning path ‘’Network Admin to Site Reliability Engineer’’.
- Part 1: Network Admin
- Part 2: DevOps Engineer
- Part 3: Chaos Engineer
Target audience
System Administrator, Network Administrator
Content
Network Admin to Site Reliability Engineer – Part 4: Site Reliability Engineer
SRE Team Management: Scaling the Team
When adding a new site reliability engineer (SRE) to your team, it's important that the new member not only has the required skills but also receives the proper training. This allows the new SRE to fit into the team and get up to speed as quickly as possible. In this course, you'll learn about the best practices for onboarding a new SRE team member, including methods and tools that can be used during the onboarding process. Next, you'll explore the technical skills that an SRE requires, including the ability to reverse engineer an application to determine the root cause of a problem. Finally, you'll examine the skills and knowledge an SRE requires when on-call, including those needed to provide support and manage support issues.
SRE Team Management: Managing Operational Loads
To ensure and maintain a system's functional state, site reliability engineers (SRE) must learn how to identify, calculate, and manage a system's operational load, which generally falls into three categories: ongoing operation activities, tickets, and pages. In this course, you'll explore these categories in detail. You'll start by outlining methods for managing operational loads at the team level and using support ticketing systems and service level objectives. Next, you'll investigate 'toil,' a term used to describe the operational work associated with running and maintaining a production service. You'll outline steps for identifying, calculating, and eliminating toil and examine the adverse effects toil can have on a team. Additionally, you'll outline how to work with interrupts and distinguish between crucial metrics used for managing them. Lastly, you'll identify the human element factors to consider when dealing with interrupts, including efficiency, distractibility, and respect.
SRE Team Management: Operational Overload
Site reliability engineers (SREs) are responsible for many administrative tasks, often splitting their time between reactive ops work and special projects. To ensure teams do not become overloaded, SREs may be transferred to a team in order to prevent or help mitigate overload. In this course, you will learn how to deal with operational overload. You'll start by examining ops mode, which is an approach used to ensure services are properly maintained and optimized. You'll discover factors that contribute to team morale and stress. In addition, you will outline emergency planning strategies and best practices, as well as learn how to categorize emergencies and prepare detailed emergency plans. Next, you'll explore how knowledge sharing relates to emergency preparedness, the key to writing successful postmortems, the importance of service level objectives, and how an appropriate level of detail is required to properly explain your findings. Lastly, you'll discover the key factors and attributes of successful teams. You'll examine a team-first approach and differentiate between questioning techniques such as open/closed, funnel, probing, and leading.
Core Skills for Site Reliability Engineers: SRE Collaboration & Communication
Collaboration is key to getting the most out of your team and ensuring your clients receive their desired service. In this course, you'll learn to collaborate and communicate as an SRE effectively. You'll learn how to run traditional and virtual meetings to ensure maximum effectiveness and productivity, whether it's with customers, internal or external team members, or distributed teams. You'll examine how to plan, carry out, and post-analyze meetings using best practices and sufficient preparation, tailoring these methods to suit the participants and the end-goal. You'll delve into the unique characteristics of different meeting types, such as those for problem-solving or innovation. You'll explore the advantages and challenges of SRE pair programming. You'll then end the course by investigating some helpful collaboration and communication tools.
SRE Metric Management: Software Reliability Metrics
To improve the chances of creating, monitoring, and maintaining a successful software development project, site reliability engineers and all team members must be aware of which metrics to measure. They also need a working knowledge of both automated and manual testing methods. In this course, you'll learn how to manage and select SRE metrics and how various testing methods work. You'll begin by learning what metrics need to be measured for project management, software development, and APIs - examining in detail CI/CD, cloud API, and software project metrics, to name a few. Next, you'll compare both manual and automated testing methods and the goals of each. Lastly, you'll investigate automated testing frameworks and platforms, test cases and types, and best practices and pitfalls to consider.
SRE Metric Management: Software Reliability Monitoring and Reporting
Once SRE metrics have been identified, site reliability engineers (SREs) must know how to perform fault analysis on a system, classify defects, and monitor and report data. In this course, you'll explore the tools and best practices for carrying out these procedures. You'll begin by identifying various fault analysis methods and tools. You'll then classify software defects and bugs with a focus on severity and priority. Next, you'll investigate strategies for monitoring APIs and explore some tools used for this task. You'll then examine in detail several tools for collecting, analyzing, and reporting metric data using a customizable dashboard, including those that comprise the ELK Stack - Elasticsearch, Logstash, and Kibana. Furthermore, you'll explore the data collection tool Beats and the beneficial use cases for Elasticsearch notifications.
SRE Engagement: Production Readiness Review
Production Readiness Review (PRR), the standard first step of SRE engagement, and its phases are used to identify a service's reliability needs. The concept of ""early engagement"" is then used to evolve the Simple PRR model. In this course, you'll investigate SRE engagement, early engagement, and Production Readiness Review. You'll start by delving into each phase of the SRE Production Readiness Review (PRR) model, namely, engagement, analysis, refactoring, training, onboarding, and continuous improvement. Next, you'll learn how early engagement can be used to evolve the Simple PRR model. You'll then examine how SRE platforms and frameworks can provide structural solutions. Finally, you'll learn how to use the SRE engagement model to manage software projects, comparing it to the traditional System Development Life Cycle (SDLC) model.
SRE Engagement: The SRE Engagement Model
The SRE engagement model and SRE service lifecycle have note-worthy similarities and differences to the traditional software development life cycle. In this course, you'll explore these differences and investigate the SRE engagement model's components and how to work with it in various circumstances. You'll learn the steps for setting up and building SRE service relationships and establishing a roadmap for sprints and communication. You'll examine how to measure the impact of SRE engagement, set ground rules for SRE teams, and sustain effective relationships with other SREs and developers. Next, you'll study the steps to take for scaling SRE to larger environments and for ending an engagement. Lastly, you'll review case studies to see the results of how others have used the SRE engagement model used in real-life.
Final Exam: Site Reliability Engineer
Final Exam: Site Reliability Engineer will test your knowledge and application of the topics presented throughout the Site Reliability Engineer track of the Skillsoft Aspire Network Admin to Site Reliability Engineer Journey.
Course options
We offer several optional training products to enhance your learning experience. If you are planning to use our training course in preperation for an official exam then whe highly recommend using these optional training products to ensure an optimal learning experience. Sometimes there is only a practice exam or/and practice lab available.
Optional practice exam (trial exam)
To supplement this training course you may add a special practice exam. This practice exam comprises a number of trial exams which are very similar to the real exam, both in terms of form and content. This is the ultimate way to test whether you are ready for the exam.
Optional practice lab
To supplement this training course you may add a special practice lab. You perform the tasks on real hardware and/or software applicable to your Lab. The labs are fully hosted in our cloud. The only thing you need to use our practice labs is a web browser. In the LiveLab environment you will find exercises which you can start immediately. The lab enviromentconsist of complete networks containing for example, clients, servers,etc. This is the ultimate way to gain extensive hands-on experience.
Sign In
WHY_ICTTRAININGEN
Via ons opleidingsconcept bespaar je tot 80% op trainingen
Start met leren wanneer je wilt. Je bepaalt zelf het gewenste tempo
Spar met medecursisten en profileer je als autoriteit in je vakgebied.
Ontvang na succesvolle afronding van je cursus het officiële certificaat van deelname van Icttrainingen.nl
Krijg inzicht in uitgebreide voortgangsinformatie van jezelf of je medewerkers
Kennis opdoen met interactieve e-learning en uitgebreide praktijkopdrachten door gecertificeerde docenten
Orderproces
Once we have processed your order and payment, we will give you access to your courses. If you still have any questions about our ordering process, please refer to the button below.
read more about the order process
Een zakelijk account aanmaken
Wanneer u besteld namens uw bedrijf doet u er goed aan om aan zakelijk account bij ons aan te maken. Tijdens het registratieproces kunt u hiervoor kiezen. U heeft vervolgens de mogelijkheden om de bedrijfsgegevens in te voeren, een referentie en een afwijkend factuuradres toe te voegen.
Betaalmogelijkheden
U heeft bij ons diverse betaalmogelijkheden. Bij alle betaalopties ontvangt u sowieso een factuur na de bestelling. Gaat uw werkgever betalen, dan kiest u voor betaling per factuur.
Cursisten aanmaken
Als u een zakelijk account heeft aangemaakt dan heeft u de optie om cursisten/medewerkers aan te maken onder uw account. Als u dus meerdere trainingen koopt, kunt u cursisten aanmaken en deze vervolgens uitdelen aan uw collega’s. De cursisten krijgen een e-mail met inloggegevens wanneer zij worden aangemaakt en wanneer zij een training hebben gekregen.
Voortgangsinformatie
Met een zakelijk account bent u automatisch beheerder van uw organisatie en kunt u naast cursisten ook managers aanmaken. Beheerders en managers kunnen tevens voortgang inzien van alle cursisten binnen uw organisatie.
What is included?
Certificate of participation | Yes |
Monitor Progress | Yes |
Award Winning E-learning | Yes |
Mobile ready | Yes |
Sharing knowledge | Unlimited access to our IT professionals community |
Study advice | Our consultants are here for you to advice about your study career and options |
Study materials | Certified teachers with in depth knowledge about the subject. |
Service | World's best service |
Platform
Na bestelling van je training krijg je toegang tot ons innovatieve leerplatform. Hier vind je al je gekochte (of gevolgde) trainingen, kan je eventueel cursisten aanmaken en krijg je toegang tot uitgebreide voortgangsinformatie.
FAQ
Niet gevonden wat je zocht? Bekijk alle vragen of neem contact op.