SREConAmericas 2018 March 27, 2018 to March 27, 2018, Santa Clara, USA

Event Page


Tell us about missing data
Title Speakers Summary Topic Types
If You Don’t Know Where You’re Going, It Doesn’t Matter How Fast You Get There Nicole Humble The best-performing organizations have the highest quality, throughput, and reliability while also delivering value. They ...
Security and SRE: Natural Force Multipliers Cory Scott A thorough understanding of how modern security principles impact SRE operations and the services they ...
What It Really Means to Be an Effective Engineer Edmond Lau For two years, I embarked on a quest to answer: What mindsets and frameworks do ...
SparkPost: The Day the DNS Died Jeremy Blosser More than 25% of the world's non-spam email is sent using SparkPost's technology, and our ...
Stable and Accurate Health-Checking of Horizontally-Scaled Services Lorenzo Saino This talk explains how Fastly built a distributed health-checking system capable of driving stable traffic ...
Beyond Burnout: Mental Health and Neurodiversity in Engineering James Meickle In 2018, most of us understand what burnout is and why it's an occupational hazard ...
Bootstrapping an SRE Team: Effecting Culture Change and Leveraging Diverse Skill Sets Aaron Wieczorek The U.S. Digital Service at VA’s team (DSVA), working on its most visible applications, Vets.gov ...
Don’t Ever Change! Are Immutable Deployments Really Simpler, Faster, and Safer? Rob Hirschfeld In the cloud and container era, we’ve moved from managing systems over time to the ...
Lessons Learned from Our Main Database Migrations at Facebook Yoshinori Matsunobu At Facebook, we created a new MySQL storage engine called MyRocks. Our objective was to ...
Leveraging Multiple Regions to Improve Site Reliability: Lessons Learned from Jet.com Andrew Duch Running your systems across multiple regions allows you to tolerate a unique set of failure ...
Building Successful SRE in Large Enterprises—One Year Later Dave Rensin At SRECon2017 I talked about the formation of a special group of Google SREs who ...
Working with Third Parties Shouldn't Suck Jonathan Mercereau As an SRE, you're not just responsible for building automation. Sometimes the work we do ...
When to NOT Set SLOs: Lots of Strangers Are Running My Software! Marie Cosgrove-davies Are you robbing your customers of their ability to think hard about their users and ...
Lessons Learned from Five Years of Multi-Cloud at PagerDuty Arup Chakrabarti PagerDuty has been running a multi-cloud infrastructure over the past 5 years. In that time, ...
Help Protect Your Data Centers with Safety Constraints Christina Perot Running a multi-tenant, multi-datacenter compute infrastructure requires automating machine management across their respective lifecycles. We ...
Real World SLOs and SLIs: A Deep Dive Matthew Binette If you've read almost anything about SRE best practices, you've probably come across the idea ...
How SREs Found More than $100 Million Using Failed Customer Interactions Wes Hummel This talk will go into PayPal SRE's journey of using data around customer failures (outward-looking) ...
Learning at Scale Is Hard! Outage Pattern Analysis and Dirty Data Tanner Lund An important part of site reliability is identifying and eliminating the causes of outages. Good ...
How Not to Go Boom: Lessons for SREs from Oil Refineries Emil Stolarsky Bad software doesn’t explode. You can describe it as exploding when it throws an exception, ...
Track That Clone: Near-Realtime Data Audit for Distributed Data Replication Janardh Bantupalli N/A
Auto-Cascading Security Updates Through Docker Andrey Falko N/A
How to Insulate Your Team from "Shoulder Taps" Danny Gershman N/A
TechOps-How Stride Built a Culture of Reliability from Day One David Giesberg N/A
Auto Remediation in Diagnosing Network for SRE team Sean Jiang N/A
Embedded SRE to Improve Time to Market at Scale Hemant Kapoor N/A
Turning It off and on Again—"Look Mom, No Hands" Craig Knott N/A
Who Are Your Alerts For? Ren Lee N/A
Statistics for Dummies Fred Moyer N/A
Package Masonry Francisco Ruiz N/A
Delivering Technical Presentations the SRE Way Peter Sahlstrom N/A
Diversity: It’s Not about How or Who, but Why We Hire John Schnipkoweit N/A
Why Netflix Built Titus with Reliability in Mind Andrew Spyker N/A
An Unexpected Open Source Win Amy Tobey N/A
SRE and Unicorn—Match Made in Heaven Ritesh Vajariya N/A
Signal vs. Noise: How to Identify Projects That Will Survive Chris Robertson N/A
Containerization War Stories Ruth Menezes Just like many other mid-sized companies, Pinterest runs tens of thousands of machines and hundreds ...
Resolving Outages Faster with Better Debugging Strategies Liz Mckaig Engineers spend a lot of time building dashboards to improve monitoring but still spend a ...
Monitoring DNS with Open-Source Solutions Felipe Bustos NIC Chile is the DNS administrator of the ccTLD .cl, managing over 500.000 domain names ...
Antics, Drift, and Chaos Lorin Hochstein Large systems evolve from successful, smaller one, an observation predicted by the branch of study ...
Security as a Service Wojciech Wojtyniak In the game of security, defenders have to be lucky every single time, but just ...
Breaking in a New Job as an SRE Amy Tobey In theory, most companies have onboarding processes to assimilate you and prepare you for life ...
"Capacity Prediction" instead of "Capacity Planning": How Uber Uses ML to Accurately Forecast Resource Utilization Rick Boone At Uber, the majority of our services are in the critical path of customer-facing features ...
Distributed Tracing, Lessons Learned Gina Maini Your engineering job might look something like this:1. Understand dependencies.2. Keep the site up.3. Understand ...
Junior Engineers Are Features, Not Bugs Kate Taggart There are many benefits to hiring junior engineers, but when it comes to teams responsible ...
Approaching the Unacceptable Workload Boundary Baron Schwartz We've all stared in frustration at a system that degraded into nonresponsiveness, to the point ...
Building Shopify's PaaS on Kubernetes Karan Thukral Shopify has grown from less than 20 production services in 2011 to more than 400 ...
Know Thy Enemy: How to Prioritize and Communicate Risks Matt Brown Every SRE team attempting to manage, mitigate, or eliminate the risks facing their system will ...
Automatic Metric Screening for Service Diagnosis Yu Chen When a service is experiencing an incident, the oncall engineers need to quickly identify the ...
Whispers in Chaos: Searching for Weak Signals in Incidents J. Paul Reed The complexity of the socio-technical systems we engineer, operate, and exist within is staggering. Despite ...
Architecting a Technical Post Mortem Will Gallego SRE’s are frequently tasked with being front and center in intense, highly demanding situations in ...
Your System Has Recovered from an Incident, but Have Your Developers? Jaime Woo Mistakes are inevitable, and happen to the best of us. Our industry adopts a blame-free ...
The History of Fire Escapes Tanya Reilly When a datacenter goes offline, a server gets overloaded, or a binary hits a crashing ...
Leaping from Mainframes to AWS: Technology Time Travel in the Government Andy Punteney The year is 1999, judging from the technology on your servers. Your mission is to ...
Operational Excellence in April Fools’ Pranks: Being Funny Is Serious Work! Thomas Limoncelli One of the most “high stakes launch” you can do is the April Fools Prank ...