SRE
May 2021 - September 2025
Wireless Car
Detroit, MI
- Resolved system and data concerns from automotive clients and their customers, tracking network and system state issues through complex asynchronous systems while maintaining communication throughout the incident lifecycle.
- Corrected customer vehicle telematics issues using internal software, database and system-level scripts, and APIs.
- Was on-call as the sole US-based representative on an SRE team distributed worldwide to handle critical incident triage, investigation, and communication with necessary DevOps and management teams.
- Led critical incident Root Cause Analyses (RCAs) involving development, specialist, and management team representatives to examine system and company processes and to design and enact improvements.
- Collaborated to implement a system driving critical incident communication and lifecycle, developing, documenting, and presenting the best practices for all company product teams.
- Designed modifications to company processes for critical incidents and their associated RCAs, carrying the proposals through the testing, documentation, and implementation phases and presenting changes to the multitude of teams impacted.
- Led an initiative to restructure the extensive team internal wiki, removing old documentation and creating a clear and logical page hierarchy that could be reliably traversed and referenced in high-stress situations.
- Developed infrastructure and application monitoring and alerting practices and associated runbooks alongside DevOps teams to improve observability, failure response time, and time to recovery.
- Developed monitoring and alerting best practices and runbooks alongside Devops teams, improving observability, failure response time, and time to recover.