Platform Engineering – Sr. Staff Site Reliability Engineer


  • Bangalore, India
Platform Engineering – Sr. Staff Site Reliability Engineer
Job Posted : Aug 16th, 2023

Job Description

We are seeking a Sr. Staff Site Reliability Engineer (Infrastructure & Site Reliability Engineering) with extensive experience in Kubernetes, GitOps, AWS & AZURE to lead our Site Reliability Engineering (SRE) team. The successful candidate will deeply understand SRE practices and have a track record of implementing high-quality site reliability engineering practices (SLAs, SLOs, Proactive Alert Management, Incident Response/Review, Postmortems, etc.).

In this role, you will work with our SRE and cross-functional engineering teams to lead, develop and operate our development and production infrastructure and operations


  • Collaborate with software engineering teams to define infrastructure requirements, drive best practices in monitoring, incident response, and automation, ensuring seamless integration and optimal performance of applications and systems
  • Lead and mentor a team of SREs, providing technical guidance and support to ensure the ongoing reliability and performance of our systems
  • Play a key role in driving the automation, tools, and observability initiatives, assuming ownership of designing and implementing scalable and efficient solutions
  • Leading the response to production incidents, conducting comprehensive learning reviews, driving continuous improvement initiatives, and actively participating in an on-call rotation, fostering a culture of learning, resilience, and ongoing enhancement within our systems
  • Establish and drive operations performance through SLOs
  • Provide project management, sprint planning, and road-mapping support to the SRE team
  • Demonstrate proficiency in technical skills, exhibit an expert-level understanding of relevant technologies and tools, and use this knowledge to mentor and support team members, helping them improve their skills and succeed in their roles

Qualification And Experience:-

  • At least 10 + years of experience designing, building & maintaining SAAS environments
  • 5+ years of experience designing, building & maintaining AWS/AZURE infrastructure with Terraform
  • 3+ years of experience building and running Kubernetes clusters
  • Experience with observability (monitoring – logging, tracing, metrics)
  • Experience with GitOps CI/CD processes
  • Experience with coding Python, Go (Golang) & bash
  • Experience with security operations – security policies, infrastructure, key management, setup of encryption at rest and transport
  • Experience in mentoring and fostering the professional development of team members, promoting a culture of continuous learning and collaboration.


  • Strong customer orientation
  • Excellent interpersonal and organizational skills
  • Attention to detail and focus on quality
  • Strong communication skills to effectively liaise with both technical and non-technical staff
  • Ability to act decisively and works well under pressure
  • Must be a collaborative problem solver
  • Strong bias for ownership and action


  • Bangalore, India.

Company Overview:-

SolarWinds is a leading provider of powerful and affordable IT management software. Our products give organizations worldwide—regardless of type, size, or complexity—the power to monitor and manage their IT services, infrastructures, and applications; whether on-premises, in the cloud, or via hybrid models. We continuously engage with technology professionals—IT service and operations professionals, DevOps professionals, and managed services providers (MSPs)—to understand the challenges they face in maintaining high-performing and highly available IT infrastructures and applications. The insights we gain from them, in places like our THWACK® community, allow us to solve well-understood IT management challenges in the ways technology professionals want them solved. Our focus on the user and commitment to excellence in end-to-end hybrid IT management has established SolarWinds as a worldwide leader in solutions for network and IT service management, application performance, and managed services.