Ensure availability and performance of customer-facing platform by provisioning and maintaining Windows/Linux/Kubernetes infrastructure, implementing IaC (Terraform/ARM/CloudFormation), automating workflows, monitoring with observability tools, managing backups and DR, analyzing logs and metrics, and promoting security and ITIL best practices across DevOps, DBA, and development teams.
Job Summary
As a Site Reliability Engineer, you will play a critical role in ensuring the availability and performance of our customer-facing platform. You will work closely with DevOps, DBA, and Development teams to provision and maintain infrastructure, deploy and monitor our applications, and automate workflows. Your contributions will have a direct impact on customer satisfaction and overall user experience.
Responsibilities and Deliverables
- Manage, monitor, and maintain highly available systems (Windows and Linux)
- Analyze metrics and trends to ensure performance and rapid scalability.
- Address routine service requests while identifying ways to automate and simplify.
- Create infrastructure as code using Terraform, ARM Templates, Cloud Formation.
- Maintain data backups and disaster recovery plans.
- Adhere to security best practices through all stages of the software development lifecycle
- Follow and champion ITIL best practices and standards.
Organizational Alignment
- Reports to the Senior SRE Manager
- This role involves close collaboration with DevOps, DBA, and security teams.
Technical Proficiencies
- Hands-on experience with AWS is a must-have.
- Proficiency analyzing application, IIS, system, security logs, and CloudTrail events.
- Experience with CI/CD tools such as Jenkins and GitHub Actions
- Experience maintaining and administering Windows, Linux, and Kubernetes.
- Experience in automation using scripting languages such as PowerShell, Bash, or Python.
- Good understanding of networking concepts (VPC, subnet, private link, peering).
- Familiarity with configuration management using Ansible, Azure Automation or similar.
- Familiarity with observability tools such as New Relic, AppDynamics, or DataDog.
Experience
- 3+ years of experience in SRE or System Administration role.
- Demonstrated ability building and supporting high availability Windows/Linux servers.
- 2+ years of experience working with cloud technologies including AWS, Azure.
- Comfortable using Scrum, Kanban, or Lean methodologies.
Education
- Bachelor’s Degree or College Diploma in Computer Science, Information Systems, or equivalent experience.
Similar Jobs
Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
The Lead Site Reliability Engineer will build, deploy, and manage microservices in Kubernetes, optimize cloud applications, and integrate emerging technologies in AI and GenAI, ensuring high reliability and scalability.
Top Skills:
Amazon EksAWSAzureBashChefGCPGithub ActionsHelmKubernetesMySQLNew RelicPagerdutyPythonRundeckTerraform
Big Data • Information Technology • Software • Database • Analytics • Infrastructure as a Service (IaaS) • Big Data Analytics
Lead proactive reliability engineering for a multi-cloud streaming platform: build automation and tooling, define SLO/SLA frameworks, analyze systemic failures, own incident response standards, serve as incident commander, coach teams through post-mortems, produce customer-facing root cause analyses, and partner across engineering to reduce incidents and scale reliability practices.
Top Skills:
AWSAzureCi/CdConfluenceGCPGitJIRAKafkaKubernetesLoggingMetricsPagerdutyRootlySlackTracing
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills:
AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
What you need to know about the Mumbai Tech Scene
From haggling for the best price at Chor Bazaar to the bustle of Crawford Market, the energy of Mumbai's traditional markets is a key part of the city's charm. And while these markets will always have their place, the city also boasts a thriving e-commerce scene, ranking among the largest in the region. Driven by online sales in everything from snacks to licensed sports merchandise to children's apparel, the local industry is worth billions, with companies actively recruiting to meet the demands of continued growth.


.png)
