Proximity Works Logo

Proximity Works

DevOps / SRE Engineer

Posted 5 Days Ago
Be an Early Applicant
In-Office
Navi Mumbai, Thane, Maharashtra
Senior level
In-Office
Navi Mumbai, Thane, Maharashtra
Senior level
Responsible for managing production operations, ensuring reliability and performance of an AI-first platform, and leading incident response. Collaborates with engineering teams to optimize workloads and maintain CI/CD systems.
The summary above was generated by AI

We are looking for a DevOps / Site Reliability Engineer (L5) to own and scale the production reliability of a large-scale, AI-first platform. You will be responsible for running mission-critical workloads on cloud infrastructure, hardening Kubernetes-based systems, and ensuring high availability, performance, and cost efficiency across platform and AI services.

This role is deeply hands-on and ownership-driven. You will be trusted to run day-2 production systems end-to-end, lead incident response, and continuously raise the reliability bar for AI and data-intensive workloads.

At Proximity, you won’t just keep systems running — you’ll shape how reliability, observability, and operational excellence are built into the platform from the ground up.

Responsibilities
  • Own day-2 production operations of a large-scale, AI-first platform running on cloud infrastructure
  • Run, scale, and harden Kubernetes-based workloads integrated with a broad set of managed cloud services across data, messaging, AI, networking, and security
  • Define, implement, and operate SLIs, SLOs, and error budgets across core platform and AI services
  • Build and own observability end-to-end, including:
    • APM
    • Infrastructure monitoring
    • Logs, alerts, and operational dashboards
  • Improve and maintain CI/CD pipelines and Terraform-driven infrastructure automation
  • Operate and integrate AI platform services for LLM deployments and model lifecycle management
  • Lead incident response, conduct blameless postmortems, and drive systemic reliability improvements
  • Optimize cost, performance, and autoscaling for AI, ML, and data-intensive workloads
  • Partner closely with backend, data, and ML engineers to ensure production readiness and operational best practices
What Matters (Non-Negotiable Alignment)

Infra owners, not operators.
This role is for engineers who design, build, and own infrastructure, not those limited to ticket-based operations.

  • Built and operated production-grade cloud infrastructure end-to-end
  • Strong Kubernetes experience in real, high-traffic production environments
  • AWS experience is mandatory, with GCP as a strong plus
  • Experience operating AI / ML workloads in production
    • Including GPU-based systems
  • Strong ownership of CI/CD systems and Infrastructure as Code
  • End-to-end observability ownership
    • Monitoring, logging, alerting, dashboards
  • Comfortable making infrastructure decisions under ambiguity
  • Proven ability to collaborate deeply with ML and backend teams to take systems from design → production → scale

Requirements
  • 6+ years of hands-on experience in DevOps, SRE, or Platform Engineering roles.
  • Strong, production-grade experience with cloud platforms
  • AWS required
  • GCP strongly preferred, especially Kubernetes and managed services
  • Proven expertise running Kubernetes at scale in live production environments.
  • Deep hands-on experience with New Relic in complex, distributed systems.
  • Experience operating AI/ML or LLM-driven platforms in production environments.
  • Solid background in Terraform, CI/CD systems, cloud networking, and security fundamentals.
  • Strong understanding of reliability engineering principles, including capacity planning, failure modes, and resilience patterns.
  • Comfortable owning production systems end-to-end with minimal supervision.
  • Strong communication skills and the ability to operate calmly and effectively during incidents.
  • Experience building internal platform tooling for developer productivity.
Desired Skills
  • Experience managing multi-cloud environments or cross-cloud integrations.
  • Familiarity with cost optimization strategies for large-scale Kubernetes and AI workloads.
  • Exposure to service meshes, advanced traffic management, or zero-trust security models.

Benefits
  • Best in class compensation: We hire only the best, and we pay accordingly.
  • Proximity Talks: Learn from senior engineers, platform leaders, and industry experts.
  • Work on real-world AI systems: Operate and scale production AI platforms used at meaningful scale.
  • Continuous learning: Grow alongside a high-caliber team that values operational excellence and engineering rigor.
About us

We are Proximity — a global team of coders, designers, product managers, geeks, and experts. We solve complex problems and build cutting-edge technology at scale.

Our team of Proxonauts is growing quickly, which means your impact on the company’s success will be significant. You’ll work with experienced leaders who have built and led high-performing tech and platform teams.

Here’s a quick guide to getting to know us better:

  • Watch our CEO, Hardik Jagda, tell you all about Proximity.
  • Read about Proximity’s values and meet some of our Proxonauts.
  • Explore our website, blog, and design wing — Studio Proximity.

Get behind the scenes with us on Instagram — follow @ProxWrks and @H.Jagda.

Top Skills

AWS
Ci/Cd
GCP
Kubernetes
New Relic
Terraform

Similar Jobs

An Hour Ago
In-Office
Pune, Maharashtra, IND
Junior
Junior
Healthtech • Logistics • Pharmaceutical
Design, build, and maintain automated test workflows and scripts (TOSCA, Selenium, Jenkins). Integrate automation with systems, execute test plans, monitor KPIs, troubleshoot quality issues, and drive process improvements across SDLC stages.
Top Skills: Tosca,Selenium,Jenkins,Python,Java,C#,Aws,Azure,Google Cloud,Alm Octane,Quality Center (Qc),Jira,Postman,Soapui,Testcollab,Visual Basic,Powerbuilder,.Net,Winrunner,Loadrunner,Bug Tracking Software,Sage,Asw,Sap,Cucumber,Microsoft Office,Javascript,Html/Css,Sql
5 Hours Ago
In-Office
Mumbai, Maharashtra, IND
Senior level
Senior level
Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics
Design, develop, and deploy scalable ML/NLP and multi-modal document understanding solutions for clinical data. Own end-to-end ML lifecycle, build inference infrastructure, implement information extraction and retrieval, collaborate with domain experts, and ensure clinical accuracy and production readiness.
Top Skills: AWSBertCi/CdContainerizationDistillationGptGpu DeploymentLabel StudioNumpyPandasPlotlyPower BIProdigyPruningPythonPyTorchQuantizationScikit-LearnSQLTableauVision-Language Models
7 Hours Ago
Hybrid
2 Locations
Junior
Junior
Artificial Intelligence • Healthtech • Professional Services • Analytics • Consulting
Support clinical data management and analytics projects as a business/data analyst: gather and document requirements, author user stories, model business and data processes, create solution diagrams, and collaborate with senior BAs and product owners throughout agile implementations.
Top Skills: CdiscDominoEcs ElluminateEdcFhirFormedixInformNurocorOmopOpenclinicaOracle LshRaveRbqmSas LsafSas ViyaSycamoreVeeva

What you need to know about the Mumbai Tech Scene

From haggling for the best price at Chor Bazaar to the bustle of Crawford Market, the energy of Mumbai's traditional markets is a key part of the city's charm. And while these markets will always have their place, the city also boasts a thriving e-commerce scene, ranking among the largest in the region. Driven by online sales in everything from snacks to licensed sports merchandise to children's apparel, the local industry is worth billions, with companies actively recruiting to meet the demands of continued growth.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account