We’re not just building better tech. We’re rewriting how data moves and what the world can do with it. With Confluent, data doesn’t sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them.
It takes a certain kind of person to join this team. Those who ask hard questions, give honest feedback, and show up for each other. No egos, no solo acts. Just smart, curious humans pushing toward something bigger, together.
One Confluent. One Team. One Data Streaming Platform.
About the Metrics Platform TeamThe Confluent Metrics Platform team's mission is to provide a best-in-class observability foundation that enables customers to monitor, analyze, and optimize their real-time data streaming infrastructure at cloud scale. Our charter is to deliver Realtime Metrics and Insights through the Confluent Cloud Metrics API, empowering businesses to make data-driven decisions about their streaming workloads.
We are a critical component of Confluent's observability systems, serving as the primary interface through which customers understand the health, performance, and behavior of their Kafka clusters, connectors, ksqlDB applications, and Schema Registry deployments. Our technology powers monitoring dashboards, alerting systems, and capacity planning tools for thousands of customers running mission-critical streaming applications.
About the RoleAs a Senior Manager, Engineering for the Metrics Platform team, you will build, lead, and grow a high-performing engineering organization responsible for one of Confluent's most critical services. This role demands a unique blend of deep technical expertise and strong leadership—you must drive both the strategic vision for a large-scale, real-time analytics platform AND execute flawlessly on operational excellence.
Your immediate focus will be on:
Scaling for Growth: Leading the technical strategy to scale our metrics infrastructure to handle 10x data volume over the next 2 years
API Evolution: Driving the roadmap for new metrics datasets, query capabilities, and integration patterns
Operational Excellence: Ensuring 99.99%+ availability, sub-second query performance, and seamless incident response
Cross-Team Collaboration: Partnering with multiple teams across Telemetry, Cloud Infrastructure, and Product to deliver end-to-end observability solutions
Define and execute the multi-year technical roadmap for the Metrics Platform, including Data infrastructure cluster evolution, data retention strategies, and query optimization
Build, mentor, and grow a world-class engineering team
Partner with Product Management to define and prioritize the Metrics API roadmap based on customer needs and business impact
Align with Confluent's broader observability strategy across Cloud and Platform offerings
Establish metrics and KPIs to measure system performance, system reliability, and customer satisfaction
14+ years of overall experience in software development and engineering
4+ years of engineering management experience, leading productive, high-performing teams
Experience operating large-scale distributed systems in production environments (preferably cloud-native)
Leadership & Management Skills
Demonstrated ability to hire and retain top engineering talent, provide impactful coaching, and drive high-performance results.
Proven track record of shipping features consistently and meeting aggressive deadlines with a high degree of urgency.
Exceptional prioritization skills with the ability to balance short-term execution with a long-term strategic vision for technical evolution.
Exceptional communication and collaboration skills, with a focus on building a positive, inclusive team culture aligned with organizational goals.
Technical Expertise
Solid fundamentals in distributed systems design, replication protocols, and high-availability production operations.
Deep familiarity with Kafka or similar high-scale event streaming platforms (Pulsar, Flink, etc.) in cloud environments.
Experience operating complex architectures across large public clouds (AWS, GCP, Azure) or private cloud-native infrastructures.
Strong engineering background with a hands-on approach to technology and a passion for architectural deep-dives.
Direct experience with Apache Druid in production at scale
Familiarity with Prometheus, OpenMetrics, or OpenTelemetry ecosystems
Experience in SaaS or platform engineering organizations
Belonging isn’t a perk here. It’s the baseline. We work across time zones and backgrounds, knowing the best ideas come from different perspectives. And we make space for everyone to lead, grow, and challenge what’s possible.
We’re proud to be an equal opportunity workplace. Employment decisions are based on job-related criteria, without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by law.
Privacy StatementConfluent is an IBM subsidiary which has been acquired by IBM and will be integrated into the IBM organization. By proceeding with this application, you understand that Confluent will share your personal information with other IBM affiliates involved in your recruitment process, wherever these are located. More Information on how IBM protects your personal information, including the safeguards in case of cross-border data transfer, are available here.

