Phonepe : Site Reliability Engineer – Big Data

March 12, 2026 by Abhay Singh

📢 Join Our WhatsApp Group for Instant Job Updates

Category	Details
Company	PhonePe
Job Title	Site Reliability Engineer – Big Data
Location	Bangalore
Experience	7–11 years
Role Focus	Manage and maintain distributed big data ecosystems; ensure reliability, scalability, and security of large-scale production infrastructure
Key Responsibilities	– Manage Linux/Unix environments and on-call incident response- Design & implement automation for provisioning, scaling, upgrades, patching clusters- Troubleshoot production issues and perform root cause analysis- Optimize system performance, resource usage, and workflows- Collaborate with teams on system design and integration- Enforce security and SRE best practices- Develop operational automation scripts/tools- Monitor system health using ELK, Grafana, Prometheus, OpenTelemetry
Technical Skills	– Linux (IP, Iptables, IPsec)- Scripting: Perl, Golang, Python- Hadoop stack: HDFS, HBase, Airflow, YARN, Ranger, Kafka, Pinot- Configuration/Deployment: Puppet, Salt, Chef, Ansible- DevOps tools: Docker, Git, Saltstack, Ansible- Observability/Monitoring: ELK, Grafana, Prometheus, OpenTelemetry- Networking, cloud infrastructure (AWS, GCP, Azure – good to have)
Scope / Scale	Large-scale big data production clusters, supporting critical business services for millions of users and 330+ million transactions/day
Soft Skills	Strong collaboration, communication, problem-solving, and independent decision-making
Benefits	Medical, critical illness, accidental, life insurance; wellness programs; parental support; mobility benefits; retirement benefits (PF, NPS, Gratuity); higher education assistance, car lease, salary advance
Key Differentiator	Focus on infrastructure reliability and automation at scale; distinct from HR, Payroll, or PR roles at PhonePe

Click here to apply

📢 Join Our WhatsApp Group for Instant Job Updates

Leave a Comment Cancel reply