| Company | PhonePe |
| Job Title | Site Reliability Engineer – Azure |
| Location | Bangalore |
| Experience | 4–8 years |
| Role Focus | Manage, scale, and ensure high availability of PhonePe’s core Azure-based cloud infrastructure; implement automation, monitoring, and networking solutions |
| Key Responsibilities | – Configure and maintain Azure VMs, storage, CosmosDB, ADX- Manage complex networking: Azure Firewall, Route Tables, VN Gateways, ExpressRoute, BGP- Automate BAU tasks with Terraform, Saltstack, Ansible, scripting- Manage databases: MySQL, Aerospike, replication, backups- Implement monitoring (Prometheus, Victoria Metrics, Riemann) and logging (Loki) with Grafana dashboards- Ensure security/compliance with SOC and Infosec- Incident management, DR planning, capacity & performance management |
| Technical Skills | – Cloud: Microsoft Azure core services- OS: Ubuntu/Linux administration- Scripting: Python, Go, Java, Bash- Monitoring/Observability: Prometheus, Victoria Metrics, Riemann, Grafana, Loki- IaC & Config Mgmt: Terraform, Saltstack, Ansible- Databases: MySQL, Aerospike, InfluxDB, ElasticSearch- Core infra: Nginx, HAProxy, RMQ, Docker- Networking: DNS, IPsec, BGP, ExpressRoute |
| Scope / Scale | Large-scale, mission-critical infrastructure supporting 600+ million users and 330+ million transactions/day |
| Soft Skills | Ownership, accountability, communication, mentoring (for senior roles), SLO/SLI management, toil reduction, cost optimization |
| Benefits | Same as other PhonePe full-time roles: insurance, wellness, parental support, mobility, retirement, education, car lease, salary advance |
| Key Differentiator | Focused on cloud infrastructure reliability, automation, networking, and database availability in a high-volume Azure environment |