Career Profile
Senior Systems/Platform/SRE engineer specializing in large-scale distributed systems, high-performance infrastructure, cloud-native platforms, and production reliability. Experienced in optimizing complex systems for latency, throughput, availability, and cost. Strong in Linux internals, networking, Kubernetes, system design, and debugging at scale. Skilled at building resilient, high-throughput architectures powering mission-critical workloads.
Experiences
- Architectural Leadership: Currently leading transformation of core service platform from Mesos/Marathon to Kubernetes, defining patterns for deployment, networking, runtime configuration, observability, and multi-environment topology.
- Multi-Cloud Infra Ownership: Responsible for cloud footprint across AWS, Azure, and Krutrim; directed cloud migration strategies.
- Platform Reliability: Drove SLO-first culture across engineering; introduced standard runbooks, unified alerting, and postmortem processes across teams.
- Networking & Traffic Engineering: Re-architected DNS layers with DNSMasq, Unbound, and caching tiers, reducing Route53 queries and improving intra-cluster latency.
- Distributed Cache Platform: Designed HA Redis clusters (100+ clusters) on Kubernetes with automated failover, monitoring, and seamless migrations.
- Cost Engineering: Reduced infra cost by 25%+ via architectural consolidation, compute rightsizing, Graviton adoption, refined autoscaling, and observability optimization.
- Security Ownership: Ensured PCI compliance; worked with auditors, implemented controls, audit readiness, and secure operational guidelines.
- Infrastructure Modernization: Led multiple POCs for replacing legacy stacks onto modern and efficient tooling to cater the increasing scale of infrastructure environments.
- High-Severity Incident Leadership: Led outage response, coordinated multi-team debugging, drove RCAs, and implemented systemic fixes.
- Deployed several micro service architectured applications to Kubernetes cluster(AKS,GKE&GCP).
- Used Helm charts to manage deployment to the Kubernetes Cluster.
- Setup Gitlab CI jobs for CI/CD towards the Kubernetes Cluster.
- Exploring new technologies(Devops tools) to help automate development process.
- Setup on-prem Kubernetes Cluster using Rancher.
- Worked closely with developers to understand the development methodology and helped them with all infra related issues.
- Accomplished Deployment automation using Capistrano integrated with Gitlab.
- Worked on a big data project with Hadoop at Azure(HDinsight) and Ruby on Rails.
- Build environments from scratch and worked with different teams to successfully deploy the code to production.
- Implementation of Backup/Retention policy using bash scripts.
- Automating daily reports from Database using bash script.
- Monitoring Cloud server health Azure portal(AppInsights) and HDInsight using Ambari.
- Handling technical issues via ticketing system, chats, organizing con-calls with different stake holders for timely resolution of issues.
- Enabled Monitoring of server’s using ZABBIX.
- Hands on Experience in Linux servers, configuring ssh, Mysql, Networking, System administration, Telecom ss7 technologies.
- Installing and configuring a new Ubuntu server with all necessary packages for development like RVM rubies, Nginx, Kannel etc.
- Assist Engineering Lead – Operations in technology transformations for the Operations division.
- Identify bugs and work closely with DEV team on fixing them.
- Familiarity with Amazon AWS (EC2, RDS, S3)
- Automation using Ansible playbooks.
- Configured Nagios Plugin(ICINGA).
- CI/CD using Gitlab Runners.
- Network troubleshooting (including iptables, firewalls).
- Complete knowledge of in house applications for SMS promotions and USSD.
- Automating reports from Mysql Databases using bash script.
- Expert in technical troubleshooting of SoHo networks and Windows Issues.
- Support nesting team in resolving complex issues.
- Diagnose, troubleshoot and resolve a wide range of connectivity issues.
- Maintain high level quality, technical expertise, soft skills, and phone etiquette skills.
- Help customers identify and resolve issues pertaining to dial up configuration, web hosting and domain registration.
- Adheres to high level of service ethic.
