September 2025 - Present
DevOps and Cloud Infrastructure Consultant for an AI based Fraud Detection and AML Monitoring Startup
- Accelerated Provisioning: Leveraged Terraform and Pulumi to automate regional and BYOC (Bring Your Own Cloud) environment deployments. Reducing TAT for new infra from 1 week to 1 day.
- Cost Optimization: Saved $4.8k/mo by tuning Kubernetes requests and limits to eliminate over-provisioning of GKE Nodes.
- Zero-Trust Security: Secured back-office portals by implementing Cloudflare Zero Trust.
- Rapid Recovery: Reduced rollback time from 30 to 5 minutes by introducing optimized pipelines.
- Multi-Cloud PaaS Architecture: Led the re-architecture of the GCP based SaaS into a PaaS to support enterprise self-deployment across on-prem, AWS, GCP and Azure.
- Data Warehousing and Analytics: Led architecture planning for multi-tenancy and deployment of an end-to-end analytics platform based out of Starrocks Data Warehouse, Debezium and Kuberay on GKE
- SOC2 Compliance: Led infrastructure and development action items for SOC2 Type 2 certification.
April 2025 - September 2025
Site Reliability Engineer for a Mid-sized Fintech (Payments Platform similar to PayTM)
- Production & Infra Support: On-call and infrastructure ownership across 5 AWS environments for 45+ services during monolith-to-microservices migration.
- Alerting: Reduced PagerDuty alerts from 100/day to 4/day via signal and threshold tuning.
- Reliability: Implemented Datadog Synthetics to protect critical SMS/OTP authentication flows.
- Automation: Cut pre-deploy checks from 1 hour to 10 seconds using Python CLI and GitHub Actions.
- 12-Factor Architecture: Refactored Ruby services with Shoryuken and Sidekiq on AWS SQS; isolated workers into separate containers for fault isolation and faster recovery.
- IaC: Decoupled Terraform from CI/CD and introduced a custom PR review workflow for infrastructure changes.
- Deployments: Rolled out Kubernetes canary releases for 27 high-traffic services.
- Incident Management: Designed SLI/SLOs to improve MTTD and MTTR.
- Self-Healing: Automated Aurora Postgres lock remediation, eliminating manual intervention.

