Skip to main content
Site Reliability Engineer @ one2n.io
  1. Work Experience/

Site Reliability Engineer @ one2n.io

·291 words·2 mins
 Author
Hi, I’m Harsh. An Engineer, who loves infrastructure. For now, I work with anything that falls in between DevOps, SRE, Cloud and Infrastructure. I don’t know what to do next. I’m eternally curious.

September 2025 - Present

DevOps and Cloud Infrastructure Consultant for an AI based Fraud Detection and AML Monitoring Startup

  1. Accelerated Provisioning: Leveraged Terraform and Pulumi to automate regional and BYOC (Bring Your Own Cloud) environment deployments. Reducing TAT for new infra from 1 week to 1 day.
  2. Cost Optimization: Saved $4.8k/mo by tuning Kubernetes requests and limits to eliminate over-provisioning of GKE Nodes.
  3. Zero-Trust Security: Secured back-office portals by implementing Cloudflare Zero Trust.
  4. Rapid Recovery: Reduced rollback time from 30 to 5 minutes by introducing optimized pipelines.
  5. Multi-Cloud PaaS Architecture: Led the re-architecture of the GCP based SaaS into a PaaS to support enterprise self-deployment across on-prem, AWS, GCP and Azure.
  6. Data Warehousing and Analytics: Led architecture planning for multi-tenancy and deployment of an end-to-end analytics platform based out of Starrocks Data Warehouse, Debezium and Kuberay on GKE
  7. SOC2 Compliance: Led infrastructure and development action items for SOC2 Type 2 certification.

April 2025 - September 2025

Site Reliability Engineer for a Mid-sized Fintech (Payments Platform similar to PayTM)

  1. Production & Infra Support: On-call and infrastructure ownership across 5 AWS environments for 45+ services during monolith-to-microservices migration.
  2. Alerting: Reduced PagerDuty alerts from 100/day to 4/day via signal and threshold tuning.
  3. Reliability: Implemented Datadog Synthetics to protect critical SMS/OTP authentication flows.
  4. Automation: Cut pre-deploy checks from 1 hour to 10 seconds using Python CLI and GitHub Actions.
  5. 12-Factor Architecture: Refactored Ruby services with Shoryuken and Sidekiq on AWS SQS; isolated workers into separate containers for fault isolation and faster recovery.
  6. IaC: Decoupled Terraform from CI/CD and introduced a custom PR review workflow for infrastructure changes.
  7. Deployments: Rolled out Kubernetes canary releases for 27 high-traffic services.
  8. Incident Management: Designed SLI/SLOs to improve MTTD and MTTR.
  9. Self-Healing: Automated Aurora Postgres lock remediation, eliminating manual intervention.