Skip to main content
Site Reliability Engineer @ one2n.io
  1. Work Experience/

Site Reliability Engineer @ one2n.io

·310 words·2 mins
 Author
Hi, I’m Harsh. An Engineer, who loves infrastructure. For now, I work with anything that falls in between DevOps, SRE, Cloud and Infrastructure. I don’t know what to do next. I’m eternally curious.

September 2025 - Present

DevOps and Cloud Infrastructure Consultant for an AI based Fraud Detection and AML Monitoring Startup

  1. Collaborating on performance optimization for workflows running on Netflix Conductor deployed on GKE.
  2. Automated software delivery for multi-tenant architectures using Terraform, reducing new environment setup time from 1 week to 1 day.
  3. Fixed CI/CD pipelines, reduced rollback times from 3 min to 30 seconds for GKE and GCE workloads.
  4. Achieved monthly savings of $4,000 through K8s resource and limit optimization to prevent node overprovisioning.
  5. Implemented Zero Trust architecture with RBAC using Cloudflare Zero Trust.
  6. Migrated publicly exposed back-office portals to private access through Cloudflare Zero Trust.
  7. Led the organization through a successful SOC2 Type II compliance process.
  8. Currently working with enterprise customers to deploy our SaaS product into their own AWS/on-prem environments, including network architecture design, secure IAM setup, and delivering Helm/Terraform-based installation and upgrade flows.

April 2025 - September 2025

Site Reliability Engineer for a Mid-sized Fintech (Payments Platform similar to PayTM)

  1. Owned end-to-end reliability for 45+ services across 6 teams and 5 AWS environments; led migration from monolith to microservices for core payment functions.
  2. Provided production support for services built in Ruby on Rails, Spring Boot, and Go.
  3. Contributed to the design and rollout of an SLI/SLO framework for critical services and ETL pipelines.
  4. Implemented canary deployments for 27 services, improving release safety and reducing incident risk.
  5. Collaborated in decoupling Terraform IaC from CI/CD, supporting migration from AWS CodePipeline and reducing technical debt.
  6. Built Python CLI tools and GitHub Actions for configuration drift detection, reduced pre-deployment checks from 60 min to 10 sec to enable self-service for deployment owners.
  7. Reduced alert fatigue from 100 alerts per day to 4.
  8. Created incident runbooks and Datadog synthetic monitors for OTP and SMS reliability.
  9. Automated DB Lock resolution in Aurora Postgres using cron jobs.
  10. Developed custom Terraform PR Automation with Github Actions.