After 1.5 years running Postgres, here are the 8 operational mistakes that cause outages. Each failure mode will warn you about a landmine before you hit one. Give it a read…
As a long time Ubuntu User for 13 years, I’ve finally made the decision to swap out Ubuntu on all my machines, my VMs, my personal laptops and my secondary work laptop to Debian and Bluefin. This comes after years of frustration, Canonical’s pure neglect of basic “Ubuntu Ideals”, on the desktop front
This article discusses best practices for versioned schema changes, safe column modifications, and managing replica lag across database clusters. Includes practical patterns for transactions, data batching, and rollback strategies to maintain high availability in production.
Learn how to port-forward from EC2 instances in private subnets without opening SSH ports or setting up bastion hosts. This AWS SSM Session Manager guide provides a secure, cost-effective, and simple solution for EC2 access and port forwarding.
This article highlights the gap between idealized YouTube tutorials and real-world DevOps/SRE work, emphasizing fundamentals, debugging, and system reliability over superficial projects.
Explore how Retrieval Augmented Generation (RAG) can enhance Large Language Models (LLMs) like ChatGPT by leveraging external knowledge sources. Understand the fundamentals of LLMs, the limitations of standalone models, and the key components of a RAG-based application.
A comprehensive overview of MLOps for software engineers, covering key concepts, challenges, and best practices. Learn about knowledge silos, the MLOps lifecycle, automation, versioning, experiment tracking, and more.
This article focuses on leveraging cron-jobs in Kubernetes to pull host metrics from all nodes in Kubernetes and save them as log files. A few days ago I was asked to solve this problem as part of an interview process for an SRE position. The problem is a very nice way to brush up your skills in monitoring. So, I decided to share it with all readers.
Recently, I was working on a custom Observability suite where I needed to monitor long-running and blocking queries running on a HA Postgres Cluster running over Kubernetes using Patroni.