
I build and operate reliable cloud-native systems with automation, observability, and infrastructure as code. 3+ years in 24×7 production at TCS, plus hands-on systems and reporting work at UNT Libraries.
I'm a cloud and DevOps-focused engineer with 3+ years of hands-on experience supporting and improving production-grade systems in 24×7 environments. My background combines Linux operations, Python automation, cloud infrastructure, and reliability engineering practices.
I have worked extensively with monitoring, alert triage, incident response, and root-cause analysis in distributed systems. I focus on building observable, scalable, and resilient infrastructure using AWS services, Terraform, Kubernetes, and CI/CD pipelines.
In addition to operations, I build practical engineering projects involving serverless architectures, Kubernetes observability, and secure cloud networking to deepen my platform and automation expertise.
Open to Junior Cloud, DevOps, Linux, and Platform Engineering opportunities.
Problem: Needed scalable processing for high-volume activity data.
Solution: Built event-driven AWS Lambda pipeline writing to RDS, DynamoDB, and S3 with logging and monitoring.
Result: Delivered a fault-tolerant ingestion system handling burst traffic reliably.
Problem: Services lacked visibility and required manual scaling.
Solution: Implemented Prometheus, Grafana, and HPA with load and failure testing.
Result: Achieved automated scaling and improved system reliability.
Problem: Required secure cloud network segmentation and controlled access.
Solution: Designed public/private VPC with NAT, routing controls, and layered security.
Result: Established isolated, secure architecture validated through testing.
Problem: Manual planning was slow and inconsistent.
Solution: Used Amazon Bedrock to generate structured project plans from prompts.
Result: Reduced planning time and improved documentation consistency.
"Reduced manual processing effort by ~40% through automation, documentation, and standardized workflows."
Standardized reporting and data workflows for academic services with reliable, repeatable processes.
"Improved incident response efficiency and system reliability by standardizing monitoring, automation, and operational procedures."
Supported large-scale production systems focused on reliability and automation.