Job Information
UnitedHealth Group Site Reliability Engineer - AWS, Terraform, Python in Bangalore, India
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
We are seeking a Site Reliability Engineer to join our team and help maintain and improve the reliability, scalability, and security of our cloud infrastructure. This is an excellent opportunity for someone early in their SRE career who wants to work with modern cloud technologies and gain hands-on experience managing production infrastructure at scale.
Primary Responsibilities:
Cloud Infrastructure Maintenance
You will be responsible for regular maintenance and upgrades of our cloud components, including but not limited to:
Core Infrastructure:
RDS Aurora (PostgreSQL) upgrades and maintenance
EKS (Kubernetes) cluster upgrades and node pool management
EC2 instance patching and maintenance
Data Infrastructure:
Redshift serverless cluster maintenance
AWS Glue jobs and crawlers management
Kinesis streams and Firehose delivery streams monitoring
Application Infrastructure:
Lambda function runtime upgrades and optimization
API Gateway configuration and maintenance
Application Load Balancer (ALB) management
CloudFront distributions and WAF rule updates
Storage & Data Services:
S3 bucket lifecycle policies and versioning
DynamoDB table maintenance and scaling
AWS Backup job monitoring and validation
EBS volume management and snapshots
Networking:
VPC peering connections maintenance
Network ACL and Security Group updates
VPN and Direct Connect monitoring
Private endpoints and PrivateLink management
Security Services:
AWS Secrets Manager rotation and access
KMS key management and rotation
Guard Duty findings remediation
Cognito user pool management
Observability & Logging:
CloudWatch log groups and metric alarms
Elasticsearch (OpenSearch) cluster maintenance
FluentBit logging pipeline management
AWS Config rule compliance
Messaging & Streaming:
SQS queue monitoring and dead-letter queue management
SNS topic subscription management
Event Bridge rule maintenance
Security & Compliance Management
You will help ensure our infrastructure meets enterprise security standards:
Secure Code Vulnerabilities: Review and remediate security findings from code scanning tools
Cloud Configuration Vulnerabilities: Address misconfigurations identified by AWS Config, SecurityHub, and other compliance tools
Cloud Vulnerabilities: Patch EC2 instances, update container images, and respond to AWS security bulletins
Compliance Reporting: Maintain documentation and evidence for security audits
Employee Access Management
You will manage access requests to various systems. While proficiency is not required (we have comprehensive runbooks), familiarity is a plus:
Salesforce - CRM platform access management
Nice CX One - Contact center platform
Skedulo - Workforce management system
Athena EHR - Electronic health records system
Twilio - Communications platform
AWS Resources - IAM roles, policies, and resource access
Infrastructure Support & Troubleshooting
You will work with development teams to:
Investigate and resolve infrastructure-related incidents
Troubleshoot application deployment issues
Optimize infrastructure costs and performance
Recommend appropriate infrastructure solutions for new application requirements
Create and maintain runbooks and documentation
Participate in on-call rotation for production support
Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
Required Qualifications:
1+ years of experience with cloud platforms (AWS preferred)
Basic understanding of infrastructure as code concepts
Familiarity with version control systems (Git)
Proven solid troubleshooting and problem-solving skills
Proven excellent communication skills and ability to work collaboratively
Proven eagerness to learn and adapt to new technologies
Tech Stack experience or Knowledge
AWS - Our primary cloud platform
Terraform - Infrastructure as Code for managing all cloud resources
PostgreSQL (RDS) - Our primary database platform
Preferred Qualifications:
AWS certifications (Solutions Architect Associate, SysOps Administrator, etc.)
Experience with Terraform or other IaC tools (CloudFormation, Pulumi, etc.)
Experience with CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, etc.)
Scripting experience in Python, Bash, or PowerShell
Database administration experience with PostgreSQL or other relational databases
Experience with incident management and on-call support
Tech Stack experience or knowledge
C# - Primary language used by application teams
Python - Additional language used by application teams
Node.Js - Additional language used by application teams
Familiarity with containerization (Docker) and orchestration (Kubernetes/EKS)
Knowledge of monitoring and observability tools (CloudWatch, Datadog, Prometheus, Grafana)
Understanding of networking concepts (VPC, subnets, routing, load balancing)
What You'll Learn
This role offers excellent growth opportunities to develop expertise in:
Large-scale AWS infrastructure management
Infrastructure as Code best practices with Terraform
Kubernetes or EKS operations and cluster management
Security compliance and vulnerability management
Incident response and postmortem processes
Cross-functional collaboration with development teams
SRE principles including SLIs, SLOs, and error budgets
Work Environment
Collaborative team environment with experienced SREs
Modern tools and technologies
Opportunities for professional development and certification
Participation in on-call rotation (with mentorship and runbook support)
At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone-of every race, gender, sexuality, age, location and income-deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.