Site Reliability Engineer in GUADALAJARA, Mexico

Job Information

IBM Site Reliability Engineer in GUADALAJARA, Mexico

Introduction

At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You’ll work with diverse technologies and colleagues worldwide to deliver resilient, future-ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress.

Your role and responsibilities

We’re seeking a Junior Site Reliability Engineer to support the availability, performance, and day‑to‑day operations of our services and platforms. The engineer in this role will apply SRE best practices—automation, observability, Kubernetes, CI/CD—while developing technical depth under the guidance of senior engineers. Responsibilities include system maintenance, tooling improvements, participation in on‑call, and contributing to the reliability and scalability of services.

Key Responsibilities

Operations & Reliability

Participate in an on‑call rotation with mentorship and established runbooks
Perform operational tasks: log reviews, rollouts, restarts, configuration updates, certificate renewals
Maintain and update runbooks, dashboards, diagrams, and documentation

Monitoring & Observability

Build or update dashboards and alerts using Prometheus, Grafana, and Loki
Tune alerts to reduce noise and improve signal quality
Apply golden signal and RED/USE patterns under guidance

Automation & Tooling

Develop automation scripts with Python, Bash, or Go to eliminate repetitive tasks
Contribute to CI/CD pipelines (linting, gates, templates)

Cloud & Platform

Support deployment and operation of workloads on Docker, Kubernetes, and OpenShift
Contribute to infrastructure changes using Terraform and Ansible with review
Assist with basic cloud provisioning tasks

Networking & Security

Apply foundational networking concepts (TCP/IP, DNS, routing, HTTP, TLS) in troubleshooting
Follow least‑privilege and proper secrets‑management practices

Collaboration & Process

Participate in Agile ceremonies (standups, planning, retros)
Contribute to blameless post‑incident reviews
Collaborate with cross‑functional teams and use standard Git workflows

Required technical and professional expertise

Less than a year of experience in SRE/DevOps/Platform Engineering or related fields
Strong Linux fundamentals: CLI, processes, permissions, logs, troubleshooting
Proficiency in at least one scripting language (Python, Bash, or Go)
Experience with Git and GitHub workflows
Familiarity with Docker and Kubernetes basics
Understanding of CI/CD fundamentals
Basic networking knowledge
Advanced English proficiency is a must

Preferred technical and professional experience

OpenShift experience
Hands‑on exposure to Terraform and Ansible
Experience with Prometheus, Grafana, Loki, Thanos, or OpenTelemetry
Cloud platform fundamentals (IBM Cloud, AWS, Azure, or GCP)
Optional experience with JavaScript or TypeScript

IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

Apply Now

OneMain Financial Jobs

Job Information

IBM Site Reliability Engineer in GUADALAJARA, Mexico

Current Search Criteria