OneMain Financial Jobs

Job Information

The Hartford Staff Engineer, Reliability in Hyderabad, India

IND - Staff Engineer, Reliability - GCC070

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.

Cloud Services Team is searching for a Reliability Engineer. Candidate must have hands-on experience operating and engineering services on Google Cloud Platform (GCP), including data, compute, and observability services. The team is accountable for the operations, engineering, and governance of 200+ Cloud Technologies across a multiple cloud environment. Role requires helping mature operational practices for GCP workloads as part of our multi-cloud strategy. This is an excellent opportunity for someone who is interested in a mix of strategy and hands-on work. The ideal candidate should feel comfortable working with teammates at all levels of the organization including leadership.

Key Responsibilities

  • Assistsin the development,maintenanceand operations of IT services across 200+infraservices across our Cloud transformation landscape.

  • Develop solutions and driveadoption of enterprise solutions such as Cyber Protection, Disaster Recovery, and Security enhancements, acrossLineof business teams.

  • Drive improvement, through automation, of software delivered as a service from anefficiencyand simplicity perspective.

  • Provide clear operational documents and construction/support specifications toITuserbase.

  • Provide insight into operational Metrics across the entire Cloud Environment.

  • Consult with customers on any new requirements or design questions or functionality configurations for environments on and off premise

  • Delivers the tooling and capabilities needed to enable cloud compliance, metrics and reporting and costmanagementroadmap and strategy.

  • Participate in incident resolution and changeimplementationas necessary. This may occasionally include support duringnon standardhours.

  • Operate and improve reliability for production workloads running on Google Cloud Platform (GCP), focusing on availability, scalability, and operational readiness rather than application development.

  • Own day‑to‑day operational concerns for core GCP services including Compute Engine, GKE, Cloud Run,BigQuery, Cloud Storage, and supporting platform services.

  • Provide operational support forBigQueryplatforms including job performance troubleshooting, capacity planning, quota management, dataset permissions, and cost optimization (slot usage, reservations, and quotas).

  • Support Vertex AI platforms from an operations and reliability standpoint, including environment readiness, access controls, monitoring, pipeline execution health, and incident response (not model development).

  • Build andmaintainobservability standards using Cloud Monitoring, Cloud Logging, Error Reporting, and custom SLI/SLO dashboards for GCP workloads.

  • Implement alerting strategies aligned to error budgets and production reliability goals; reduce alert noise and prevent toil.

  • Execute incident response, triage, and post‑incident analysis for GCP services, contributing to PIRs and corrective actions.

  • Develop andmaintainrunbooks, operational playbooks, and escalation workflows for GCP services.

  • Drive automation-first operations, including self‑healing patterns using Cloud Functions, Cloud Run jobs, Scheduler, and event‑driven remediation.

  • Enforce and operate GCP security and governance controls, including IAM, service accounts, Org Policies, VPC Service Controls, KMS, Secret Manager, and networking guardrails.

  • Partner with engineering and data teams to review designs for operability, resiliency, and supportability, ensuring workloads meet production readiness standards beforelaunch.

Required Skills & Experience :

  • Expertunderstanding ofhow applications should be engineered by following fault tolerate best practices, separation of duties, observability, and being operator friendly.

  • Expert on beingSelf-motivated and results-oriented with the ability to work in a team environment and independently

  • Strong hands-on experience withBigQuery, including performance tuning, cost management, and governance.

  • Experience with Vertex AI, including pipelines, model deployment, model monitoring, and integration withBigQuery.

  • Deep knowledge of Cloud IAM, service accounts, Workload Identity Federation, and principle-of-least-privilege controls.

  • Experience with GKE operations (clusters, node pools, autoscaling, workload identity, Istio/Anthos optional).

  • Understanding ofCloud Storage, Pub/Sub, Dataflow,Dataproc, and Cloud Composer for data/ML workflows.

  • Experience building CI/CD pipelines targeting GCP using Cloud Build, Artifact Registry, and Terraform.

  • Ability to troubleshoot GCP networking: VPCs,firewallrules, private service access, interconnects/VPN.

Nice to Have

  • Intermediateknowledge ofTerraformand Cloud Formationrequired.

  • Intermediate Microsoft office skills

  • Hands-on experience with advanced GCP services such as Vertex AI,BigQuery, Dataflow, Pub/Sub, Cloud Run, and GKE.

  • Experience creating org-level policies, security baselines, and automation patterns for GCP environments

What We Offer

  • Collaborative work environment with global teams.

  • Competitive compensation and comprehensive benefits.

  • Continuous learning and growth opportunities in geospatial and risk analytics technologies.

DirectEmployers