OneMain Financial Jobs

Job Information

IBM Senior Software Engineer - Presto C++ Core in San Jose, California

Introduction

At IBM Software, we transform client challenges into solutions. Building the world’s leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers to learn, grow, and make an impact on a global scale. Working in Software means joining a team fueled by curiosity and collaboration. You’ll work with diverse technologies, partners, and industries to design, develop, and deliver solutions that power digital transformation. With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBM’s product and technology landscape. Here, you’ll have the tools and opportunities to advance your career while creating software that changes the world.

Your role and responsibilities

We are looking for a Senior Software Engineer to own critical work across fault tolerance, workload and resource management in Presto. This role involves significant work in both Java and C++. Strong proficiency in at least one is required, with the expectation to develop depth in the other. You will design systems that span the coordinator and worker layers, ship production-quality code in both languages, and help define how Presto manages failures, enforces resource boundaries, and scales elastically—including across heterogeneous clusters with mixed hardware capabilities.

You will partner closely with team leadership in architectural direction, contribute to design documents and RFCs, and engage with the PrestoDB open-source community. This is a hands-on building role with real influence on the technical direction of one of the most widely deployed open-source query engines in the world.

  • Fault Tolerance & Query Resilience: Design and implement transparent query retry and intermediate result spooling, enabling queries to recover gracefully from worker failures. Build failure detection and task rerouting logic that can reassign work to healthy nodes or alternative resource groups, including across heterogeneous hardware configurations.

  • Resource Management: Own the resource enforcement layer on worker nodes — memory limits, CPU quotas, spill-to-disk triggers — and the protocol for reporting utilization metrics back to the coordinator. Design how the coordinator reacts to worker-reported resource pressure (backpressure, throttling, preemption, auto-scaling).

  • Scheduling & Heterogeneous Workers: Collaborate on scheduling support for mixed worker pools, enabling the coordinator to route work based on node capabilities, hardware tags, and current load. This includes extending resource group selection logic and admission control.

  • Observability & Benchmarking: Build instrumentation across both the coordinator and worker layers to surface per-query resource utilization, scheduling latency, spill rates, and retry metrics. Develop benchmarking infrastructure to validate workload management behavior under contention and failure injection.

  • Cross-Layer Engineering: Work across the Java coordinator and C++ worker boundary — protocol changes, serialization, type fidelity — ensuring that workload management decisions made in Java are enforced correctly in the native execution layer.

Required technical and professional expertise

  • Systems Programming: 5+ years of deep expertise in Java or C++ (or both), with strong understanding of memory management, concurrency, and performance optimization in distributed environments. Candidates are expected to develop fluency in both languages

  • Query Engine or Data Infrastructure: Experience building or contributing to a production-grade query engine, database kernel, or large-scale data processing system (e.g., Presto, Spark, Flink, Velox, DuckDB, ClickHouse, Snowflake, or similar).

  • Performance Engineering: Experience profiling and optimizing systems across CPU, memory, and I/O in distributed environments. Comfortable with tools like perf, VTune, async-profiler, JFR, or equivalent.

  • Communication: Ability to contribute to design documents and architectural discussions, and to work effectively across a distributed engineering team.

Preferred technical and professional experience

  • Database Scheduler Knowledge: Familiarity with query stage scheduling methodologies (e.g., Phased Execution, Spark-style, Interactive).

  • Presto/Trino Familiarity: Knowledge of Presto internals including resource groups, stage scheduling, connector architecture, or plugin models.

  • Cross-Language Systems: Experience spanning JVM and native execution environments — JNI bridges, sidecar architectures, or polyglot distributed systems.

  • Lakehouse & Table Formats: Background in Iceberg, Delta Lake, or Hudi, particularly how fault tolerance interacts with snapshot isolation and consistent reads during retries.

  • Observability: Experience building instrumentation, tracing, or metrics infrastructure for distributed systems.

  • Open Source: Contribution experience in database systems or distributed infrastructure projects.

IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

DirectEmployers