Job Information
IBM Senior Software Engineer - Velox Operators for GPU in San Jose, California
Introduction
At IBM Software, we transform client challenges into solutions. Building the world’s leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers to learn, grow, and make an impact on a global scale. Working in Software means joining a team fueled by curiosity and collaboration. You’ll work with diverse technologies, partners, and industries to design, develop, and deliver solutions that power digital transformation. With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBM’s product and technology landscape. Here, you’ll have the tools and opportunities to advance your career while creating software that changes the world.
Your role and responsibilities
We are looking for an expert C++ Engineer to join the core team responsible for the next-generation Presto engine (Prestissimo/Velox). In this role, you will work closely with Presto C++ Tech Leads to bridge the gap between Velox’s vectorized execution and GPU acceleration. You will be responsible for implementing and optimizing critical database operators and functions to run efficiently on GPU hardware.
Design and implement vectorized operators (Joins, Aggregations, Filter, Project) in Velox C++ that can seamlessly offload computation to GPUs.
Optimize memory bandwidth usage and data transfer protocols between Host (CPU) and Device (GPU) to minimize latency for interactive queries.
Ensure all GPU-accelerated functions maintain strict compatibility with the upstream Velox library and Presto’s function signatures.
Work with the architectural team to define the standard for "heterogeneous execution" (mixing CPU and GPU processing within a single query plan).
Collaborate with the open-source community to upstream Velox improvements.
Debug complex performance bottlenecks in a distributed query engine environment.
Required technical and professional expertise
5+ years of experience in systems programming using modern C++.
Solid understanding of database operators and vectorized execution models.
Knowledge of GPU programming (CUDA, RAPIDS, etc.).
Deep understanding of columnar data formats (Arrow, Parquet) and SIMD/Vectorized processing.
Familiarity with performance optimization.
Experience writing low-latency, high-throughput systems code.
Ability to debug complex crashes or race conditions in a multi-threaded C++ environment.
Ability to contribute to design documents and architectural discussions, and to work effectively across a distributed engineering team.
Preferred technical and professional experience
Experience with Velox / Presto / Trino
Experience with distributed systems
Experience with CUDA or ROCm programming
Contributor to open-source database engines (Velox, ClickHouse, DuckDB, Apache Arrow)
IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.