OneMain Financial Jobs

Job Information

IBM AI Performance Analyst in Bangalore, India

Introduction

At IBM Infrastructure & Technology, we design and operate the systems that keep the world running. From high-resiliency mainframes and hybrid cloud platforms to networking, automation, and site reliability. Our teams ensure the performance, security, and scalability that clients and industries depend on every day. Working in Infrastructure & Technology means tackling complex challenges with curiosity and collaboration. You’ll work with diverse technologies and colleagues worldwide to deliver resilient, future-ready solutions that power innovation. With continuous learning, career growth, and a supportive culture, IBM provides the opportunities to build expertise and shape the infrastructure that drives progress.

The IBM Z Software Performance team is responsible for designing, executing, and analyzing stress workloads and benchmarks on IBM Z and LinuxONE systems to ensure these platforms meet stringent customer expectations for reliability, scalability, and performance.

The team develops and maintains a suite of tools and automation scripts to support:

  • Performance environment setup and configuration

  • Performance measurements and data capture

  • Data storage in centralized repositories

  • Presentation and visualization of performance metrics

  • Analysis and comparison of captured data across multiple scenarios

Your role and responsibilities

Your Role and Responsibilities*

As a AI Performance Analyst you will be responsible for:

  • Design and implement benchmarks and stress workloads for AIU IO Cards, ensuring they remain current and relevant.

  • Set up benchmarks and stress workloads, including configuring the underlying AIU IO Card for various performance scenarios.

  • Automate performance measurements and streamline data collection processes for benchmarks and stress workloads.

  • Develop and enhance data collection and analysis tools to improve efficiency and accuracy.

  • Execute performance benchmarks and stress workloads to validate system performance.

  • Analyze performance measurements and collected data to identify bottlenecks and resolve performance issues.

  • Collaborate with development teams across the stack (IBM Z Hardware, IBM Research, IBM AIU application stack, Middleware/Applications) to guide and support performance optimization efforts related to configurations.

Required technical and professional expertise

  • Overall Experience: 5-10 years in performance measurement, analysis, and system testing.

  • Education: Bachelor’s degree in Computer Science or Information Science.

Technical Skills

AI/ML Knowledge:

Basic understanding of ML/AI model architecture, training, and inferencing.

3+ years of experience with PyTorch, Tensorflow, vLLM

Development & Automation:

Proficiency in source code repository systems (e.g., Git).

Strong scripting and test automation skills.

System & Containerization:

Basic Linux administration skills.

Hands-on experience with Docker and Podman containers.

Solid understanding of Operating System fundamentals and Computer Architecture concepts.

Basic experience in performance analysis of applications/systems and familiarity with performance tools.

Hands-on experience in functional and performance testing of multi-tiered applications.

Additional Skills:

Exposure to Agile methodologies and ability to apply agile concepts effectively.

Strong presentation skills and ability to communicate technical concepts clearly.

Collaborative team player with excellent interpersonal and communication skills.

Programming Languages and scripts:

Python, C/C++, Bash, Ansible, Java

Preferred technical and professional experience

  • Master’s degree in Information Technology, Computer Science, or Computer Engineering.

  • AI/ML & Model Serving

  • Know‑how in Transformer model design and modification (architecture tuning, fine‑tuning, optimization).

  • Hands-on experience with TensorFlow and model inference serving using TensorFlow Serving, NVIDIA Triton Inference Server, and vLLM.

  • Systems Performance & Observability

  • Proficient in performance profiling and tracing (e.g., Linux perf, flame graphs, instrumentation).

  • Advanced Linux administration skills (networking, storage, process management, kernel parameters, automation).

  • Hardware & Accelerators

  • Experience in hardware design and debugging (board bring-up, driver interactions, performance counters).

  • Working knowledge of AI accelerator architectures—GPU, TPU, AMX (capabilities, memory hierarchies, scheduling/tiling considerations).

  • Programming Languages

  • Proficiency in CUDA (kernel development, memory optimization, streams/concurrency) and Java (services, tooling, SDK integration).

IBM is committed to creating a diverse environment and is proud to be an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, caste, genetics, pregnancy, disability, neurodivergence, age, veteran status, or other characteristics. IBM is also committed to compliance with all fair employment practices regarding citizenship and immigration status.

DirectEmployers