OneMain Financial Jobs

Job Information

Meta Hardware Systems Engineer, NPI in Menlo Park, California

Summary:

Meta is seeking a highly skilled and experienced Systems/Hardware Engineer to join our Release to Production (RTP) team. The RTP team is responsible for the end-to-end Hardware Lifecycle of all Meta servers, including prototyping, pre-production hands-on system validation, hardware debugging, and stress testing. As a Systems/Hardware Systems Engineer, you will work closely with various teams, including HW/SW co-design teams, hardware designers, networking teams, system manufacturers, component vendors, capacity engineering, production engineering, production services, and data center operations teams to enable new systems that will be deployed in our production data centers.

Required Skills:

Hardware Systems Engineer, NPI Responsibilities:

  1. Interface with external vendors and internal teams to understand system architecture and develop Hardware Fault Management for various server products

  2. Drive new platform enablement, hardware validation, tooling specification, and integration, customer workload testing, and experiment creation to detect and diagnose hardware/firmware/software health issues

  3. Proactively create experiments and tooling to detect and diagnose hardware/firmware/software health issues

  4. Leverage understanding of RAS (reliability, availability, serviceability) to improve error reporting and error handling mechanisms for better operation quality and cost/efficiency

  5. Develop visibility through data visualization and implement systemic solutions to hardware health issues

  6. Troubleshoot, diagnose, and root cause system failures, isolating components/failure scenarios while working with internal & external stakeholders

  7. Lead bring-up, validation, and deployment of cutting-edge hardware systems in lab and datacenter environments

  8. Design and implement robust system-level test plans, including functional, stress, and performance tests

  9. Enhance hardware reliability by creating data visualizations and implementing systemic solutions to address recurring health issues

Minimum Qualifications:

Minimum Qualifications:

  1. Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

  2. 6+ years of work experience in one or more domains such as: ASIC development, compute, AI-ML hardware/software, storage, memory, network, server interconnect technologies, or similar

  3. Knowledge of architecture and components on one of the following products: server/PC/Laptop

  4. Development or debug experience in one or more areas: hardware fault management, error reporting, error handling on hardware products

  5. Experience with Python, C/C++ and/or similar languages, within a Linux environment, for server system management, automation, version control, CI/CD, or similar

  6. Demonstrated problem-solving skills, with track record of resolving to troubleshoot complex technical issues

  7. Demonstrated communication and collaboration skills, with the track record of working effectively with cross-functional teams

  8. Experience working in a matrix organization

Preferred Qualifications:

Preferred Qualifications:

  1. 7+ years of experience with a subset of one of the following domains: Compute Systems, Storage Systems, Accelerated Compute Systems/HPC, Kernel/Firmware Development and/or test, Post Silicon Bringup

  2. Experience with x86 or ARM-based CPUs and their subsystems (e.g. memory, inter-chiplet communications, RAS/DFT, performance management, power management)

  3. Working/functional knowledge of common bus protocols such as I2C, SPI, USB, LP/DDR, and/or PCIe

  4. Hands-on experience troubleshooting problems at system level, crossing across multiple components, as well as hardware/firmware/software boundaries. Hands on experience managing/debugging Linux servers

  5. Understanding of the hardware development process and how it pertains to test strategy. Experience authoring test plans for complex chipsets for functional, stress and performance testing

  6. Familiarity with debugging tools for systems-on-chip (SoCs) - eg. JTAG, GDB, DSTREAM, Trace32

  7. Experienced in the integration of lab tools for automated workflows with large scale deployments. Proficiency in continuous integration/continuous delivery tools

  8. 2+ years experience scripting automation in Python or equivalent

Public Compensation:

$144,000/year to $204,000/year + bonus + equity + benefits

Industry: Internet

Equal Opportunity:

Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.

Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@meta.com.

DirectEmployers