Job Information
XCEL ENGINEERING INC Senior HPC Linux Systems Engineer in Oak Ridge, Tennessee
COMPANY OVERVIEW
XCEL Engineering, Inc. is an award-winning small business that provides trusted information technology, engineering, consulting and project management solutions and services to federal agencies and organizations. Originally founded in 1971 by professional engineers at the University of Tennessee, XCEL was acquired in 2003 by U.S. Army and Navy veterans and in 2023 became a MartinFed company.
XCEL Engineering is a part of IT Lab Partners (ITLP) which was created to support a leading research facility in the East Tennessee region in recruiting the best and the brightest technical talent. Considering joining our impressive team today!
JOB OVERVIEW
XCEL Engineering is seeking a qualified applicant for aSenior HPC Linux Systems Engineer to work for theNational Center for Computational Sciences (NCCS) at Oak Ridge National Lab (ORNL), which hosts several of the world's most powerful computer systems, is seeking a highly qualified individual to play a key role in improving the security, performance, and reliability of the NCCS computing environments. This includes supporting one of the fastest supercomputers in the world, Frontier, along with numerous commodity clusters and specialized programs and partnerships. Frontier is one of the scientific research community's most powerful computational instruments for exploring solutions to some of today's most challenging problems.
ESSENTIAL FUNCTIONS
- Install, integrate, and administer HPC Linux clusters and high-speed networks
- Diagnosing system operational problems quickly and effectively
- Coordinating with vendors to resolve hardware and software problems
- Recommending, planning, and coordinating hardware and software changes with customer participation using change management processes
- Porting and writing system management tools
- Documenting system administration procedures for routine and complex tasks
- Participating in a 24-hour, 7-day on-call support rotation and off-hours maintenance windows
- System implementation/integration into the NCCS environment and systems performance
- Lead system deployment, integration and troubleshooting of a large-scale computer
- Participate in relevant systems topics with the internal and external community of peers contributing experiences and solutions.
- Mentor junior-level staff as they join the
- Deliver ORNL's mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service.
BASIC QUALIFICATIONS
- Bachelor's Degree in a scientific or technical field
- 8+ years of Linux systems experience is required
- An equivalent combination of education and experience will be considered
DESIRED QUALIFICATIONS
Experience managing Linux operating systems in a large-scale system
environment
Solid understanding of networked computing environment
concepts
Experience with Linux Cluster
Administration
Ability
to
develop and
maintain programs and
scripts that
aid
in
the
operation and
automation
of
administrative
tasksusing various shell and scripting languages (bash, Python, Go)
Experience with Lustre and GPFS file
systems
Experience with batch schedulers (particularly
SLURM)
Experience deploying and maintaining automated configuration management software such as
Puppet
Strong interpersonal and commu