r/HPC | reddlx

Keeping POSIX IDs in sync with AD

We're close to launching a new University shared cluster with attached research storage, the VM that handles accessing the research storage (and also a user's cluster directories if they wish) is connected to our AD via winbind so we can get the shares mounted via CIFS on the Windows managed devices.

The issue is trying to ensure the converted POSIX IDs that winbind makes stay in sync with standard LDAP lookups that SSSD does (to the same DCs) on the rest of the cluster. We've had success so far at least by telling SSSD to keep it within a range and using ``ldap_idmap_autorid_compat`` but we've found if a user would change their password SSSD hands them a completely different user ID until we clear SSSD's cache (or possibly wait for it to resync itself which isn't ideal).

Since the cluster itself is in it's own containerized network with very little if any access to the rest of the University network, joining the rest of the system to our AD is a non-starter. We're thinking of setting up a Keycloak VM that ties into our AD so that way POSIX IDs are handled entirely by Keycloak and there's no conflict issues. Is it worth setting up though?

reddit.com

u/Funny744 — 19 hours ago

▲ 20 r/HPC

New Grad Looking for Advice on Breaking into HPC and ML Systems

Hi r/hpc,

I'm a 2026 CS grad with experience in Systems, ML Systems, HPC and adjacent fields. I'm struggling to get a job right out of college in this field and will be grateful if anyone can provide any guidance on how to proceed further into my career or any sort of referral.

About my experience:

Built Umbra, an API-level CUDA profiler that intercepts GPU kernel dispatch via LD_PRELOAD on libcuda.so/libcudart.so, requiring no source code modification. Discovered that torch.compile dispatches through cuGetExportTable, an undocumented NVIDIA internal API invisible to standard profilers.
Built Mako, an OpenMP scheduling daemon for HPC workloads, dynamically optimizing thread-to-core affinities and CPU frequency scaling at runtime on Intel Haswell/Xeon NUMA systems. Achieved 8% speedup and 21% energy reduction on ECP benchmarks with ~2% overhead.
Built RVNE, a RISC-V Neuromorphic Extension ISA implemented in Verilog, modeling spiking neural network operations at the RTL level.
Research internship at TCS Research building a CUDA device simulator (stubbing ~70 CUDA runtime/driver APIs to run PyTorch/Triton workloads on CPU without modification).

Resume: https://drive.google.com/file/d/1hfBnvL5Wef6lr4ecjc7kkoKk9qADKQ__/view?usp=sharing

Any guidance, feedback, or referrals would be genuinely appreciated. I'm eligible to work both in the USA and India without any visa sponsors. Thanks for reading.

reddit.com

u/Outrageous_Insect532 — 6 days ago

▲ 4 r/HPC

Internship and Experience Advice

I'm a third year CS undergraduate from India interested in HPC, GPU computing, and parallel programming.

I've spent the past year learning CUDA, OpenMP, MPI, distributed computing, and working on research projects and HPC events. Despite this, I've had almost no success securing HPC internships or research positions even for gpu computing, either in India or abroad.

Is this a common experience for undergraduates? What should I focus on to improve my chances of breaking into HPC?

reddit.com

u/Embarrassed_Maybe213 — 7 days ago

▲ 33 r/HPC+1 crossposts

lazyslurm - a terminal ui like lazygit/lazydocker for slurm jobs / HPC built in rust

Hey everyone! I built a little TUI tool for monitoring SLURM jobs on HPC. I found this useful for my masters thesis and thought I might share here. Its kind of similar to the very popular lazygit and lazydocker, which I enjoy using.

Please let me know if you have any feedback and I welcome any contributions / constructive criticism.

The github is here and you can install it with `cargo install lazyslurm`

Have a great day!

u/tjhill — 8 days ago

▲ 13 r/HPC

Guidance related to HPC jobs

Can someone please help me in getting into a new job.

While the pay is great in my current org, we kind of deal with network stack and I'm not really enjoying it thta much. So Im looking for a switch.

About me:

HPC Algorithm engineer. 4 yrs of work ex.

Primarily worked on accelerators like GPUs but I'm open to explore TPUs or other accelerators too.

I have multiple research papers in top venues across the globe too. Currently part of some of the world's fastest supercomputer team.

If someone can help, I'm open to share my first month salary and I can sign papers if needed.

reddit.com

u/mystrioab — 8 days ago

▲ 19 r/HPC

Startups that work with GPU and cuda programing and/ or compilers

Hi i am software engineer with 4 yoe i have good knowledge of os internals, coa ,multithreadin and network programming and embedded and c++ ,python and have worked with systems side and application side both .

I want to build my career around gpu and/ or compiler engineering and i am currently exploring them but apart from theory i firmly believe you can learn more my working in real projects and doing real firefighting are there any starups in india that work on this stack ? are there any such founders available on this sub if yes can you guys give me a chance please let me know

Thanks

reddit.com

u/Odd_Departure_1159 — 9 days ago

▲ 60 r/HPC

7 Chinese companies are already shipping H100/H200-class AI chips, most IPO'd in the last 6 months. I mapped all of them

I run Chinese open models on a 4×3090 rig every day. The more I watched these models get tuned for domestic hardware, the more I wanted to know what that hardware actually is, so I mapped it. At least 7 Chinese companies are already shipping AI accelerators, and most of them IPO'd in the last 6 months.

China's own framing is "3 dragons, 4 snakes." The dragons are Big Tech that also builds full-stack GPUs. Huawei alone shipped 812K AI cards last year, 49% of China's domestic supply, with their own HBM and their own fabs. The Ascend 950 reportedly targets H200-class.

The "snakes" are the pure-plays that just IPO'd, and this is the part that surprised me: several were founded by the former chief GPU architects of NVIDIA and AMD. MetaX is basically AMD's old global GPU leadership rebuilt in Shenzhen, revenue up about 3,800x in three years. Alibaba is shipping a server with 16×96GB = 1.5TB of VRAM in one box, enough to hold a frontier model in BF16 fully on-prem.

Meanwhile production moved from TSMC to SMIC, and NVIDIA's China share fell from about 95% to 55% in two years. The metal and the open models are converging.

Full breakdown with all 7 vendors and sources:

https://x.com/superalesha/article/2069415447779246440

reddit.com

u/awfulalexey — 13 days ago

▲ 40 r/HPC+3 crossposts

YaFF, a zero-copy wire format for Protobuf schemas, Apache 2.0, C++

Hey everyone. Our team recently open-sourced YaFF (Yet Another Flat Format), a C++ serialization library that provides a zero-copy wire format for the Protobuf ecosystem

Why we built it:

In read-heavy, high-load paths, Protobuf parsing and deserialization can become a high CPU cost. Developers often look at FlatBuffers for zero-copy reads, but adopting it into an existing Protobuf-heavy codebase means dealing with separate schema/API layers and extra conversion logic.

What YaFF does:

YaFF keeps .proto files as the contract, but changes the physical representation of your data. You get close to native C++-structs while keeping your existing .proto files as your source of truth. You get zero-copy performance without abandoning the Protobuf ecosystem.

Some technical details:

written in C++ and designed for server-side runtimes
supports mmap compatable layouts for large local indexes and faster startup
mitigates accessor-chain overhead related to alias analysis with immutable buffers and gnu::pure annotations
easy to integrate via CMake or Conan

The project is still early (C++ only for now, other languages are on the roadmap). We're open to issues, pull requests, and any feedback from the community.

Source code and Quick Start: https://github.com/yandex/yaff

Happy to answer any technical questions in the comments!

u/aegismuzuz — 13 days ago

▲ 12 r/HPC

Looking for Nvidia Floorplan Analyses

I am currently doing a research project which involves comparing performance of Nvidia HPC class GPUs, and I have found that referencing the die-area investment of these GPUs would be useful for this analysis. The floorplan analyses I have found for GV100, GA100 and GH100 so far only include speculative summaries of die-area investment, so if anyone knows of any credible resources for this information I would be very appreciative.

reddit.com

u/Inevitable-Sky-7238 — 12 days ago

▲ 4 r/HPC

25M Undergrad EE planning to study HPC with Masters degree in Japan? Distibuted LLM taining

Hello, i am from Belarus, how is the landscape for HPC and distributed training in Japan? Do anybody know if that even possible to find a job in this field later on?

reddit.com

u/West_Photograph_3163 — 13 days ago