u/inperbio

What's your go-to trick for speeding up PowerShell loops on large datasets

Been processing some pretty chunky datasets lately and kept running into the usual pain points. The biggest wins I've found are swapping out repeated Where-Object scans inside loops for hashtable lookups, and, being strict about filtering and selecting only what I need before anything enters the loop at all. Also switched from building arrays with += to using Generic.List[T] from the start, which made a, noticeable difference over ArrayList and is generally the better call now if you want typed, efficient accumulation. Still occasionally go back and forth on foreach vs ForEach-Object, but honestly less so these days. For stuff already in memory, foreach wins pretty consistently because ForEach-Object carries pipeline overhead that adds up fast on large collections. I keep ForEach-Object for when I'm actually streaming from a file or pipeline and don't want everything loaded into memory at once. Haven't gone deep on ForEach-Object -Parallel yet, worth noting that's PowerShell 7+ only if you're still on 5.1. The overhead hasn't been worth it for most of what I do, but curious if others have found a sweet spot for when it actually pays off. My gut says it only really shines when tasks are genuinely independent and the per-item work is heavy enough to justify the thread spin-up cost. One thing I've started doing more is running Measure-Command before assuming the loop itself is the bottleneck. Turns out a lot of the time the real culprit is I/O, remoting latency, or logging, not the iteration logic at all. What's the change that made the biggest difference for you?

reddit.com
u/inperbio — 3 days ago