Quantcast
Channel: Powershell – Tome's Land of IT
Viewing all articles
Browse latest Browse all 14

ForEach-Parallel

$
0
0

I just came back from the PowerShell Deep Dive at TEC 2012.  A great experience, by the way.  I highly recommend it to everyone.  Extremely smart and passionate people who could talk about PowerShell for days along with direct access to the PowerShell product team!

During this summit, workflows were a topic of conversation.  If you have looked at workflows, there is one feature that generally catches the eye – I know it caught mine the first time I saw it – ForEach-Parallel.  Unfortunately, when you dig into what it’s doing you come to learn that it is not a solution for multithreading in PowerShell.  Nope, it’s extremely slowwwwwwwwwwwwwww.  If you’re like me, parallel processing is key to getting some enterprise-class scripts to run faster.  You may have played with jobs before, but even they have some overhead that causes them to slow down.  Running scripts side by side works, but requires you to engineer the scripts in a way that they can be called like that.  So what is the best way to run something like a loop of data across four threads?  The answer is runspaces and runspace pooling.

function ForEach-Parallel {
    param(
        [Parameter(Mandatory=$true,position=0)]
        [System.Management.Automation.ScriptBlock] $ScriptBlock,
        [Parameter(Mandatory=$true,ValueFromPipeline=$true)]
        [PSObject]$InputObject,
        [Parameter(Mandatory=$false)]
        [int]$MaxThreads=5
    )
    BEGIN {
        $iss = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
        $pool = [Runspacefactory]::CreateRunspacePool(1, $maxthreads, $iss, $host)
        $pool.open()
        $threads = @()
        $ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock("param(`$_)`r`n" + $Scriptblock.ToString())
    }
    PROCESS {
        $powershell = [powershell]::Create().addscript($scriptblock).addargument($InputObject)
        $powershell.runspacepool=$pool
        $threads+= @{
            instance = $powershell
            handle = $powershell.begininvoke()
        }
    }
    END {
        $notdone = $true
        while ($notdone) {
            $notdone = $false
            for ($i=0; $i -lt $threads.count; $i++) {
                $thread = $threads[$i]
                if ($thread) {
                    if ($thread.handle.iscompleted) {
                        $thread.instance.endinvoke($thread.handle)
                        $thread.instance.dispose()
                        $threads[$i] = $null
                    }
                    else {
                        $notdone = $true
                    }
                }
            }
        }
    }
}

With that function, you can do things like this:

(0..50) |ForEach-Parallel -MaxThreads 4{
    $_
    sleep 3
}

You’ll notice that the above causes batches of four to run simultaneously.  Actually, it looks like the data is running serially, but it’s really in parallel.  A better example is something like this that simulates that some processes take longer than others:

(0..50) |ForEach-Parallel -MaxThreads 4{
    $_
    sleep (Get-Random -Minimum 0 -Maximum 5)
}

Mind you, parallel processing doesn’t always make things faster.  For example, if your CPU consumption per thread is more than your box can handle, you may be adding latency due to scheduling of the CPU.  Another example is that if it’s not a long running process that you are performing in your loop, the overhead for starting up multiple threads could make your script slower.  Just use your head and play with it.  In the right place at the right time, this is an absolute lifesaver.

Note: I learned this technique from Dr. Tobias Weltner, but for some reason I can’t find the link to the video where he discussed it.



Viewing all articles
Browse latest Browse all 14

Trending Articles