Skip to content
Menu
The Lonely Administrator
  • PowerShell Tips & Tricks
  • Books & Training
  • Essential PowerShell Learning Resources
  • Privacy Policy
  • About Me
The Lonely Administrator

Scraping Sysinternals

Posted on June 16, 2014June 21, 2015

download-thumbRecently I was conversing with someone about my PowerShell code that downloads tools from the live Sysinternals site. If you search the Internet, you'll find plenty of ways to achieve the same goal. But we were running into a problem where PowerShell was failing to get information from the site. From my testing and research I'm guessing there was a timing issue when the site is too busy. So I started playing around with some alternatives.

Manage and Report Active Directory, Exchange and Microsoft 365 with
ManageEngine ADManager Plus - Download Free Trial

Exclusive offer on ADManager Plus for US and UK regions. Claim now!

I knew that I could easily get the html content from http://live.sysinternals.com through a few different commands. I chose to use Invoke-RestMethod for the sake of simplicity.

[string]$html = Invoke-RestMethod "http://live.sysinternals.com"

Because $html is one long string with predictable patterns, I realized I could use my script to convert text to objects using named regular expression patterns. In my test script, I dot source this script.

#need this function
. C:\scripts\convertfrom-text.ps1

Now for the fun part. I had to build a regular expression pattern. Eventually I arrived at this:

$pattern = "(?<date>\w+,\s\w+\s\d{1,2},\s\d{4}\s+\d{1,2}:\d{2}\s+\w{2})\s+(?<size>\d+)\s+<A HREF=""/(?<name>\w+(\.\w+){1,2})"

If you have used the Sysinternals site, you'll know there is also a Tools "subfolder" that appears to be essentially the same as the top level site. My pattern is for the top level site. Armed with this pattern, it wasn't difficult to create an array of objects for each tool.

$data = $html | ConvertFrom-Text $pattern | 
Select @{Name="LastModified";Expression={$_.Date -as [DateTime]}},
@{Name="Size";Expression={$_.Size -as [int]}},Name,
@{Name="Path";Expression={ "http://live.sysinternals.com/$($_.name)"}}

sysinternals

Next, I check my local directory and get the most recent file.

#get most recently modified local file
$localfiles = dir G:\Sysinternals
$lastLocal = ( $localfiles | sort LastWriteTime | select -last 1).LastWriteTime

Then I can test for files that don't exist locally or are newer on the site.

#find files that have a newer time stamp or files that exist remotely but not locally
$needed = $data | where { ($_.LastModified -gt $lastlocal) -OR ( $localfiles.name -notcontains $_.name ) }

If there are needed files, then I create a System.Net.WebClient object and download the files.

if ($needed) {
    Write-Host "Getting $($needed.count) updated files" -ForegroundColor Cyan
    $needed | foreach -begin { $wc = New-Object System.Net.WebClient 
    } -process {
     $target = Join-Path -Path "G:\Sysinternals" -ChildPath $_.name
     $source = $_.Path
     Write-host "Updating $target from $source" -ForegroundColor Green
     $wc.DownloadFile($source,$target)
    }
}
else {
    Write-Host "No updated files detected" -ForegroundColor Cyan
}

The end result is that I can update my local Sysinternals folder very quickly and not worry about timing problems using the Webclient service.


Behind the PowerShell Pipeline

Share this:

  • Click to share on X (Opens in new window) X
  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to print (Opens in new window) Print
  • Click to email a link to a friend (Opens in new window) Email

Like this:

Like Loading...

Related

reports

Powered by Buttondown.

Join me on Mastodon

The PowerShell Practice Primer
Learn PowerShell in a Month of Lunches Fourth edition


Get More PowerShell Books

Other Online Content

github



PluralSightAuthor

Active Directory ADSI Automation Backup Books CIM CLI conferences console Friday Fun FridayFun Function functions Get-WMIObject GitHub hashtable HTML Hyper-V Iron Scripter ISE Measure-Object module modules MrRoboto new-object objects Out-Gridview Pipeline PowerShell PowerShell ISE Profile prompt Registry Regular Expressions remoting SAPIEN ScriptBlock Scripting Techmentor Training VBScript WMI WPF Write-Host xml

©2025 The Lonely Administrator | Powered by SuperbThemes!
%d