I just love using the web cmdlets that were introduced in PowerShell 3.0. One of the cmdlets I use the most is Invoke-RESTMethod. This isn't because I'm dealing with sites that offer REST-ful services, but rather I like that the cmdlet does all the heavy lifting for me when I point it to an RSS feed. I thought I'd share some techniques for working with the results incorporating regular expressions and replacements.
ManageEngine ADManager Plus - Download Free Trial
Exclusive offer on ADManager Plus for US and UK regions. Claim now!
First, I need something to work with.
$feed = Invoke-RestMethod -Uri http://powershell.org/wp/feed/
Here's an example of what I get back:
title : PowerShell Tip from the Head Coach of the 2014 Winter Scripting Games: Design for Performance and Efficiency! link : http://powershell.org/wp/2014/01/23/powershell-tip-from-the-head-coach-of-the-2014-winter-scripting-games-design-fo r-performance-and-efficiency/ comments : {http://powershell.org/wp/2014/01/23/powershell-tip-from-the-head-coach-of-the-2014-winter-scripting-games-design-f or-performance-and-efficiency/#comments, 0} pubDate : Thu, 23 Jan 2014 14:28:33 +0000 creator : creator category : {category, category, category, category...} guid : guid description : description encoded : encoded commentRss : http://powershell.org/wp/2014/01/23/powershell-tip-from-the-head-coach-of-the-2014-winter-scripting-games-design-fo r-performance-and-efficiency/feed/
The cmdlet, Invoke-RestMethod, has gone ahead and created XML elements. I didn't have to do anything. Next, I'd like to re-format the results and make it pretty. For example, the pubDate, while readable, won't sort as a true [datetime] because it is being represented as a string. And some properties, like Description, are buried a bit further.
PS C:\> $feed[1].description #cdata-section -------------- There are several concepts that come to mind when discussing the topic of designing your PowerShell commands for performance a...
I can start with something like this:
$feed | Select Title,Link, @{Name="Description";Expression={$_.description.InnerText}}, @{Name="Published";Expression={$_.PubDate -as [datetime]}}
I've grabbed the description text and treated the pubDate as a [datetime] object.
One of the first things I see looking at this in more detail is that the description is full of HTML code.
title : Scripting Games Winter 2014 – Team Discussion Tips link : http://powershell.org/wp/2014/01/06/scripting-games-winter-2014-team-discussion-tips/ Description : When you’re logged into the Games, you’ll notice that clicking on your team pulls up a “team discussion” box. That’s a shared discussion area for you and your team. However, if you click on one of the files you’ve uploaded, you’ll see the discussion turn into a “File Discussion.” We retain a separate thread for... Continue Reading » Published : 1/6/2014 10:42:52 AM
To make this easier to read I want to strip out HTML tags and convert things like "–" into something I can understand. Here's where regular expressions come into play.
With a little online research I came up with a regular expression pattern to find HTML tags: <(.|\n)+?> which I can use like this:
PS C:\> [regex]$rgx = "<(.|\n)+?>" PS C:\> $rgx.replace($feed[1].description.innertext,"") There are several concepts that come to mind when discussing the topic of designing your PowerShell commands for performance and efficiency, but in my opinion one of the items at the top of the list is “Filtering Left” which is what I’ll be covering in this blog article. First, let’s start out by taking a... Continue Reading » Related posts: Winter Scripting Games 2014 Tip #1: Avoid the aliases Winter Scripting Games 2014 Tip #2: Use #Requires to let PowerShell do the work for you Winter Scripting Games 2014
The Replace() method took all matches and replaced them with "", effectively removing them from the text. Because, I have a number of items to potentially replace, I defined a hashtable.
$decode=@{ '<(.|\n)+?>'= "" '’' = "'" '“' = '"' '”' = '"' '»' = "..." '–' = "--" '–' = "@" ' ' = " " }
Then in my Select-Object statement I can reformat the description text.
$feed | Select Title, @{Name="Description";Expression={ $text = $_.Description.InnerText #strip out html codes foreach ($key in $decode.keys) { [regex]$rgx=$key $text = $rgx.Replace($text,$decode.Item($key)).Trim() } #use the cleaned up text $text }},
PowerShell goes through the hashtable keys and replaces the text with the key value. The end result is that $text is now clean. Here is my final code
$feed = Invoke-RestMethod -Uri http://powershell.org/wp/feed/ #hash table of HTML codes #http://www.ascii.cl/htmlcodes.htm $decode=@{ '<(.|\n)+?>'= "" '’' = "'" '“' = '"' '”' = '"' '»' = "..." '–' = "--" '–' = "@" ' ' = " " } <# Redefining some properties so that they are in proper case and look pretty #> $feed | Select @{Name="Title";Expression={$_.title}}, @{Name="Description";Expression={ $text = $_.Description.InnerText #strip out html codes foreach ($key in $decode.keys) { [regex]$rgx=$key $text = $rgx.Replace($text,$decode.Item($key)).Trim() } #use the cleaned up text $text }}, @{Name="Published";Expression={$_.PubDate -as [datetime]}}, @{Name="Link";Expression={$_.Link}}, @{Name="Category";Expression={$_.Category.innertext -join ","}}
The other addition I made was to get the text from the Category XML element and join the array of strings into a single line separated by commas. Here's the final result:
But wait, there's more! I have taken the output from Invoke-RestMethod and written new objects to the pipeline. One thing I could do is pipe the result to Out-GridView and use it as an object-picker.
c:\scripts\get-powershellorg.ps1 | out-gridview -Title "Select one or more stories" -PassThru | foreach { start $_.link }
I can send the results to Out-Gridview. From there I can select one or more items, click OK and have the item open in my web browser. Or how about this:
c:\scripts\get-powershellorg.ps1 | Select Title,Description,Published, @{Name="Link";Expression={ "$($_.link) "}}, Category | ConvertTo-HTML -Title "PowerShell.org" | Out-string | foreach { $_.Replace("<","<").Replace(">",">") } | out-file c:\work\psorg.htm -Encoding ascii
Here I have a couple of replacements going on. After running the script, I re-select the properties and customize the Link property to turn it into an HTML anchor link. I'm doing this so that when I convert to HTML I will get a table with a clickable link. Well, almost. You see when ConvertTo-HTML gets the text for the Link property it turns the < and > back into quoted HTML. Which means I need to turn that back in < and > before saving the results to a file. Notice I'm taking advantage of the pipeline in the ForEach script block.
foreach { $_.Replace("<","<").Replace(">",">") }
The Replace() method writes the string object to the pipeline after replacing for < and then it replaces for >. The net result is two replacements with one command.
I hope you had some fun with this and maybe learned a trick or two.
Great Article! Thanks Jeff