We often think about PowerShell v3 as being a management tool for the cloud. One new PowerShell v3 cmdlet that lends substance to this idea is Invoke-WebRequest. This is a handy for retrieving data from a web site resource. It might be a public web site or something on your intranet. For today's fun I have a few lines of code I run to "scrape" information from http://manning.com. Since all of my recent books are through Manning I like to keep track of best sellers to see if any of my books make the list. Here's how.
ManageEngine ADManager Plus - Download Free Trial
Exclusive offer on ADManager Plus for US and UK regions. Claim now!
First, I need to grab the web page.
PS C:\> $data = Invoke-Webrequest "http://manning.com"
There is a potential memory leak you can run into if you run Invoke-Webrequest in the ISE so I recommend trying this in the console. The cmdlet returns a structured object which I'll let you explore on your own. The fun part, is that the cmdlet creates a property called ParsedHTML. This property is the page structured in such as way that I can use DOM (document object model) methods like GetElementsbyTagName.
I looked at the source on manning.com and found the HTML code surrounding the best seller boxes. Knowing the tag information, I can use the DOM from the ParsedHTML property and retrieve the information I want. I know there are div tags with classname attributes of bestsellHeader and bestSellbox.
PS C:\> $data.ParsedHtml.getElementsByTagName("div") | Where "classname" -match "^bestsell" | Select -ExpandProperty InnerText PRINT BESTSELLERS December 20, 2012 Learn Windows PowerShell 3 in a Month of Lunches, Second Edition Hello World! Spring in Action, Third Edition The Quick Python Book, Second Edition The Well-Grounded Java Developer C# in Depth, Second Edition Windows PowerShell in Action, Second Edition jQuery in Action, Second Edition Hadoop in Action Hadoop in Practice MEAP BESTSELLERS December 20, 2012 F# Deep Dives Node.js in Action AOP in .NET Secrets of the JavaScript Ninja HTML5 for .NET Developers The Responsive Web Taming Text Single Page Web Applications Play for Scala Scala in Action
And what do you know? Learning PowerShell v3 in a Month of Lunches is the number 1 print bestseller. Thank you, by the way. This is a quick and dirty screen scrape but is just fine for my purposes. I have to admit I like using PowerShell to find out if my PowerShell books are best sellers.
I'd love to hear how you are using this new cmdlet.
How would you use this to parse we website which requires you to login?
Look at help examples for Invoke-Webrequest. There is one that uses a Facebook logon. I have a LinkedIn problem I’d like to tackle with Invoke-Webrequest when I have some time.
Thanks. I had a brain fart when I first ran Get-Help Invoke-Webrequest -Examples and saw that there was no help info. Forgot that I just installed a new OS and didn’t run Update-Help.
Had an issue where I needed to export a list of items from a site, but they did not have an export option and limited the display to 10 items. Managed to read the data from the table, find the URL from the ‘Next Page’ link, load the next page and read the data from the new table, etc. MUCH easier than clicking through over 20 pages of ‘Next Page’ and copy/pasting the data from the tables.
Dear Jeff,
I have bought your new book from Amazon.com namely “Learn PowerShell ToolMaking in a month of lunches” and start today to learn it, just arrived yesterday. Hopefully in a month time or so I can stand to the finished line of reading it.
Actually I have written several VBscript in scrapping website content, and looking at new feature PowerShell 3.0 like your example, I am very much interested to learn more to use it.
If possible can convert my VBScript web page scrapping into PowerShell 3.0.
Can you provide me with additional PowerShell 3.0 book, examples, add ons, cmdlets which have relation with full explanation of website content scrapping tutorial.
Or maybe Manning already have related book(s), so far I have bought from Manning on line:
“Learn Window PowerShell 3.0 in a month of lunches” however inside it specific topics relate to how to scrap web page content not fully explained in detail.
Thank you for reading my comments, and I am looking forward to get your advice of my question above.
Budhi M Suwardi
One of your readers book.
January 9, 2013
First, thanks for your support and enthusiasm. I doubt that there is anything in your VBScript that translates to PowerShell 3.0. I think you are better off starting from scratch. I think your best solution is to read the online help and examples for Invoke-Webrequest. Then start trying things on your own. When you get stuck, post in the forums at PowerShell.org. I’ll probably write about this cmdlet again so keep in touch with the blog.