The other day I received an email looking for guidance on using Invoke-Webrequest to pull data from a table on a web page. Specifically, he wanted to get the list of popular baby names from http://www.ssa.gov/OACT/babynames/index.html. I gave him some quick tips but figured this would also be another teaching opportunity. Using Invoke-Webrequest with PowerShell 3 is a fantastic way to incorporate Web-based data into your PowerShell session. However, I have yet to devise tools for working with web data that work in every case. I think what you'll find is that each site needs to be handled a bit differently. So here's how I worked through this problem and I think some of the techniques will work for other sites.
ManageEngine ADManager Plus - Download Free Trial
Exclusive offer on ADManager Plus for US and UK regions. Claim now!
The challenge I gave myself was to get data from a table and turn it into PowerShell objects. When you run Invoke-Webrequest from the PowerShell console (don't use the ISE), you'll get a property of ParsedHTML. With this property you can use the Microsoft DOM (document object model) to parse out data. I'll admit I'm still a neophyte when it comes to working with DOM, but I'm getting there. Anyway...in looking at the page source, which you can do in any browser, I learned that the table I wanted was the first one in the page. So let's start getting some PowerShell results.
$uri = "http://www.ssa.gov/OACT/babynames/index.html" #get the data $data = Invoke-WebRequest $uri #get the first table $table = $data.ParsedHtml.getElementsByTagName("table") | Select -first 1
I used the GetElementsByTagName method from the DOM to retrieve all tables and then selected the first one. The table object has some text properties that I could have tried to parse out, but I decided to take a different approach. The table contains rows, which I can get.
#get the rows $rows = $table.rows
The first item in the collection of rows will be the table header. This data will become my object properties.
#get table headers $headers = $rows.item(0).children | select -ExpandProperty InnerText
Now, $headers is an array of strings from each table column. Now the tricky part and there are probably several ways you could approach this. I need to go through the remaining table rows and match up each column with the header and eventually create a custom object. I decided to use a For enumeration to go through each row and then within each row enumerate again using the headers and add each entry into a hash table, which I can eventually turn into a custom object.
#count number of rows $NumOfRows = $rows | Measure-Object #enumerate the remaining rows (skipping the header row) and create a custom object for ($i=1;$i -lt $NumofRows.Count;$i++) { #define an empty hashtable $objHash=[ordered]@{} #get the child rows $rowdata = $rows.item($i).children | select -ExpandProperty InnerText for ($j=0;$j -lt $headers.count;$j++) { #add each row of data to the hash table using the corresponding #table header value $objHash.Add($headers[$j],$rowdata[$j]) } #for #turn the hashtable into a custom object [pscustomobject]$objHash } #for
Because my script is writing an object to the pipeline I can also do things such as sort and select.
I can't guarantee this code will handle every single table you come across, but it might get you started and you can always ask for help in the forums at PowerShell.org. If you are looking for more examples with Invoke-WebRequest you might also take a look at Get Beer List and Browsing Trainsignal Courses.
Enjoy!
2 thoughts on “Friday Fun: It’s PowerShell, Baby!”
Comments are closed.