As you continue to learn and embrace PowerShell, you will eventually meet regular expressions. Hopefully many of you already have some fundamental knowledge. if not, the first place to start is by reading the help topic, about_regular_expressions In this article, I'm gong to introduce you to an advanced regular expression topic – named captures. I'll admit that when I first learned this topic it made my head spin. But hopefully I can slow down the merry-go-round.
ManageEngine ADManager Plus - Download Free Trial
Exclusive offer on ADManager Plus for US and UK regions. Claim now!
Let's begin with a string like this.
$t = "2019-06-21 17:12:31Z : 172.16.1.123 [ Begin process data ]"
This is something you might have in a log file. .Since PowerShell's strength comes from its ability to work with objects, it might be easier to transform the log file into a collection of objects.
Start with Patterns
In order for this to work, and for you to use regular expressions, you have to know what your data will look like and it must be consistent and predictable. In this sample, I have a datetime string, an IPv4 address and then a message inside the brackets. The first step is to create a regular expression pattern that matches on these different elements. Let's begin with the datetime.
$t -match "\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}Z"
The pattern says, find something that starts with exactly 4 digits. The \d means a digit and the {4} means exactly 4 digits. Then there is a dash followed by exactly 2 digits (\d{2}) another dash and 2 digits. That should give me the date. But there's a bit more. I need to capture the space (\s) and then I have a pattern to match the time which by now you should recognize. The end is a literal Z
As you can see, this matches on exactly what I want. By the way, there is often more than one pattern you could write. I'm trying to stick with simple and clear patterns. Regular expressions can be cryptic enough!
Next, I need to match the IP address.
$t -match "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"
The pattern says start with at least 1 digit and no more than 3. (\d{1,3}) followed by a literal period. (\.). The period is a special regular expression character that means anything. The \ is the escape character which tells PowerShell to look for a literal period. This pattern repeats for the remaining octets.
This pattern doesn't validate the address, just that it looks like an IP address. There are patterns that are more restrictive but I didn't want to add any more complexity. And in my situation, I know the address will be valid.
Last is the text between the square brackets.
$t -match "\[.*\]"
The bracket is a special regular expression character so I need to escape it. I'm then asking to match on multiple instances (*) of any character (.).
Define the Names
Once you have matching patterns, you can define them with a name. The general layout looks like this:
(?<capture-name>Your-Pattern)
The parentheses are key. My datetime pattern can be defined as a named capture:
(?<date>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}Z)
Here is a pattern that describes the entire line of text from beginning to end.
[regex]$rx = '(?<date>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}Z)\s:\s(?<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\[\s(?<status>.*)\]'
I've tweaked the last pattern to not include the brackets. There are other advanced regular expression techniques I could have used but I kept it simple. The important thing to note is that I am defining a $rx as a special kind of object. This regex object allows me to do much more than simply using the –match operator. In this situation, I can use it to look for matches.
This is good. But how do the names come into play?
Using Named Captures in PowerShell
In the match, you can see a Groups property. This is where the named captures can be found.
Using PowerShell it is easy to get only the named captures by skipping the first matched group.
That almost looks like an object! it doesn't take much more to create an actual object.
$m.groups | Select-Object -Skip 1 | foreach-object -begin { $h = @{} } -process { $h.Add($_.name,$_.value.trim()) } -end { [pscustomobject]$h }
Although, it would be nicer if the date was a datetime object instead of a string. Here's an alternative assuming you know the order of your named captures.
$o = [pscustomobject]@{ Date = $m.groups[1].value -as [datetime] IPAddress = $m.groups[2].value -as [ipaddress] Status = $m.groups[3].value }
As you can see this is an object.
With this code, I can write a script to process each line of the log file, creating a custom object. That makes it much easier to filter, sort or do whatever else I need to do with the data.
ConvertFrom-Text
Or how about an easier way? Now that you know how to create a regular expression pattern using named captures, you can use the ConvertFrom-Text command in my PSScriptTools module which you can install from the PowerShell Gallery. With this command you can create code to turn any text output into PowerShell objects.
$c = "(?<Protocol>\w{3})\s+(?<LocalIP>(\d{1,3}\.){3}\d{1,3}):(?<LocalPort>\d+)\s+(?<ForeignIP>.*):(?<ForeignPort>\d+)\s+(?<State>\w+)?" netstat -an | select -skip 4 | convertfrom-text $c | where-object {$_.LocalIP –ne '0.0.0.0'} | format-table –autosize
Once you have at least some basic regular expression skills you'll find yourself using it often. And as with any new language, the more more you use it the more fluent you will become.
1 thought on “Capturing Names with PowerShell and Regular Expressions”
Comments are closed.