I saw this tip today and wanted to leave a comment but couldn't see how. So I thought I'd post my comments here. This is actually a question I see often and there are better ways to write this kind of code.
ManageEngine ADManager Plus - Download Free Trial
Exclusive offer on ADManager Plus for US and UK regions. Claim now!
The posted tip used an example where you wanted to find processes where the company name is defined. The way suggested in the tip, and a technique I see often goes something like this:
[cc lang="PowerShell"]
PS C:\> get-process | where {$_.Company -ne $Null} | Sort Company| Select Name,ID,Company
[/cc]
While it mostly works, this is a better PowerShell approach, in my opinion.
[cc lang="PowerShell"]
PS C:\> get-process | where {$_.Company} | Sort Company| Select Name,ID,Company"
[/cc]
When I run the first technique, I still got a blank company name. The tip offers a work around for this situation like this:
[cc lang="PowerShell"]
PS C:\> get-process | where {$_.Company -ne $Null -AND $_.company -ne ''} | Sort Company| Select Name,ID,Company
[/cc]
This gives the same result as my suggested approach. My approach uses Where-Object to say, if the Company property exists, pass on the object. If you wanted to find processes without a company name, then use the -NOT operator.
[cc lang="PowerShell"]
PS C:\> get-process | where {-Not $_.Company}
[/cc]
I use a similar technique to filter out blank lines in text files.
[cc lang="PowerShell"]
get-content computers.txt | where {$_} ...
[/cc]
While we're on the subject, a related filtering technique I often see involves boolean properties. You don't have to do this:
[cc lang="PowerShell"]
PS C:\> dir | where {$_.PsIsContainer -eq $True}
[/cc]
PsIsContainer is a boolean value, so let Where-Object simply evaluate it:
[cc lang="PowerShell"]
PS C:\> dir | where {$_.PsIsContainer}
[/cc]
As above, use -Not to get the inverse. Don't feel you need to explicitly evaluate properties in a Where-Object expression. I see this is a VBScript transition symptom that I hope you can break.
I like this. It’s just one of several ways to accomplish more by “writing less PowerShell”.
Thanks
FWIW
Tests about 5x faster for removing blank lines from text
(gc computers.txt) -match “\S”
That doesn’t surprise me, especially for a large file. There’s no pipelined expression here. Where-Object can’t really do it’s thing until Get-Content finishes. I like your idea, although it might be a little advanced for someone just starting out. And assuming most people are parsing a relatively small text file of computernames, (typically where I see this behavior), the gain is likely irrelevant. Still, I ran some tests, and even the “bad” approach performs well.
PS C:\work> (measure-command {gc tempfiles.txt | where {$_ -ne $null}}).TotalMilliseconds
27.6494
PS C:\work> (measure-command {gc tempfiles.txt | where {$_}}).TotalMilliseconds
41.9788
PS C:\work> (measure-command {(gc tempfiles.txt) -match “\S”}).TotalMilliseconds
5.7939
However, look what happens when you read the entire file
PS C:\work> (measure-command {(gc tempfiles.txt -readcount 0) -match “\S”}).TotalMilliseconds
20.9278
PS C:\work> (measure-command {gc tempfiles.txt -readcount 0| where {$_}}).TotalMilliseconds
3.3637
PS C:\work> (measure-command {gc tempfiles.txt -readcount 0 | where {$_ -ne $null}}).TotalMilliseconds
2.8742
The file I tested had 192 lines total, including about 10 blanks. All very interesting.
I replicated that test, but by stacking all three tests into the ISE, and running them one after the other, and doing that multiple times.
The first test of the first pass took considerably longer, but then dropped off dramatically on subsequent passes.
I think the first test of the first pass is preloading the disk read cache for the subsequent tests and skewing the results.
I didn’t take caching into account so I should retest. Because of the way the ISE scopes I wouldn’t trust it. Test with the shell.
Ran the same test in the shell (by putting the tests into a scriptblock and repeatedly invoking he scriptblock) and got the same result. The first test is much slower the first time the scriptblock is invoked, and then faster on subsequent invocations.
What results did you get for each scenario?
Haven’t figure out how to clear the read cache (the first time I tested, it took 9 seconds for the first pass of the first test. After that, I couldn’t reproduce it. Subsequent tests look like this:
[PS] C:\testfiles>$test = {
>> (measure-command {(gc test2.txt -readcount 0) -match “\S”}).TotalMilliseconds
>> (measure-command {gc test2.txt -readcount 0| where {$_}}).TotalMilliseconds
>> (measure-command {gc test2.txt -readcount 0 | where {$_ -ne $null}}).TotalMilliseconds
>> }
>>
[PS] C:\testfiles>
[PS] C:\testfiles>&$test
0.9636
1.0969
0.9473
[PS] C:\testfiles>&$test
0.923
1.5215
0.9554
[PS] C:\testfiles>&$test
0.9319
0.9591
0.9372
Sorry, that should have been “the first pass of the first test took 9 milliseconds”.
I created script blocks and used Invoke-Command which should start a new rujnspace for each command.
PS C:\work> $a={invoke-command {(Measure-Command {get-content c:\work\tempfiles.txt -read 0 | where {$_}} ).TotalMilliseconds}}
PS C:\work> $b={invoke-command {(Measure-Command {get-content c:\work\tempfiles.txt -read 0 | where {$_ -ne $null -AND $_ -ne ‘ ‘}} ).TotalMilliseconds}}
PS C:\work> $c={invoke-command {(Measure-Command {(get-content c:\work\tempfiles.txt -read 0) -match “\S”} ).TotalMilliseconds}}
PS C:\work> &$a;&$b;&$c
1.5115
1.4538
1.2318
PS C:\work> &$a;&$b;&$c
1.3313
1.4474
1.2369
PS C:\work> &$a;&$b;&$c
1.9138
1.4198
1.1715
I waited 1-2 minutes between each command.. If nothing else I hope people pick up a few things on testing methodology.
On a slightly different tack, in the scenario of parsing a list of computer names from a file, the boolean tests will also return any line inadvertently included that has just whitespace (a space or tab). The “\S” regex will drop those along with the null lines.
Good point. Performance aside, an expression like this would do the trick and be easy to follow.
get-content computers.txt -readcount 0 | where {$_ -match “\S”}
That is easier to follow. I wouldn’t have mentioned the whitespace if I hadn’t done the same thing myself (especially at the end of the file) and had it come back to bite me later.
It was interesting (to me at least). 🙂
Good points. However, even though (Where{$_.boolVar -eq $true} is in fact longer and unnecessary, I do think it “reads” better for someone who has to modify code that is not versed in Powershell.
Its purely a stylistic opinion, certainly not a functional one.
I had the same concern and really thought about what this means to a new PowerShell user. In the end I decided that this falls into the category of paradigm shift. While using the -eq operator is easier to read, that is only true for novice PowerShell users. Once you really get PowerShell an expression like Where {$_.company} is just as easy to understand. But curiously when it comes to performance the operator is faster. Thanks for your feedback.
It’s faster because you’re specifying a single test: is it null. Without that, PowerShell tries to coerce the value to a bool, and it knows several ways of doing that 😉
Something to be careful about when leaving out “-ne $Null”: when filtering numeric values, zero evaluates to $False.
For example:
$a = @(0,3,$null,6)
‘No Filter’ # Results in ‘4’
$n=0;$a| foreach {$n++} -End {$n}
‘Null Filter’ # Results in ‘3’
$n=0;$a| where {$_ -ne $Null}| foreach {$n++} -End {$n}
‘$_ Filter’ # Results in ‘2’
$n=0;$a| where {$_}| foreach {$n++} -End {$n}
It’s a very helpful technique – and it might be worth explaining that it works based on what PowerShell treats as FALSE and what as TRUE.
Any non-Zero Number, or Non-Empty string, or Non-Empty Array is true.
Null , “” , 0 or an empty array is False.
If you are dealing with numbers
where {$_}
orWhere {$_.property}
will drop zeros which is not always what you want.The are some funnies. The String “False” is non empty and therefore true, and the array @($false, $false) is also true – proving that two wrongs can make a right.