Filtering Empty Values in PowerShell

Posted on August 3, 2011

I saw this tip today and wanted to leave a comment but couldn't see how. So I thought I'd post my comments here. This is actually a question I see often and there are better ways to write this kind of code.

Manage and Report Active Directory, Exchange and Microsoft 365 with
ManageEngine ADManager Plus - Download Free Trial

Exclusive offer on ADManager Plus for US and UK regions. Claim now!

The posted tip used an example where you wanted to find processes where the company name is defined. The way suggested in the tip, and a technique I see often goes something like this:

[cc lang="PowerShell"]
PS C:\> get-process | where {$_.Company -ne $Null} | Sort Company| Select Name,ID,Company
[/cc]

While it mostly works, this is a better PowerShell approach, in my opinion.

[cc lang="PowerShell"]
PS C:\> get-process | where {$_.Company} | Sort Company| Select Name,ID,Company"
[/cc]

When I run the first technique, I still got a blank company name. The tip offers a work around for this situation like this:

[cc lang="PowerShell"]
PS C:\> get-process | where {$_.Company -ne $Null -AND $_.company -ne ''} | Sort Company| Select Name,ID,Company
[/cc]

This gives the same result as my suggested approach. My approach uses Where-Object to say, if the Company property exists, pass on the object. If you wanted to find processes without a company name, then use the -NOT operator.

[cc lang="PowerShell"]
PS C:\> get-process | where {-Not $_.Company}
[/cc]

I use a similar technique to filter out blank lines in text files.

[cc lang="PowerShell"]
get-content computers.txt | where {$_} ...
[/cc]

While we're on the subject, a related filtering technique I often see involves boolean properties. You don't have to do this:

[cc lang="PowerShell"]
PS C:\> dir | where {$_.PsIsContainer -eq $True}
[/cc]

PsIsContainer is a boolean value, so let Where-Object simply evaluate it:

[cc lang="PowerShell"]
PS C:\> dir | where {$_.PsIsContainer}
[/cc]

As above, use -Not to get the inverse. Don't feel you need to explicitly evaluate properties in a Where-Object expression. I see this is a VBScript transition symptom that I hope you can break.

20 thoughts on “Filtering Empty Values in PowerShell”

Mike Shepard says:

August 3, 2011 at 12:07 pm

I like this. It’s just one of several ways to accomplish more by “writing less PowerShell”.

Thanks
Rob Campbell says:

August 3, 2011 at 9:49 pm

FWIW

Tests about 5x faster for removing blank lines from text
(gc computers.txt) -match “\S”
1. Jeffery Hicks says:
  
  August 4, 2011 at 7:36 am
  
  That doesn’t surprise me, especially for a large file. There’s no pipelined expression here. Where-Object can’t really do it’s thing until Get-Content finishes. I like your idea, although it might be a little advanced for someone just starting out. And assuming most people are parsing a relatively small text file of computernames, (typically where I see this behavior), the gain is likely irrelevant. Still, I ran some tests, and even the “bad” approach performs well.
  
  PS C:\work> (measure-command {gc tempfiles.txt | where {$_ -ne $null}}).TotalMilliseconds
  27.6494
  PS C:\work> (measure-command {gc tempfiles.txt | where {$_}}).TotalMilliseconds
  41.9788
  PS C:\work> (measure-command {(gc tempfiles.txt) -match “\S”}).TotalMilliseconds
  5.7939
  
  However, look what happens when you read the entire file
  
  PS C:\work> (measure-command {(gc tempfiles.txt -readcount 0) -match “\S”}).TotalMilliseconds
  20.9278
  PS C:\work> (measure-command {gc tempfiles.txt -readcount 0| where {$_}}).TotalMilliseconds
  3.3637
  PS C:\work> (measure-command {gc tempfiles.txt -readcount 0 | where {$_ -ne $null}}).TotalMilliseconds
  2.8742
  
  The file I tested had 192 lines total, including about 10 blanks. All very interesting.
  1. Rob Campbell says:
    
    August 4, 2011 at 11:01 am
    
    I replicated that test, but by stacking all three tests into the ISE, and running them one after the other, and doing that multiple times.
    
    The first test of the first pass took considerably longer, but then dropped off dramatically on subsequent passes.
    
    I think the first test of the first pass is preloading the disk read cache for the subsequent tests and skewing the results.
  2. Jeffery Hicks says:
    
    August 4, 2011 at 11:39 am
    
    I didn’t take caching into account so I should retest. Because of the way the ISE scopes I wouldn’t trust it. Test with the shell.
  3. Rob Campbell says:
    
    August 4, 2011 at 12:08 pm
    
    Ran the same test in the shell (by putting the tests into a scriptblock and repeatedly invoking he scriptblock) and got the same result. The first test is much slower the first time the scriptblock is invoked, and then faster on subsequent invocations.
  4. Jeffery Hicks says:
    
    August 4, 2011 at 12:10 pm
    
    What results did you get for each scenario?
  5. Rob Campbell says:
    
    August 4, 2011 at 12:19 pm
    
    Haven’t figure out how to clear the read cache (the first time I tested, it took 9 seconds for the first pass of the first test. After that, I couldn’t reproduce it. Subsequent tests look like this:
    
    [PS] C:\testfiles>$test = {
    >> (measure-command {(gc test2.txt -readcount 0) -match “\S”}).TotalMilliseconds
    >> (measure-command {gc test2.txt -readcount 0| where {$_}}).TotalMilliseconds
    >> (measure-command {gc test2.txt -readcount 0 | where {$_ -ne $null}}).TotalMilliseconds
    >> }
    >>
    [PS] C:\testfiles>
    [PS] C:\testfiles>&$test
    0.9636
    1.0969
    0.9473
    [PS] C:\testfiles>&$test
    0.923
    1.5215
    0.9554
    [PS] C:\testfiles>&$test
    0.9319
    0.9591
    0.9372
  6. Rob Campbell says:
    
    August 4, 2011 at 12:21 pm
    
    Sorry, that should have been “the first pass of the first test took 9 milliseconds”.
  7. Jeffery Hicks says:
    
    August 4, 2011 at 12:40 pm
    
    I created script blocks and used Invoke-Command which should start a new rujnspace for each command.
    
    PS C:\work> $a={invoke-command {(Measure-Command {get-content c:\work\tempfiles.txt -read 0 | where {$_}} ).TotalMilliseconds}}
    PS C:\work> $b={invoke-command {(Measure-Command {get-content c:\work\tempfiles.txt -read 0 | where {$_ -ne $null -AND $_ -ne ‘ ‘}} ).TotalMilliseconds}}
    PS C:\work> $c={invoke-command {(Measure-Command {(get-content c:\work\tempfiles.txt -read 0) -match “\S”} ).TotalMilliseconds}}
    PS C:\work> &$a;&$b;&$c
    1.5115
    1.4538
    1.2318
    PS C:\work> &$a;&$b;&$c
    1.3313
    1.4474
    1.2369
    PS C:\work> &$a;&$b;&$c
    1.9138
    1.4198
    1.1715
    
    I waited 1-2 minutes between each command.. If nothing else I hope people pick up a few things on testing methodology.
  8. Rob Campbell says:
    
    August 4, 2011 at 2:17 pm
    
    On a slightly different tack, in the scenario of parsing a list of computer names from a file, the boolean tests will also return any line inadvertently included that has just whitespace (a space or tab). The “\S” regex will drop those along with the null lines.
  9. Jeffery Hicks says:
    
    August 4, 2011 at 2:40 pm
    
    Good point. Performance aside, an expression like this would do the trick and be easy to follow.
    
    get-content computers.txt -readcount 0 | where {$_ -match “\S”}
  10. Rob Campbell says:
    
    August 4, 2011 at 2:45 pm
    
    That is easier to follow. I wouldn’t have mentioned the whitespace if I hadn’t done the same thing myself (especially at the end of the file) and had it come back to bite me later.
2. Rob Campbell says:
  
  August 4, 2011 at 1:01 pm
  
  It was interesting (to me at least). 🙂
WIDBA says:

August 4, 2011 at 8:56 am

Good points. However, even though (Where{$_.boolVar -eq $true} is in fact longer and unnecessary, I do think it “reads” better for someone who has to modify code that is not versed in Powershell.

Its purely a stylistic opinion, certainly not a functional one.
1. Jeffery Hicks says:
  
  August 4, 2011 at 9:10 am
  
  I had the same concern and really thought about what this means to a new PowerShell user. In the end I decided that this falls into the category of paradigm shift. While using the -eq operator is easier to read, that is only true for novice PowerShell users. Once you really get PowerShell an expression like Where {$_.company} is just as easy to understand. But curiously when it comes to performance the operator is faster. Thanks for your feedback.
  1. Joel "Jaykul" Bennett says:
    
    August 4, 2011 at 1:11 pm
    
    It’s faster because you’re specifying a single test: is it null. Without that, PowerShell tries to coerce the value to a bool, and it knows several ways of doing that 😉
Ryan Grant says:

August 4, 2011 at 11:49 am

Something to be careful about when leaving out “-ne $Null”: when filtering numeric values, zero evaluates to $False.

For example:

$a = @(0,3,$null,6)

‘No Filter’ # Results in ‘4’
$n=0;$a| foreach {$n++} -End {$n}

‘Null Filter’ # Results in ‘3’
$n=0;$a| where {$_ -ne $Null}| foreach {$n++} -End {$n}

‘$_ Filter’ # Results in ‘2’
$n=0;$a| where {$_}| foreach {$n++} -End {$n}
Pingback: Episode 158 – Phil Haack and Andrew Nurse from Microsoft about NuGet and PSGet « PowerScripting Podcast
James O'Neill says:

August 29, 2011 at 8:29 am

It’s a very helpful technique – and it might be worth explaining that it works based on what PowerShell treats as FALSE and what as TRUE.
Any non-Zero Number, or Non-Empty string, or Non-Empty Array is true.
Null , “” , 0 or an empty array is False.
If you are dealing with numbers where {$_} or Where {$_.property} will drop zeros which is not always what you want.
The are some funnies. The String “False” is non empty and therefore true, and the array @($false, $false) is also true – proving that two wrongs can make a right.