Last week on Twitter I saw a discussion about a git related problem. The short version of the story is that the person was running out of disk space and didn't understand why. Turns out this person has several development projects using git. All of the change tracking and other related activities are stored in a hidden .git folder. The amount of data in this folder had gotten out of hand. This is not necessarily unexpected behavior. In fact, you can, and should, periodically run git gc to perform basic housekeeping. That part is easy. I have a number of git managed projects. What I was more interested in was determining how big the folder had become. Because git uses a hidden folder it is pretty easy to forget about it. I wrote a PowerShell function so that I wouldn't.
ManageEngine ADManager Plus - Download Free Trial
Exclusive offer on ADManager Plus for US and UK regions. Claim now!
The Basics
When creating any function, I always recommend starting with a command or set of commands that you can run in an interactive session that achieves the essence of your desired goal. In this case, it is pretty simple to get a directory listing of files and determine the total size.
Get-ChildItem -Path C:\scripts\PSScriptTools\.git -file -Recurse | Measure-Object -Property Length -sum
Now that I know the expression works, I can use it as the basis of my function. I already have full cmdlet and parameter names so that's a plus. Now to think about how I intend to use this.
I know the folder will always be called .git. I will most likely pass a parent directory path and if .git exists, then run the calculations. This means I can define a Path parameter and configure it to accept pipeline input.
[Parameter(Position = 0, ValueFromPipeline, ValueFromPipelinebyPropertyName)] [alias("pspath")] [ValidateScript( {Test-Path $_})] [string]$Path = "."
Notice that I've also created an alias, pspath. This is because most of my repositories are under C:\Scripts and I want to be able to run dir c:\scripts -directory and pipe to my function. Through experience I have found that if I define the pspath alias, this works much easier.
With the path in hand, I can construct a path to test.
$full = Convert-Path -Path $Path $git = Join-Path -Path $full -childpath ".git"
I am converting the path so that I get a full file system path. I set a default value for $path of '.' and I want to be able to convert that to a full file system name. Notice also the use of Join-Path. There is no reason to use concatenation to build paths. It is easy to mess up and personally I see it as a sign of a beginner.
Let me point out that the real objective of this article is how I built this function and the techniques I used. The end result is useful but not my goal for this post. There are plenty of git and PowerShell related tools available.
Function Output
When you create a PowerShell function, it should only do one thing and only write one type of object to the pipeline. With this function I decided to create a simple object.
[PSCustomObject]@{ Path = $full Files = $stat.count Size = $Size } #customobject
I'm only expecting to run this locally so I didn't think I needed a Computername property. I suppose I could have added a DateTime property. That could come in handy if I were logging. I could have grabbed data from git like branches. But that starts veering into a different set of requirements. All I need of this function is an indication of how big the folder is so I know if I need to run git gc.
Plan Ahead
That said, I added one additional feature to my function. Measure-Object is providing a value in bytes. Well, I'm not very good at quickly telling at a glance if something is 100KB or 100MB. Usually, I would tell people NOT to include any sort of formatting or data manipulation in your function. I could just as easily pipe my output to Select-Object and create a custom property dividing the sum by 1KB, or 1MB. However, I am constantly telling PowerShell toolmakers that you have to think about who is going to use your tool and how. In this case it is me and I know that will most likely want to see the value in something other than bytes. So I added a $As parameter.
[ValidateSet("kb", "mb", "gb")] [string]$As
Also note that I am using a parameter validation technique. I am limiting parameter choices to these 3 options. Even better, because I'm using ValidateSet, PowerShell will use these values with tab completion!
The Function
For now, the complete function is up on Github.
https://gist.github.com/jdhitsolutions/cbdc7118f24ba551a0bb325664415649
I have some other plans for this function. But for now I can use it to check a single folder.
Or I can check an entire directory.
dir c:\scripts -directory | get-gitsize | Out-GridView -Title "Git Report"
With a little extra effort I could build a simple control script.
dir c:\scripts -Directory | Get-GitSize | where-object {$_.size -ge 25MB} | Out-GridView -Title "Select repos to compress" -PassThru | foreach-object { set-location $_.Path git gc --aggressive }
You may have other ideas in mind on how to use this. If so, I hope you'll share. Enjoy.
Nice code. I think most is common work with getting values from files, piping, building functions and so on. But I’m happy to see more folks using PS custom objects for structured return values. Tank you for that. 😉
By the way. The solution to format the size with an extra “as” param ist nice on one hand but I prefer the solution of getting two values. One “size” and one “size in mb” for example.
Thank you. I really debated about whether the -As parameter was the best design practice. But for a stand-alone function I decided to opt for convenience. There are better ways to accomplish this and I might write about them.