Tuesday, February 18, 2020

Azure Blob Storage - List and Download Files with PowerShell

Azure Blob Storage - List and Download Files

Azure vs On-Premises

In emerging world of Cloud solutions many services and databases have been moved to the Cloud, and one would think everything will be migrated to Azure over night. However, there is still a lot of applications and data sources still residing On-Premises and it will be hard to expect that all of them will be moved to Azure in near future.

The close relationship between Azure and On-Premises will remain, especially in a hybrid type architecture. Also, I find it very handy to have files locally for data load troubleshooting and testing purposes. For example, if you need to download a single file without connecting to Azure Portal online or accessing through Azure Storage Explorer on your local computer, it would be easier to have some sort of code or function where you can just provision name of the file(s) to be downloaded and the program will do the job for you.

Problem

The challenge we are facing here is how to programmatically download files from Azure Blob Storage to On-Premises or local machine. Azure Storage path looks similar to any other storage device and follows the sequence: Azure Storage -> container -> folder -> subfolder -> file.
There are various ways to download the files in this type of environment and the following are supported languages: .NET, Java, Node.js, Python, PHP, Ruby, and Go. Also, scripting languages PowerShell and Azure CLI are available too.

Solution

We already mentioned the languages supported to manage Blob Storage, but in this demo we will be focused on PowerShell solution specifically. PowerShell scripting definitely provides easy of implementation and great flexibility, and in this case it's a good choice.

Environment

Prerequisites:
  1. PowerShell, Visual Studio code or any other editor with PowerShell scripts supported
  2. Install Az PowerShell module (see below)
  3. Azure Subscription - check Visual Studio Essentials for one year free of Azure and other tools and services.
  4. Azure Storage Explorer
  5. Azure Blob Storage - provision under the same Azure subscription
  6. Container created in Azure Blob Storage
  7. Collect Azure Storage information: account name, account key, container name
  8. Create a working folder locally and make note of it (default is c:\temp)

Install PowerShell Az module:

#Install Azure module
Install-Module -Name Az -AllowClobber -Scope CurrentUser

#Import Azure module
Import-Module Az

Samples:

**Beware that the following solution uses the latest Az module versus older AzureRM module.


Quick Solution


A Simple Download/Upload

Before we dig deeper into the code, downloading the file is very easy and pretty straight forward. A simple command will do the job:

Get-AzStorageBlobContent -Container "ContainerName" -Blob "MyFile.txt" -Destination "C:\test\"

Also, if we want to upload the file with single line of code this can help:

Set-AzureStorageBlobContent -Container "ContainerName" -File ".\PlanningData.csv" -Blob "Planning2015.csv"

Download Single File

The solution requires setting the variable values for both, your Azure and local environments:

  • $StorageAccountName - the name of Azure Storage account
  • $StorageAccountKey - Azure Storage account key
  • $containerName - name of the container within storage account
  • $fileNameSrcPath - folder path and file name within the container
  • $fileNameDestPath - folder path and file name on your local computer

#Connect to Azure
    Connect-AzAccount

    #storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    #container name
    $containerName = "salesdata"
    
    
    #Initialize variables 
    $fileNameSrcPath = "2016/01/01.txt"
    
    #Default location, but if there is a better location feel free to change:
    $fileNameDestPath = "C:\Temp\sales_download\01.txt"
    
    
    $folderDestPath = $fileNameDestPath.Substring(0, $fileNameDestPath.LastIndexOf("\"))
    
    
    #Create destination folder if it doesn't exist
    If(!(test-path $folderDestPath))
    {
          New-Item -ItemType Directory -Force -Path $folderDestPath
    }
    
    #Get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName
    
    
    #Download Blob
    Get-AzStorageBlobContent -Container $containerName -Blob $fileNameSrcPath -Destination $fileNameDestPath -Context $Ctx -Force
    


Download All Files From an Azure Storage Container

Prior to running the script set the following variables depending on your local and Azure environment:

  • $StorageAccountName - Azure Storage Account name
  • $StorageAccountKey - Azure Storage Account key
  • $containerName - name of the container under the same storage account
  • $DestinationRootFolder - destination folder on your local machine

#Connect to Azure
    Connect-AzAccount

    #storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    $containerName = "salesdata"
    
    #get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName
    
    #Destination folder - change if different
    $DestinationRootFolder = "C:\temp\sales_download\AllFiles\"
    
    #Create destination folder if it doesn't exist
    If(!(test-path $DestinationRootFolder))
    {
          New-Item -ItemType Directory -Force -Path $DestinationRootFolder
    }
    
    #Loop through the files in a container
    foreach($bl in $ListBlobs)
    {
           
        $BlobFullPath = $bl.Name
    
        Write-Host ""
        Write-Host ("File Full Path: " + $BlobFullPath)
        
        #Get blob folder path
        $SourceFolder = $BlobFullPath.Substring( 0, $BlobFullPath.LastIndexOf("/")+1)
        Write-Host ("Source Folder Path: " + $SourceFolder)
    
        #Build destination path based on blob path
        $DestinationFolder = ($DestinationRootFolder + $SourceFolder.Replace("/","\") ).Replace("\\","\")
        Write-Host ("Destination Folder Path: " + $DestinationFolder)
    
        #Create local folders
        New-Item -ItemType Directory -Force -Path $DestinationFolder
              
    
        Write-Host "Blob: " 
        $DestinationFilePath = $DestinationRootFolder + $BlobFullPath.Replace("/", "\")
        Write-Host ("Destination File Path: " + $DestinationFilePath)
    
        #Download file
        Get-AzStorageBlobContent -Container $containerName -Blob $BlobFullPath -Destination $DestinationFilePath -Context $Ctx -Force
    
    }
    
    Write-Host ("")
    Write-Host ("Download completed...")
    


Download All Files From a Selected Folder in Azure Storage Container

Prior to running the script set the following variables depending on your local and Azure environment:

  • $StorageAccountName - Azure Storage Account name
  • $StorageAccountKey - Azure Storage Account key
  • $containerName - name of the container under the same storage account
  • $DestinationRootFolder - destination folder on your local machine
  • $srcBlobFolder - folder/subfolder with wildcard (*)

    #Connect to Azure
    Connect-AzAccount

    #storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    $containerName = "salesdata"

    $srcBlobFolder = "2016/01/*"

    #get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName -Blob $srcBlobFolder

    #Destination folder - change if different
    $DestinationRootFolder = "C:\temp\sales_download\AllFiles\"

    #Create destination folder if it doesn't exist
    If(!(test-path $DestinationRootFolder))
    {
          New-Item -ItemType Directory -Force -Path $DestinationRootFolder
    }

    #Loop through the files in a container
    foreach($bl in $ListBlobs)
    {
       
        $BlobFullPath = $bl.Name
    
        Write-Host ""
        Write-Host ("File Full Path: " + $BlobFullPath)
    
        #Get blob folder path
        $SourceFolder = $BlobFullPath.Substring( 0, $BlobFullPath.LastIndexOf("/")+1)
        Write-Host ("Source Folder Path: " + $SourceFolder)

        #Build destination path based on blob path
        $DestinationFolder = ($DestinationRootFolder + $SourceFolder.Replace("/","\") ).Replace("\\","\")
        Write-Host ("Destination Folder Path: " + $DestinationFolder)

        #Create local folders
        New-Item -ItemType Directory -Force -Path $DestinationFolder
          

        Write-Host "Blob: " 
        $DestinationFilePath = $DestinationRootFolder + $BlobFullPath.Replace("/", "\")
        Write-Host ("Destination File Path: " + $DestinationFilePath)

        #Download file
        Get-AzStorageBlobContent -Container $containerName -Blob $BlobFullPath -Destination $DestinationFilePath -Context $Ctx -Force

    }

    Write-Host ("")
    Write-Host ("Download completed....")


How to Get the List of Files

It would be beneficial to consider another aspect of the solution and in this case focus on different PowerShell option for getting the list of the files in a Blob storage. This will help to understand what's been implemented above as well as it can be very useful to have these scripts on hand whenever there is a need for listing the objects.

Again, there is a great flexibility here by filtering objects or collections. For example, if you want to know which files were placed in the sales folder for last year you could just specify -Blob "LastYear/*.csv" parameter. As you may noticed, I used wildcard to get csv list of files because probably, the main interest was to get the report of all of the files for last year.


Get List Of All Files (Select-Object)

The easiest way to get list of files from blob is a combination of Get-AzStorageBlob to get the collection of the files into variable $ListBlobs and then use pipline style commandlet to get list by using name property.

#Connect to Azure
Connect-AzAccount

#storage account
$StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
#storage key
$StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
#Container name - change if different
$containerName = "salesdata"

#get blob context
$Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName

#Blob count
Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")
#List blob files
$ListBlobs | Sort-Object -Property Name | Select-Object -Property Name


Get List Of Files in a Folder (Select-Object)

The following code is using wildcard to filter the files in a blob and in this case we are interested in January 2016 sales only. The result is list of files in 2016/01/ folder. The command to focus on is: $srcBlobFolder = "2016/01/*".

#Connect to Azure
Connect-AzAccount

#storage account
$StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
#storage key
$StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
#Container name - change if different
$containerName = "salesdata"

#Blob folder/subfolder 
$srcBlobFolder = "2016/01/*"

#get blob context
$Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName -Blob $srcBlobFolder 

#Blob count
Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")
#List blob files
$ListBlobs | Sort-Object -Property Name | Select-Object -Property Name


Get List Of Files using Foreach Loop

In previous 2 examples pipeline commands made it easy and on the other foreach loop brings up more complexity to our code, but it provides more flexibility and control over the code. The following example demonstrates iterative loop through each folder and it will list files one by one. Also, list of blobs retrieved with Get-AzStorageBlob commandlets will contain full file path, so extract the file name has been performed using Substring function.

#Connect to Azure
    Connect-AzAccount

    #Storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #Storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    #Container name - change if different
    $containerName = "salesdata"
    
    #get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName
    
    #Blob count
    Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")
    
    foreach($bl in $ListBlobs)
    {
    
        ##Write-Host "Blob: " 
        Write-Host ("File Full Path: " + $bl.Name)
    
        Write-Host ("Folder Path: " + $bl.Name.Substring( 0, $bl.Name.LastIndexOf("/")+1) )
        
        #Extract file name from full path
        Write-Host ("File Name: " + $bl.Name.Substring( $bl.Name.LastIndexOf("/") + 1, $bl.Name.Length - $bl.Name.LastIndexOf("/")-1 ) )
        Write-Host ""
    
    }
    



PowerShell Examples

Below are some key examples of PowerShell storage management commandlets, but for more examples and further consideration explore the full list at Az.Storage .

Example 1: List all blobs in a container

Get-AzStorageContainer -Name container*

Example 2: List blob in a container by name using wildcard

Get-AzStorageBlob -Container "ContainerName" -Blob blob*


Example 3: Download Blob content by name

Get-AzStorageBlobContent -Container "ContainerName" -Blob "Blob" -Destination "C:\test\"

Example 4: Upload file to blob container

Set-AzureStorageBlobContent -Container "ContosoUpload" -File ".\PlanningData" -Blob "Planning2015"


Example 5:

Set-AzStorageBlobContent -Container "ContosoUpload" -File ".\PlanningData" -Blob "Planning2015"



Get-ChildItem -File -Recurse | Set-AzStorageBlobContent -Container "ContosoUploads"



Get-AzStorageFileContent -ShareName "ContosoShare06" -Path "ContosoWorkingFolder/CurrentDataFile"



Get-AzStorageShare -Name "ContosoShare06"



New-AzStorageContext -StorageAccountName "ContosoGeneral" -StorageAccountKey "< Storage Key for ContosoGeneral ends with==>"



New-AzStorageShare -Name "ContosoShare06"



Remove-AzStorageBlob -Container "ContainerName" -Blob "BlobName"



Remove-AzStorageBlob -Container "ContainerName" -Blob "BlobName"


Solution Explanation

Let's review all of the steps used from previous examples and have a quick walk through the individual PowerShell commandlets.

1. Connect to Azure

Connecting to Azure is pretty straight forward, use the command below and Azure pop up window will appear on your screen.

Connect-AzAccount

2. Connect to Azure Storage

Enter previously captured information: storage account name, key and container name.

$StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    $containerName = "salesdata"
    

Create blob context - this step is required to authenticate against your Blob storage.

$Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey

Get the context of your container, which will place it in an object. From there you would have access to all of its properties and use them from there (see next section).

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName

3. Get The List of Blobs

Select list of blob files using PowerShell pipelines. In this case I'm interested only in name property, but feel free to explore others.

$ListBlobs | Sort-Object -Property Name | Select-Object -Property Name

Blob count is additional information that can be useful and it can be easily extracted from object collected earlier.

Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")

4. Get the List of Files in a Folder

Based on the previous example, adding -Blob $srcBlobFolder parameter will filter it down to a specific folder, which means it will download only the files from that folder.

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName -Blob $srcBlobFolder 


5. Create Local Folder

The following code snippet will check if folder exists and create one if it doesn't.

$DestinationRootFolder = "C:\temp\sales_download\AllFiles\"

If(!(test-path $DestinationRootFolder))
{
      New-Item -ItemType Directory -Force -Path $DestinationRootFolder
}


6. Foreach Loop

Below is final part to loop over files in subfolders and download each of the files. Also, in this example code is described in comments below.

#Loop through the files in a container
foreach($bl in $ListBlobs)
{
    #Get the full path name   
    $BlobFullPath = $bl.Name
    
    #Print out full path name
    Write-Host ""
    Write-Host ("File Full Path: " + $BlobFullPath)
    
    #Get blob folder path - without file name and extension
    $SourceFolder = $BlobFullPath.Substring( 0, $BlobFullPath.LastIndexOf("/")+1)
    Write-Host ("Source Folder Path: " + $SourceFolder)

    #Build destination path based on blob path - follows the same subfolder structure
    $DestinationFolder = ($DestinationRootFolder + $SourceFolder.Replace("/","\") ).Replace("\\","\")
    Write-Host ("Destination Folder Path: " + $DestinationFolder)

    #Create local folders - Force parameter will allow to create the folder path even the subfolders don't exist
    New-Item -ItemType Directory -Force -Path $DestinationFolder
          
    #Print out destination folder path
    Write-Host "Blob: " 
    $DestinationFilePath = $DestinationRootFolder + $BlobFullPath.Replace("/", "\")
    Write-Host ("Destination File Path: " + $DestinationFilePath)

    #Finally, download the file - use of previously collected variables
    Get-AzStorageBlobContent -Container $containerName -Blob $BlobFullPath -Destination $DestinationFilePath -Context $Ctx -Force

}



What's next

Here are just a few ideas how to use and enhance previously explained examples:

  • Upload the files - explore how to upload files to Blob using similar logic
  • Integrated Azure authentication without login required. In the previous examples we used Connect-AzAccount which always requires login information, but there is a way to connect to Azure Blob directly. This is especially handy in case of automation when there is no user interaction needed to authenticate our connection.
  • Automation - integrate the solution as a service (Azure Function, Azure Runbook, ETL)
  • AzCopy - command line utility to copy data from and to Azure Blob


Happy PowerShell coding!


No comments:

Post a Comment