Tuesday, February 16, 2021

Azure Data Factory - Array Filter on an Array

Azure Blob - List and Download Files

ADF Functions

ADF offers an expression language for data transformation, control flows and other operations. It is a very intuitive and documentation supported language, but in many cases requires finding the best way how to utilize those expressions and also, it requires more of a developer's thinking for resolving certain problems. We will focus on arrays this time or some people like to call them lists even though there is a distinction between the two across many programming languages.

Problem

We will tackle a challenge of finding common elements between two lists of an entity. For example, match betwen CarBrandList=["Ford", "Chevrolet", "Buick", "Volkswagen", "Lexus"] and FilteredCarBrandList=["Ford", "Volkswagen"] should result in ResultCarBrandList=["Ford", "Volkswagen"] because those are two values appearing in both lists.

  • CarBrandList=["Ford", "Chevrolet", "Buick", "Volkswagen", "Lexus"]
  • FilteredCarBrandList=["Ford", "Volkswagen"]
  • ResultCarBrandList=["Ford", "Volkswagen"]

Quick Solution

Use of either, "intersection" or "contains" function will do the work depending on the type of ADF activity used:

  • Expression in "Set Variable" activity: @intersection(pipeline().parameters.FileList, pipeline().parameters.FilterFileList)

    Pipeline configuration:

    • ADF actvity: Set Variable
    • Pipeline parameters:FileList=["File1.txt","File2.txt","File3.txt"], FilterFileList=["File2","File3"]
    • Variable: var_ResultList
    • Expected result: var_ResultList=["File2.txt","File3.txt"]



  • Expression in "Filter" activity: @contains(pipeline().parameters.FilterFileList, item().name )

    Pipeline configuration:

    • ADF actvities: Get Metadata, Filter
    • Pipeline parameters:FilterFileList=["File2.txt","File3.txt"]
    • Get Metadata result returned from Azure Blob Storage: ["File1.txt","File2.txt","File3.txt"]
    • Expected result: ["File2.txt","File3.txt"]




Environment

  • Azure subscription
  • Azure Data Factory
  • Test files - create empty files for the following demo: "File1.txt","File2.txt","File3.txt"
  • Azure Blob Storage - test file uploaded to Blob Storage

Collection Functions

Before starting with examples, let’s take a look for available functions in ADF, and "contains" and "intersection" look like obvious choice.



"Contains" Function


After a a quick look into MS documentation and "contains" looks like a logical function, which checks a value existence in a collection.




"Intersection" Function

As per documentation, "intersection" function works as a slicer between multiple collections or as mentioned, let’s call them lists or arrays or values.




Solution

  1. Array filter - "Contains" Function – We are going to test 2 different scenarios here by using "Set Variable" activity:
    1. Filter array by a single string value - This first example filters parameter FileList by a single value and it returns the value if it exists in FileList.

      @if( contains(pipeline().parameters.FileList, pipeline().parameters.FilterFile), pipeline().parameters.FilterFile, '')
    2. Filter array by another array – Filter is not working in this way because it’s not able to filter array by another array. I found documentation contradictory here because it states that array can be used as a filter under "value" property, but from some reason it doesn’t work. First thing we could think of is to use Foreach activity, but that’s a performance expensive solution. Keep with me for the next example to resolve the issue.

      @if( contains(pipeline().parameters.FileList, pipeline().parameters.FilterList), pipeline().parameters.FilterList, array('User error: No match found'))

  2. Array Filter - "Intersection" Function – Intersection function looks like a more obvious choice when dealing with collections and it will return common values that are included in both arrays. "Set Variable" activity has bee used for demo purposes.

    @intersection(pipeline().parameters.FileList,pipeline().parameters.FilterFileList)

  3. Blob List - Array Filter – Let’s use a real-life example on filtering file list returned from Blob Storage, where we are trying to find file existence in the blob storage. You could imagine if you have thousands of files sitting in your Azure Blob Storage and you want to know if there is only 2 files there and based on that you could decide what to do with them further up the stream in your pipeline. Solution is very simple, it’s the way to filter Blob Storage returned file list by our own list of files specified in an array.

    "Metadata" activity has been used to return list of files from Blob Storage and "Filter" activity to find common elements between our own file search list and Blob Storage file list.

    Let’s consider both "intersection" and "contains" functions for the following examples.

    1. Using "intersection" function:

      There is a little bit of work that needs to be done here and it’s around "Filter" activity in ADF, where we have to retrieve items first by using "childitems" property returned from Metadata Activity, and then pass the values or collection of items by "item().name" to be filter by our parameter "pipeline().parameters.FileList". Another thing to notice is that "createArray(item().name)" has been used because value returned in "item().name" is in a string format and we are trying to compare it with another array, so it requires converting our value to an array. Also, "not-empty" has been used to return a logical result instead of collection returned by "intersection" function.

      Filter activity setup:

      • Items: @activity('Metadata_BlobStorage').output.childitems
      • Condition: @not(empty( intersection( createArray(item().name), pipeline().parameters.FileList) ) )

      The solution works fine and it returns expected result, but it just seems like a lot of work around "intersection" function and using 3 different function is always a threat to performance. Notice that there is "item().name" property, which means that "Filter" activity works as an iterator selecting individual items from "childitems" and "item().name" will be a single string value accordingly.

    2. Using "contains" Function:

      With "contains" function things look much easier and value returned from "item().name" is being compared with "parameters.FileList" for existence. This looks like a simpler implementation and we will take it as our final solution.

      Filter activity setup:

      • Items: @activity('Metadata_BlobStorage').output.childitems
      • Condition: @contains(pipeline().parameters.FilterFileList, item().name )

Conclusion

Both, "contains" and "intersection" function look like great options and based on the demo from above, the following are recommendations:

  • Filter activity – use contains function.
  • Set Variable activity – use intersection function.

Azure Data Factory - Demo Setup

  • "PL_01-ArrayContains" Pipeline

    1. Parameters


    2. Variables


    3. Set Variable "ContainsString" activity


    4. Set Variable "ContainsString" activity - expression

      Expression: @if( contains(pipeline().parameters.FileList, pipeline().parameters.FilterFile), pipeline().parameters.FilterFile, '')



    5. Set Variable "ContainsString" activity - result


    6. Set Variable "ContainsArray"


    7. Set Variable "ContainsArray" activity - expression

      Expression: @if( contains(pipeline().parameters.FileList, pipeline().parameters.FilterList), pipeline().parameters.FilterList, array('User error: No match found'))



    8. Set Variable "ContainsArray" activity - result


  • "PL_02-ArrayIntersection" Pipeline

    1. Parameters


    2. Variables


    3. Set Variable "Intersection List" activity


    4. Set Variable "Intersection List" activity - Expression

      Expression: @intersection(pipeline().parameters.FileList,pipeline().parameters.FilterFileList)



    5. Set Variable "Intersection List" activity - result


  • "PL_03-ArrayFilterBlobList" Pipeline

    1. Parameters


    2. Blob Storage "DS_AZBS_Binary_txt" dataset - it requires linked service connection


    3. Get Metada "Metadata_BlobStorage" activity


    4. Filter "FilterByArray" activity


    5. Filter "FilterByArray" activity - Items expression

      Expression: @activity('Metadata_BlobStorage').output.childitems



    6. Filter "FilterByArray" activity - Condition expression

      Expression: @contains(pipeline().parameters.FilterFileList, item().name )



    7. Blob Storage - files uploaded


    8. Filter "FilterByArray" activity - result


Tuesday, February 18, 2020

Azure Blob Storage - List and Download Files with PowerShell

Azure Blob Storage - List and Download Files

Azure vs On-Premises

In emerging world of Cloud solutions many services and databases have been moved to the Cloud, and one would think everything will be migrated to Azure over night. However, there is still a lot of applications and data sources still residing On-Premises and it will be hard to expect that all of them will be moved to Azure in near future.

The close relationship between Azure and On-Premises will remain, especially in a hybrid type architecture. Also, I find it very handy to have files locally for data load troubleshooting and testing purposes. For example, if you need to download a single file without connecting to Azure Portal online or accessing through Azure Storage Explorer on your local computer, it would be easier to have some sort of code or function where you can just provision name of the file(s) to be downloaded and the program will do the job for you.

Problem

The challenge we are facing here is how to programmatically download files from Azure Blob Storage to On-Premises or local machine. Azure Storage path looks similar to any other storage device and follows the sequence: Azure Storage -> container -> folder -> subfolder -> file.
There are various ways to download the files in this type of environment and the following are supported languages: .NET, Java, Node.js, Python, PHP, Ruby, and Go. Also, scripting languages PowerShell and Azure CLI are available too.

Solution

We already mentioned the languages supported to manage Blob Storage, but in this demo we will be focused on PowerShell solution specifically. PowerShell scripting definitely provides easy of implementation and great flexibility, and in this case it's a good choice.

Environment

Prerequisites:
  1. PowerShell, Visual Studio code or any other editor with PowerShell scripts supported
  2. Install Az PowerShell module (see below)
  3. Azure Subscription - check Visual Studio Essentials for one year free of Azure and other tools and services.
  4. Azure Storage Explorer
  5. Azure Blob Storage - provision under the same Azure subscription
  6. Container created in Azure Blob Storage
  7. Collect Azure Storage information: account name, account key, container name
  8. Create a working folder locally and make note of it (default is c:\temp)

Install PowerShell Az module:

#Install Azure module
Install-Module -Name Az -AllowClobber -Scope CurrentUser

#Import Azure module
Import-Module Az

Samples:

**Beware that the following solution uses the latest Az module versus older AzureRM module.


Quick Solution


A Simple Download/Upload

Before we dig deeper into the code, downloading the file is very easy and pretty straight forward. A simple command will do the job:

Get-AzStorageBlobContent -Container "ContainerName" -Blob "MyFile.txt" -Destination "C:\test\"

Also, if we want to upload the file with single line of code this can help:

Set-AzureStorageBlobContent -Container "ContainerName" -File ".\PlanningData.csv" -Blob "Planning2015.csv"

Download Single File

The solution requires setting the variable values for both, your Azure and local environments:

  • $StorageAccountName - the name of Azure Storage account
  • $StorageAccountKey - Azure Storage account key
  • $containerName - name of the container within storage account
  • $fileNameSrcPath - folder path and file name within the container
  • $fileNameDestPath - folder path and file name on your local computer

#Connect to Azure
    Connect-AzAccount

    #storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    #container name
    $containerName = "salesdata"
    
    
    #Initialize variables 
    $fileNameSrcPath = "2016/01/01.txt"
    
    #Default location, but if there is a better location feel free to change:
    $fileNameDestPath = "C:\Temp\sales_download\01.txt"
    
    
    $folderDestPath = $fileNameDestPath.Substring(0, $fileNameDestPath.LastIndexOf("\"))
    
    
    #Create destination folder if it doesn't exist
    If(!(test-path $folderDestPath))
    {
          New-Item -ItemType Directory -Force -Path $folderDestPath
    }
    
    #Get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName
    
    
    #Download Blob
    Get-AzStorageBlobContent -Container $containerName -Blob $fileNameSrcPath -Destination $fileNameDestPath -Context $Ctx -Force
    


Download All Files From an Azure Storage Container

Prior to running the script set the following variables depending on your local and Azure environment:

  • $StorageAccountName - Azure Storage Account name
  • $StorageAccountKey - Azure Storage Account key
  • $containerName - name of the container under the same storage account
  • $DestinationRootFolder - destination folder on your local machine

#Connect to Azure
    Connect-AzAccount

    #storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    $containerName = "salesdata"
    
    #get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName
    
    #Destination folder - change if different
    $DestinationRootFolder = "C:\temp\sales_download\AllFiles\"
    
    #Create destination folder if it doesn't exist
    If(!(test-path $DestinationRootFolder))
    {
          New-Item -ItemType Directory -Force -Path $DestinationRootFolder
    }
    
    #Loop through the files in a container
    foreach($bl in $ListBlobs)
    {
           
        $BlobFullPath = $bl.Name
    
        Write-Host ""
        Write-Host ("File Full Path: " + $BlobFullPath)
        
        #Get blob folder path
        $SourceFolder = $BlobFullPath.Substring( 0, $BlobFullPath.LastIndexOf("/")+1)
        Write-Host ("Source Folder Path: " + $SourceFolder)
    
        #Build destination path based on blob path
        $DestinationFolder = ($DestinationRootFolder + $SourceFolder.Replace("/","\") ).Replace("\\","\")
        Write-Host ("Destination Folder Path: " + $DestinationFolder)
    
        #Create local folders
        New-Item -ItemType Directory -Force -Path $DestinationFolder
              
    
        Write-Host "Blob: " 
        $DestinationFilePath = $DestinationRootFolder + $BlobFullPath.Replace("/", "\")
        Write-Host ("Destination File Path: " + $DestinationFilePath)
    
        #Download file
        Get-AzStorageBlobContent -Container $containerName -Blob $BlobFullPath -Destination $DestinationFilePath -Context $Ctx -Force
    
    }
    
    Write-Host ("")
    Write-Host ("Download completed...")
    


Download All Files From a Selected Folder in Azure Storage Container

Prior to running the script set the following variables depending on your local and Azure environment:

  • $StorageAccountName - Azure Storage Account name
  • $StorageAccountKey - Azure Storage Account key
  • $containerName - name of the container under the same storage account
  • $DestinationRootFolder - destination folder on your local machine
  • $srcBlobFolder - folder/subfolder with wildcard (*)

    #Connect to Azure
    Connect-AzAccount

    #storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    $containerName = "salesdata"

    $srcBlobFolder = "2016/01/*"

    #get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName -Blob $srcBlobFolder

    #Destination folder - change if different
    $DestinationRootFolder = "C:\temp\sales_download\AllFiles\"

    #Create destination folder if it doesn't exist
    If(!(test-path $DestinationRootFolder))
    {
          New-Item -ItemType Directory -Force -Path $DestinationRootFolder
    }

    #Loop through the files in a container
    foreach($bl in $ListBlobs)
    {
       
        $BlobFullPath = $bl.Name
    
        Write-Host ""
        Write-Host ("File Full Path: " + $BlobFullPath)
    
        #Get blob folder path
        $SourceFolder = $BlobFullPath.Substring( 0, $BlobFullPath.LastIndexOf("/")+1)
        Write-Host ("Source Folder Path: " + $SourceFolder)

        #Build destination path based on blob path
        $DestinationFolder = ($DestinationRootFolder + $SourceFolder.Replace("/","\") ).Replace("\\","\")
        Write-Host ("Destination Folder Path: " + $DestinationFolder)

        #Create local folders
        New-Item -ItemType Directory -Force -Path $DestinationFolder
          

        Write-Host "Blob: " 
        $DestinationFilePath = $DestinationRootFolder + $BlobFullPath.Replace("/", "\")
        Write-Host ("Destination File Path: " + $DestinationFilePath)

        #Download file
        Get-AzStorageBlobContent -Container $containerName -Blob $BlobFullPath -Destination $DestinationFilePath -Context $Ctx -Force

    }

    Write-Host ("")
    Write-Host ("Download completed....")


How to Get the List of Files

It would be beneficial to consider another aspect of the solution and in this case focus on different PowerShell option for getting the list of the files in a Blob storage. This will help to understand what's been implemented above as well as it can be very useful to have these scripts on hand whenever there is a need for listing the objects.

Again, there is a great flexibility here by filtering objects or collections. For example, if you want to know which files were placed in the sales folder for last year you could just specify -Blob "LastYear/*.csv" parameter. As you may noticed, I used wildcard to get csv list of files because probably, the main interest was to get the report of all of the files for last year.


Get List Of All Files (Select-Object)

The easiest way to get list of files from blob is a combination of Get-AzStorageBlob to get the collection of the files into variable $ListBlobs and then use pipline style commandlet to get list by using name property.

#Connect to Azure
Connect-AzAccount

#storage account
$StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
#storage key
$StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
#Container name - change if different
$containerName = "salesdata"

#get blob context
$Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName

#Blob count
Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")
#List blob files
$ListBlobs | Sort-Object -Property Name | Select-Object -Property Name


Get List Of Files in a Folder (Select-Object)

The following code is using wildcard to filter the files in a blob and in this case we are interested in January 2016 sales only. The result is list of files in 2016/01/ folder. The command to focus on is: $srcBlobFolder = "2016/01/*".

#Connect to Azure
Connect-AzAccount

#storage account
$StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
#storage key
$StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
#Container name - change if different
$containerName = "salesdata"

#Blob folder/subfolder 
$srcBlobFolder = "2016/01/*"

#get blob context
$Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName -Blob $srcBlobFolder 

#Blob count
Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")
#List blob files
$ListBlobs | Sort-Object -Property Name | Select-Object -Property Name


Get List Of Files using Foreach Loop

In previous 2 examples pipeline commands made it easy and on the other foreach loop brings up more complexity to our code, but it provides more flexibility and control over the code. The following example demonstrates iterative loop through each folder and it will list files one by one. Also, list of blobs retrieved with Get-AzStorageBlob commandlets will contain full file path, so extract the file name has been performed using Substring function.

#Connect to Azure
    Connect-AzAccount

    #Storage account
    $StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    #Storage key
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    #Container name - change if different
    $containerName = "salesdata"
    
    #get blob context
    $Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey
    
    $ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName
    
    #Blob count
    Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")
    
    foreach($bl in $ListBlobs)
    {
    
        ##Write-Host "Blob: " 
        Write-Host ("File Full Path: " + $bl.Name)
    
        Write-Host ("Folder Path: " + $bl.Name.Substring( 0, $bl.Name.LastIndexOf("/")+1) )
        
        #Extract file name from full path
        Write-Host ("File Name: " + $bl.Name.Substring( $bl.Name.LastIndexOf("/") + 1, $bl.Name.Length - $bl.Name.LastIndexOf("/")-1 ) )
        Write-Host ""
    
    }
    



PowerShell Examples

Below are some key examples of PowerShell storage management commandlets, but for more examples and further consideration explore the full list at Az.Storage .

Example 1: List all blobs in a container

Get-AzStorageContainer -Name container*

Example 2: List blob in a container by name using wildcard

Get-AzStorageBlob -Container "ContainerName" -Blob blob*


Example 3: Download Blob content by name

Get-AzStorageBlobContent -Container "ContainerName" -Blob "Blob" -Destination "C:\test\"

Example 4: Upload file to blob container

Set-AzureStorageBlobContent -Container "ContosoUpload" -File ".\PlanningData" -Blob "Planning2015"


Example 5:

Set-AzStorageBlobContent -Container "ContosoUpload" -File ".\PlanningData" -Blob "Planning2015"



Get-ChildItem -File -Recurse | Set-AzStorageBlobContent -Container "ContosoUploads"



Get-AzStorageFileContent -ShareName "ContosoShare06" -Path "ContosoWorkingFolder/CurrentDataFile"



Get-AzStorageShare -Name "ContosoShare06"



New-AzStorageContext -StorageAccountName "ContosoGeneral" -StorageAccountKey "< Storage Key for ContosoGeneral ends with==>"



New-AzStorageShare -Name "ContosoShare06"



Remove-AzStorageBlob -Container "ContainerName" -Blob "BlobName"



Remove-AzStorageBlob -Container "ContainerName" -Blob "BlobName"


Solution Explanation

Let's review all of the steps used from previous examples and have a quick walk through the individual PowerShell commandlets.

1. Connect to Azure

Connecting to Azure is pretty straight forward, use the command below and Azure pop up window will appear on your screen.

Connect-AzAccount

2. Connect to Azure Storage

Enter previously captured information: storage account name, key and container name.

$StorageAccountName = "<<STORAGE ACCOUNT NAME>>"
    $StorageAccountKey = "<<STORAGE ACCOUNT KEY>>"
    $containerName = "salesdata"
    

Create blob context - this step is required to authenticate against your Blob storage.

$Ctx = New-AzStorageContext $StorageAccountName -StorageAccountKey $StorageAccountKey

Get the context of your container, which will place it in an object. From there you would have access to all of its properties and use them from there (see next section).

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName

3. Get The List of Blobs

Select list of blob files using PowerShell pipelines. In this case I'm interested only in name property, but feel free to explore others.

$ListBlobs | Sort-Object -Property Name | Select-Object -Property Name

Blob count is additional information that can be useful and it can be easily extracted from object collected earlier.

Write-Host ("Blob Count: " + $ListBlobs.Count + "`n")

4. Get the List of Files in a Folder

Based on the previous example, adding -Blob $srcBlobFolder parameter will filter it down to a specific folder, which means it will download only the files from that folder.

$ListBlobs = Get-AzStorageBlob -context $Ctx -Container $containerName -Blob $srcBlobFolder 


5. Create Local Folder

The following code snippet will check if folder exists and create one if it doesn't.

$DestinationRootFolder = "C:\temp\sales_download\AllFiles\"

If(!(test-path $DestinationRootFolder))
{
      New-Item -ItemType Directory -Force -Path $DestinationRootFolder
}


6. Foreach Loop

Below is final part to loop over files in subfolders and download each of the files. Also, in this example code is described in comments below.

#Loop through the files in a container
foreach($bl in $ListBlobs)
{
    #Get the full path name   
    $BlobFullPath = $bl.Name
    
    #Print out full path name
    Write-Host ""
    Write-Host ("File Full Path: " + $BlobFullPath)
    
    #Get blob folder path - without file name and extension
    $SourceFolder = $BlobFullPath.Substring( 0, $BlobFullPath.LastIndexOf("/")+1)
    Write-Host ("Source Folder Path: " + $SourceFolder)

    #Build destination path based on blob path - follows the same subfolder structure
    $DestinationFolder = ($DestinationRootFolder + $SourceFolder.Replace("/","\") ).Replace("\\","\")
    Write-Host ("Destination Folder Path: " + $DestinationFolder)

    #Create local folders - Force parameter will allow to create the folder path even the subfolders don't exist
    New-Item -ItemType Directory -Force -Path $DestinationFolder
          
    #Print out destination folder path
    Write-Host "Blob: " 
    $DestinationFilePath = $DestinationRootFolder + $BlobFullPath.Replace("/", "\")
    Write-Host ("Destination File Path: " + $DestinationFilePath)

    #Finally, download the file - use of previously collected variables
    Get-AzStorageBlobContent -Container $containerName -Blob $BlobFullPath -Destination $DestinationFilePath -Context $Ctx -Force

}



What's next

Here are just a few ideas how to use and enhance previously explained examples:

  • Upload the files - explore how to upload files to Blob using similar logic
  • Integrated Azure authentication without login required. In the previous examples we used Connect-AzAccount which always requires login information, but there is a way to connect to Azure Blob directly. This is especially handy in case of automation when there is no user interaction needed to authenticate our connection.
  • Automation - integrate the solution as a service (Azure Function, Azure Runbook, ETL)
  • AzCopy - command line utility to copy data from and to Azure Blob


Happy PowerShell coding!


Sunday, October 14, 2018

Azure Stream Analytics - Troubleshooting Issues


Stream Analytics monitoring and logs


There are several ways how to monitor and track down loads and all other sort of issues that might occur in processing Azure Stream Analytics:
  1. Activity Log
  2. Diagnose and solve problems
  3. Metrics
  4. Alert Rules
  5. Diagnostic Logs
My favorite and the most effective way to troubleshoot event issues in the most of cases is definitely Activity Log, which will be our focus here as well as Metrics graphical tool and statistics.

Also, there are 2 other useful tools in "Support + Troubleshooting Section" of Stream Analytics Jobs:
  1. Resource Health
  2. Job Diagram - check data driven debugging with Job Diagram




Other Resources


For more information on troubleshooting refer to: Troubleshooting resources on Microsoft Docs



Troubleshooting Time Management Issues


Because we are dealing with live streams and real time loads, there are scenarios where events are not syncing in desired way and some of the events sent to Real Stream Analytics may be delayed and dropped. This might be caused either by application times or early arrivals of the events between client application and Stream Analytics job.
  
For broader consideration, please refer to and explore how to detect and resolve latency issues:

Activity Log - Troubleshooting Demo


Here is just a quick demo on how to monitor and track down event issues.
  1. Activity Log - Navigate to Stream Analytics job and click on "Activity Log" tab


  2. Explore filters and log history




  3. Activity Log filters - the entire error log might be overwhelming and let's narrow down results to 24 hours in this case



  4. Error log - filter down "Event Severity" to errors only and now we have the clearer picture on what is going on



  5. Review individual error logs - usually, summary section doesn't show enough of information or there is no error description at all, but JSON normally reveals the right information



  6. JSON log output - in this case, there is an an error of mismatching source and destination column types (System.Double vs System.Int32). This looks like a miss in proper mapping between input and output in the input and output of the Stream Analytics job and can be resolved by fixing either of the data types. Also, for this demonstration, only a snippet of JSON was shown below, but browsing to the entire JSON document can show detailed path to both, inputs and output as well as other information related to the resources used and error message produced.



  7. More troubleshooting - After fixing the previous error message, there was still one more left and it's about missing column in the destination. It seems that the destination column wasn't named properly and renaming column to match the same name in input resolves the issue.



Metrics


Metrics section might be very useful in certain cases like troubleshooting for  early/late arrivals, data conversion errors, input/output events, runtime errors...


Friday, September 14, 2018

DAX Date Calculations - Previous Period Combined Measure (Time Intelligence)

Intro

Weather there is a requirement for a combined time measure, for disconnected time dimension or similar date related calculation we can use time intelligence DAX functions and build our calculations in a relatively easy way.


Samples and Dev Environment

Environment:
  • SQL Server and SQL Analysis Service running under the same account
  • Visual Studio 2017
  • SQL Server 2016
Samples: - Microsoft sample databases available at github:
  • AdventureWorksDW2014
  • Tabular model

Final solution for this demo:

DAX Calculations


Calculation alternatives


To begin with DAX time measures, there are many time intelligence functions available in DAX and creating calculations based on those functions can be accomplished in different ways, which means usage of different functions can produce different results. 

In this case we will be focused only on previous period calculations and to demonstrate DAX diversity here is an example how to get the same results using different functions and calculation techniques in DAX:

Internet Total Sales PY:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    DATEADD ( 'Date'[Date], -1, YEAR )
)

Internet Total Sales PY2:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    PARALLELPERIOD ( 'Date'[Date], -1, YEAR )
)

Internet Total Sales PY3:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    PREVIOUSYEAR ( 'Date'[Date] )
)

In a given example DATEADD PARALLELPERIOD and PREVIOUSYEAR will return the same result and they can be use interchangeably. Also, in certain cases they might be different based on dataset or on intermediate measures used, so the recommendation is to check the best option between these and test for desired results.



Requirements


Business requirement: Create one calculation and no matter which period of date hierarchy is selected return previous period. For example, if date is selected then return previous day, if month is selected then return month, etc. 

Problem: How to get individual time calculation for previous period from Date table and combine them together into one calculation due to fact that there is no unique calculation available in DAX for this task.

Date hierarchy: Year-Semester-Quarter-Month-Date

Solution:
As mentioned previously, there are several ways how to get individual period calculations working. In this case we will use DATEADD for function that will work for the most of our calculations.

Quick Solution


Previous day:
Internet Total Sales PD:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    DATEADD ( 'Date'[Date], -1, DAY )
)

Previous month:
Internet Total Sales PM:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    DATEADD ( 'Date'[Date], -1, MONTH )
)

Previous quarter:

Internet Total Sales PQ :=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    DATEADD ( 'Date'[Date], -1, QUARTER )
)

Calendar semester ID - calculated measure in Date table
Calendar Semester ID = ('Date'[Calendar Year] *100) + 'Date'[Calendar Semester]

Previous semester ID - Helper measure in Date table to accommodate MAX function which normally doesn't work with dates and strings:
Previous Semester ID:=
CALCULATE (
    MAX ( 'Date'[Calendar Semester ID] ),
    DATEADD ( 'Date'[Date], -6, MONTH )
)

Internet Total Sales Semester:
Internet Total Sales Semester:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    FILTER (
        ALL ( 'Date' ),
        'Date'[Calendar Semester ID] = MAX('Date'[Calendar Semester ID])
 )
)


Internet Total Sales Previous Semester:
Internet Total Sales PS:=
CALCULATE (
    'Internet Sales'[Internet Total Sales Semester],
    PARALLELPERIOD ( 'Date'[Date], -6, MONTH )
)

Internet Total Sales Previous Year:
Internet Total Sales PY:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    DATEADD ( 'Date'[Date], -1, YEAR )
)

Calendar Level Selected - Created test measure in Date dimension to check which level of hierarchy is selected:
Calendar Level Selected:=
SWITCH (
    TRUE (),
    HASONEVALUE ( 'Date'[Date] ), "Date selected",
    HASONEVALUE ( 'Date'[Month Calendar] ), "Month selected",
    HASONEVALUE ( 'Date'[Calendar Quarter] ), "Quarter selected",
    HASONEVALUE ( 'Date'[Calendar Semester] ), "Semester selected",
    HASONEVALUE ( 'Date'[Calendar Year] ), "Year selected",
    BLANK ()
)

Internet Total Sales Previous Period:
Internet Total Sales Previous Period:=
SWITCH (
    TRUE (),
    HASONEVALUE ( 'Date'[Date] ), [Internet Total Sales PD],
    HASONEVALUE ( 'Date'[Month Calendar] ), [Internet Total Sales PM],
    HASONEVALUE ( 'Date'[Calendar Quarter] ), [Internet Total Sales PQ],
    HASONEVALUE ( 'Date'[Calendar Semester] ), [Internet Total Sales PS],
    HASONEVALUE ( 'Date'[Calendar Year] ), [Internet Total Sales PY],
    BLANK ()
)



Previous Period Calculation - Explained


Previous month

Previous month calculation is relatively simple to get and will follow the logic within the calculation from below: use Calculate function and do an offset for month before.

Internet Total Sales PM:=

CALCULATE (
    'Internet Sales'[Internet Total Sales],
    DATEADD ( 'Date'[Date], -1MONTH )
)

All other levels of hierarchy except previous semester can be calculated in the same way by changing time frame parameter in DATEADD function. 



Previous Semester

There is more of work around semester because there is no built in function available for semester calculation. In this case, we need more calculation and here is one of the ways to get it in DAX:
  1. Calendar Semester ID - calculated column in Date dimension to get numeric ID which will be used later in MAX function, which will not accept any of date or string inputs, but number parameters.
  2. Previous Semester ID (Date table) - test measure to get previous semester by doing 6 months offset in the past and taking Semester ID for given output of previous filter.
  3. Internet Total Semester - get previous period ID FILTER function: ALL ( 'Date' ) to take off the filter selection on currently selected semester, go over all date and find previous semester. 'Date'[Calendar Semester ID] = MAX('Date'[Calendar Semester ID]) - if Max is not specified, the comparison will go over all repeated rows in the table which is not allowed by DAX and it will report error because of scalar use, and that's why MAX will work bringing only distinctive values for comparison and in this context I like to think of MAX as current semester. 
  4. Internet Total Sales - go back in time 6 months and for that current semester using [Internet Total Semester] measure. In this case, PARALLELPERIOD  has been used here instead of FILTER that we used before.

Calendar Semester ID = ('Date'[Calendar Year] *100) + 'Date'[Calendar Semester]

Previous Semester ID:=
CALCULATE (
    MAX ( 'Date'[Calendar Semester ID] ),
    DATEADD ( 'Date'[Date], -6, MONTH )
)

Internet Total Sales Semester:=
CALCULATE (
    'Internet Sales'[Internet Total Sales],
    FILTER (
        ALL ( 'Date' ),
        'Date'[Calendar Semester ID] = MAX('Date'[Calendar Semester ID])
 )
)


Internet Total Sales PS:=
CALCULATE (
    'Internet Sales'[Internet Total Sales Semester],
    PARALLELPERIOD ( 'Date'[Date], -6MONTH )
)



Combined Previous Period Calculation

After all of the hard work that we've done on individual calculation comes the challenge how to put them all together into one calculation.

Test Date Hierarchy Level Selected

The following example demonstrate how to recognize date hierarchy level based on filter selection. SWITCH which has been used here can be treated as multiple IF functions and HASONEVALUE will detect if certain period is selected either from filter selection or it's present in row context.


Calendar Level Selected:=
SWITCH (
    TRUE (),
    HASONEVALUE ( 'Date'[Date] ), "Date selected",
    HASONEVALUE ( 'Date'[Month Calendar] ), "Month selected",
    HASONEVALUE ( 'Date'[Calendar Quarter] ), "Quarter selected",
    HASONEVALUE ( 'Date'[Calendar Semester] ), "Semester selected",
    HASONEVALUE ( 'Date'[Calendar Year] ), "Year selected",
    BLANK ()
)



Previous Period Measure

Finally, based on all of the individual measures and switch function to detect date hierarchy levels that we've created earlier, we can easily get our combined previous period calculation.

Internet Total Sales Previous Period:=
SWITCH (
    TRUE (),
    HASONEVALUE ( 'Date'[Date] ), [Internet Total Sales PD],
    HASONEVALUE ( 'Date'[Month Calendar] ), [Internet Total Sales PM],
    HASONEVALUE ( 'Date'[Calendar Quarter] ), [Internet Total Sales PQ],
    HASONEVALUE ( 'Date'[Calendar Semester] ), [Internet Total Sales PS],
    HASONEVALUE ( 'Date'[Calendar Year] ), [Internet Total Sales PY],
    BLANK ()
)