Saturday, November 19, 2016

WFA Datasource Template

If you are working with WFA for quite some time, you will eventually need more advanced workflows, requiring a custom scheme and dictionaries.  And most likely you will want to import external data and import that data into you custom dictionaries.  To import that data, you will need to have a custom datasource. 
Now you can import data using SQL queries, assuming your data will be in an external database.  Or you can create a script based datasource (perl or powershell).  Now the first time you create such a datasource, you will lose a lot of time figuring out how it works.  You will need to create a CSV file for each dictionary.  Each CSV file must have a very very specific layout.  I've created my share of datasources (always powershell) and after a couple of them I just had it, creating the same thing over and over.   So in the end I had a look at how I could automate this and created some functions to create the CSV files, adding auto field sorting, logging...  I wrapped it all up in a single template.

Download the code.  I will add a datasource tutorial soon.
#!/powershell
#
<# WFA Data Source
  
   DESCRIPTION:
        this is what it does : add your own description
    
   PREREQUISTES and NOTES:
        You must change the SchemeName below
        Interactive mode is to test in a powershell cli prompt
        Interactive mode enables debug be default
        Interactive auto loads the profile.ps1 of WFA
        Debug mode can be triggered by setting the datasource port number to '1'
        Go to the CUSTOM CODE Part for more information
 
   VERSION HISTORY:
        1.0.0 : (mirko) initial code
  1.0.1 : (mirko) bugfix parsetable
  1.0.2 : (mirko) small estetic change
  1.0.3 : (mirko) bugfix when only 1 dictionary
  1.0.4 : (mirko) bugfix encoding utf8
  1.0.5 : (mirko) bugfix wrong Csv linebreaks (must be lf instead of crlf)
 
   AUTHOR:
        Mirko Van Colen (mirko@netapp.com)
  
#>
 
 
# ================================================================
# ==   Scheme name : you must edit this
# ================================================================
 
$schemeName = "YOUR_SCHEME"   # the scheme you want to 'datasource'
 
# ================================================================
# ==   Variables you may want to edit
# ================================================================
 
$interactive = $false               # set this if your are running locally in the shell
$sendmail = $false                  # mail the logfile & csv files
$debug=$false                       # show debug info / note interactive enables this by default

$hostname = "YOUR_HOSTNAME_VARIABLE_FOR_INTERACTIVE_MODE"
 
# IF sendmail is $true you should set the following mail variables, otherwise can be ignored
 
$mailTo   = "from@netapp.com"      # who you want to send acquire debug emails to
$mailFrom = "to@netapp.com"      # who the email will appear to come from
$mailSmtp = "mail.netapp.com"       # a reachable and working smtp mail server
  
# ================================================================
# ==   Get Environment variables (script location, host & port)
# ==   Note : set the datasource port to 1, to trigger debug mode
# ================================================================
 
if(!$interactive){
    Set-Variable -Name SCRIPT_PATH -Value (Split-Path (Resolve-Path $myInvocation.MyCommand.Path)) -Scope local
}else{
    # auto load wfa profile
    $profilecheck = Get-Item function: | ?{$_.Name -eq "Connect-WfaCluster"}
    if(-not $profilecheck){
        Write-Host "Loading WFA profile..." -ForegroundColor Yellow
        cd 'C:\Program Files\NetApp\WFA\PoSH\'
        . '.\profile.ps1'
    }
    Set-Variable -Name SCRIPT_PATH -Value 'C:\Program Files\NetApp\WFA\jboss\standalone\tmp\wfa' -Scope local
}
 
if(!$interactive){
    $port = Get-WfaRestParameter "port"
    $hostname = Get-WfaRestParameter "host"
}else{
    $port = 1    # port number 1 enables debugging
}
 
$dllLocation = $SCRIPT_PATH + "\..\..\..\..\Posh\Modules\DataOntap\log4net.dll"
$logfile = $SCRIPT_PATH + "\..\..\log.war\jboss\" + $schemeName + ".log"
if($port -eq 1){
       $debug=$true
}
 
# ================================================================
# ==   Prep logging - a log file is created by default, with your scheme name
# ================================================================
 
# Initialize log4net
[void][Reflection.Assembly]::LoadFrom($dllLocation)
$pattern="%d %w %-5p %c : %m%n"
[log4net.LogManager]::ResetConfiguration()
New-Item -Path $logFile -type file -ErrorAction SilentlyContinue
$Appender = new-object log4net.Appender.FileAppender
$Appender.File = $logFile
$Appender.Layout = new-object log4net.Layout.PatternLayout($pattern)
if($debug){
       $Appender.Threshold = [log4net.Core.Level]::Debug
}else{
       $Appender.Threshold = [log4net.Core.Level]::Info
}
$Appender.ActivateOptions()
[log4net.Config.BasicConfigurator]::Configure($Appender)
$logg = [log4net.LogManager]::GetLogger("[$schemeName]")
#
 
# ================================================================
# == 5 Public logging functions - Debug,Info,Warning,Error & Fatal
# == If you set interacte true, you'll get host-feedback
# ================================================================
 
# LOG INFO
function LogInfo($t){
   if($interactive){
      Write-Host $t -ForegroundColor Yellow
   }
   $logg.Info($t)
}
 
# LOG DEBUG
function LogDebug($t){
   if($interactive){
      Write-Host $t -ForegroundColor Cyan
   }
   $logg.Debug($t)
}
 
# LOG WARN
function LogWarn($t){
   if($interactive){
      Write-Warning $t
   }
   $logg.Warn($t)
}
 
# LOG ERROR
function LogError($t){
   if($interactive){
      Write-Host $t -ForegroundColor Red
   }
   $logg.Error($t)
}
 
# LOG FATAL (throws error & exit)
function LogFatal($t){
   if($interactive){
      Write-Host $t -ForegroundColor Magenta
   }
   $logg.Fatal($t)
   throw $t
}
 
# ================================================================
# ==   Private - WFA Datasource conversion helper functions
# ================================================================
 
# converts a psobject to wfa csv
function ConvertTo-WfaCsv($psobj,$csvpath){
    try {
        New-Item -Path $csvpath -type file -force | Out-Null
    } catch [System.Exception] {
        $msg = "Data Source: Could not create output file path: $($_.Exception)"
        LogFatal($msg)
    }
    if($psobj){
        $csv = $psobj | convertto-csv -NoTypeInformation -Delimiter "`t"
        $csv = $csv | %{$_ -replace '"'} | select -skip 1
        $csv = $csv | %{[System.IO.File]::AppendAllText((resolve-path $csvpath), "$_`n",[System.Text.Encoding]::UTF8)}
    }
} 

  
# booleanstring to int
function BoolToInt($b){
       if($b -match "true|yes"){
             return 1
       }else{
             return 0
       }
}
 
# check if the scheme is ok
function CheckScheme($schemeName){
    # lets connect to mysql and check the scheme first
    try{
        LogInfo("Looking up the scheme '$schemeName' in MySql")
        $rsTables = Invoke-MySqlQuery "SHOW tables FROM $schemeName"
    }catch{
        LogError("Could not find scheme $schemeName")
        LogError("Maybe reset the scheme $schemeName ?")
        LogFatal("Failed to verify the scheme $schemeName, Does it exist?")
    }
    [System.Collections.ArrayList]$tableListMySql = @()
  
    # are there any tables in the scheme ?
    if($rsTables[0] -gt 0){
        $rsTables | select -Skip 1 | %{$tableListMySql+=$_[0]}
        LogDebug("Found $($rsTables[0]) tables in MySql")
        $tableListMySql | %{$i=0}{$i++;LogDebug("($i) $_");}
    }else{
        LogFatal("No tables found in this scheme")
    }
    LogInfo("Scheme $schemeName is ok")
    return $tableListMySql
}
 
# ================================================================
# ==   Public - Master Conversion Function
# ================================================================
 
# check if a dictionary is matching your psobject, and reformat if needed for WFA
function CheckDictionary($Dictionary,$Table){
 
    LogInfo("Processing dictionary $Dictionary")
 
    $fieldList = @()
  
    # get the table fields
    LogDebug("Looking up the fields for table $schemeName.$Dictionary")
    $rsFields = Invoke-MySqlQuery "SHOW fields FROM $schemeName.$Dictionary"
  
    # are there any fields in the table ?
    if($rsFields[0] -gt 0){
       $fields = $rsFields |select -Skip 1
       $fieldList = $fields | %{$_.Field}
        LogDebug("Found $($rsFields[0]) fields")
        $fields | %{$i=0}{$i++;LogDebug("($i) $($_.Field) ($($_.Type))");}
    }else{
        LogFatal("No fields found in table $schemeName.$Dictionary")
    }
    $selectOrderedId = @()
    # make a correct ordered select
    foreach($f in $fields){
       if($f.Type -eq 'TinyInt(1)'){
             $exp = [scriptblock]::Create('BoolToInt($_."' + $f.Field + '")')
       }else{
               $exp = [scriptblock]::Create('$_."' + $f.Field + '"')
       }
        $selectOrderedId += @{name=$f.Field;e=$exp}
    }
    $selectOrderedNoId = @()
    # make a correct ordered select
    foreach($f in $fields){
        if($f.Field -eq "id"){
            $exp = [scriptblock]::Create("`"\N`"")
        }else{
             if($f.Type -eq 'TinyInt(1)'){
                    $exp = [scriptblock]::Create('BoolToInt($_."' + $f.Field + '")')
             }else{
                    $exp = [scriptblock]::Create('$_."' + $f.Field + '"')
             }
        }
        $selectOrderedNoId += @{name=$f.Field;e=$exp}
    }
    $outFile = ".\$Dictionary.csv"    # write results to intermediate CSV file
    $global:attachList += $outFile     # add CSV file to list of debug attachments
  
    # check fields
    if($Table){
        LogDebug("Comparing fields with MySql table fields")
        $tableFields = ($Table | gm -MemberType NoteProperty | select Name | %{$_.Name})
        if($tableFields -and $fieldList){
            $comp = Compare-Object -ReferenceObject $fieldList -DifferenceObject $tableFields
            if($comp){
                $logg.Warn("Possible mismatch found between fields (if missing id, we'll add NULL-values)")
                $comp | ?{($_.SideIndicator -eq '<=') -and ($_.InputObject -ne 'id')} | %{$i=0}{$i++;LogFatal("($i) Missing field '$($_.InputObject)' in $dictionary")}
                $comp | ?{($_.SideIndicator -eq '=>')} | %{$i=0}{$i++;$logg.Warn("($i) Extra field '$($_.InputObject)' in $dictionary")}
                if($comp | ?{($_.SideIndicator -eq '<=') -and ($_.InputObject -ne 'id')}){
                    LogFatal("Mismatch found between fields.  Check the log in the log-viewer for more info.")
                }
            }else{
                LogDebug("All fields match")
            }
        }else{
            LogFatal("There are no fields in table $Dictionary")
        }
  
        # we check if it has a primary key ("id")
        $haspk = $Table | gm -Name "id"
        if($haspk){
            $table = @($Table | select -Property $selectOrderedId)
            LogDebug("We found an id, starting to hash it")
            # if it does, we'll hash it
            for($i=0;$i -lt $table.Count;$i++){
                $table[$i].id = $table[$i].id.GetHashCode()
            }
        }else{
            # if not, we add \N
            LogDebug("No id found, we'll add one with \N")
            $table = @($Table | select -Property $selectOrderedNoId)
        }
  
        # we check if it has one or more primary keys
        $hasfk = $table | gm -Name "*_id"
        if($hasfk){
            LogDebug("1 or more foreign keys found, starting to hash")
            # if it does, loop them and hash them
            $fkcount = 0
            foreach($fk in $hasfk){
                $fkcount++
                LogDebug("($fkcount) fk = $($fk.Name)")
                for($i=0;$i -lt $table.Count;$i++){
                    $table[$i]."$($fk.Name)" = ($table[$i]."$($fk.Name)").GetHashCode()
                }
            }
        }
    }else{
        LogWarn("This table is empty")
    }
    # convert to WFa readable csv
    LogDebug("Start converting the table to csv '$outFile'")
    ConvertTo-WfaCsv -psobj $table -csvpath $outFile
 
}
 
# ================================================================
# ==   Public - Helper functions
# ================================================================
 
# parse a string list to a psobject array
function parseList(
    [string[]]$inputstring,
    [string]$startregex,
    [string]$matchregex="^\s*(.*) : (.*)$",
    [switch]$cleanheader=$true
    )
{
 
    if(-not $inputstring){
       LogWarn("Input is empty (parseList)")
       return $null
    }
 
    # define variables
    $o = $null
    $key = $null
    $value = ""
    $list = @()
    $firststat = $false
 
    # loop the file
    foreach($l in ($inputstring -split "`r`n")){
        # reset key
       $key=$null
        
       # find start
       if($l -match $startregex){
        
           # set flag, start found
           $firststart=$true
           #Get-WFALogger -Info "start found"
           if($o){
               $o
           }
        
           $o = New-Object -TypeName PSObject
        
       }
        
       # start found ?
       if($firststart){
           # find overview data
           if($l -match $matchregex){
               $key = [string]$Matches[1]
               if($cleanheader){
                   $key = $key -replace "\W",""
               }
               $value = $Matches[2]
               $o | Add-Member $key $value
           }
       }
 
    }
 
    # save dirty cache
    if($o){
        $o
    }
}
 
# parse a string table to a psobject array
function parseTable([string[]]$inputstring,[switch]$cleanheader=$true){
 
   #write-verbose "start"
    if(-not $inputstring){
       LogWarn("Input is empty (parseTable)")
       return $null
    }
 
    # break in lines
    [string[]]$lines = $inputstring -split "`r`n" | ?{([string]$_).TrimEnd() -ne ""}
 
    # search the breakline
    $breakposition=0
    $lines | select -first 5 | %{$i=1}{if($_ -match "----"){$breakposition=$i};$i++}
 
    if(-not $breakposition){
       LogFatal("Cannot parse as table : " + $input)
       #Write-Warning("cannot parse - no breakline found")
    }
 
    $breakline = $lines | where {$_ -ne ""} | select -skip ($breakposition-1) | select -first 1
    $header = @($lines | select -first ($breakposition-1))
    $data = @($lines | select -skip $breakposition)
 
    # find break positions
    $cols = ([regex]'(?<!-)+-').Matches($breakline)
    $splitregex = ($cols | select -skip 1 | %{$x=0}{"(.{$($_.Index-$x)})";$x=$_.Index}) -join ""
    # split the names
    $headercounter=1
    $namecounter=0
    $names = @()
    foreach($h in $header){
        $temp = @($h -split $splitregex | select -skip 1 | %{([string]$_).Trim()})
        if($headercounter -eq 1){
            $headercounter=0
            $namecounter = $temp.Length
            $names = $temp
        }else{
            0..($namecounter-1) | %{$names[$_]+=$temp[$_]}
        }
    }
 
    if($cleanheader){
      $names = $names | %{$_ -replace "\W",""}
    }
 
    # split the data
    for($j=0;$j -lt $data.count;$j++){
        $l = $data[$j]
        $info = @($l -split $splitregex | select -skip 1)
        if($info.count -eq 0){
            # overflow - we take the full line instead
            $info=@($l)
            $tempSplitregex = ($cols | select -skip 1 | %{$x=0}{if($_.Index -le $l.length){"(.{$($_.Index-$x)})";$x=$_.Index}}) -join ""
           #write-verbose "regex [$tempSplitregex]"
            if($tempSplitregex){
                $info = @($l -split $tempSplitregex | select -skip 1)
            }
        }
       #write-verbose($info -join "|")
        # make properties
        $hash = [ordered]@{}
        for ($i=0;($i -lt $names.count);$i++) {
            $overflow=$false
            $k=$i
            if($info.count -eq 1){
                $overflow=$true
                $value=$info[0]
               #write-verbose("overflow too small")
            }
            if($i -lt $info.count){
                $value = $info[$i]
                # we take a normal indexed value
            }else{
                # there was a partial split are we are over the index
                $value=""
                $overflow=$true
            }
            # value ?
           #write-verbose("==> [$value]")
            if($value){
                # as long as a value in spread over multiple indexes, stitch them
               #write-verbose("[$(([string]$value).ToString().Trim().Length)][$(([string]$value).Length)]")
 
                While((([string]$value).ToString().Trim().Length -eq ([string]$value).Length) -and (($k+1) -lt $info.count) -and -not ([string]$info[$k+1]).StartsWith(" ")){
                    $k++
                    $value+=$info[$k]
                    $overflow=$true
                   #write-verbose("overflow [$i][$k][$value]")
                }
                # add the value
                $hash.add($names[$i],$value.ToString().Trim())
            }else{
                # add a blank
                $hash.add($names[$i],"")
            }
            # if we have overflow, we need to pick the next row
            if($overflow -eq $true){
                # we are not done yet, take the next row
                $j++
                $l = $data[$j]
                $info = @($l -split $splitregex | select -skip 1)
                if($info.count -eq 0){
                    # overflow - we take the full line instead
                    $info=@($l)
                    $tempSplitregex = ($cols | select -skip 1 | %{$x=0}{if($_.Index -le $l.length){"(.{$($_.Index-$x)})";$x=$_.Index}}) -join ""
                   #write-verbose "regex [$tempSplitregex]"
                    if($tempSplitregex){
                        $info = @($l -split $tempSplitregex | select -skip 1)
                    }
                }
               #write-verbose($info -join "|")
            }
        }
        # add object
        $o = New-Object -Property $hash -TypeName PSObject -Verbose:$false
        $o
    }
 
}
 
# Run a dos cmd
function RunDosCmd($cmd,$errorRegex="^Error",$errorCheckLines=1,$fatalRegex="^Error",$fatalCheckLines=0){
    LogDebug("running $cmd")
    $output = (cmd /c "$cmd")
    $status = "success"
    [hashtable]$result=@{}
    # search for fatal
    $fatalsearch = $output | select -first $fatalCheckLines
    foreach($e in $fatalsearch){
        if($e -match $fatalRegex){
            LogError("Error : $cmd")
            LogFatal($output)
        }
    }
    # search for errors
    $errorsearch = $output | select -first $errorCheckLines
    foreach($e in $errorsearch){
        if($e -match $errorRegex){
            LogError("Error : $cmd")
            $status="error"
        }
    }
    # return
    $result.Add("status",$status)
    $result.Add("output",$output)
    return $result
}
  
# Ensure that dates are always returned in English
[System.Threading.Thread]::CurrentThread.CurrentCulture="en-US"
 
# Start & check scheme
LogInfo("===============================================================")
LogInfo("BEGINNING Acquisition (set portnumber to '1' for debug logging)")
LogInfo("===============================================================")
  
$global:attachList = @()
$global:attachList += ".\datasource.log"   # make the first debug-email attachement be the log-file itself
 
[System.Collections.ArrayList]$tableListMySql = @(CheckScheme -SchemeName $schemeName)
  
# ==============================================================================================
# ======================================== START CUSTOM CODE ===================================
#
# How to start ?
# --------------
# 1) Write your code here to collect the data needed for your datasource
#    Steps :
#       1) Create a psobject-array for each dictionary
#          - Fields : your psobject properties must 1-on-1 match the dictionary attributes
#          - Relations between dictionaries :
#              - add a property "id" in the master-psobject (=primary key)
#              - add a property "xxxxxxx_id" in the child-psobject (=foreign key)
#              - assign unique values to these id's and make sure the match 100%
#              - these unique values may be strings, most likely the 'name'-property
#              - these id's will be hashed for you in the end to numeric values
#              - Example : relation volume - aggregate
#                   volume would have an attribute 'aggregate' in the dictionary, of type "Aggregate". 
#                   To tell the datasource there is a relation we :
#                   - add "id" to the aggregate object (value must unique = primary key)
#                   - add "aggregate_id" to the volume object (use the same unique value = foreign key)
#       2) Run for each psobject-array the function :
#
#          --> CheckDictionary <--- -="" 2="" :="" adding="" advanced="" and="" array="" arrays="" auto="" boolean="" case="" check="" checking="" convert="" dictionary="" dictionray="" field="" fields="" foreign="" hashing="" id="" in="" keys="" logging="" match="" name="" not="" null="" object="" of="" order="" parameters="" present="" primary="" ps="" reformat="" result="" sorting="" strings="" table="" the="" this="" to="" true="" values="" will="" with="" yes="" your=""> 1)
#            - auto create WFA-compatible CSV files
#
# 2) Helper functions :
#
#    - RunDosCmd (in case you need to run a non powershell command, output is returned)
#      Optional Regex-based error & fatal dectection
#      Result has 2 properties (Status & Output)
#      So get output with $result.output
#    - ParseList (a textual list parser to psobject array)
#      Example string :
#         id : 1
#         name : Netapp1
#         type : FAS8080
#      ParseList -inputstring $mystring  -startregex "^/s*id : .*$"
#        will result in a psobject array with properties "id","name" and "type"
#    - ParseTable (a textuel table parser to psobject array)
#      Example string :
#         id     name    type
#         --     ----    ----
#         1      Netapp1 FAS8080
#         2      Netapp2 FAS8060
#      ParseTable -inputstring $mystring
#        will result in a psobject array with properties "id","name" and "type"
#
# 3) Logging functions :
#    - LogDebug
#    - LogInfo
#    - LogWarn
#    - LogError
#    - LogFatal (throws exception)
#
# 4) Notes & Tips
#    - The current directory must not change for WFA to find the CSV files, so avoid "cd" commands
#    - Use can also use the native wfa helper functions such are InvokeMySqlQuery
#   
#
# ==============================================================================================
  
 
# CREATE PSOBJECT ARRAYS FOR EACH DICTIONARY-TYPE IN THE SCHEME
# Something like :
# ... $Dictionary1 = @()
# ... gather info
# ... Loop
# ...... $new = "" | select prop1,prop2,prop3
# ...... $new.prop1 = "xxx"
# ...... $new.prop2 = "yyy"
# ...... $new.prop3 = "zzz"
# ...... $Dictionary1 += $new
# ... End Loop
 
# RUN DICTIONARY CHECKS FOR EACH DICTIONARY-TYPE IN THE SCHEME
# for each dictionary, run the CheckDictionary function
# ... CheckDictionary -dictionary "dictionary1" -table $Dictionary1
 
 
# ==============================================================================================
# ======================================== END CUSTOM CODE   ===================================
# ==============================================================================================
 
# close and send mail if needed
LogInfo("Acquisitions COMPLETE")
if ($sendmail)
{
    Copy-Item $logFile ".\datasource.log"
    $bodyText = "See attachments.`n`nNOTE: Scroll to the end of the attached datasource.log file to see the log of this execution of the $schemeName Data Source.`n`n"
    Send-MailMessage -to $mailTo -from $mailFrom -subj "WFA $schemeName Data-Source: Log and CSV Attachments" -body $bodyText -attachments $global:attachList -smtpServer "$mailSmtp"
}

1 comment :

  1. Note that this is an update version 1.0.3. A minor bugfix. When your scheme contained only 1 dictionary, there was a failure (typical powershell single-item array issue)

    ReplyDelete