Thursday, January 25, 2018

Advanced SQL queries in WFA Filters

[Article from Christian Bauernfeind]

In a workflow we built recently, we needed to know the combined size of all LUNs in a given volume.  The SQL to calculate that number is fairly straightforward:

SELECT volume.name,SUM (lun.size_mb)
FROM cm_storage.volume JOIN cm_storage.lun ON lun.volume_id = volume.id
GROUP BY lun.volume_id

However, how can we return this information.  And why would this even be a big deal ?

To answer this question in detail, you should first understand how WFA filters work.

What we want : return a custom value from a filter

So we have this SQL query :

SELECT volume.name,SUM (lun.size_mb)
FROM cm_storage.volume JOIN cm_storage.lun ON lun.volume_id = volume.id
GROUP BY lun.volume_id

Attempt 1 : use a filter as always 


How can you actually access the result of that query in WFA?  One idea we came up with was to package it in a Filter (that is where SQL code goes, after all).  As it turns out, a proper filter needs you to return a fixed set of keys (more on that later) and the result looks something like this:

SELECT
    lun.full_path AS 'full_path',
    volume.name AS 'volume.name',
    vserver.name AS 'volume.vserver.name',
    cluster.primary_address AS 'volume.vserver.cluster.primary_address',
    SUM(lun.size_mb) AS size_mb
FROM
    cm_storage.lun
JOIN
    cm_storage.volume
        ON lun.volume_id = volume.id
JOIN
    cm_storage.vserver
        ON volume.vserver_id = vserver.id
JOIN
    cm_storage.cluster
        ON vserver.cluster_id = cluster.id
WHERE
    volume.name = '${VolumeName}'
    AND vserver.name = '${VserverName}'
    AND (
        cluster.name = '${ClusterName}'
        OR cluster.primary_address = '${ClusterName}'
    )
GROUP BY
    lun.volume_id


Note that the filter returns the keys "full_path", "volume.name", "volume.vserver.name" and "volume.vserver.cluster.primary_address", which are all required for an object of dictionary type "LUN", plus one extra value, size_mb, which is an attribute in the dictionary marked as "Can be NULL".

Problem : WFA does a re-lookup

The filter works perfectly fine when tested and shows the correct aggregate size.  But used in a workflow, it returns an incorrect value.  In fact, it returns the size of the first LUN in the list (it also returns its name).

WTF, WFA?

Explanation : how WFA filters work

Any filter is required to return the set of natural keys for its dictionary type.  It may return any other attributes of that same dictionary type, but they are immediately discarded

Instead, the natural keys are used to look up the object again and return the current value of that attribute."

Attempt 2 : Trial & Error 

So what if we return a set of natural keys that do not refer to an actual object? After all, we are not interested in a particular LUN, just the number that represents their combined size.  So we can rewrite the first few lines of the query to

SELECT
'all_luns' AS 'full_path'
...

This works, and you can actually access the full_path attribute of your dictionary object and get the result "all_luns".  But now size_mb is empty.  Apparently, WFA still does the lookup and returns the value it found - NULL.  No backdoors here.

Attemp 3 : Abuse the object "Annotation"

This might have been the end of that idea, but one thing we just learned is that WFA hangs on to the natural keys returned from a filter and does not perform a lookup on them.  So what if we could package our desired value in a natural key?  Unfortunately, LUN does not have a natural key that lends itself to storing a number (although we probably could force the issue).  But there is a much easier way.  WFA already comes with a dictionary type perfect for key/value pairs - Annotation.

As it turns out, the Annotation dictionary type, which is designed (but not used, AFAIK) to carry volume and vserver annotations from ONTAP, has two keys, "name" and "value", which are conveniently both natural keys.  And WFA does not discard natural keys.

So we rewrite the filter from above as one that returns a cm_storage.annotation object:

SELECT
    'lun_size' AS name,
    SUM (lun.size_mb) AS value
FROM
    cm_storage.lun
JOIN
    cm_storage.volume
        ON lun.volume_id = volume.id
JOIN
    cm_storage.vserver
        ON volume.vserver_id = vserver.id
JOIN
    cm_storage.cluster
        ON vserver.cluster_id = cluster.id
WHERE
    volume.name = '${VolumeName}'
    AND vserver.name = '${VserverName}'
    AND (
        cluster.name = '${ClusterName}'
        OR cluster.primary_address = '${ClusterName}'
    )
GROUP BY
    lun.volume_id

Wrap up

Note that the term "cm_storage.annotation" does not appear anywhere in the query.  Still, the query needs to be configured as returning an object of that type, and pointed to an annotation object in a "Search or define" step.  Fed with a volume, vserver and cluster name, we can then access the value "annotation1.value" anywhere in our workflow.

Also note that this does not in any way rely on your annotation table to be populated, or even to exist.  We are merely using a container of dictionary type "Annotation" to store a key/value pair.  That pair may be populated with anything that SQL can generate.

No comments :

Post a Comment