Searches Using SUBSET

The SUBSET operator allows subset queries, and is typically used with multi-value (i.e. strlst) fields that are treated as sets. It is true if the left-side is a subset of the right-side, i.e. if there are no values on the left-side that are missing from the right-side. Duplicates count, i.e. they must match one-to-one from left side to right.

For example, suppose the table Users contains one row per user (UserName), and has a strlst column GroupMembership that lists all the groups that row's user is a member of. To find all users that are members of groups "Management", "Sales" and "Marketing", a SUBSET query can be used:

SELECT UserName
     FROM   Users
     WHERE  ('Management', 'Sales', 'Marketing')
          IS SUBSET OF GroupMembership;

(Syntactically, SUBSET is always used as part of the phrase IS SUBSET OF, as it is only valid in WHERE clauses.) The above query will return the users that are members of all three groups - including any users that may also be members of additional groups.

Note that SUBSET is not commutative, i.e. if the left- and right-sides are reversed, the meaning is changed (unlike e.g. INTERSECT). If A is a subset of B, then B is not necessarily a subset of A; B is a subset of A if and only if both sets contain the same values. E.g. this query:

SELECT UserName
     FROM   Users
     WHERE  GroupMembership
          IS SUBSET OF ('Management', 'Sales', 'Marketing');

while merely the reversed version of the earlier query, behaves differently: it would list the users whose are in zero or more of the Management, Sales or Marketing groups - and are not in any other groups.

In set logic the empty set is a subset of any set; thus if there are no values on the left-side, SUBSET is true no matter what the right-side value(s) are. Note that SUBSET interprets an empty varchar value as empty-set, not single-item empty-string set set (as IN does). See here for details, as well as additional behaviors that IN, SUBSET and INTERSECT share in common. SUBSET was added in Texis version 7.

Index Usage by SUBSET

A SUBSET query can often utilize a regular (B-tree) index to increase performance. Generally the index should be created with indexvalues set to splitstrlst (the default), as this enables individual values of strlsts to be accessed as needed. There are some limitations and caveats for SUBSET and indexes however:

  • Empty parameter, strlst column (either side): Queries with empty-set parameters (i.e. zero-item strlst, or empty varchar) and a strlst column cannot use an indexvalues=splitstrlst index, regardless of which side of SUBSET the parameter and column are on. An index with indexvalues=all can be used however. It may be created in addition to the normal indexvalues=splitstrlst index, and the Texis optimizer will choose the appropriate one at search time.

  • Empty strlst column left-side, non-empty parameter right-side: With a strlst column on the left-side, and a non-empty parameter on the right, empty rows will not be returned if an index is used - even though they properly match (as empty set is a subset of any set).

These caveats are due to limitations in indexvalues=strlst indexes; see here for more information.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.