Server Functions

The SQL module and engine has a number of functions built into it which can operate on fields within a SQL statement.

The functions below may be called anywhere that a field or an expression normally occurs in a SQL statement. Arguments to a function can be either a single field name, another function or another expression.

General Functions

exec

Execute an external command. The syntax is

exec(commandline[, INPUT[, INPUT[, INPUT[, INPUT]]]]);

Allows execution of an external command. The first argument is the command to execute. Any subsequent arguments are written to the standard input of the process. The standard output of the command is read as the return from the function.

For example this could be used to extract text from a pdf.

UPDATE     DOCUMENTS
SET        TEXT = exec('pdftotext '+PDFFILE)
WHERE      TEXT = '';

mminfo

This function lets you obtain Metamorph info. You have the choice of either just getting the portions of the document which were the hits, or you can also get messages which describe each hit and subhits.

The SQL to use is as follows:

SELECT mminfo(query,data[,nhits,[0,msgs]]) from TABLE
       [where CONDITION];
query
Query should be a string containing a Metamorph query.
data
The text to search. May be literal data or a field from the table.
nhits
The maximum number of hits to return. If it is 0, which is the default, you will get all the hits.
msgs

An integer; controls what information is returned. A bit-wise OR of any combination of the following values:

  • 1 to get matches and offset/length information
  • 2 to suppress text from data which matches; printed by default
  • 4 to get a count of hits (up to nhits)
  • 8 to get the hit count in a numeric parseable format
  • 16 to get the offset/length in the original query of each search set

Set offset/length information (value 16) is of the form:

Set N offset/len in query: setoff setlen

Where N is the set number (starting with 1), setoff is the byte offset from the start of the query where set N is, and setlen is the length of the set.

Hit offset/length information is of the form:

300 <Data from Texis> offset length suboff sublen [suboff sublen]..
301 End of Metamorph hit

Where:

  • offset is the offset within the data of the overall hit context (sentence, paragraph, etc…)
  • length is the length of the overall hit context
  • suboff is the offset within the hit of a matching term
  • sublen is the length of the matching term
  • suboff and sublen will be repeated for as many terms as are required to satisfy the query.

Example:

select mminfo('power struggle @0 w/.',Body,0,0,1) inf from html
       where Title\Meta\Body like 'power struggle';

Would give a result similar to the following:

300 <Data from Texis> 62 5 0 5
power
301 End of Metamorph hit
300 <Data from Texis> 2042 5 0 5
power
301 End of Metamorph hit
300 <Data from Texis> 2331 5 0 5
POWER
301 End of Metamorph hit
300 <Data from Texis> 2892 8 0 8
STRUGGLE
301 End of Metamorph hit

convert

The convert function allows you to change the type of an expression. The syntax is

CONVERT(expression, 'type-name'[, 'mode'])

The type name should in general be in lower case.

This can be useful in a number of situations. Some cases where you might want to use convert are

  • The display format for a different format is more useful. For example you might want to convert a field of type COUNTER to a DATE field, so you can see when the record was inserted, for example:

    SELECT convert(id, 'date')
    FROM   LOG;
    
    CONVERT(id, 'date')
    2015-01-27 22:43:48
    
  • If you have an application which is expecting data in a particular type you can use convert to make sure you will receive the correct type.
  • The optional third argument given to convert() is a varcharToStrlstMode mode value. This third argument may only be supplied when converting to type strlst or varstrlst. It allows the separator character or mode to be conveniently specified locally to the conversion, instead of having to alter the global varcharToStrlstMode mode.
  • Note that convert()ing data from/to varbyte/varchar does not convert the data to/from hexadecimal by default in programs other than tsql; it is preserved as-is (though truncated at nul for varchar). See the bintohex() and hextobin() functions for hexadecimal conversion, and the hexifyBytes SQL property for controlling automatic hex conversion.

seq

Returns a sequence number. The number can be initialized to any value, and the increment can be defined for each call. The syntax is:

seq(increment [, init])

If init is given then the sequence number is initialized to that value, which will be the first value returned. It is then incremented by increment. If init is not specified then the current value will be retained. The initial value will be zero if init has not been specified.

Examples of typical use:

SELECT  NAME, seq(1)
FROM    SYSTABLES

The results are:

 NAME                seq(1)
SYSTABLES               0
SYSCOLUMNS              1
SYSINDEX                2
SYSUSERS                3
SYSPERMS                4
SYSTRIG                 5
SYSMETAINDEX            6
SELECT  seq(0, 100)
FROM    SYSDUMMY;

SELECT  NAME, seq(1)
FROM    SYSTABLES

The results are:

 seq(0, 100)
    100

 NAME                seq(1)
SYSTABLES             100
SYSCOLUMNS            101
SYSINDEX              102
SYSUSERS              103
SYSPERMS              104
SYSTRIG               105
SYSMETAINDEX          106

random

Returns a random int. The syntax is:

random(max [, seed])

If seed is given then the random number generator is seeded to that value. The random number generator will only be seeded once in each session, and will be randomly seeded on the first call if no seed is supplied. The seed parameter is ignored in the second and later calls to random in a process.

The returned number is always non-negative, and never larger than the limit of the C lib’s random number generator (typically either 32767 or 2147483647). If max is non-zero, then the returned number will also be less than max.

This function is typically used to either generate a random number for later use, or to generate a random ordering of result records by adding random to the ORDER BY clause.

Examples of typical use:

SELECT  NAME, random(100)
FROM    SYSTABLES

The results might be:

 NAME                random(100)
SYSTABLES               90
SYSCOLUMNS              16
SYSINDEX                94
SYSUSERS                96
SYSPERMS                 1
SYSTRIG                 84
SYSMETAINDEX            96
SELECT  ENAME
FROM    EMPLOYEE
ORDER BY random(0);

The results would be a list of employees in a random order.

bintohex

Converts a binary (varbyte) value into a hexadecimal string.

bintohex(varbyteData[, 'stream|pretty'])

A string (varchar) hexadecimal representation of the varbyteData parameter is returned. This can be useful to visually examine binary data that may contain non-printable or nul bytes. The optional second argument is one of the following flags:

  • stream: Use the default output mode: a continuous stream of hexadecimal bytes.
  • pretty: Return a “pretty” version of the data: print 16 byte per line, space-separate the hexadecimal bytes, and print an ASCII dump on the right side.

hextobin

Converts a hexadecimal stream to its binary representation.

hextobin(hexString[, 'stream|pretty'])

The hexadecimal varchar string hexString is converted to its binary representation, and the varbyte result returned. The optional second argument is one of the following flags:

  • stream: Only accept the stream format of bintohex(), i.e. a stream of hexadecimal bytes, the same format that bintohex(varbyteData, 'stream') returns. Whitespace is acceptable, but only between (not within) hexadecimal bytes. Case-insensitive. Non-conforming data will result in an error message and the function failing.
  • pretty: Accept either stream or pretty bintohex formatted data; if the latter, only the hexadecimal bytes are parsed (e.g. ASCII column is ignored). Parsing is more liberal, but may be confused if the data deviates significantly from either format.

identifylanguage

Tries to identify the predominant language of a given string. By returning a probability in addition to the identified language, this function can also serve as a test of whether the given string is really natural-language text, or perhaps binary/encoded data instead. Syntax:

identifylanguage(text[, language[, samplesize]])

The return value is a two-element strlst: a probability and a language code. The probability is a value from 0.000 to 1.000 that the text argument is composed in the language named by the returned language code. The language code is a two-letter ISO-639-1 code.

If an ISO-639-1 code is given for the optional language argument, the probability for that particular language is returned, instead of for the highest-probability language of the known/built-in languages (currently de, es, fr, ja, pl, tr, da, en, eu, it, ko, ru).

The optional third argument samplesize is the initial integer size in bytes of the text to sample when determining language; it defaults to 16384.

Note that since a strlst value is returned, the probability is returned as a strlst element, not a double value, and thus should be cast to double during comparisons.

The identifylanguage() function is experimental, and its behavior, syntax, name and/or existence are subject to change.

Example:

var Sql = require("rampart-sql");

var sql = new Sql.init("./testdb", true);

// ignore error if table doesn't exists
try {
   sql.exec("drop table idtext;");
} catch(e){}

// (re)create the table
sql.exec("create table idtext (text varchar(64));");


var texts = [
   `Now we are engaged in a great civil war, testing whether that nation, or
   any nation so conceived and so dedicated, can long endure.  We are met on
   a great battle-field of that war.  We have come to dedicate a portion of that
   field, as a final resting place for those who here gave their lives that
   that nation might live.  It is altogether fitting and proper that we should
   do this.`,

   `Maintenant, nous sommes engagés dans une grande guerre civile, testant
   si cette nation, ou toute autre nation ainsi conçue et si dévouée, peut
   durer longtemps.  Nous sommes rencontrés sur un grand champ de bataille
   de cette guerre.  Nous sommes venus pour consacrer une partie de ce
   champ, comme lieu de repos final pour ceux qui ici ont donné leur vie
   pour que cette nation puisse vivre.  Il est tout à fait approprié et
   approprié que nous fassions cela.`,

   `Ahora estamos inmersos en una gran guerra civil, probando si esa nación,
   o cualquier nación así concebida y dedicada, puede durar mucho tiempo.
   Nos encontramos en un gran campo de batalla de esa guerra.  Hemos venido
   a dedicar una porción de ese campo, como lugar de descanso final para
   quienes aquí dieron su vida para que viviera esa nación.  Es totalmente
   apropiado y apropiado que hagamos esto.`,

   `Jetzt sind wir in einen großen Bürgerkrieg verwickelt, in dem geprüft
   wird, ob diese Nation oder eine so konzipierte und engagierte Nation
   lange bestehen kann.  Wir treffen uns auf einem großen Schlachtfeld
   dieses Krieges.  Wir sind gekommen, um einen Teil dieses Feldes als
   letzte Ruhestätte für diejenigen zu widmen, die hier ihr Leben gegeben
   haben, damit diese Nation leben kann.  Es ist durchaus angemessen und
   richtig, dass wir dies tun.`
];

// insert text
for (var i=0;i<texts.length;i++) {
   sql.exec("insert into idtext values (?);", [texts[i]]);
}

//identify text
var res = sql.exec("select identifylanguage(text) from idtext;");

rampart.utils.printf("%3J\n", res.rows);
/* expected output:
[
   {
      "identifylanguage(text)": [
         "0.602603",
         "en"
      ]
   },
   {
      "identifylanguage(text)": [
         "0.612079",
         "fr"
      ]
   },
   {
      "identifylanguage(text)": [
         "0.626431",
         "es"
      ]
   },
   {
      "identifylanguage(text)": [
         "0.614251",
         "de"
      ]
   }
]
*/

lookup

By combining the lookup() function with a GROUP BY, a column may be grouped into bins or ranges – e.g. for price-range grouping – instead of distinct individual values. Syntax:

lookup(keys, ranges[, names])

The keys argument is one (or more, e.g. strlst) values to look up; each is searched for in the ranges argument, which is one (or more, e.g. strlst) ranges. All range(s) that the given key(s) match will be returned. If the names argument is given, the corresponding names value(s) are returned instead; this allows ranges to be renamed into human-readable values. If names is given, the number of its values must equal the number of ranges.

Each range is a pair of values (lower and upper bounds) separated by “..” (two periods). The range is optionally surrounded by square (bound included) or curly (bound excluded) brackets. E.g.:

[10..20}

denotes the range 10 to 20: including 10 (“[”) but not including (“}”) 20. Both an upper and lower bracket must be given if either is present (though they need not be the same type). The default if no brackets are given is to include the lower bound but exclude the upper bound; this makes consecutive ranges non-overlapping, if they have the same upper and lower bound and no brackets (e.g. “0..10,10..20”). Either bound may be omitted, in which case that bound is unlimited. Each range’s lower bound must not be greater than its upper bound, nor equal if either bound is exclusive.

If a ranges value is not varchar/char, or does not contain “..”, its entire value is taken as a single inclusive lower bound, and the exclusive upper bound will be the next ranges value’s lower bound (or unlimited if no next value). E.g. the varint lower-bound list:

0,10,20,30

is equivalent to the strlst range list:

[0..10},[10..20},[20..30},[30..]

By using the lookup() function in a GROUP BY, a column may be grouped into ranges. For example, given a table Products with the following SKUs and float prices:

SKU    Price
------------
1234   12.95
1235    5.99
1236   69.88
1237   39.99
1238   29.99
1239   25.00
1240   50.00
1241   -2.00
1242  499.95
1243   19.95
1244    9.99
1245  125.00

they may be grouped into price ranges (with most-products first) with this SQL:

SELECT   lookup(Price, convert('0..25,25..50,50..,', 'strlst', 'lastchar'),
     convert('Under $25,$25-49.99,$50 and up,', 'strlst', 'lastchar'))
       PriceRange, count(SKU) NumberOfProducts
FROM Products
GROUP BY lookup(Price, convert('0..25,25..50,50..,', 'strlst', 'lastchar'),
     convert('Under $25,$25-49.99,$50 and up,', 'strlst', 'lastchar'))
ORDER BY 2 DESC;

Full example in Rampart JavaScript:

var Sql=require("rampart-sql");

var sql=new Sql.init("./testdb");

// ignore error if table doesn't exists
try {
   sql.exec("drop table Products;");
} catch(e){}

// (re)create the table
sql.exec("create table Products (SKU int, Price double);");

var skus = [ 1234, 1235, 1236, 1237, 1238, 1239, 1240,
             1241, 1242, 1243, 1244, 1245 ];
var prices = [ 12.95, 5.99, 69.88, 39.99, 29.99, 25.00, 50.00,
               -2.00, 499.95, 19.95, 9.99, 125.00 ];

for (var i=0;i<skus.length; i++)
    sql.exec("insert into Products values (?,?)", [skus[i],prices[i]]);

var range=['0..25','25..50','50..'];
var rangenames=['Under $25','$25-$49','$50 and up'];
var res = sql.exec(
    "SELECT lookup( Price, convert(?,'strlst','json'), convert(?,'strlst','json') ) PriceRange,"+
    "count(SKU) NumberOfProducts FROM Products " +
    "GROUP BY lookup(Price, convert(?,'strlst','json'), convert(?,'strlst','json') )" +
    "ORDER BY 2 DESC",
    [range,rangenames,range,rangenames],
    {returnType:"array"}
);

rows=res.rows;
cols=res.columns;
for (i=0;i<rows.length;i++) {

        if (!i) {
            rampart.utils.printf("%-12s %16s\n", cols[0] , cols[1]);
            rampart.utils.printf("------------+----------------\n");
        }

        rampart.utils.printf("%-12s %16s\n", rows[i][0][0], rows[i][1]);

}
/* expected output :
PriceRange   NumberOfProducts
------------+----------------
$50 and up                  4
Under $25                   4
$25-$49                     3
                            1
*/

Note that:

  • In the tsql example, the trailing commas in the PriceRange values are used to converted to strlst values via the convert(.., .., 'lastchar') function. In the Rampart JavaScript version, the array of strings are converted into a strlst using convert(.., .., 'json') function. See convert () mode above for details.
  • The empty PriceRange for the fourth row: the -2 Price matched no ranges, and hence an empty PriceRange was returned for it.

See also: lookupCanonicalizeRanges, lookupParseRange

lookupCanonicalizeRanges

The lookupCanonicalizeRanges() function returns the canonical version(s) of its ranges argument, which is zero or more ranges of the syntaxes used in lookup() function above:

lookupCanonicalizeRanges(ranges, keyType)

The canonical version always includes both a lower and upper inclusive/exclusive bracket/brace, both lower and upper bounds (unless unlimited), the “..” range operator, and is independent of other ranges that may be in the sequence.

The keyType parameter is a varchar string denoting the SQL type of the key field that would be looked up in the given range(s). This ensures that comparisons are done correctly. E.g. for a strlst range list of “0,500,1000”, keyType should be “integer”, so that “500” is not compared alphabetically with “1000” and considered invalid (greater than).

This function can be used to verify the syntax of a range, or to transform it into a standard form for lookupParseRange().

For an implicit-upper-bound range, the upper bound is determined by the next range’s lower bound. Thus the full list of ranges (if multiple) should be given to lookupCanonicalizeRanges() – even if only one range needs to be canonicalized – so that each range gets its proper bounds.

See also: lookup, lookupParseRange

lookupParseRange

The lookupParseRange() function parses a single lookup() style range into its constituent parts, returning them as strings in one strlst value. This can be used by scripts to edit a range. Syntax:

lookupParseRange(range, parts)

The parts argument is zero or more of the following part tokens as strings:

  • lowerInclusivity: Returns the inclusive/exclusive operator for the lower bound, e.g. “{” or “”

If a requested part is not present, an empty string is returned for that part. The concatenation of the above listed parts, in the above order, should equal the given range. Non-string range arguments are not supported.

lookupParseRange('10..20', 'lowerInclusivity')

would return a single empty-string strlst, as there is no lower-bound inclusive/exclusive operator in the range “10..20”.

lookupParseRange('10..20', 'lowerBound')

would return a strlst with the single value “10”.

For an implicit-upper-bound range, the upper bound is determined by the next range’s lower bound. Since lookupParseRange() only takes one range, passing such a range to it may result in an incorrect (unlimited) upper bound. Thus the full list of ranges (if multiple) should always be given to lookupCanonicalizeRanges() first, and only then the desired canonicalized range passed to lookupParseRange().

See also: lookup, lookupCanonicalizeRanges

ifNull

Substitute another value for NULL values. Syntax:

ifNull(testVal, replaceVal)

If testVal is a SQL NULL value, then replaceVal (cast to the type of testVal) is returned; otherwise testVal is returned. This function can be used to ensure that NULL value(s) in a column are replaced with a non-NULL value, if a non-NULL value is required:

SELECT ifNull(myColumn, 'Unknown') FROM myTable;

isNull

Tests a value, and returns a long value of 1 if NULL, 0 if not. Syntax:

isNull(testVal)
SELECT isNull(myColumn) FROM myTable;

Note that Texis isNull behavior differs from some other SQL implementations; see also ifNull.

File functions

fromfile, fromfiletext

The fromfile and fromfiletext functions read a file. The syntax is

fromfile(filename[, offset[, length]])
fromfiletext(filename[, offset[, length]])

These functions take one required, and two optional arguments. The first argument is the filename. The second argument is an offset into the file, and the third argument is the length of data to read. If the second argument is omitted then the file will be read from the beginning. If the third argument is omitted then the file will be read to the end. The result is the contents of the file. This can be used to load data into a table. For example if you have an indirect field and you wish to see the contents of the file you can issue SQL similar to the following.

The difference between the two functions is the type of data that is returned. fromfile will return varbyte data, and fromfiletext will return varchar data. If you are using the functions to insert data into a field you should make sure that you use the appropriate function for the type of field you are inserting into.

SELECT  FILENAME, fromfiletext(FILENAME)
FROM    DOCUMENTS
WHERE   DOCID = 'JT09113' ;

The results are:

FILENAME            fromfiletext(FILENAME)
/docs/JT09113.txt   This is the text contained in the document
that has an id of JT09113.

toind

Create a Texis managed indirect file. The syntax is

toind(data)

This function takes the argument, stores it into a file, and returns the filename as an indirect type. This is most often used in combination with fromfile to create a Texis managed file. For example:

INSERT  INTO DOCUMENTS
VALUES('JT09114', toind(fromfile('srcfile')))

The database will now contain a pointer to a copy of srcfile, which will remain searchable even if the original is changed or removed. An important point to note is that any changes to srcfile will not be reflected in the database, unless the table row’s indirect column is modified (even to the save value, this just tells Texis to re-index it).

canonpath

Returns canonical version of a file path, i.e. fully-qualified and without symbolic links:

canonpath(path[, flags])

The optional flags is a set of bit flags: bit 0 set if error messages should be issued, bit 1 set if the return value should be empty instead of path on error.

pathcmp

File path comparison function; like C function strcmp() but for paths:

pathcmp(pathA, pathB)

Returns an integer indicating the sort order of pathA relative to pathB: 0 if pathA is the same as pathB, less than 0 if pathA is less than pathB, greater than 0 if pathA is greater than pathB.

Multiple consecutive directory separators are considered the same as one. A trailing directory separator (if not also a leading separator) is ignored. Directory separators sort lexically before any other character.

Note that the paths are only compared lexically: no attempt is made to resolve symbolic links, “..” path components, etc. Note also that no inference should be made about the magnitude of negative or positive return values: greater magnitude does not necessarily indicate greater lexical “separation”, nor should it be assumed that comparing the same two paths will always yield the same-magnitude value in future versions. Only the sign of the return value is significant.

basename

Returns the base filename of a given file path.

basename(path)

The basename is the contents of path after the last path separator. No filesystem checks are performed, as this is a text/parsing function; thus “.” and “..” are not significant.

dirname

Returns the directory part of a given file path.

dirname(path)

The directory is the contents of path before the last path separator (unless it is significant – e.g. for the root directory – in which case it is retained). No filesystem checks are performed, as this is a text/parsing function; thus “.” and “..” are not significant.

fileext

Returns the file extension of a given file path.

fileext(path)

The file extension starts with and includes a dot. The file extension is only considered present in the basename of the path, i.e. after the last path separator.

joinpath

Joins one or more file/directory path arguments into a merged path, inserting/removing a path separator between arguments as needed. Takes one to 5 path component arguments. E.g.:

joinpath('one', 'two/', '/three/four', 'five')

yields

one/two/three/four/five

Redundant path separators internal to an argument are not removed, nor are “.” and “ ..” path components removed.

joinpathabsolute

Like joinpath, except that a second or later argument that is an absolute path will overwrite the previously-merged path. E.g.:

joinpathabsolute('one', 'two', '/three/four', 'five')

yields

/three/four/five

Redundant path separators internal to an argument are not removed, nor are “.” and “..” path components removed.

String Functions

abstract

Generate an abstract of a given portion of text. The syntax is

abstract(text[, maxsize[, style[, query]]])

The abstract will be less than maxsize characters long, and will attempt to end at a word boundary. If maxsize is not specified (or is less than or equal to 0) then a default size of 230 characters is used.

The style argument is a string or integer, and allows a choice between several different ways of creating the abstract. Note that some of these styles require the query argument as well, which is a Metamorph query to look for:

  • dumb (0) - Start the abstract at the top of the document.
  • smart (1) - This style will look for the first meaningful chunk of text, skipping over any headers at the top of the text. This is the default if neither style nor query is given.
  • querysingle (2) - Center the abstract contiguously on the best occurence of query in the document.
  • querymultiple (3) - Like querysingle, but also break up the abstract into multiple sections (separated with “...”) if needed to help ensure all terms are visible. Also take care with URLs to try to show the start and end.
  • querybest - An alias for the best available query-based style; currently the same as querymultiple. Using querybest in a script ensures that if improved styles become available in future releases, the script will automatically “upgrade” to the best style.

If no query is given for the query\(...\) modes, they fall back to dumb mode. If a query is given with a non-query\(...\) mode (dumb/smart), the mode is promoted to querybest. The current locale and index expressions also have an effect on the abstract in the query\(...\) modes, so that it more closely reflects an index-obtained hit.

SELECT     abstract(STORY, 0, 1, 'power struggle')
FROM       ARTICLES
WHERE      ARTID = 'JT09115' ;

See also the Rampart JavaScript abstract() function.

text2mm

Generate LIKEP query. The syntax is

text2mm(text[, maxwords])

This function will take a text expression, and produce a list of words that can be given to LIKER or LIKEP to find similar documents. text2mm takes an optional second argument which specifies how many words should be returned. If this is not specified then 10 words are returned. Most commonly text2mm will be given the name of a field. If it is an indirect field you will need to call fromfile as shown below:

SELECT     text2mm(fromfile(FILENAME))
FROM       DOCUMENTS
WHERE      DOCID = 'JT09115' ;

You may also call it as texttomm() instead of text2mm() .

keywords

Generate list of keywords. The syntax is

keywords(text[, maxwords])

keywords is similar to text2mm but produces a list of phrases, with a linefeed separating them. The difference between text2mm and keywords is that keywords will maintain the phrases. The keywords function also takes an optional second argument which indicates how many words or phrases should be returned.

length

Returns the length in characters of a char or varchar expression, or number of strings/items in other types. The syntax is

length(value[, mode])

For example:

SELECT  NAME, length(NAME)
FROM    SYSTABLES

The results are:

 NAME                length(NAME)
SYSTABLES               9
SYSCOLUMNS             10
SYSINDEX                8
SYSUSERS                8
SYSPERMS                8
SYSTRIG                 7
SYSMETAINDEX           12

The optional mode argument is a stringCompareMode-style compare mode to use. If mode is not given, the current apicp stringCompareMode is used. Currently the only pertinent mode flag is “iso-8859-1”, which determines whether to interpret value as ISO-8859-1 or UTF-8. This can alter how many characters long the string appears to be, as UTF-8 characters are variable-byte-sized, whereas ISO-8859-1 characters are always mono-byte.

Note that if given a strlst type value, length() returns the number of string values in the list. For other types, it returns the number of values (e.g. for varint it returns the number of integer values).

lower

Returns the text expression with all letters in lower-case. The syntax is

lower(text[, mode])

For example:

SELECT  NAME, lower(NAME)
FROM    SYSTABLES

The results are:

 NAME                lower(NAME)
SYSTABLES            systables
SYSCOLUMNS           syscolumns
SYSINDEX             sysindex
SYSUSERS             sysusers
SYSPERMS             sysperms
SYSTRIG              systrig
SYSMETAINDEX         sysmetaindex

The optional mode argument is a string-folding mode. If mode is unspecified, the current apicp stringCompareMode setting – with “+lowercase” aded – is used.

upper

Returns the text expression with all letters in upper-case. The sytax is

upper(text[, mode])

For example:

SELECT  NAME, upper(NAME)
FROM    SYSTABLES

The results are:

 NAME                upper(NAME)
SYSTABLES            SYSTABLES
SYSCOLUMNS           SYSCOLUMNS
SYSINDEX             SYSINDEX
SYSUSERS             SYSUSERS
SYSPERMS             SYSPERMS
SYSTRIG              SYSTRIG
SYSMETAINDEX         SYSMETAINDEX

The optional mode argument is a string-folding mode. If mode is unspecified, the current apicp stringCompareMode setting – with “+uppercase” added – is used.

initcap

Capitalizes text. The syntax is

initcap(text[, mode])

Returns the text expression with the first letter of each word in title case (i.e. upper case), and all other letters in lower-case. For example:

SELECT  NAME, initcap(NAME)
FROM    SYSTABLES

The results are:

 NAME                initcap(NAME)
SYSTABLES            Systables
SYSCOLUMNS           Syscolumns
SYSINDEX             Sysindex
SYSUSERS             Sysusers
SYSPERMS             Sysperms
SYSTRIG              Systrig
SYSMETAINDEX         Sysmetaindex

The optional mode argument is a string-folding mode in the same format as stringCompareMode. If mode is unspecified, the current stringCompareMode setting – with “+titlecase” added – is used.

sandr

Search and replace text.

sandr(search, replace, text)

Returns the text expression with the search REX expression replaced with the replace expression. See the Rampart Sql.rex() and the Rampart Sql.sandr() function documentation for complete syntax of the search and replace expressions.

SELECT  NAME, sandr('>>=SYS=', 'SYSTEM TABLE ', NAME) DESCR
FROM    SYSTABLES

The results are:

 NAME                DESCR
SYSTABLES            SYSTEM TABLE TABLES
SYSCOLUMNS           SYSTEM TABLE COLUMNS
SYSINDEX             SYSTEM TABLE INDEX
SYSUSERS             SYSTEM TABLE USERS
SYSPERMS             SYSTEM TABLE PERMS
SYSTRIG              SYSTEM TABLE TRIG
SYSMETAINDEX         SYSTEM TABLE METAINDEX

separator

Returns the separator character from its strlst argument, as a varchar string:

separator(strlstValue)

This can be used in situations where the strlstValue argument may have a nul character as the separator, in which case simply converting strlstValue to varchar and looking at the last character would be incorrect.

stringcompare

Compares its string (varchar) arguments a and b, returning -1 if a is less than b, 0 if they are equal, or 1 if a is greater than b:

stringcompare(a, b[, mode])

The strings are compared using the optional mode argument. If mode is unspecified, the current apicp stringCompareMode setting is used.

stringformat

Returns its arguments formatted into a string (varchar), like the equivalent Sql.stringFormat() (based on the C function sprintf()):

stringformat(format[, arg[, arg[, arg[, arg]]]])

The format argument is a varchar string that describes how to print the following argument(s), if any.

Math functions

The following basic math functions are available in Texis: acos, asin, atan, atan2, ceil, cos, cosh, exp, fabs, floor, fmod, log, log10, pow, sin, sinh, sqrt, tan, tanh.

All of the above functions call the ANSI C math library function of the same name, and return a result of type double. pow, atan2 and fmod take two double arguments, the remainder take one double argument.

In addition, the following math-related functions are available:

  • isNaN(x) Returns 1 if x is a float or double NaN (Not a Number) value, 0 if not. This function should be used to test for NaN, rather than using the equality operator (e.g. x = 'NaN'), because the IEEE standard defines NaN == NaN to be false, not true as might be expected.

Date functions

The following date functions are available in Texis: dayname, month, monthname, dayofmonth, dayofweek, dayofyear, quarter, week, year, hour, minute, second.

All the functions take a date as an argument. dayname and monthname will return a string with the full day or month name based on the current locale, and the others return a number.

The dayofweek function returns 1 for Sunday. The quarter is based on months, so April 1st is the first day of quarter 2. Week 1 begins with the first Sunday of the year.

The monthseq, weekseq and dayseq functions will return the number of months, weeks and days since an arbitrary past date. These can be used when comparing dates to see how many months, weeks or days separate them.

Bit manipulation functions

These functions are used to manipulate integers as bit fields. This can be useful for efficient set operations (e.g. set membership, intersection, etc.). For example, categories could be mapped to sequential bit numbers, and a row’s category membership stored compactly as bits of an int or varint, instead of using a string list. Category membership can then be quickly determined with bitand on the integer.

In the following functions, bit field arguments a and b are int or varint (32 bits per integer, all platforms). Argument n is any integer type. Bits are numbered starting with 0 as the least-significant bit of the first integer. 31 is the most-significant bit of the first integer, 32 is the least-significant bit of the second integer (if a multi-value varint), etc.

  • bitand(a, b) Returns the bit-wise AND of a and b. If one argument is shorter than the other, it will be expanded with 0-value integers.
  • bitor(a, b) Returns the bit-wise OR of a and b. If one argument is shorter than the other, it will be expanded with 0-value integers.
  • bitxor(a, b) Returns the bit-wise XOR (exclusive OR) of a and b. If one argument is shorter than the other, it will be expanded with 0-value integers.
  • bitnot(a) Returns the bit-wise NOT of a.
  • bitsize(a) Returns the total number of bits in a, i.e. the highest bit number plus 1.
  • bitcount(a) Returns the number of bits in a that are set to 1.
  • bitmin(a) Returns the lowest bit number in a that is set to 1. If none are set to 1, returns -1.
  • bitmax(a) Returns the highest bit number in a that is set to 1. If none are set to 1, returns -1.
  • bitlist(a) Returns the list of bit numbers of a, in ascending order, that are set to 1, as a varint. Returns a single -1 if no bits are set to 1.
  • bitshiftleft(a, n) Returns a shifted n bits to the left, with 0s padded for bits on the right. If n is negative, shifts right instead.
  • bitshiftright(a, n) Returns a shifted n bits to the right, with 0s padded for bits on the left (i.e. an unsigned shift). If n is negative, shifts left instead.
  • bitrotateleft(a, n) Returns a rotated n bits to the left, with left (most-significant) bits wrapping around to the right. If n is negative, rotates right instead.
  • bitrotateright(a, n) Returns a rotated n bits to the right, with right (least-significant) bits wrapping around to the left. If n is negative, rotates left instead.
  • bitset(a, n) Returns a with bit number n set to 1. a will be padded with 0-value integers if needed to reach n (e.g. bitset(5, 40) will return a varint(2)).
  • bitclear(a, n) Returns a with bit number n set to 0. a will be padded with 0-value integers if needed to reach n (e.g. bitclear(5, 40) will return a varint(2)).
  • bitisset(a, n) Returns 1 if bit number n is set to 1 in a, 0 if not.

Internet/IPV4 address functions

The following functions manipulate IP network and/or host addresses; most take inet style argument(s). This is an IPv4 address string, optionally followed by a netmask.

For IPv4, the format is dotted-decimal, i.e. \(N\)[.\(N\)[.\([N\).\(N\)]]] where \(N\) is a decimal, octal or hexadecimal integer from 0 to 255. If \(x < 4\) values of \(N\) are given, the last \(N\) is taken as the last \(5-x\) bytes instead of 1 byte, with missing bytes padded to the right. E.g. 192.258 is valid and equivalent to 192.1.2.0: the last \(N\) is 2 bytes in size, and covers 5 - 2 = 3 needed bytes, including 1 zero pad to the right. Conversely, 192.168.4.1027 is not valid: the last \(N\) is too large.

An IPv4 address may optionally be followed by a netmask, either of the form /\(B\) or :\(IPv4\), where \(B\) is a decimal, octal or hexadecimal netmask integer from 0 to 32, and \(IPv4\) is a dotted-decimal IPv4 address of the same format described above. If an :\(IPv4\) netmask is given, only the largest contiguous set of most-significant 1 bits are used (because netmasks are contiguous). If no netmask is given, it will be calculated from standard IPv4 class A/B/C/D/E rules, but will be large enough to include all given bytes of the IP. E.g. 1.2.3.4 is Class A which has a netmask of 8, but the netmask will be extended to 32 to include all 4 given bytes.

  • inetabbrev(inet) Returns a possibly shorter-than-canonical representation of $inet, where trailing zero byte(s) of an IPv4 address may be omitted. All bytes of the network, and leading non-zero bytes of the host, will be included. E.g. returns 192.100.0/24. The /\(B\) netmask is included, except if the network is host-only (i.e.netmask is the full size of the IP address). Empty string is returned on error.
  • inetcanon(inet) Returns canonical representation of $inet. For IPv4, this is dotted-decimal with all 4 bytes. The /\(B\) netmask is included, except if the network is host-only (i.e. netmask is the full size of the IP address). Empty string is returned on error.
  • inetnetwork(inet) Returns string IP address with the network bits of inet, and the host bits set to 0. Empty string is returned on error.
  • inethost(inet) Returns string IP address with the host bits of inet, and the network bits set to 0. Empty string is returned on error.
  • inetbroadcast(inet) Returns string IP broadcast address for inet, i.e. with the network bits, and host bits set to 1. Empty string is returned on error.
  • inetnetmask(inet) Returns string IP netmask for inet, i.e. with the network bits set to 1, and host bits set to 0. Empty string is returned on error.
  • inetnetmasklen(inet) Returns integer netmask length of inet. -1 is returned on error.
  • inetcontains(inetA, inetB) Returns 1 if inetA contains inetB, i.e. every address in inetB occurs within the inetA network. 0 is returned if not, or -1 on error.
  • inetclass(inet) Returns class of inet, e.g. A, B, C, D, E or classless if a different netmask is used (or the address is IPv6). Empty string is returned on error.
  • inet2int(inet) Returns integer representation of IP network/host bits of $inet (i.e. without netmask); useful for compact storage of address as integer(s) instead of string. Returns -1 is returned on error (note that -1 may also be returned for an all-ones IP address, e.g. 255.255.255.255).
  • int2inet(i) Returns inet string for 1- or 4-value varint $i taken as an IP address. Since no netmask can be stored in the integer form of an IP address, the returned IP string will not have a netmask. Empty string is returned on error.

urlcanonicalize

Canonicalize a URL. Usage:

urlcanonicalize(url[, flags])

Returns a copy of url, canonicalized according to case-insensitive comma-separated flags, which are zero or more of:

  • lowerProtocol Lower-cases the protocol.
  • lowerHost Lower-cases the hostname.
  • removeTrailingDot Removes trailing dot(s) in hostname.
  • reverseHost Reverse the host/domains in the hostname. E.g. http://host.example.com/ becomes http://com.example.host/. This can be used to put the most-significant part of the hostname leftmost.
  • removeStandardPort Remove the port number if it is the standard port for the protocol.
  • decodeSafeBytes URL-decode safe bytes, where semantics are unlikely to change. E.g. “%41” becomes “A”, but “%2F” remains encoded, because it would decode to “/”.
  • upperEncoded Upper-case the hex characters of encoded bytes.
  • lowerPath Lower-case the (non-encoded) characters in the path. May be used for URLs known to point to case-insensitive filesystems, e.g. Windows.
  • addTrailingSlash Adds a trailing slash to the path, if no path is present.

Default flags are all but reverseHost, lowerPath. A flag may be prefixed with the operator + to append the flag to existing flags; - to remove the flag from existing flags; or = (default) to clear existing flags first and then set the flag. Operators remain in effect for subsequent flags until the next operator (if any) is used.

Geographical coordinate functions

The geographical coordinate functions allow for efficient processing of latitude / longitude operations. They allow for the conversion of a latitude/longitude pair into a single “geocode”, which is a single long value that contains both values. This can be used to easily compare it to other geocodes (for distance calculations) or for finding other geocodes that are within a certain distance.

azimuth2compass

azimuth2compass(double azimuth [, int resolution [, int verbosity]])

The azimuth2compass function converts a numerical azimuth value (degrees of rotation from 0 degrees north) and converts it into a compass heading, such as N or Southeast. The exact text returned is controlled by two optional parameters, resolution and verbosity.

Resolution determines how fine-grained the values returned are. There are 4 possible values:

  • 1 - Only the four cardinal directions are used (N, E, S, W)
  • 2 (default) - Inter-cardinal directions (N, NE, E, etc.)
  • 3 - In-between inter-cardinal directions (N, NNE, NE, ENE, E, etc.)
  • 4 - “by” values (N, NbE, NNE, NEbN, NE, NEbE, ENE, EbN, E, etc.)

Verbosity affects how verbose the resulting text is. There are two possible values:

  • 1 (default) - Use initials for direction values (N, NbE, NNE, etc.)
  • 2 - Use full text for direction values (North, North by east, North-northeast, etc.)

For an azimuth value of 105, here are some example results of azimuth2compass:

azimuth2compass(105): E
azimuth2compass(105, 3): ESE
azimuth2compass(105, 4): EbS
azimuth2compass(105, 1, 2): East
azimuth2compass(105, 3, 2): East-southeast
azimuth2compass(105, 4, 2): East by south

azimuthgeocode

azimuthgeocode(geocode1, geocode2 [, method])

The azimuthgeocode function calculates the directional heading going from one geocode to another. It returns a number between 0-360 where 0 is north, 90 east, etc., up to 360 being north again.

The third, optional method parameter can be used to specify which mathematical method is used to calculate the direction. There are two possible values:

  • greatcircle (default) - The “Great Circle” method is a highly accurate tool for calculating distances and directions on a sphere. It is used by default.
  • pythagorean - Calculations based on the Pythagorean method can also be used. They’re faster, but less accurate as the core formulas don’t take the curvature of the earth into consideration. Some internal adjustments are made, but the values are less accurate than the greatcircle method, especially over long distances and with paths that approach the poles.

azimuthlatlon

azimuthlatlon(lat1, lon1, lat2, lon2, [, method])

The azimuthlatlon function calculates the directional heading going from one latitude-longitude point to another. It operates identically to azimuthgeocode, except azimuthlatlon takes its parameters in a pair of latitude-longitude points instead of geocode values.

The third, optional method parameter can be used to specify which mathematical method is used to calculate the direction. There are two possible values:

  • greatcircle (default) - The “Great Circle” method is a highly accurate tool for calculating distances and directions on a sphere. It is used by default.
  • pythagorean - Calculations based on the Pythagorean method can also be used. They’re faster, but less accurate as the core formulas don’t take the curvature of the earth into consideration. Some internal adjustments are made, but the values are less accurate than the greatcircle method, especially over long distances and with paths that approach the poles.

dms2dec, dec2dms

dms2dec(dms)
dec2dms(dec)

The dms2dec and dec2dms functions are for changing back and forth between the “degrees minutes seconds” (DMS) format (west-positive) and “decimal degree” format for latitude and longitude coordinates. All SQL geographical functions expect decimal degree parameters.

DMS values are of the format \(DDDMMSS\). For example, 3515’ would be represented as 351500.

In decimal degrees, a degree is a whole digit, and minutes & seconds are represented as fractions of a degree. Therefore, 3515’ would be 35.25 in decimal degrees.

Note that the Texis DMS format has west-positive longitudes (unlike ISO 6709 DMS format), and decimal degrees have east-positive longitudes. It is up to the caller to flip the sign of longitudes where needed.

distgeocode

distgeocode(geocode1, geocode2 [, method] )

The distgeocode function calculates the distance, in miles, between two given geocodes. It uses the “Great Circle” method for calculation by default, which is very accurate. A faster, but less accurate, calculation can be done with the Pythagorean theorem. It is not designed for distances on a sphere, however, and becomes somewhat inaccurate at larger distances and on paths that approach the poles. To use the Pythagorean theorem, pass a third string parameter, “pythagorean”, to force that method. “greatcircle” can also be specified as a method.

For example:

  • New York (JFK) to Cleveland (CLE), the Pythagorean method is off by .8 miles (.1%)
  • New York (JFK) to Los Angeles (LAX), the Pythagorean method is off by 22.2 miles (.8%)
  • New York (JFK) to South Africa (PLZ), the Pythagorean method is off by 430 miles (5.2%)

See Also: distlatlon

distlatlon

distlatlon(lat1, lon1, lat2, lon2 [, method] )

The distlatlon function calculates the distance, in miles, between two points, represented in latitude/longitude pairs in decimal degree format.

Like distgeocode, it uses the “Great Circle” method by default, but can be overridden to use the faster, less accurate Pythagorean method if “pythagorean” is passed as the optional method parameter.

For example:

  • New York (JFK) to Cleveland (CLE), the Pythagorean method is off by .8 miles (.1%)
  • New York (JFK) to Los Angeles (LAX), the Pythagorean method is off by 22.2 miles (.8%)
  • New York (JFK) to South Africa (PLZ), the Pythagorean method is off by 430 miles (5.2%)

See Also: distgeocode

latlon2geocode, latlon2geocodearea

latlon2geocode(lat[, lon])
latlon2geocodearea(lat[, lon], radius)

The latlon2geocode function encodes a given latitude/longitude coordinate into one long return value. This encoded value – a “geocode” value – can be indexed and used with a special variant of Texis’ BETWEEN operator for bounded-area searches of a geographical region.

The latlon2geocodearea function generates a bounding area centered on the coordinate. It encodes a given latitude/longitude coordinate into a two- value varlong. The returned geocode value pair represents the southwest and northeast corners of a square box centered on the latitude/longitude coordinate, with sides of length two times radius (in decimal degrees). This bounding area can be used with the Texis BETWEEN operator for fast geographical searches.

The lat and lon parameters are doubles in the decimal degrees format. (To pass \(DDDMMSS\) “degrees minutes seconds” (DMS) format values, convert them first with dms2dec or parselatitude, parselongitude.). Negative numbers represent south latitudes and west longitudes, i.e. these functions are east-positive, and decimal format.

Valid values for latitude are -90 to 90 inclusive. Valid values for longitude are -360 to 360 inclusive. A longitude value less than -180 will have 360 added to it, and a longitude value greater than 180 will have 360 subtracted from it. This allows longitude values to continue to increase or decrease when crossing the International Dateline, and thus avoid a non-linear “step function”. Passing invalid lat or lon values to latlon2geocode will return -1.

The lon parameter is optional: both latitude and longitude (in that order) may be given in a single space- or comma-separated text (varchar) value for lat. Also, a N/S suffix (for latitude) or E/W suffix (for longitude) may be given; S or W will negate the value.

The latitude and/or longitude may also have just about any of the formats supported by parselatitude, parselongitude, provided they are disambiguated (e.g. separate parameters; or if one parameter, separated by a comma and/or fully specified with degrees/minutes/seconds).

-- Populate a table with latitude/longitude information:
create table geotest(city varchar(64), lat double, lon double, geocode long);
insert into geotest values('Cleveland, OH, USA', 41.4,  -81.5,  -1);
insert into geotest values('San Francisco, CA, USA',   37.78, -122.42,  -1);
insert into geotest values('Davis, Ca, USA',    38.55, -121.74, -1);
insert into geotest values('New York, NY, USA',  40.81, -73.96,  -1);

-- Prepare for geographic searches:
update geotest set geocode = latlon2geocode(lat, lon);
create index xgeotest_geocode on geotest(geocode);

-- Search for cities within a 3-degree-radius "circle" (box)
-- of Cleveland, nearest first:
select city, lat, lon, distlatlon(41.4, -81.5, lat, lon) MilesAway from geotest
   where geocode between (select latlon2geocodearea(41.4, -81.5, 3.0)) order by 4 asc;

The geocode values returned by latlon2geocode and latlon2geocodearea are platform-dependent in format and accuracy, and should not be copied across platforms. On platforms with 32-bit longs a geocode value is accurate to about 32 seconds (around half a mile, depending on latitude). -1 is returned for invalid input values.

See Also: geocode2lat, geocode2lon

geocode2lat, geocode2lon

geocode2lat(geocode)
geocode2lon(geocode)

The geocode2lat and geocode2lon functions decode a geocode into a latitude or longitude coordinate, respectively. The returned coordinate is in the decimal degrees format. An invalid geocode value (e.g. -1) will return NaN (Not a Number).

If you want \(DDDMMSS\) “degrees minutes seconds” (DMS) format, you can use dec2dms to convert it.

select city, geocode2lat(geocode), geocode2lon(geocode) from geotest;

As with latlon2geocode, the geocode value is platform-dependent in accuracy and format, so it should not be copied across platforms, and the returned coordinates from geocode2lat and geocode2lon may differ up to about half a minute from the original coordinates (due to the finite resolution of a long). An invalid geocode value (e.g. -1) will return NaN (Not a Number).

See Also: latlon2geocode

parselatitude, parselongitude

parselatitude(latitudeText)
parselongitude(longitudeText)

The parselatitude and parselongitude functions parse a text (varchar) latitude or longitude coordinate, respectively, and return its value in decimal degrees as a double. The coordinate should be in one of the following forms (optional parts in square brackets):

  • [\(H\)] \(nnn\) [\(U\)] [:] [\(H\)] [\(nnn\) [\(U\)] [:] [\(nnn\) [\(U\)]]] [\(H\)]
  • \(DDMM\)[\(.MMM\)…]
  • \(DDMMSS\)[\(.SSS\)…]

where the terms are:

  • \(nnn\) A number (integer or decimal) with optional plus/minus sign. Only the first number may be negative, in which case it is a south latitude or west longitude. Note that this is true even for \(DDDMMSS\) (DMS) longitudes – i.e. the ISO 6709 east-positive standard is followed, not the deprecated Texis west-positive standard.

  • \(U\) A unit (case-insensitive):

    • d
    • deg
    • deg.
    • degrees
    • ' (single quote) for minutes
    • m
    • min
    • min.
    • minutes
    • " (double quote) for seconds
    • s (iff d/m also used for degrees/minutes)
    • sec
    • sec.
    • seconds
    • Unicode degree-sign (U+00B0), in ISO-8559-1 or UTF-8

    If no unit is given, the first number is assumed to be degrees, the second minutes, the third seconds. Note that “s” may only be used for seconds if “d” and/or “m” was also used for an earlier degrees/minutes value; this is to help disambiguate “seconds” vs. “southern hemisphere”.

  • \(H\) A hemisphere (case-insensitive):

    • N
    • north
    • S
    • south
    • E
    • east
    • W
    • west

    A longitude hemisphere may not be given for a latitude, and vice-versa.

  • \(DD\) A two- or three-digit degree value, with optional sign. Note that longitudes are east-positive ala ISO 6709, not west-positive like the deprecated Texis standard.

  • \(MM\) A two-digit minutes value, with leading zero if needed to make two digits.

  • \(.MMM\)… A zero or more digit fractional minute value.

  • \(SS\) A two-digit seconds value, with leading zero if needed to make two digits.

  • \(.SSS\)… A zero or more digit fractional seconds value.

Whitespace is generally not required between terms in the first format. A hemisphere token may only occur once. Degrees/minutes/seconds numbers need not be in that order, if units are given after each number. If a 5-integer-digit \(DDDMM\)[\(.MMM\)…] format is given and the degree value is out of range (e.g. more than 90 degrees latitude), it is interpreted as a \(DMMSS\)[\(.SSS\)…] value instead. To force \(DDDMMSS\)[\(.SSS\)…] for small numbers, pad with leading zeros to 6 or 7 digits.

insert into geotest(lat, lon)
  values(parselatitude('54d 40m 10"'),
         parselongitude('W90 10.2'));

An invalid or unparseable latitude or longitude value will return NaN (Not a Number). Extra unparsed/unparsable text may be allowed (and ignored) after the coordinate in most instances. Out-of-range values (e.g. latitudes greater than 90 degrees) are accepted; it is up to the caller to bounds-check the result.

JSON functions

The JSON functions allow for the manipulation of varchar fields and literals as JSON objects.

JSON Path Syntax

The JSON Path syntax is standard Javascript object access, using $ to represent the entire document. If the document is an object the path must start $., and if an array $[.

JSON Field Syntax

In addition to using the JSON functions it is possible to access elements in a varchar field that holds JSON as if it was a field itself. This allows for creation of indexes, searching and sorting efficiently. Arrays can also be fetched as strlst to make use of those features, e.g. SELECT Json.$.name FROM tablename WHERE 'SQL' IN Json.$.skills[*];

isjson

isjson(JsonDocument)

The isjson function returns 1 if the document is valid JSON, 0 otherwise.

isjson('{ "type" : 1 }'): 1
isjson('{}'): 1
isjson('json this is not'): 0

json_format

json_format(JsonDocument, FormatOptions)

The json_format formats the JsonDocument according to FormatOptions. Multiple options can be provided either space or comma separated.

Valid FormatOptions are:

  • COMPACT - remove all unnecessary whitespace
  • INDENT(N) - print the JSON with each object or array member on a new line, indented by N spaces to show structure
  • SORT-KEYS - sort the keys in the object. By default the order is preserved
  • EMBED - omit the enclosing {} or [] is using the snippet in another object
  • ENSURE_ASCII - encode all Unicode characters outside the ASCII range
  • ENCODE_ANY - if not a valid JSON document then encode into a JSON literal, e.g. to encode a string.
  • ESCAPE_SLASH - escape forward slash / as \/

json_type

json_type(JsonDocument)

The json_type function returns the type of the JSON object or element. Valid responses are:

  • OBJECT
  • ARRAY
  • STRING
  • INTEGER
  • DOUBLE
  • NULL
  • BOOLEAN

Assuming a field Json containing:

{"items":
  [
   {"myNum":1, "myText": "Some text", "myBool": true},
   {"myNum":2.0, "myText": "Some more text", "myBool": false},
   null
  ]
}
json_type(Json): OBJECT
json_type(Json.$.items[0]): OBJECT
json_type(Json.$.items): ARRAY
json_type(Json.$.items[0].myNum): INTEGER
json_type(Json.$.items[1].myNum): DOUBLE
json_type(Json.$.items[0].myText): STRING
json_type(Json.$.items[0].myBool): BOOLEAN
json_type(Json.$.items[2]): NULL

json_value

json_value(JsonDocument, Path)

The json_value extracts the value identified by Path from JsonDocument. Path is a varchar in the JSON Path Syntax. This will return a scalar value. If Path refers to an array, object, or invalid path no value is returned.

Assuming the same Json field from the previous examples:

json_value(Json, '$'):
json_value(Json, '$.items[0]'):
json_value(Json, '$.items'):
json_value(Json, '$.items[0].myNum'): 1
json_value(Json, '$.items[1].myNum'): 2.0
json_value(Json, '$.items[0].myText'): Some Text
json_value(Json, '$.items[0].myBool'): true
json_value(Json, '$.items[2]'):

json_query

json_query(JsonDocument, Path)

The json_query extracts the object or array identified by Path from JsonDocument. Path is a varchar in the JSON Path Syntax. This will return either an object or an array value. If Path refers to a scalar no value is returned.

Assuming the same Json field from the previous examples:

json_query(Json, '$')
---------------------
{"items":[{"myNum":1,"myText":"Some text","myBool":true},{"myNum":2.0,"myText":"Some more text","myBool":false},null]}

json_query(Json, '$.items[0]')
------------------------------
{"myNum":1,"myText":"Some text","myBool":true}

json_query(Json, '$.items')``
---------------------------
[{"myNum":1,"myText":"Some text","myBool":true},{"myNum":2.0,"myText":"Some more text","myBool":false},null]

The following will return an empty string as they refer to scalars or non-existent keys.

json_query(Json, '$.items[0].myNum')
json_query(Json, '$.items[1].myNum')
json_query(Json, '$.items[0].myText')
json_query(Json, '$.items[0].myBool')
json_query(Json, '$.items[2]')

json_modify

json_modify(JsonDocument, Path, NewValue)

The json_modify function returns a modified version of JsonDocument with the key at Path replaced by NewValue.

If Path starts with append then the NewValue is appended to the array referenced by Path. It is an error it Path refers to anything other than an array.

json_modify('{}', '$.foo', 'Some "quote"')
------------------------------------------
{"foo":"Some \"quote\""}

json_modify('{ "foo" : { "bar": [40, 42] } }', 'append $.foo.bar', 99)
----------------------------------------------------------------------
{"foo":{"bar":[40,42,99]}}

json_modify('{ "foo" : { "bar": [40, 42] } }', '$.foo.bar', 99)
---------------------------------------------------------------
{"foo":{"bar":99}}

json_merge_patch

json_merge_patch(JsonDocument, Patch)

The json_merge_patch function provides a way to patch a target JSON document with another JSON document. The patch function conforms to RFC 7386.

Keys in JsonDocument are replaced if found in Patch. If the value in Patch is null then the key will be removed in the target document.

json_merge_patch('{"a":"b"}', '{"a":"c"}')
------------------------------------------
{"a":"c"}

json_merge_patch('{"a": [{"b":"c"}]}', '{"a": [1]}')
----------------------------------------------------
{"a":[1]}

json_merge_patch('[1,2]', '{"a":"b", "c":null}')
------------------------------------------------
{"a":"b"}

json_merge_preserve

json_merge_preserve(JsonDocument, Patch)

The json_merge_preserve function provides a way to patch a target JSON document with another JSON document while preserving the content that exists in the target document.

Keys in JsonDocument are merged if found in Patch. If the same key exists in both the target and patch file the result will be an array with the values from both target and patch.

If the value in Patch is null then the key will be removed in the target document.

json_merge_preserve('{"a":"b"}', '{"a":"c"}')
---------------------------------------------
{"a":["b","c"]}

json_merge_preserve('{"a": [{"b":"c"}]}', '{"a": [1]}')
-------------------------------------------------------
{"a":[{"b":"c"},1]}

json_merge_preserve('{"a": [{"b":"c"}]}', '{"a": 1}')
-----------------------------------------------------
{"a":[{"b":"c"},1]}

json_merge_preserve('{"a": [{"b":"c"}]}', '{"a": [1,2]}')
---------------------------------------------------------
{"a":[{"b":"c"},1,2]}

json_merge_preserve('{"a": [{"b":"c"}]}', '{"a": {"d":1,"e":2} }')
------------------------------------------------------------------
{"a":[{"b":"c"},{"d":1,"e":2}]}

json_merge_preserve('{"a": {"b":"c"}}', '{"a": {"d":1, "e":2} }')
-----------------------------------------------------------------
{"a":{"b":"c","d":1,"e":2}}

json_merge_preserve('[1,2]', '{"a":"b", "c":null}')
---------------------------------------------------
[1,2,{"a":"b","c":null}]

Full Example Using Json

JSON fields can be operated on with database functions and SQL statements in the same manner as normal fields. Here is the sql.exec() example using a JSON varchar field in the place of multiple columns:

var Sql = require("rampart-sql");

/* create database if it does not exist */
var sql = new Sql.init("./mytestdb",true);

/* check if table exists */
var res = sql.exec(
    "select * from SYSTABLES where NAME='employees'",
    {"returnType":"novars"} /* we only need the count */
);

if(res.rowCount) /* 1 if the table exists */
{
    /* drop table from previous test run of this script */
    res=sql.exec("drop table employees");
}

/* (re)create the table */
sql.exec(
        "create table employees (Classification varchar(8), " +
        "Name varchar(16), EmpData varchar(256) );",
        {"returnType":"novars"}
);

/* populate variables for insertion */
var emp1 = {
  cl:     "principal",
  name:   "Debbie Dreamer",
  empdata: {
    age:    63,
    title:  "Chief Executive Officer",
    start:  '1999-12-31',
    salary: 250000,
    bio:    "Born and raised in Manhattan, New York. U.C. Berkeley graduate. " +
              "Loves to skydive. Built Company from scratch. Still uses word-perfect."
  }
}

var emp2 = {
  cl:     "principal",
  name:   "Rusty Grump",
  empdata: {
    age:    58,
    title:  "Chief Financial Officer",
    start:  '1999-12-31', // Strings are converted to local time
    salary: 250000,
    bio:    "Born in Switzerland, raised in South Dakota. Columbia graduate. " +
              "Financed operation with inheritance. Has no sense of humor."
  }
}

var emp3 = {
  cl:     "salary",
  name:   "Georgia Geek",
  empdata: {
    age:    44,
    title:  "Lead Programmer",
    start:  '2001-3-15',
    salary: 100000,
    bio:    "Stanford graduate. Enjoys pizza and beer. Proficient in Perl, COBOL," +
            "FORTRAN and IBM System/360"
  }
}

var emp4 = {
  cl:     "salary",
  name:   "Sydney Slacker",
  empdata: {
    age:    44,
    title:  "Programmer",
    start:  new Date('2002-5-12T00:00:00.0-0800'), // Dates are UTC unless offset is given.
    salary: 100000,
    bio:    "DeVry University graduate. Enjoys a good nap. Proficient in Python, " +
            "Perl and JavaScript"
  }
}

var emp5 = {
  cl:     "hourly",
  name:   "Pat Particular",
  empdata: {
    age:    32,
    title:  "Systems Administrator",
    start:  new Date('2003-7-14'),
    salary: 80000,
    bio:    "Lincoln High School graduate. Self taught Linux and windows administration skills. Proficient in " +
          "Bash and GNU utilities. Capable of crashing or resurrecting machines with a single ping.",
  }
}

var emp6 = {
  cl:     "intern",
  name:   "Billie Barista",
  empdata: {
    age:    22,
    title:  "Intern",
    start:  new Date('2020-3-18'),
    salary: 0,
    bio:    "Harvard graduate, full ride scholarship, top of class.  Proficient in C, C++, " +
            "Rust, Haskell, Node, Python. Into skydiving. Makes a mean latte."
  }
}

var employees = [ emp1, emp2, emp3, emp4, emp5, emp6 ];

/* insert rows */
for (var i=0; i<employees.length; i++)
{
    // empdata:{} is automatically converted to JSON
    sql.exec(
        "insert into employees values(?cl,?name,?empdata)",
        employees[i]
    );
}



/* create text index */
sql.exec("create fulltext index employees_Bio_text on employees( EmpData.$.bio );");

/* perform some queries */
res=sql.exec("select Name, EmpData.$.age Age from employees");

rampart.utils.printf('%3J\n%s\n', res,sql.errMsg);
/* expected output:
   {
       "columns": [
           "Name",
           "Age"
       ],
       "rows": [
           {
               "Name": "Debbie Dreamer",
               "Age": 63
           },
           {
               "Name": "Rusty Grump",
               "Age": 58
           },
           {
               "Name": "Georgia Geek",
               "Age": 44
           },
           {
               "Name": "Sydney Slacker",
               "Age": 44
           },
           {
               "Name": "Pat Particular",
               "Age": 32
           },
           {
               "Name": "Billie Barista",
               "Age": 22
           }
       ],
       "rowCount": 6
   }
*/

res=sql.exec(
    "select Name, EmpData.$.age  Age from employees",
    {returnType:'array', maxRows:2, includeCounts:true}
);
rampart.utils.printf('%3J\n', res);
/* expected output:
   {
       "columns": [
           "Name",
           "Age"
       ],
       "rows": [
           [
               "Debbie Dreamer",
               63
           ],
           [
               "Rusty Grump",
               58
           ]
       ],
       "countInfo": {
           "indexCount": -1,
           "rowsMatchedMin": -1,
           "rowsMatchedMax": -2,
           "rowsReturnedMin": -1,
           "rowsReturnedMax": -2
       },
       "rowCount": 2
   }
             Note that countInfo values are all negative since no
             text search was performed.
*/
res=sql.exec(
    "select Name from employees where EmpData.$.bio likep 'proficient' and convert(EmpData.$.salary, 'float') > 50000",
     {includeCounts:true}
);
rampart.utils.printf('%3J\n', res);

/* expected output:
   {
       "columns": [
           "Name"
       ],
       "rows": [
           {
               "Name": "Georgia Geek"
           },
           {
               "Name": "Sydney Slacker"
           },
           {
               "Name": "Pat Particular"
           }
       ],
       "countInfo": {
           "indexCount": 4,
           "rowsMatchedMin": 0,
           "rowsMatchedMax": 4,
           "rowsReturnedMin": 0,
           "rowsReturnedMax": 4
       },
       "rowCount": 3
   }
   Note that indexCount is the count before "Salary > 50000" filter
*/

/* skydive => skydiving */
sql.set({
    minwordlen: 5,
    suffixproc: true
});

nrows=sql.exec(
    "select Name, EmpData.$.salary Salary from employees where EmpData.$.bio likep 'skydive' " +
      "order by convert(EmpData.$.salary, 'float')  desc",
    {returnType:"array", includeCounts:true},
    function (row, i, coln, cinfo) {
        if(!i) {
            console.log(
               "Total approximate number of matches in db: " +
               cinfo.indexCount
            );
            console.log("-", coln);
        }
        console.log(i+1,row);
    }
);
console.log("Total: " + nrows); // 2

/* expected output:
   Total approximate number of matches in db: 2
   - ["Name","Salary"]
   1 ["Debbie Dreamer",250000]
   2 ["Billie Barista",0]
   Total: 2
*/

Note that it is more efficient and less cumbersome to place values in dedicated columns. However, when the table may need to accomodate future fields, or where fields vary per row, using JSON fields can allow for greater flexibility.