Posts tagged with "postgres"

dqprintf

Part of my job involves performing data transformations using tables loaded via CSVs, which themselves were created from tab-delimited reports. Nasty.

Making something usable out of these involves a nice handful of PL/PGSQL functions. These functions involve arguments specifying table names and other values that are used to build dynamic queries that either perform INSERT ... SELECT FROM, or loop through result sets because the reports require some amount of contextual processing before performing an INSERT. Blech.

The two main functions that make this all possible are quote_ident and quote_literal. They ensure you're not just blindly concatenating illegal identifier names into your queries. Please don't just type in " || table_name || ". It hurts.

Unfortunately, your function will inevitably end up looking a little like this:

iQuery := $Q$
    SELECT
        $Q$ || quote_ident(some_column) || $Q$ || $Q$ ||
            quote_literal(some_value) || $Q$ || '-1'
    FROM $Q$ || quote_ident(some_table) || $Q$
    WHERE another_column = $Q$ || quote_literal(another_value) || $Q$
$Q$;

FOR mRow IN EXECUTE iQuery
LOOP
...

Gross, but this isn't a new problem, so here's a simple solution.

I adapted this function a bit to make dqprintf, a simple sprintf-like function to create dynamic queries:

CREATE OR REPLACE FUNCTION dqprintf(fmt text, VARIADIC args anyarray)
RETURNS text
LANGUAGE PLPGSQL AS $$
DECLARE
        argcnt int = 1;
        chrcnt int = 0;
        fmtlen int;
        CHR text;
        NEXT_CHR text;
        output text = '';
        curarg text;

BEGIN
        fmtlen = LENGTH(fmt);
        LOOP
                chrcnt = chrcnt + 1;

                -- ran out of format string? bail out
                IF chrcnt > fmtlen THEN
                        EXIT;
                END IF;

                -- grab our char
                CHR = substring(fmt, chrcnt, 1);
                NEXT_CHR = substring(fmt, chrcnt + 1, 1);

                -- %% means output a single %, and skip them
                IF CHR = '%' AND substring(fmt, chrcnt + 1, 1) = '%' THEN
                        output = output || '%';
                        chrcnt = chrcnt + 1;
                        continue;
            END IF;

                -- %i means we're going to quote an identifier
                -- %l means we're going to quote a literal
                -- %s is anything else. Not exactly a printf format.
                IF CHR = '%' THEN
                        curarg := COALESCE(args[argcnt]::text, '');

                        IF NEXT_CHR = 'i' THEN
                            curarg := quote_ident(curarg);
                        ELSIF NEXT_CHR = 'l' THEN
                            curarg := quote_literal(curarg);
                        ELSIF NEXT_CHR <> 's' THEN
                            -- improper format identifier
                            RAISE EXCEPTION 'Incorrect format identifier: %', NEXT_CHR;
                        END IF;

                        output = output || curarg;
                        argcnt = argcnt + 1;
                        chrcnt = chrcnt + 1;
                        CONTINUE;

                END IF;

                -- no special case? output the thing
                output = output || CHR;
        END LOOP;

        RETURN output;
END;
$$;

This will turn the above pipe character mess into something a bit more manageable:

iQuery := $Q$
    SELECT
        %i || %l || '-1'
    FROM %i
    WHERE another_column = %l
$Q$;
FOR mRow IN EXECUTE dqprintf(iQuery, some_column, some_value,
    some_table, another_value)
LOOP
...

A couple of notes. First, every argument needs to be cast to the same type, preferably text. If you're using this in a PL/PGSQL function, you most likely have this covered if these are typed arguments to the function already.

Secondly, error handling isn't terribly robust, so debugging it might be clumsy. You're already used to that, though, if you're writing PL/PGSQL functions.

On DML Logging

If your shop is like ours, your devs working with PostgreSQL want to log UPDATE, INSERT and DELETE operations by users, as a tool to analyze bugs in your software or to audit user activities. Trigger functions immediately jump to mind, and they are rightfully suited to the task, but bring some baggage of their own that may give you some pause if your users or developers are sensitive to the speed of their database queries.

First things first, PL/PGSQL is great. If you absolutely can't write your function in plain SQL, use PL/PGSQL if you can. PL/Python is fun, very powerful, and if you're like me, writing Python is like a Slip 'N Slide that never ends. But for trigger functions on what could be millions of queries, it may give you the gift of a significant performance penalty that will irritate you for around a year before it finally prompts you to complain about it in writing, on the Internet.

Unfortunately for PL/PGSQL, the trigger function's row object doesn't provide any sort of data you can use to find out exactly what fields are contained within the row object. You also can't dynamically refer to these columns in the row variable. You have to infer this from the table you created the trigger on, and write your function appropriately. So, that works...if you're logging a single table. Fast forward to writing trigger functions for 40 or more tables and you see the dilemma.

So, long story made short, PL/Python trigger functions provide rows as a Dict, and it's fairly obvious what to do with it. But then there's a performance hit. It's not even so much the OLTP queries that drag us down as the data loading. Like most people dealing with real data, you have to load quite a bit from time to time, and if you're nice, you won't grab an exclusive lock on the entire table and disable triggers just so you can bring in a few years of history.

You can see where this is going. Every row inserted brings you another single INSERT to your logging table. It doesn't matter what structure you've made for storing this log data, it's going to be painful.

You could include code in the trigger function to turn around and exit if it sees a specially named session-local temporary table. This takes off some of the hit from the INSERTs. But, you're still firing off the Python interpreter 500,000 times if you're creating 500,000 records.

That leaves us with some options.

The last one is crazy enough that it might work, but I haven't even tried to test it yet.