How to Transform SQL Queries to Crosstabs in PostgreSQL

How to Transform SQL Queries to Crosstabs in PostgreSQL

·

11 min read

A SELECT SQL query retrieves data from a database in a tabular form. The first row, or header row, has the column names, and all the other rows have the data that was retrieved.

Listing of salesmen monthly sales data

In a crosstab, the data is condensed and the names of one or more columns are rotated. You can add row and column totals to a crosstab. For example, a list of salesmen's monthly sales may include the months in the column headers. By rotating the data, it is easier to read and understand how the facts relate to one another.

Sample crosstab

Other names you might come across for crosstabs are matrix reports, pivot for SQL server databases, and pivot tables on spreadsheets like Microsoft Excel, Google Sheets, and LibreOffice.

The following topics are covered in this post:

This post uses queries from a database populated with data from the Northwind database for Postgres. The queries are run using DBeaver SQL client.

Crosstabs Using CASE Conditional Expression

The SQL CASE expression enables you to choose a value depending on a condition, much like an if-then-else conditional statement. The syntax for the CASE expression is as follows:

CASE WHEN condition THEN result
     [WHEN ...]
     [ELSE result]
END

If a condition evaluates to true, the result value will be chosen, otherwise; the result value contained in the optional ELSE expression will be chosen.

In the example below, the SQL CASE statement is used to make a crosstab of monthly total sales by employees for the first four months of 1997.

---- Crosstabs Using CASE Conditional Expression
SELECT  salesman,
      SUM(CASE order_month WHEN 1 THEN order_total ELSE 0 END) AS Jan,
      SUM(CASE order_month WHEN 2 THEN order_total ELSE 0 END) AS Feb,
      SUM(CASE order_month WHEN 3 THEN order_total ELSE 0 END) AS Mar,
      SUM(CASE order_month WHEN 4 THEN order_total ELSE 0 END) AS Apr,
      SUM(order_total) AS total
  FROM
      (
        SELECT
          e.last_name  || ' ' || e.first_name AS salesman,
      date_part('month', order_date) AS order_month,
      ROUND((quantity * unit_price)::numeric, 2) AS order_total
    FROM employees e
    JOIN orders o ON e.employee_id = o.employee_id
    JOIN order_details od ON o.order_id = od.order_id
    WHERE date_part('year', order_date) = 1997
        AND  date_part('month', order_date) BETWEEN 1 AND 4
      ) employee_sales
  GROUP by 1

If you run the query above, you will get a crosstab like the one below.

employees monthly total sales crosstab during the first four months of 1997

You can combine the previous query with a UNION ALL operator and the following SQL SELECT query to include the total for each month in the cross-tab.

  --- UNION ALL operator to include totals for each month
  UNION ALL
  SELECT  'Totals ',
      SUM(CASE order_month WHEN 1 THEN order_total ELSE 0 END) AS Jan,
      SUM(CASE order_month WHEN 2 THEN order_total ELSE 0 END) AS Feb,
      SUM(CASE order_month WHEN 3 THEN order_total ELSE 0 END) AS Mar,
      SUM(CASE order_month WHEN 4 THEN order_total ELSE 0 END) AS Apr,
      SUM(order_total) AS total
  FROM
      (
        SELECT
            date_part('month', order_date) AS order_month,
        ROUND(SUM(quantity * unit_price)::numeric, 2) AS order_total
    FROM
        orders o
    JOIN order_details od ON o.order_id = od.order_id
    WHERE date_part('year', order_date) = 1997
        AND  date_part('month', order_date) BETWEEN 1 AND 4
    GROUP BY 1
  ) employee_sales

employees monthly total sales crosstab during the first four months 1997 including month totals using SQL CASE expression

The SQL CASE expression is supported by most database systems.

Using Common Table Expressions (CTE) For A Crosstab

Common Table Expressions (CTE) in SQL let you create a temporary table in a SELECT SQL statement. It is then possible to reference this named temporary table in subsequent SELECT, INSERT, UPDATE, or DELETE SQL statements. The CTEs simplify the SQL queries by breaking them down into smaller, more manageable parts that are easier to read and understand. In PostgreSQL, CTEs are called WITH queries.

You can rewrite the previous CASE crosstab query using a CTE as follows:

--- CASE Conditional Expression crosstab using a CTE
WITH cte AS (
    SELECT  salesman,
        SUM(CASE order_month WHEN 1 THEN order_total ELSE 0 END) AS Jan,
        SUM(CASE order_month WHEN 2 THEN order_total ELSE 0 END) AS Feb,
        SUM(CASE order_month WHEN 3 THEN order_total ELSE 0 END) AS Mar,
        SUM(CASE order_month WHEN 4 THEN order_total ELSE 0 END) AS Apr,
        SUM(order_total) AS total
    FROM
        (
          SELECT e.last_name  || ' ' || e.first_name AS salesman,
            date_part('month', order_date) AS order_month,
            ROUND((quantity * unit_price)::numeric, 2) AS order_total
          FROM employees e
          JOIN orders o ON e.employee_id = o.employee_id
          JOIN order_details od ON o.order_id = od.order_id
          WHERE date_part('year', order_date) = 1997
              AND  date_part('month', order_date) BETWEEN 1 AND 4
        ) employee_sales
    GROUP by 1
)
SELECT * FROM cte
UNION ALL
SELECT  'Totals ',
    SUM(jan),
    SUM(feb),
    SUM(mar),
    SUM(apr),
    SUM(total) AS total
FROM cte

Crosstabs Using Aggregate FILTER Clause

From PostgreSQL 9.4, you can use the FILTER clause to perform aggregate functions on specific records. The FILTER clause is less wordy and has a cleaner syntax than the CASE statement.

The following SQL query uses the FILTER clause to generate a crosstab.

---- Crosstabs Using Aggregate FILTER Clause
WITH cte As (
        SELECT salesman ,
            COALESCE(SUM(order_total) FILTER ( WHERE order_month = 1), 0) AS Jan,
            COALESCE(SUM(order_total) FILTER ( WHERE order_month = 2), 0) AS Feb,
            COALESCE(SUM(order_total) FILTER ( WHERE order_month = 3), 0) AS Mar,
            COALESCE(SUM(order_total) FILTER ( WHERE order_month = 4), 0) AS Apr,
            COALESCE(SUM(order_total), 0) AS total
        FROM
            (
                SELECT  e.last_name  || ' ' || e.first_name AS salesman ,
                    date_part('month', order_date) AS order_month,
                    ROUND((quantity * unit_price)::numeric, 2) AS order_total
                FROM
                    employees e
                    JOIN orders o ON e.employee_id = o.employee_id
                    JOIN order_details od ON o.order_id = od.order_id
                WHERE date_part('year', order_date) = 1997
                AND  date_part('month', order_date) BETWEEN 1 AND 4
            ) employee_sales
        GROUP by 1
)
SELECT * FROM cte  
UNION ALL
SELECT  'Totals ',
    SUM(jan),
    SUM(feb),
    SUM(mar),
    SUM(apr),
    SUM(total) AS total
FROM cte

The SQL COALESCE function replaces any null values in the crosstab with zero (0) values.

employees monthly total sales crosstab during the first four months of 1997 including month totals using aggregate FILTER clause

Using PostgreSQL crosstab() Function

The crosstab() function is part of the optional tablefunc module.

You can run the SELECT COUNT(*) FROM pg_extension WHERE extname='tablefunc'; query to see if the tablefunc extension is installed on the database you are using. If the result of the query is 0, install and activate the tablefunc⁣ extension using the CREATE EXTENSION IF NOT EXIST tablefunc;⁣ SQL command. The tablefunc⁣ module can be installed by non-superusers who have the CREATE⁣ privilege.

The crosstab function has several options, crosstab(sql text), crosstabN(sql text), and crosstab(source_sql text, category_sql text). Examples in this post use the crosstab(source_sql text, category_sql text) function option.

The first parameter (source_sql text) in a crosstab(source_sql text, category_sql text) function is the source SQL SELECT query statement and must return at least three (3) columns of data to pivot or rotate. The first column (row_name) contains data values to be used as row identifiers in the final result; data in the second column (category) represents category values that will be rotated to column headers in the pivot table, and the third column (value) contains data to be assigned to each cell of the final crosstab. The second parameter text category_sql is a query returning a category list for the columns.

The crosstab function returns a set of records with unknown data types. Therefore, you must alias the returned columns with column names and types using the AS (col1 type, col2 type, ...) clause. Failing to alias the columns will cause a column definition list is required for functions returning "record" error.

The SQL query below shows how to use the crosstab(source_sql text, category_sql text) function. A dollar-quoted string constant ($$) has been used to maintain formatting consistency and remove the need to escape single quotes (') by doubling them.

-- Using PostgreSQL crosstab() Function
SELECT *
    FROM crosstab(
        $$
          SELECT e.last_name  || ' ' || e.first_name salesman,
              date_part('Mon',order_date) AS order_month,
              SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
          FROM employees e
          JOIN orders o ON e.employee_id = o.employee_id
          JOIN order_details od ON o.order_id = od.order_id
          WHERE date_part('year', order_date) = 1997
              AND  date_part('month', order_date) BETWEEN 1 AND 4
          GROUP BY 1, 2
          UNION ALL
          SELECT 'month_total', date_part('month', order_date) AS order_month,
            ROUND(SUM(quantity * unit_price)::numeric, 2) AS order_total
          FROM orders o
          JOIN order_details od ON o.order_id = od.order_id
          WHERE date_part('year', order_date) = 1997
              AND  date_part('month', order_date) BETWEEN 1 AND 4
          GROUP BY 1,2
        $$,
        $$
          SELECT DISTINCT date_part('month', order_date) AS order_month
          FROM orders
          WHERE date_part('year', order_date) = 1997
              AND  date_part('month', order_date) BETWEEN 1 AND 4
          ORDER BY 1
        $$
        ) AS ct (salesman text, Jan numeric, Feb numeric, Mar numeric, Apr numeric);

employees monthly total sales crosstab during the first four months of 1997 using the crosstab() function

Using A PostgreSQL crosstab() Function With More Than Three Columns

If your source_sql text SQL query returns more than three columns, the additional or extra columns must be placed between the row_name and category columns. In the previous crosstab SQL query, any extra or additional columns must be placed between the salesman and order_month columns.

PostgreSQL converts identifiers/column names to lowercase by default. To capitalize the crosstab column headers, enclose them within double quotes.

-- Using A PostgreSQL crosstab() Function With More Than Three Columns
SELECT employee AS "Salesman", title AS "Title", emp_total AS "Total", COALESCE(Jan, 0) AS "Jan", 
    COALESCE(Feb, 0) AS "Feb", mar AS  "Mar", COALESCE(Apr, 0) AS "Apr"
FROM crosstab(
    $$
        SELECT e.last_name  || ' ' || e.first_name as salesman, 
            e.title ,
            et.emp_total ,
            date_part('Mon',order_date) AS order_month,
           SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total 
        FROM employees e
        JOIN (
                SELECT e.employee_id ,
                    SUM(ROUND((quantity * unit_price)::numeric, 2)) AS emp_total
                FROM  employees e
                    JOIN orders o ON e.employee_id = o.employee_id
                    JOIN order_details od ON o.order_id = od.order_id
                WHERE date_part('year', order_date) = 1997
                    AND  date_part('month', order_date) BETWEEN 1 AND 4
                GROUP BY 1
            ) et on e.employee_id = et.employee_id
        JOIN orders o ON e.employee_id = o.employee_id
        JOIN order_details od ON o.order_id = od.order_id
        WHERE date_part('year', order_date) = 1997
            AND  date_part('month', order_date) BETWEEN 1 AND 4
        GROUP BY 1,2,3,4
        UNION ALL
        SELECT '' AS salesman,
            'Month Total' AS title,
            (
                SELECT SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total 
                FROM orders o
                JOIN order_details od ON o.order_id = od.order_id
                WHERE date_part('year', order_date) = 1997
                    AND  date_part('month', order_date) BETWEEN 1 AND 4
            ) AS col_total,
            date_part('Mon',order_date) AS order_month,
            SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
        FROM employees e
        JOIN orders o ON e.employee_id = o.employee_id
        JOIN order_details od ON o.order_id = od.order_id
        WHERE date_part('year', order_date) = 1997
            AND  date_part('month', order_date) BETWEEN 1 AND 4
        GROUP BY 1,2,3, 4
    $$,
    $$ 
      SELECT DISTINCT date_part('month', order_date) AS order_month
      FROM orders
      WHERE date_part('year', order_date) = 1997
          AND  date_part('month', order_date) BETWEEN 1 AND 4
      ORDER BY 1

    $$
    ) AS ct (employee text, title text, emp_total numeric, Jan numeric, Feb numeric, Mar numeric, Apr numeric);

employees monthly total sales crosstab during the first four months of 1997 using crosstab() function with more than 3 columns

Using ARRAY Data Type To Re-Arrange Extra Crosstab Columns

A crosstab with more than three columns has limitations in the order placement of the additional columns in the final crosstab. However, you can re-arrange the crosstab columns by inserting the extra columns into an arrays data type.

-- Using ARRAY Data Type To Re-Arrange Extra Crosstab Columns
    SELECT employee[1] AS "Salesman", employee[2] AS "Title", 
        COALESCE(Jan, 0) AS "Jan", COALESCE(Feb, 0) AS "Feb", mar AS  "Mar", COALESCE(Apr, 0) AS "Apr"
        , employee[3]::numeric AS "Total"
    FROM crosstab(
        $$
            SELECT ARRAY[e.last_name  || ' ' || e.first_name, e.title, et.emp_total::text] AS employee,
               date_part('Mon',order_date) AS order_month,
               SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total 
            FROM employees e
            JOIN (
                    SELECT e.employee_id ,
                        SUM(ROUND((quantity * unit_price)::numeric, 2)) AS emp_total
                    FROM  employees e
                        JOIN orders o ON e.employee_id = o.employee_id
                        JOIN order_details od ON o.order_id = od.order_id
                    WHERE date_part('year', order_date) = 1997
                        AND  date_part('month', order_date) BETWEEN 1 AND 4
                    GROUP BY 1
                ) et on e.employee_id = et.employee_id
            JOIN orders o ON e.employee_id = o.employee_id
            JOIN order_details od ON o.order_id = od.order_id
            WHERE date_part('year', order_date) = 1997
                AND  date_part('month', order_date) BETWEEN 1 AND 4
            GROUP BY 1,2
            UNION ALL
            SELECT ARRAY[
                    '', 'Month Total',
                    (
                        SELECT SUM(ROUND((quantity * unit_price)::numeric, 2))::text
                        FROM orders o
                        JOIN order_details od ON o.order_id = od.order_id
                        WHERE date_part('year', order_date) = 1997
                            AND  date_part('month', order_date) BETWEEN 1 AND 4
                    )
                 ] AS employee,
                date_part('Mon',order_date) AS order_month,
                SUM(ROUND((quantity * unit_price)::numeric, 2)) AS order_total
            FROM employees e
                JOIN orders o ON e.employee_id = o.employee_id
                JOIN order_details od ON o.order_id = od.order_id
            WHERE date_part('year', order_date) = 1997
                AND  date_part('month', order_date) BETWEEN 1 AND 4
            GROUP BY 1,2
        $$,
        $$ 
          SELECT DISTINCT date_part('month', order_date) AS order_month
          FROM orders
          WHERE date_part('year', order_date) = 1997
              AND  date_part('month', order_date) BETWEEN 1 AND 4
          ORDER BY 1  
        $$
        ) AS ct (employee text[], Jan numeric, Feb numeric, Mar numeric, Apr numeric);

employees monthly total sales crosstab during the first four months 1997 using crosstab() function with re-arranged columns

Conclusion

Crosstabs are a powerful way to summarize and analyze data from a database. By presenting data in a condensed and organized format, crosstabs make it easier to analyze relationships between different variables.

The CASE conditional expression, aggregate FILTER clause, and the PostgreSQL crosstab() function are some of the methods for creating crosstabs. If the crosstab() function uses more than three columns to summarize data, the ARRAY data type may be used to re-arrange the extra columns into the correct positions. Using crosstabs, PostgreSQL users can create reports and dashboards that help them make informed business decisions.

Resources

  1. CrossTab Queries in PostgreSQL using tablefunc contrib

  2. Static and dynamic pivots