The role of staging tables in database administration (part 3)

Introduction

In the previous article, we discussed the role of staging tables in the database environment, particularly their role in the ETL process. This article will delve into the practical application of staging tables on the Oracle database platform. Oracle Database version 23ai has introduced a new type of table known as the staging table (more details can be found here). We will demonstrate, in detail, the differences in performance (time) and generated UNDO when using this new table type versus the classic heap table during DML operations. This comparison is crucial for the ETL process or when handling large data volumes.

1. Creating the necessary tables.

-- source table
CREATE TABLE STG_SRC
(          "ID" NUMBER,
            "DATA1" VARCHAR2(100),
            "DATA2" VARCHAR2(100),
            "DATA3" VARCHAR2(100),
            "DATA4" VARCHAR2(100)
);
-- target heap table
CREATE TABLE STG_TGT_NORMAL
(          "ID" NUMBER,
            "DATA1" VARCHAR2(100),
            "DATA2" VARCHAR2(100),
            "DATA3" VARCHAR2(100),
            "DATA4" VARCHAR2(100)
);
-- target staging table
CREATE TABLE STG_TGT_STAGING
(          "ID" NUMBER,
            "DATA1" VARCHAR2(100),
            "DATA2" VARCHAR2(100),
            "DATA3" VARCHAR2(100),
            "DATA4" VARCHAR2(100)
);

2. Populating the source table with data

Now, we will populate the source table STG_SRC with some data, specifically 80 million rows, to work with a more relevant data sample.

-- 10 000 000
INSERT INTO STG_SRC
SELECT
   level,
  'data1_' || LEVEL,
  'data2_' || LEVEL,
  'data3_' || LEVEL,
  'data3_' || LEVEL
FROM
  DUAL CONNECT BY LEVEL <= 10000000;
COMMIT;
-- 20 000 000
INSERT INTO STG_SRC
SELECT * FROM STG_SRC;
COMMIT;
-- 40 000 000
INSERT INTO STG_SRC
SELECT * FROM STG_SRC;
COMMIT;
-- 80 000 000
INSERT INTO STG_SRC
SELECT * FROM STG_SRC;
COMMIT;

3. Restarting the database to clear the UNDO and buffer cache

shutdown immediate;
startup;

4. Performing ingestion into a normal heap table and calculate the relevant statistics of interest.

-- current undo
SELECT value
FROM v$mystat m, v$statname n
WHERE m.statistic# = n.statistic#
AND n.name = 'undo change vector size';
     VALUE
----------
       176
-- ingestion to heap table    
set timing on
INSERT INTO STG_TGT_NORMAL
SELECT * FROM STG_SRC;
80000000 rows created.
Elapsed: 00:00:36.25
SQL> commit;
-- generated UNDO
SELECT value
FROM v$mystat m, v$statname n
WHERE m.statistic# = n.statistic#
AND n.name = 'undo change vector size';
     VALUE
----------
    831448
SQL> select 831448-176 undo from dual;
      UNDO
----------
    831272

5. Performing the same process for the staging table.

shutdown immediate;
startup;
-- current UNDO
SELECT value
FROM v$mystat m, v$statname n
WHERE m.statistic# = n.statistic#
AND n.name = 'undo change vector size';
     VALUE
----------
       176
-- staging
SQL> set timing on
SQL> INSERT INTO STG_TGT_STAGING
  2  SELECT
  3     *
  4  FROM STG_SRC;
80000000 rows created.
Elapsed: 00:00:17.30
SQL> commit;
-- staging
SQL> SELECT value
  2  FROM v$mystat m, v$statname n
  3  WHERE m.statistic# = n.statistic#
  4  AND n.name = 'undo change vector size';
     VALUE
----------
    236100
Elapsed: 00:00:00.03
SQL> select 236100-176 undo from dual;
      UNDO
----------
    235924

Evaluation

From the sample of 80 million rows, we clearly observe that ingestion into the STAGING table takes 17.30 seconds and generates 235,924 UNDO, while ingestion into the HEAP table takes 36.25 seconds and generates 831,272 UNDO. It is, of course, premature and inaccurate to expect that all DML operations, specifically INSERTs, will be 50% faster, because the chosen example is relatively simple and the complexity of the SQL command matters – the operation speed can be impacted both by data ingestion and data transformation, not just the ingestion

However, we see that the STAGING table is internally optimized for data ingestion and generates less UNDO, which is the main reason for the shorter runtime of the entire operation. Therefore, if you are already using Oracle version 23ai, STAGING tables are a very good choice to achieve better performance with minimal changes, as the change only involves the DDL of the table itself, with nothing else needing to be altered.