Acme Sprockets

LinkedInTwitterRSS

Simple Update via Oracle Key Preserved Inline View

Sometimes the simple way really is the best...

Recently I looked at a custom SQL script running during a client's batch processing. They needed to update one column on approximately 300,000 rows out of 9,000,000 on a base table. The new values were coming from another staging table that had the same primary key as the base table. Best-case, the SQL was running for several minutes; worst case (like on the Saturday I took a quick look), it was over 30 minutes. The extra long run was due to bad stats, but the SQL was an overly complex update statement with subqueries, joins, update by rowid, etc -- too complex and definitely inefficient. Setting aside the question about why a staging table at all, I suggested using an update to an Oracle inline, key preserved, view.

By using this approach, what used to run in 5-30 minutes on a large AIX server and SAN became less than a minute on my much smaller two-core Linux test server running 11gR2 (with the database on a single SATA drive).

In Oracle, "key preserved" means the row from the base table will appear at most once in the output view on that table. In that case, one can update the view and the appropriately underlying table will be updated. If it fits, as it did in this case, one only needs to do a simple update against a simple in-line view. It is really easy, and really efficient.

What follows is an example:

First, create a base table with 10,000,000 rows. It has a compound primary key like the client example.

SQL> create table data_table (
  2    pk1 number not null,
  3    pk2 number not null,
  4    val varchar2(10),
  5    primary key (pk1,pk2)
  6  );

Table created.

Elapsed: 00:00:00.04
SQL> 
SQL> declare
  2    type tt_pk1 is table of data_table.pk1%type index by pls_integer;
  3    type tt_pk2 is table of data_table.pk2%type index by pls_integer;
  4    type tt_val is table of data_table.val%type index by pls_integer;
  5    pk1_tab tt_pk1;
  6    pk2_tab tt_pk2;
  7    val_tab tt_val;
  8  begin
  9    /* set up the pk2 and val tables -- won't change */
 10    for j in 1..10000 loop
 11  	 pk2_tab(j) := j;
 12  	 val_tab(j) :='A';
 13    end loop;
 14  
 15    /* for each pk1 value... */
 16    for i in 1..1000 loop
 17  
 18  	 for j in 1..10000 loop
 19  	   pk1_tab(j) := i;
 20  	 end loop;
 21  
 22  	 forall j in 1..10000
 23  	   insert into data_table(pk1,pk2,val) values(pk1_tab(j),pk2_tab(j),val_tab(j));
 24  
 25    end loop;
 26  
 27  end;
 28  /

PL/SQL procedure successfully completed.

Elapsed: 00:01:08.42
SQL> 
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> 
SQL> exec dbms_stats.gather_table_stats(user,'data_table',cascade=>true)

PL/SQL procedure successfully completed.

Elapsed: 00:00:17.08

Next, create our "new values" table:

 
SQL> create table new_values (
  2    pk1 number not null,
  3    pk2 number not null,
  4    val varchar2(10),
  5    primary key (pk1,pk2)
  6  );

Table created.

Elapsed: 00:00:00.06
SQL> 
SQL> insert into new_values(pk1,pk2,val) select pk1,pk2,'B' from data_table sample(3);

300234 rows created.

Elapsed: 00:00:08.21
SQL> 
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> 
SQL> exec dbms_stats.gather_table_stats(user,'new_values',cascade=>true)

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.30

A couple of simple counts will verify the starting data values:

 
SQL> col cnt format 999,999,999
SQL> 
SQL> select val,count(1) as cnt from data_table group by val order by 1;

VAL		    CNT
---------- ------------
A	     10,000,000

Elapsed: 00:00:07.32
SQL> select val,count(1) as cnt from new_values group by val order by 1;

VAL		    CNT
---------- ------------
B		300,234

Elapsed: 00:00:00.07

And here is the super-easy, super-efficient update:

SQL> set autotrace on
SQL> update (
  2    select d.val, n.val new_value
  3    from data_table d join new_values n on (n.pk1=d.pk1 and n.pk2=d.pk2)
  4  )
  5  set val=new_value;

300234 rows updated.

Elapsed: 00:00:06.36

Execution Plan
----------------------------------------------------------
Plan hash value: 3548633994

------------------------------------------------------------------------------------------
| Id  | Operation	    | Name	 | Rows  | Bytes |TempSpc| Cost (%CPU)| Time	 |
------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT    |		 |   300K|  5863K|	 | 17325   (4)| 00:03:28 |
|   1 |  UPDATE 	    | DATA_TABLE |	 |	 |	 |	      | 	 |
|*  2 |   HASH JOIN	    |		 |   300K|  5863K|  6456K| 17325   (4)| 00:03:28 |
|   3 |    TABLE ACCESS FULL| NEW_VALUES |   300K|  2931K|	 |   180   (6)| 00:00:03 |
|   4 |    TABLE ACCESS FULL| DATA_TABLE |    10M|    95M|	 |  6097   (6)| 00:01:14 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("N"."PK1"="D"."PK1" AND "N"."PK2"="D"."PK2")


Statistics
----------------------------------------------------------
	  0  recursive calls
     306833  db block gets
      21207  consistent gets
	 79  physical reads
   74661696  redo size
	683  bytes sent via SQL*Net to client
	707  bytes received via SQL*Net from client
	  3  SQL*Net roundtrips to/from client
	  1  sorts (memory)
	  0  sorts (disk)
     300234  rows processed

We can verify that the correct table was updated by doing our count again:

 
SQL> select val,count(1) as cnt from data_table group by val order by 1;

VAL		    CNT
---------- ------------
A	      9,699,766
B		300,234

Elapsed: 00:00:03.73
SQL> select val,count(1) as cnt from new_values group by val order by 1;

VAL		    CNT
---------- ------------
B		300,234

Elapsed: 00:00:00.06

As shown above, the update statement was easy to construct and ran very efficiently.

For more information, you might consider the following sources:

If you're    ready for a zombie apocalypse, then you're ready for any emergency.    emergency.cdc.gov