SQL problems requiring cursors
A cursor is a construct available in most implementations of SQL that allows the programmer to handle data in a row-by-row manner rather than as a group. Parallelizing row-by-row processing is much more complex than serial processing, which is another reason to make use of non-procedural SQL wherever possible. Database vendors typically handle parallel processing without requiring special handling by application developers.
Parallel processing can be orders of magnitude faster than serial processing.
Constraints
In this article, the following constraints apply:
- The term "cursor" includes all cursor-like behavior. For example, using a loop in a shell script that loops across single-row SQL queries or the output of multi-row SQL queries is cursor-like behavior and does not achieve the goal of true set-based processing within the database.
- All set-based SQL must be ANSI SQL. A number of vendors provide some extremely powerful proprietary extensions. The goal is to avoid such extensions in favor of ANSI SQL.
- The solution must be generalizable. In one or more examples below, specific values may be used for demonstration purposes, but any solution must scale to any number feasible within the power of the database software and machine resources.
Example: Insert rows based on a count in the table itself
The table below represents types of marbles. The four text columns represent four marble characteristics. Each characteristic has two values for a total of 16 types of marbles.
The "quantity" column represents how many marbles of that type we have. The task is to create a second table holding one row for each marble of that type. Thus, the target table would have the four text columns, and a total of 40 + 20 + 20 + 10 + ... + 10 + 5 = 270 rows.
Source table:
QUANTITY TEXTURE APPEARANCE SHAPE COLOR ---------- ---------- ---------- ---------- ----- 40 smooth shiny round blue 20 smooth shiny warped blue 20 smooth dull round blue 10 smooth dull warped blue 20 rough shiny round blue 10 rough shiny warped blue 10 rough dull round blue 5 rough dull warped blue 40 rough dull warped red 20 rough dull round red 20 rough shiny warped red 10 rough shiny round red 20 smooth dull warped red 10 smooth dull round red 10 smooth shiny warped red 5 smooth shiny round red
Table to generate:
TEXTURE APPEARANCE SHAPE COLOR ---------- ---------- ---------- ----- smooth shiny round blue -- 1 smooth shiny round blue -- 2 ... -- and so on smooth shiny round blue -- 40 smooth shiny warped blue -- 1 smooth shiny warped blue -- 2 ... -- and so on smooth shiny warped blue -- 20 ... -- and so on smooth shiny round red -- 1 smooth shiny round red -- 2 smooth shiny round red -- 3 smooth shiny round red -- 4 smooth shiny round red -- 5
Solution in cursor form
Generating the target table with a cursor is fairly simple.
declare
cursor c is select * from marbles_seed;
begin
for r in c loop
for i in 1..r.quantity loop
insert into marbles values (
r.texture,
r.appearance,
r.shape,
r.color_actual
);
end loop;
end loop;
end;
Solution in SQL
Solving the problem with SQL is a bit more code and requires a bit more creative thought than the nested loop approach of cursors.
Number table
The solution requires an intermediate table. The table has one column of type NUMBER that has the values 0 to whatever number of rows is needed. For this discussion, we'll limit it to one million rows. The code is as follows: Setup:
create table numbers_seed ( n number(1) );
create table numbers ( n number(7));
insert into numbers_seed values ( 0 );
insert into numbers_seed values ( 1 );
insert into numbers_seed values ( 2 );
insert into numbers_seed values ( 3 );
insert into numbers_seed values ( 4 );
insert into numbers_seed values ( 5 );
insert into numbers_seed values ( 6 );
insert into numbers_seed values ( 7 );
insert into numbers_seed values ( 8 );
insert into numbers_seed values ( 9 );
insert into numbers
select n6.n * 100000 +
n5.n * 10000 +
n4.n * 1000 +
n3.n * 100 +
n2.n * 10 +
n1.n * 1 n
from numbers_seed n1,
numbers_seed n2,
numbers_seed n3,
numbers_seed n4,
numbers_seed n5,
numbers_seed n6
The numbers table can be created in parallel.
Solution core
Assume the marble type table above is named marbles_seed and the target table is named marbles. The code that generates the needed 270 rows is:
insert into marbles
(m.texture, m.appearance, m.shape, m.color_actual)
select m.texture,
m.appearance,
m.shape,
m.color_actual
from marbles_seed m,
numbers n
where m.quantity > n.n
The database can process this insert in parallel without the programmer's involvement.