Package · cdo

Batch & chunking

insertGroup() and upsertGroup() turn a big array of rows into a few multi-row statements instead of one query per row. This page explains how those statements are built — the chunking, the per-row placeholder suffixing, and the one shape rule that keeps them correct.

Why chunk at all

Databases cap the number of bound parameters and the packet size of a single statement. A naive “one giant INSERT” of 100k rows would blow past both. Chunking splits the input into fixed-size batches (array_chunk) and emits one multi-row statement per batch, keeping every statement comfortably within limits.

Method Default chunkSize
insertGroup 1000
upsertGroup 500

An empty input array short-circuits to a no-op before any SQL is built.

Placeholder suffixing

Within one chunk, every row needs its own placeholders — you can’t bind :name twice with different values. So each row’s keys are suffixed with the row’s index in the chunk (_0, _1, …):

text
rows: [{name:'A', email:'a@x'}, {name:'B', email:'b@x'}]

columns: (name, email)
values:  (:name_0, :email_0), (:name_1, :email_1)
binds:   :name_0 => 'A', :email_0 => 'a@x',
       :name_1 => 'B', :email_1 => 'b@x'

The column list is taken from the row’s keys; the values clause repeats one suffixed tuple per row; all binds are merged into a single execute. upsertGroup builds the same value tuples, then appends the driver-specific conflict clause (see Driver detection).

The row-shape rule

The generated INSERT has one column list, built from the rows in the chunk. Every row’s suffixed placeholders are emitted into the VALUES. For the statement to line up, rows in a chunk should share the same columns.

Keep keys uniform within a chunk

If rows in the same chunk have different key sets, the single column list and the per-row placeholder tuples can disagree — producing a malformed statement or binding values into the wrong columns. Normalise your rows (same keys, same order) before a batch call. Per-row null values are still dropped individually, so a column that is null in one row but present in another is a shape mismatch — supply an explicit value instead of null when a column must appear.

Per-row NULL stripping

As with single insert(), a null value is removed from that row — the column falls to its database default. In a batch this is exactly the shape hazard above: dropping a key from one row changes its column set relative to its neighbours.

Chunk size trade-offs

  • Smaller chunks — less memory per statement, more round-trips, safer for very wide rows (many columns × many rows approaches the parameter cap faster).
  • Larger chunks — fewer round-trips, higher throughput, more memory and more bound parameters per statement.

Tune from the defaults if your rows are unusually wide or narrow. A rough guide: keep chunkSize × columnsPerRow well under your driver’s bound-parameter limit.

Atomicity

Each chunk is its own statement. insertGroup / upsertGroup do not open a transaction across chunks — if the fifth chunk fails, the first four are already committed. Wrap the whole call in beginTransaction() / commit() (inherited from PDO) when you need all-or-nothing semantics across the entire batch.