Batch & chunking
insertGroup() and upsertGroup() turn a big array of rows into a few multi-row
statements instead of one query per row. This page explains how those statements
are built — the chunking, the per-row placeholder suffixing, and the one shape
rule that keeps them correct.
Why chunk at all
Databases cap the number of bound parameters and the packet size of a single
statement. A naive “one giant INSERT” of 100k rows would blow past both. Chunking
splits the input into fixed-size batches (array_chunk) and emits one multi-row
statement per batch, keeping every statement comfortably within limits.
| Method | Default chunkSize |
|---|---|
insertGroup |
1000 |
upsertGroup |
500 |
An empty input array short-circuits to a no-op before any SQL is built.
Placeholder suffixing
Within one chunk, every row needs its own placeholders — you can’t bind :name
twice with different values. So each row’s keys are suffixed with the row’s index
in the chunk (_0, _1, …):
rows: [{name:'A', email:'a@x'}, {name:'B', email:'b@x'}]
columns: (name, email)
values: (:name_0, :email_0), (:name_1, :email_1)
binds: :name_0 => 'A', :email_0 => 'a@x',
:name_1 => 'B', :email_1 => 'b@x'The column list is taken from the row’s keys; the values clause repeats one
suffixed tuple per row; all binds are merged into a single execute. upsertGroup
builds the same value tuples, then appends the driver-specific conflict clause
(see Driver detection).
The row-shape rule
The generated INSERT has one column list, built from the rows in the chunk.
Every row’s suffixed placeholders are emitted into the VALUES. For the statement
to line up, rows in a chunk should share the same columns.
Keep keys uniform within a chunk
If rows in the same chunk have different key sets, the single column list and the
per-row placeholder tuples can disagree — producing a malformed statement or
binding values into the wrong columns. Normalise your rows (same keys, same order)
before a batch call. Per-row null values are still dropped individually, so a
column that is null in one row but present in another is a shape mismatch —
supply an explicit value instead of null when a column must appear.
Per-row NULL stripping
As with single insert(), a null value is removed from that row — the column
falls to its database default. In a batch this is exactly the shape hazard above:
dropping a key from one row changes its column set relative to its neighbours.
Chunk size trade-offs
- Smaller chunks — less memory per statement, more round-trips, safer for very wide rows (many columns × many rows approaches the parameter cap faster).
- Larger chunks — fewer round-trips, higher throughput, more memory and more bound parameters per statement.
Tune from the defaults if your rows are unusually wide or narrow. A rough guide:
keep chunkSize × columnsPerRow well under your driver’s bound-parameter limit.
Atomicity
Each chunk is its own statement. insertGroup / upsertGroup do not open a
transaction across chunks — if the fifth chunk fails, the first four are already
committed. Wrap the whole call in beginTransaction() / commit() (inherited from
PDO) when you need all-or-nothing semantics across the entire batch.
Related
- Inserting records — the task-level guide
- Upserts — batch upsert usage
- Driver detection — the conflict clause per driver
- Parameter binding — how each suffixed value is typed