How to create a merge table in BigQuery

wqwq
2 min readMar 24, 2024

In this post, I introduce the merge function in BigQuery.
The use cases where we utilize it are mainly to eliminate data duplication. It is a powerful tool that allows us to manage and maintain data integrity. By merging data efficiently, we can ensure our datasets are free from redundancies.

Merge Query

Basic Query

MERGE dataset.master M
USING dataset.temp T
ON
M.id = T.id
WHEN MATCHED THEN
UPDATE SET
M.hoge = T.hoge,
M.fuga = T.fuga
WHEN NOT MATCHED THEN
INSERT ROW

With partition tables

if the table we merge is partitioned, we can’t use 「INSERT ROW」statement. If you use it, you can see the following error message.

「if the target is ingestion-time partitioned table, this won’t work you have to specify the columns:」

MERGE dataset.master M
USING dataset.temp T
ON
M.id = T.id
WHEN MATCHED THEN
UPDATE SET
M.hoge = T.hoge,
M.fuga = T.fuga
WHEN NOT MATCHED THEN
INSERT(
hoge,
fuga
)
VALUES(
hoge,
fuga
)

With One to many tables

If the relationship between the master and temp tables is one-to-many, we can use the composite key.

MERGE dataset.master M
USING dataset.temp T
ON
M.id = T.id
AND
M.status = T.status
WHEN MATCHED THEN
UPDATE SET
M.hoge = T.hoge,
M.fuga = T.fuga
WHEN NOT MATCHED THEN
INSERT ROW

How to improve the MERGE efficiency

We have three options to improve the MEREGE efficiency.

subquery filter

MERGE dataset.master M
USING (SELECT * FROM dataset.temp WHERE _PARTITIONTIME = '2018-01-01') T
ON M.id = T.id
WHEN MATCHED THEN
DELETE

search_condition

MERGE dataset.master M
USING dataset.temp T
ON M.id = T.id
WHEN MATCHED AND M._PARTITIONTIME = '2018-01-01' THEN
UPDATE SET id = S.id
WHEN MATCHED AND M._PARTITIONTIME = '2018-01-02' THEN
UPDATE SET price = S.price + 10

merge_condition

MERGE dataset.master M
USING dataset.temp T
ON M.id = T.id AND
M._PARTITIONTIME = '2018-01-01' AND T._PARTITIONTIME = '2018-01-01'
WHEN MATCHED THEN
UPDATE SET M.status= T.status

Reference

--

--