I wrote down how to transfer the data from AWS to GCP before.
In the below article, I introduced 「AWS Aurora Export + BigQuery Data Transfer Service」method. In this article, I’d like to deep dive into more details.
Why Aurora Export
Certainly, it’s important to avoid impacting the production database. For this reason, we typically adopt a specific approach. There exist two strategies for exporting data to S3: one involves using a snapshot, and the other entails direct access to the cluster. In this article, I will focus on detailing the snapshot method. Below, you’ll find an explanation of the components involved.
Why do we need lambda?
Initially, I tried to use the event bridge scheduler to avoid script-based implementations. This approach also allowed for the direct invocation of the start export API in RDS.
But If we use context attributes in the event bridge scheduler, we can’t get time. Let’s say if I use <aws.scheduler.scheduled-time>, I can get 2022-03-22T18:59:43Z
. But when the BigQuery Data transfer service executes, it can’t get the same time. This is because the time of execution is different.
So we have to use lambda to align time formats.
How do we align time formats?
Let’s say I want to align like 「20240210」. In this case, I can implement Go in lambda like below.
...
// taskID 20240210
taskID := time.Now().Format("20060102")
...
In the BigQuery Data transfer service, the implementation in Terraform is below. The point is to use run_time
. run_time
represents the intended time of execution.
resource "google_bigquery_data_transfer_config" "table" {
display_name = "sync sample table"
location = var.gcp_region
data_source_id = "amazon_s3"
# UTC time
schedule = "every day 01:00"
destination_dataset_id = google_bigquery_dataset.aurora.dataset_id
params = {
destination_table_name_template = "sample_tables"
# 20240210
data_path = "s3://sample/{run_time|\"%Y%m%d\"}/sample_tables/1/*.gz.parquet"
access_key_id = var.aws_access_key
secret_access_key = var.aws_access_secret
file_format = "PARQUET"
write_disposition = "WRITE_TRUNCATE"
}
}