A unified dataform-service Cloud Run service that provides two main endpoints for different operations.
This service may be triggered by a PubSub message or Cloud Scheduler and invokes a Dataform workflow based on the provided configuration.
Trigger types:
-
event- immediately triggers a Dataform workflow using tags provided in configuration. -
poller- first triggers a BigQuery polling query. If the query returns TRUE, the Dataform workflow is triggered using the tags provided in configuration.
Supported Triggers:
crux_ready- polls for Chrome UX Report data availability and triggers processing when conditions are metcrawl_complete- event-based trigger for when crawl data processing is complete
Request body example:
{
"message": {
"name": "crux_ready"
}
}Request example for local development:
curl -X POST http://localhost:8080/ \
-H "Content-Type: application/json" \
-d '{
"message": {
"name": "crux_ready"
}
}'exportReport Cloud Run Function
This function triggers a job to export data to GCS or Firestore.
Request body example:
{
"calls": [[{
"destination": "gs://httparchive-reports/tech-report-2024",
"config": {
"format": "PARQUET",
"compression": "SNAPPY"
},
"query": "SELECT * FROM httparchive.reports.tech_report_categories WHERE _TABLE_SUFFIX = '2024_01_01'"
}]]
}Request example for local development:
curl -X POST http://localhost:8080/ \
-H "Content-Type: application/json" \
-d '{
"calls": [[{
"destination": "gs://httparchive-reports/tech-report-2024",
"config": {
"format": "PARQUET",
"compression": "SNAPPY"
},
"query": "SELECT * FROM httparchive.reports.tech_report_categories WHERE _TABLE_SUFFIX = '2024_01_01'"
}]]
}'exportData Cloud Run Job
This job exports data to GCS or Firestore based on the provided configuration.
Input parameters:
EXPORT_CONFIG- JSON string with the export configuration.
Example values:
{"dataform_trigger":"report_cwv_tech_complete","name":"technologies","type":"dict"}
{"dataform_trigger":"report_cwv_tech_complete","date":"2024-11-01","name":"page_weight","type":"report"}
{"dataform_trigger":"report_complete","name":"bytesTotal","type":"timeseries"}
{"dataform_trigger":"report_complete","date":"2024-11-01","name":"bytesTotal","type":"histogram"}
The issues within the pipeline are being tracked using the following alerts:
Error notifications are sent to #10x-infra Slack channel.
To test the function locally run from the function directory:
npm run start_devThen, in a separate terminal, run the command with the test trigger payload.
From project root directory run:
make tf_apply