Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions packages/aws_rds_otel/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Elastic License 2.0

URL: https://www.elastic.co/licensing/elastic-license

## Acceptance

By using the software, you agree to all of the terms and conditions below.

## Copyright License

The licensor grants you a non-exclusive, royalty-free, worldwide,
non-sublicensable, non-transferable license to use, copy, distribute, make
available, and prepare derivative works of the software, in each case subject to
the limitations and conditions below.

## Limitations

You may not provide the software to third parties as a hosted or managed
service, where the service provides users with access to any substantial set of
the features or functionality of the software.

You may not move, change, disable, or circumvent the license key functionality
in the software, and you may not remove or obscure any functionality in the
software that is protected by the license key.

You may not alter, remove, or obscure any licensing, copyright, or other notices
of the licensor in the software. Any use of the licensor’s trademarks is subject
to applicable law.

## Patents

The licensor grants you a license, under any patent claims the licensor can
license, or becomes able to license, to make, have made, use, sell, offer for
sale, import and have imported the software, in each case subject to the
limitations and conditions in this license. This license does not cover any
patent claims that you cause to be infringed by modifications or additions to
the software. If you or your company make any written claim that the software
infringes or contributes to infringement of any patent, your patent license for
the software granted under these terms ends immediately. If your company makes
such a claim, your patent license ends immediately for work on behalf of your
company.

## Notices

You must ensure that anyone who gets a copy of any part of the software from you
also gets a copy of these terms.

If you modify the software, you must include in any modified copies of the
software prominent notices stating that you have modified the software.

## No Other Rights

These terms do not imply any licenses other than those expressly granted in
these terms.

## Termination

If you use the software in violation of these terms, such use is not licensed,
and your licenses will automatically terminate. If the licensor provides you
with a notice of your violation, and you cease all violation of this license no
later than 30 days after you receive that notice, your licenses will be
reinstated retroactively. However, if you violate these terms after such
reinstatement, any additional violation of these terms will cause your licenses
to terminate automatically and permanently.

## No Liability

*As far as the law allows, the software comes as is, without any warranty or
condition, and the licensor will not be liable to you for any damages arising
out of these terms or the use or nature of the software, under any kind of
legal claim.*

## Definitions

The **licensor** is the entity offering these terms, and the **software** is the
software the licensor makes available under these terms, including any portion
of it.

**you** refers to the individual or entity agreeing to these terms.

**your company** is any legal entity, sole proprietorship, or other kind of
organization that you work for, plus all organizations that have control over,
are under the control of, or are under common control with that
organization. **control** means ownership of substantially all the assets of an
entity, or the power to direct its management and policies by vote, contract, or
otherwise. Control can be direct or indirect.

**your licenses** are all the licenses granted to you for the software under
these terms.

**use** means anything you do with the software requiring one of your licenses.

**trademark** means trademarks, service marks, and similar rights.
6 changes: 6 additions & 0 deletions packages/aws_rds_otel/changelog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# newer versions go on top

Check notice on line 1 in packages/aws_rds_otel/changelog.yml

View workflow job for this annotation

GitHub Actions / Lint user-facing content

Elastic.Versions: Use 'later versions' instead of 'newer versions' when referring to versions.
- version: "0.1.0"
changes:
- description: Initial draft of the AWS RDS OpenTelemetry Assets package
type: enhancement
link: https://github.com/elastic/integrations/pull/1
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"id": "aws_rds_otel-burst-balance-low",
"type": "alerting_rule_template",
"managed": true,
"attributes": {
"description": "Alerts when gp2 burst balance falls below a percentage floor. Depleted burst credits throttle IOPS and typically precede disk queue depth and latency spikes.",
"name": "[AWS RDS OTel] Burst balance low",
"ruleTypeId": ".es-query",
"tags": [
"observability",
"aws-rds",
"aws"
],
"schedule": {
"interval": "5m"
},
"alertDelay": {
"active": 3
},
"flapping": {
"lookBackWindow": 10,
"statusChangeThreshold": 4
},
"artifacts": {
"investigation_guide": {
"blob": "## Burst Balance Low\n\n### Triage\n1. This metric applies to gp2 storage volumes only — confirm storage type in the RDS console.\n2. Check `ReadIOPS`/`WriteIOPS` for sustained high I/O that depleted burst credits.\n3. Watch `DiskQueueDepth` and latency metrics for downstream impact.\n\n### Mitigation\n- Reduce I/O load temporarily to allow burst credits to recover.\n- Migrate to gp3 or provisioned IOPS storage for predictable performance.\n- Increase baseline IOPS allocation on gp2."
}
},
"params": {
"searchType": "esqlQuery",
"esqlQuery": {
"esql": "FROM metrics-awscloudwatchreceiver.otel-*\n| WHERE attributes.Namespace == \"AWS/RDS\"\n AND attributes.MetricName == \"BurstBalance\"\n AND attributes.stat == \"Average\"\n| STATS min_burst_balance = MIN(`metrics.amazonaws.com/AWS/RDS/BurstBalance`)\n BY attributes.DBInstanceIdentifier, resource.attributes.cloud.region\n// Percent of burst credits remaining — adjust floor (default: 20%)\n| WHERE min_burst_balance < 20.0\n| SORT min_burst_balance ASC"
},
"size": 0,
"threshold": [
0
],
"thresholdComparator": ">",
"timeField": "@timestamp",
"timeWindowSize": 15,
"timeWindowUnit": "m",
"groupBy": "row",
"termField": "attributes.DBInstanceIdentifier",
"termSize": 50,
"excludeHitsFromPreviousRun": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"id": "aws_rds_otel-checkpoint-lag-high",
"type": "alerting_rule_template",
"managed": true,
"attributes": {
"description": "Alerts when checkpoint lag exceeds a threshold. Uses the Maximum statistic for worst-case lag. Rising checkpoint lag indicates the instance cannot keep up with write/redo volume.",
"name": "[AWS RDS OTel] Checkpoint lag high",
"ruleTypeId": ".es-query",
"tags": [
"observability",
"aws-rds",
"aws"
],
"schedule": {
"interval": "5m"
},
"alertDelay": {
"active": 3
},
"flapping": {
"lookBackWindow": 10,
"statusChangeThreshold": 4
},
"artifacts": {
"investigation_guide": {
"blob": "## Checkpoint Lag High\n\n### Triage\n1. Check `WriteIOPS`/`WriteThroughput` for heavy write load.\n2. Inspect `DiskQueueDepth` and storage latency — I/O bottlenecks delay checkpoints.\n3. Review `CPUUtilization` on the instance.\n4. Check `BinLogDiskUsage` if WAL/binlog volume is growing.\n\n### Mitigation\n- Reduce write burst or batch writes.\n- Upgrade storage IOPS or instance class.\n- Tune checkpoint-related engine parameters with care and testing."
}
},
"params": {
"searchType": "esqlQuery",
"esqlQuery": {
"esql": "FROM metrics-awscloudwatchreceiver.otel-*\n| WHERE attributes.Namespace == \"AWS/RDS\"\n AND attributes.MetricName == \"CheckpointLag\"\n AND attributes.stat == \"Maximum\"\n| STATS max_checkpoint_lag = MAX(`metrics.amazonaws.com/AWS/RDS/CheckpointLag`)\n BY attributes.DBInstanceIdentifier, resource.attributes.cloud.region\n// Lag in seconds — adjust per engine and workload (default: 60 s)\n| WHERE max_checkpoint_lag > 60.0\n| SORT max_checkpoint_lag DESC"
},
"size": 0,
"threshold": [
0
],
"thresholdComparator": ">",
"timeField": "@timestamp",
"timeWindowSize": 15,
"timeWindowUnit": "m",
"groupBy": "row",
"termField": "attributes.DBInstanceIdentifier",
"termSize": 50,
"excludeHitsFromPreviousRun": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"id": "aws_rds_otel-cpu-utilization-high",
"type": "alerting_rule_template",
"managed": true,
"attributes": {
"description": "Alerts when average CPU utilization is sustained above a threshold. Latency rises sharply above ~80% CPU; correlate with SwapUsage for memory-related CPU pressure.",
"name": "[AWS RDS OTel] CPU utilization high",
"ruleTypeId": ".es-query",
"tags": [
"observability",
"aws-rds",
"aws"
],
"schedule": {
"interval": "5m"
},
"alertDelay": {
"active": 3
},
"flapping": {
"lookBackWindow": 10,
"statusChangeThreshold": 4
},
"artifacts": {
"investigation_guide": {
"blob": "## CPU Utilization High\n\n### Triage\n1. Check `SwapUsage` — swapping can inflate CPU and latency.\n2. Review `DatabaseConnections` for connection storms driving CPU.\n3. Inspect slow query logs or Performance Insights for expensive queries (out of scope for this rule).\n4. On T-family instances, verify CPU credit balance if credit metrics are collected.\n\n### Mitigation\n- Optimize top CPU-consuming queries (indexes, query rewrite).\n- Scale up instance class or add read replicas.\n- Reduce connection count and idle sessions."
}
},
"params": {
"searchType": "esqlQuery",
"esqlQuery": {
"esql": "FROM metrics-awscloudwatchreceiver.otel-*\n| WHERE attributes.Namespace == \"AWS/RDS\"\n AND attributes.MetricName == \"CPUUtilization\"\n AND attributes.stat == \"Average\"\n| STATS avg_cpu_utilization = AVG(`metrics.amazonaws.com/AWS/RDS/CPUUtilization`)\n BY attributes.DBInstanceIdentifier, resource.attributes.cloud.region\n// Percent CPU — adjust threshold (default: 80%)\n| WHERE avg_cpu_utilization > 80.0\n| SORT avg_cpu_utilization DESC"
},
"size": 0,
"threshold": [
0
],
"thresholdComparator": ">",
"timeField": "@timestamp",
"timeWindowSize": 15,
"timeWindowUnit": "m",
"groupBy": "row",
"termField": "attributes.DBInstanceIdentifier",
"termSize": 50,
"excludeHitsFromPreviousRun": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"id": "aws_rds_otel-database-connections-high",
"type": "alerting_rule_template",
"managed": true,
"attributes": {
"description": "Alerts when peak database connections exceed a threshold. CloudWatch does not publish max_connections — set the threshold against your engine limit and normal baseline.",
"name": "[AWS RDS OTel] Database connections high",
"ruleTypeId": ".es-query",
"tags": [
"observability",
"aws-rds",
"aws"
],
"schedule": {
"interval": "5m"
},
"alertDelay": {
"active": 3
},
"flapping": {
"lookBackWindow": 10,
"statusChangeThreshold": 4
},
"artifacts": {
"investigation_guide": {
"blob": "## Database Connections High\n\n### Triage\n1. Compare peak connections to the engine `max_connections` setting (not in CloudWatch).\n2. Check for connection leaks — idle sessions accumulating over time.\n3. Review application deploys or traffic spikes that increased connection demand.\n4. Inspect `FreeableMemory` — each connection consumes memory.\n\n### Mitigation\n- Enable connection pooling (PgBouncer, RDS Proxy, application pool).\n- Fix connection leaks in application code.\n- Increase `max_connections` only if memory allows; prefer pooling first."
}
},
"params": {
"searchType": "esqlQuery",
"esqlQuery": {
"esql": "FROM metrics-awscloudwatchreceiver.otel-*\n| WHERE attributes.Namespace == \"AWS/RDS\"\n AND attributes.MetricName == \"DatabaseConnections\"\n AND attributes.stat == \"Maximum\"\n| STATS max_connections = MAX(`metrics.amazonaws.com/AWS/RDS/DatabaseConnections`)\n BY attributes.DBInstanceIdentifier, resource.attributes.cloud.region\n// Peak connection count — set relative to max_connections and baseline (default: 100)\n| WHERE max_connections > 100\n| SORT max_connections DESC"
},
"size": 0,
"threshold": [
0
],
"thresholdComparator": ">",
"timeField": "@timestamp",
"timeWindowSize": 15,
"timeWindowUnit": "m",
"groupBy": "row",
"termField": "attributes.DBInstanceIdentifier",
"termSize": 50,
"excludeHitsFromPreviousRun": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"id": "aws_rds_otel-disk-queue-depth-high",
"type": "alerting_rule_template",
"managed": true,
"attributes": {
"description": "Alerts when average disk queue depth is sustained above a threshold. High queue depth with plateauing IOPS is the canonical storage I/O saturation signature.",
"name": "[AWS RDS OTel] Disk queue depth high",
"ruleTypeId": ".es-query",
"tags": [
"observability",
"aws-rds",
"aws"
],
"schedule": {
"interval": "5m"
},
"alertDelay": {
"active": 3
},
"flapping": {
"lookBackWindow": 10,
"statusChangeThreshold": 4
},
"artifacts": {
"investigation_guide": {
"blob": "## Disk Queue Depth High\n\n### Triage\n1. Compare `ReadIOPS` and `WriteIOPS` to provisioned IOPS — plateauing at the ceiling confirms saturation.\n2. Check `ReadLatency`/`WriteLatency` for rising storage latency.\n3. On gp2, inspect `BurstBalance` for depleted burst credits.\n4. Review recent workload changes that increased I/O demand.\n\n### Mitigation\n- Increase provisioned IOPS or upgrade storage type.\n- Distribute read load to replicas.\n- Optimize queries to reduce I/O (indexes, smaller scans)."
}
},
"params": {
"searchType": "esqlQuery",
"esqlQuery": {
"esql": "FROM metrics-awscloudwatchreceiver.otel-*\n| WHERE attributes.Namespace == \"AWS/RDS\"\n AND attributes.MetricName == \"DiskQueueDepth\"\n AND attributes.stat == \"Average\"\n| STATS avg_disk_queue_depth = AVG(`metrics.amazonaws.com/AWS/RDS/DiskQueueDepth`)\n BY attributes.DBInstanceIdentifier, resource.attributes.cloud.region\n// Queued I/O count — adjust based on workload (default: 5)\n| WHERE avg_disk_queue_depth > 5.0\n| SORT avg_disk_queue_depth DESC"
},
"size": 0,
"threshold": [
0
],
"thresholdComparator": ">",
"timeField": "@timestamp",
"timeWindowSize": 15,
"timeWindowUnit": "m",
"groupBy": "row",
"termField": "attributes.DBInstanceIdentifier",
"termSize": 50,
"excludeHitsFromPreviousRun": true
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"id": "aws_rds_otel-free-storage-low",
"type": "alerting_rule_template",
"managed": true,
"attributes": {
"description": "Alerts when free storage space on an RDS instance falls below an absolute byte floor. Storage exhaustion is an outage-class risk; total volume size is not published by CloudWatch so percentage thresholds cannot be derived from this source.",
"name": "[AWS RDS OTel] Free storage space low",
"ruleTypeId": ".es-query",
"tags": [
"observability",
"aws-rds",
"aws"
],
"schedule": {
"interval": "5m"
},
"alertDelay": {
"active": 3
},
"flapping": {
"lookBackWindow": 10,
"statusChangeThreshold": 4
},
"artifacts": {
"investigation_guide": {
"blob": "## Free Storage Space Low\n\n### Triage\n1. Confirm the instance and region from the alert context.\n2. Check storage autoscaling settings and current allocated storage in the AWS RDS console.\n3. Review recent growth trends — large imports, index builds, or log table bloat are common causes.\n4. Inspect `BinLogDiskUsage` if binlog/WAL volume may be consuming space.\n\n### Mitigation\n- Enable or increase storage autoscaling.\n- Archive or purge old data; shrink large tables or indexes.\n- Increase allocated storage before the volume fills completely.\n\n### Note\nAdjust the byte floor in the rule query to match your smallest instance volumes and growth rate."
}
},
"params": {
"searchType": "esqlQuery",
"esqlQuery": {
"esql": "FROM metrics-awscloudwatchreceiver.otel-*\n| WHERE attributes.Namespace == \"AWS/RDS\"\n AND attributes.MetricName == \"FreeStorageSpace\"\n AND attributes.stat == \"Average\"\n| STATS min_free_storage = MIN(`metrics.amazonaws.com/AWS/RDS/FreeStorageSpace`)\n BY attributes.DBInstanceIdentifier, resource.attributes.cloud.region\n// Absolute byte floor — adjust to your volume sizes (default: 10 GB)\n| WHERE min_free_storage < 10737418240\n| SORT min_free_storage ASC"
},
"size": 0,
"threshold": [
0
],
"thresholdComparator": ">",
"timeField": "@timestamp",
"timeWindowSize": 15,
"timeWindowUnit": "m",
"groupBy": "row",
"termField": "attributes.DBInstanceIdentifier",
"termSize": 50,
"excludeHitsFromPreviousRun": true
}
}
}
Loading
Loading