Skip to content

[OTLP/HTTP] Honor Retry-After header when retrying exports #4186

Open
DCchoudhury15 wants to merge 1 commit into
open-telemetry:mainfrom
DCchoudhury15:fix/otlp-http-honor-retry-after
Open

[OTLP/HTTP] Honor Retry-After header when retrying exports #4186
DCchoudhury15 wants to merge 1 commit into
open-telemetry:mainfrom
DCchoudhury15:fix/otlp-http-honor-retry-after

Conversation

@DCchoudhury15

Copy link
Copy Markdown
Contributor

Fixes #4172

Changes

The OTLP/HTTP retry logic wasn't respecting the Retry-After response header. If a server returned a 429 or 503 along with Retry-After, the exporter would ignore that value and always use its configured exponential backoff instead.

This PR updates the retry behavior to honor the Retry-After header as defined in RFC 7231 §7.1.3. Both supported formats are handled:

  • A delay in seconds (for example, Retry-After: 5)
  • An HTTP-date (for example, Retry-After: Wed, 21 Oct 2015 07:28:00 GMT)

One implementation detail worth noting is that NextRetryTime() is called from the doRetrySessions polling loop after ReleaseResponse() has already cleared the response headers. Because of that, the parsed Retry-After value is cached in retry_after_time_point_ before ReleaseResponse() is called, and NextRetryTime() uses the cached value instead of trying to parse the headers again.

Unit tests are not included yet. I can add them directly to this PR
if the maintainers would like the plan would be to extend the existing
test server fixture with a /retry-after/ endpoint that sets the header,
and add two test cases to ext/test/http/curl_http_test.cc following
the pattern of the existing ExponentialBackoffRetry test: one for
Retry-After: <delay-seconds> and one for Retry-After: <HTTP-date>.
Just let me know and I will push them here.

Known limitation: The HTTP-date parser currently supports only the IMF-fixdate format. The two obsolete date formats (RFC 850 and asctime) are not supported. Since RFC 7231 specifies IMF-fixdate as the required format for senders, this should not be an issue in practice.

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

…-telemetry#4172

Signed-off-by: DCchoudhury15 <divyanshuchoudhury3@gmail.com>
@DCchoudhury15 DCchoudhury15 requested a review from a team as a code owner June 26, 2026 06:23
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 14.28571% with 42 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.78%. Comparing base (65aae9c) to head (8cdef0a).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ext/src/http/client/curl/http_operation_curl.cc 14.29% 42 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4186      +/-   ##
==========================================
- Coverage   82.99%   82.78%   -0.20%     
==========================================
  Files         406      406              
  Lines       17260    17308      +48     
==========================================
+ Hits        14323    14327       +4     
- Misses       2937     2981      +44     
Files with missing lines Coverage Δ
...lemetry/ext/http/client/curl/http_operation_curl.h 90.91% <ø> (ø)
ext/src/http/client/curl/http_operation_curl.cc 55.17% <14.29%> (-3.57%) ⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread ext/src/http/client/curl/http_operation_curl.cc

bool ParseRetryAfterDelay(std::string value, std::chrono::seconds &delay)
{
value.erase(0, value.find_first_not_of(" \t\r\n"));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use common::StringUtil::Trim to trim string.


if (std::all_of(value.begin(), value.end(), [](unsigned char c) { return std::isdigit(c); }))
{
try

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the codes just like GetTimeoutFromString in sdk/src/common/env_variables.cc to parse duration? The common codes can be moved into api/include/opentelemetry/common/timestamp.h .

And exception can be disabled by -fno-exception or /EH . We should use OPENTELEMETRY_HAVE_EXCEPTIONS to check if exception is enabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The try/catch around std::stoull is very likely what's triggering the Bazel noexcept CI failure.

I'll replace it with a manual digit-by-digit parsing loop that performs explicit overflow checks, following the same approach used by GetTimeoutFromString in sdk/src/common/env_variables.cc. Where appropriate, I'll keep the exception-related handling guarded with OPENTELEMETRY_HAVE_EXCEPTIONS.

I'll also move the shared parsing logic into api/include/opentelemetry/common/timestamp.h as you suggested so it can be reused across different call sites.


bool ParseRetryAfterDate(std::string value, std::chrono::system_clock::time_point &date)
{
value.erase(0, value.find_first_not_of(" \t\r\n"));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use common::StringUtil::Trim to trim string.

std::tm tm = {};
std::istringstream ss(value);

ss >> std::get_time(&tm, "%a, %d %b %Y %H:%M:%S");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://datatracker.ietf.org/doc/html/rfc7231#section-7.1.1.1 seems support much more formats and timezone setting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I went back and checked §7.1.1.1 again, and you're right. The RFC defines three valid HTTP-date formats:

  • IMF-fixdate: Sun, 06 Nov 1994 08:49:37 GMT
  • RFC 850: Sunday, 06-Nov-94 08:49:37 GMT
  • asctime: Sun Nov 6 08:49:37 1994

Right now I'm only handling the first one, so I'll add support for the other two as well since the RFC requires recipients to accept all three.

I'll also implement the RFC 850 two-digit year handling. If the parsed year ends up looking more than 50 years in the future, I'll map it back to the most recent matching year in the past, as required by the spec.

For the timezone part, I don't think any extra handling is needed. All three formats are defined as UTC the first two explicitly use GMT, and the asctime format is also specified to be interpreted as UTC. So treating the parsed std::tm as UTC with PortableTimegm should be the right thing to do here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — all valid timezones in HTTP-date are UTC/GMT, so timezone handling isn't needed. Only the datetime format variations matter.

Comment thread ext/src/http/client/curl/http_operation_curl.cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OTLP/HTTP] Honor Retry-After header when retrying exports

2 participants