[OTLP/HTTP] Honor Retry-After header when retrying exports #4186
[OTLP/HTTP] Honor Retry-After header when retrying exports #4186DCchoudhury15 wants to merge 1 commit into
Conversation
…-telemetry#4172 Signed-off-by: DCchoudhury15 <divyanshuchoudhury3@gmail.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4186 +/- ##
==========================================
- Coverage 82.99% 82.78% -0.20%
==========================================
Files 406 406
Lines 17260 17308 +48
==========================================
+ Hits 14323 14327 +4
- Misses 2937 2981 +44
🚀 New features to boost your workflow:
|
|
|
||
| bool ParseRetryAfterDelay(std::string value, std::chrono::seconds &delay) | ||
| { | ||
| value.erase(0, value.find_first_not_of(" \t\r\n")); |
There was a problem hiding this comment.
We can use common::StringUtil::Trim to trim string.
|
|
||
| if (std::all_of(value.begin(), value.end(), [](unsigned char c) { return std::isdigit(c); })) | ||
| { | ||
| try |
There was a problem hiding this comment.
Could you please use the codes just like GetTimeoutFromString in sdk/src/common/env_variables.cc to parse duration? The common codes can be moved into api/include/opentelemetry/common/timestamp.h .
And exception can be disabled by -fno-exception or /EH . We should use OPENTELEMETRY_HAVE_EXCEPTIONS to check if exception is enabled.
There was a problem hiding this comment.
Good catch. The try/catch around std::stoull is very likely what's triggering the Bazel noexcept CI failure.
I'll replace it with a manual digit-by-digit parsing loop that performs explicit overflow checks, following the same approach used by GetTimeoutFromString in sdk/src/common/env_variables.cc. Where appropriate, I'll keep the exception-related handling guarded with OPENTELEMETRY_HAVE_EXCEPTIONS.
I'll also move the shared parsing logic into api/include/opentelemetry/common/timestamp.h as you suggested so it can be reused across different call sites.
|
|
||
| bool ParseRetryAfterDate(std::string value, std::chrono::system_clock::time_point &date) | ||
| { | ||
| value.erase(0, value.find_first_not_of(" \t\r\n")); |
There was a problem hiding this comment.
We can use common::StringUtil::Trim to trim string.
| std::tm tm = {}; | ||
| std::istringstream ss(value); | ||
|
|
||
| ss >> std::get_time(&tm, "%a, %d %b %Y %H:%M:%S"); |
There was a problem hiding this comment.
According to https://datatracker.ietf.org/doc/html/rfc7231#section-7.1.1.1 seems support much more formats and timezone setting.
There was a problem hiding this comment.
Thanks for catching this. I went back and checked §7.1.1.1 again, and you're right. The RFC defines three valid HTTP-date formats:
IMF-fixdate:Sun, 06 Nov 1994 08:49:37 GMTRFC 850:Sunday, 06-Nov-94 08:49:37 GMTasctime:Sun Nov 6 08:49:37 1994
Right now I'm only handling the first one, so I'll add support for the other two as well since the RFC requires recipients to accept all three.
I'll also implement the RFC 850 two-digit year handling. If the parsed year ends up looking more than 50 years in the future, I'll map it back to the most recent matching year in the past, as required by the spec.
For the timezone part, I don't think any extra handling is needed. All three formats are defined as UTC the first two explicitly use GMT, and the asctime format is also specified to be interpreted as UTC. So treating the parsed std::tm as UTC with PortableTimegm should be the right thing to do here.
There was a problem hiding this comment.
You're right — all valid timezones in HTTP-date are UTC/GMT, so timezone handling isn't needed. Only the datetime format variations matter.
Fixes #4172
Changes
The OTLP/HTTP retry logic wasn't respecting the
Retry-Afterresponse header. If a server returned a429or503along withRetry-After, the exporter would ignore that value and always use its configured exponential backoff instead.This PR updates the retry behavior to honor the
Retry-Afterheader as defined in RFC 7231 §7.1.3. Both supported formats are handled:Retry-After: 5)Retry-After: Wed, 21 Oct 2015 07:28:00 GMT)One implementation detail worth noting is that
NextRetryTime()is called from thedoRetrySessionspolling loop afterReleaseResponse()has already cleared the response headers. Because of that, the parsedRetry-Aftervalue is cached inretry_after_time_point_beforeReleaseResponse()is called, andNextRetryTime()uses the cached value instead of trying to parse the headers again.Unit tests are not included yet. I can add them directly to this PR
if the maintainers would like the plan would be to extend the existing
test server fixture with a
/retry-after/endpoint that sets the header,and add two test cases to
ext/test/http/curl_http_test.ccfollowingthe pattern of the existing
ExponentialBackoffRetrytest: one forRetry-After: <delay-seconds>and one forRetry-After: <HTTP-date>.Just let me know and I will push them here.
Known limitation: The HTTP-date parser currently supports only the IMF-fixdate format. The two obsolete date formats (RFC 850 and
asctime) are not supported. Since RFC 7231 specifies IMF-fixdate as the required format for senders, this should not be an issue in practice.CHANGELOG.mdupdated for non-trivial changes