Skip to content

WIP OCPNODE-4529: Migrate test case 44493 - configurable terminationGracePeriodSeconds for probes#31170

Open
BhargaviGudi wants to merge 1 commit into
openshift:mainfrom
BhargaviGudi:migrate-44493
Open

WIP OCPNODE-4529: Migrate test case 44493 - configurable terminationGracePeriodSeconds for probes#31170
BhargaviGudi wants to merge 1 commit into
openshift:mainfrom
BhargaviGudi:migrate-44493

Conversation

@BhargaviGudi
Copy link
Copy Markdown
Contributor

@BhargaviGudi BhargaviGudi commented May 13, 2026

Summary

Migrates test case OCP-44493 from openshift-tests-private to origin.

Validates that Kubernetes liveness and startup probes honor their probe-level terminationGracePeriodSeconds setting instead of defaulting to the pod-level value.

Polarion

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-44493

Bug

OCPBUGS-44493

Test Coverage

  • ✅ Liveness probe with probe-level terminationGracePeriodSeconds (10s vs pod-level 60s)
  • ✅ Startup probe with probe-level terminationGracePeriodSeconds (10s vs pod-level 60s)
  • ✅ Liveness probe without probe-level setting (falls back to pod-level 60s)

Implementation

  • Created pod resources inline (no templates)
  • Proper Ginkgo patterns (DeferCleanup, ote.Informing(), context-aware polling)
  • Helper functions return errors, all assertions in main test
  • Pod security context with RunAsNonRoot, SeccompProfile, drop ALL capabilities

Testing

./openshift-tests run-test "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriodSeconds to liveness and startup probes [OCP-44493]"
  Will run 1 of 1 specs
  ------------------------------
  [sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriodSeconds to liveness and startup probes [OCP-44493] [Lifecycle:informing]
  github.com/openshift/origin/test/extended/node/node_e2e/node.go:175
    STEP: Creating a kubernetes client @ 05/18/26 14:28:43.827
  I0518 14:28:43.828233 2374947 discovery.go:214] Invalidating discovery information
  I0518 14:28:43.829539 2374947 framework.go:2330] [precondition-check] checking if cluster is MicroShift
  I0518 14:28:44.199428 2374947 framework.go:2353] IsMicroShiftCluster: microshift-version configmap not found, not MicroShift
  I0518 14:28:47.300957 2374947 client.go:293] configPath is now "/tmp/configfile986143567"
  I0518 14:28:47.300979 2374947 client.go:368] The user is now "e2e-test-node-nxzsh-user"
  I0518 14:28:47.300988 2374947 client.go:370] Creating project "e2e-test-node-nxzsh"
  I0518 14:28:47.681177 2374947 client.go:378] Waiting on permissions in project "e2e-test-node-nxzsh" ...
  I0518 14:28:48.911713 2374947 client.go:407] DeploymentConfig capability is enabled, adding 'deployer' SA to the list of default SAs
  I0518 14:28:50.776952 2374947 client.go:422] Waiting for ServiceAccount "default" to be provisioned...
  I0518 14:28:51.353685 2374947 client.go:422] Waiting for ServiceAccount "builder" to be provisioned...
  I0518 14:28:51.936558 2374947 client.go:422] Waiting for ServiceAccount "deployer" to be provisioned...
  I0518 14:28:52.710888 2374947 client.go:432] Waiting for RoleBinding "system:image-pullers" to be provisioned...
  I0518 14:28:52.990885 2374947 client.go:432] Waiting for RoleBinding "system:image-builders" to be provisioned...
  I0518 14:28:53.231469 2374947 client.go:432] Waiting for RoleBinding "system:deployers" to be provisioned...
  I0518 14:28:54.443121 2374947 client.go:469] Project "e2e-test-node-nxzsh" has been fully provisioned.
    STEP: Test liveness probe with probe-level terminationGracePeriodSeconds @ 05/18/26 14:28:54.443
  I0518 14:28:56.699644 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:29:06.736895 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:29:16.328239 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:29:26.807122 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:29:36.223155 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:29:46.059144 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:29:56.317470 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:30:06.745784 2374947 node.go:256] Killing event:   Normal   Killing         12s               kubelet            Container test failed liveness probe, will be restarted
  I0518 14:30:06.745821 2374947 node.go:257] Restart event:   Normal   Started         1s (x2 over 71s)  kubelet            Container started
  I0518 14:30:06.745834 2374947 node.go:286] Time difference: 11 seconds (expected: 10 ±10 seconds)
  I0518 14:30:06.745844 2374947 node.go:290] Termination grace period check passed
    STEP: Test startup probe with probe-level terminationGracePeriodSeconds @ 05/18/26 14:30:06.745
  I0518 14:30:09.077639 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:30:18.797777 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:30:28.275769 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:30:38.865499 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:30:49.343472 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:30:58.291943 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:31:09.210368 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:31:18.810915 2374947 node.go:256] Killing event:   Normal   Killing         10s               kubelet            Container teststartup failed startup probe, will be restarted
  I0518 14:31:18.810954 2374947 node.go:257] Restart event:   Normal   Started         0s (x2 over 71s)  kubelet            Container started
  I0518 14:31:18.810971 2374947 node.go:286] Time difference: 10 seconds (expected: 10 ±10 seconds)
  I0518 14:31:18.810981 2374947 node.go:290] Termination grace period check passed
    STEP: Test liveness probe without probe-level terminationGracePeriodSeconds (should use pod-level) @ 05/18/26 14:31:18.811
  I0518 14:31:21.294025 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:31:30.649065 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:31:40.827107 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:31:50.960355 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:32:01.470024 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:32:11.058322 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:32:20.552680 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:32:30.676954 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:32:41.165683 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:32:50.790776 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:33:06.134126 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:33:10.698858 2374947 node.go:252] Waiting for probe failure (killing) and container restart events
  I0518 14:33:21.206980 2374947 node.go:256] Killing event:   Normal   Killing         62s                kubelet            Container test failed liveness probe, will be restarted
  I0518 14:33:21.207015 2374947 node.go:257] Restart event:   Normal   Started         2s (x2 over 2m2s)  kubelet            Container started
  I0518 14:33:21.207025 2374947 node.go:286] Time difference: 60 seconds (expected: 60 ±10 seconds)
  I0518 14:33:21.207030 2374947 node.go:290] Termination grace period check passed
  I0518 14:33:22.119215 2374947 client.go:689] Deleted {user.openshift.io/v1, Resource=users  e2e-test-node-nxzsh-user}, err: <nil>
  I0518 14:33:22.527090 2374947 client.go:689] Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-node-nxzsh}, err: <nil>
  I0518 14:33:23.139164 2374947 client.go:689] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~YF8Iu7MBdIMsAe7i3YgsABgR7Hfw2QVP9NB3mdyxY7U}, err: <nil>
    STEP: Destroying namespace "e2e-test-node-nxzsh" for this suite. @ 05/18/26 14:33:23.139
  • [279.543 seconds]
  ------------------------------

  Ran 1 of 1 Specs in 279.544 seconds
  SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped

Summary by CodeRabbit

  • Tests
    • Added end-to-end tests validating configurable terminationGracePeriodSeconds for liveness and startup probes, covering probe-level overrides, pod-level defaults, and timing verification of probe-triggered container restarts by observing pod events and asserting termination/restart intervals within expected windows.
  • Documentation
    • Updated test suite README to document the new probe termination test.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@BhargaviGudi BhargaviGudi changed the title [OCPNODE-3812]: Migrate testcase 44493 - configurable terminationGracePeriodSeconds for probes WIP [OCPNODE-3812]: Migrate testcase 44493 - configurable terminationGracePeriodSeconds for probes May 13, 2026
@openshift-ci openshift-ci Bot requested review from asahay19 and sairameshv May 13, 2026 14:25
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

A new Ginkgo e2e test was added to validate configurable TerminationGracePeriodSeconds for liveness and startup probes by parsing pod Events, measuring kill-to-restart timing, and checking probe-level vs pod-level behavior.

Changes

Probe Termination Grace Period E2E Test

Layer / File(s) Summary
Import updates and test scaffold
test/extended/node/node_e2e/node.go
Added context, strconv, and the ote Ginkgo extension import and reorganized the import block to support the new test.
Duration parser and verifyProbeTermination helper
test/extended/node/node_e2e/node.go
Introduced parseDurationToSeconds and verifyProbeTermination: polling logic that inspects oc describe pod Events, extracts matching “Killing … failed … probe … will be restarted” and subsequent “Started … Container started” lines, computes the seconds delta, and validates it against an expected window.
Liveness probe scenario (probe-level set)
test/extended/node/node_e2e/node.go
Adds a pod with liveness probe TerminationGracePeriodSeconds=10 (pod-level 60) and asserts observed restart timing via verifyProbeTermination.
Startup probe scenario (probe-level set)
test/extended/node/node_e2e/node.go
Adds a pod with startup probe TerminationGracePeriodSeconds=10 (pod-level 60) and asserts observed restart timing via verifyProbeTermination.
Liveness probe fallback to pod-level
test/extended/node/node_e2e/node.go, test/extended/node/README.md
Adds a pod with liveness probe lacking probe-level TerminationGracePeriodSeconds to validate fallback behavior to the pod-level value (60s); README lists the new test file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

jira/valid-reference

Suggested reviewers

  • bertinatto
  • cpmeadors
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Test requires pulling container image from public quay.io registry, which fails in disconnected environments. Missing [Skipped:Disconnected] annotation. Add [Skipped:Disconnected] to test name. Change test annotation from: g.It("[OTP] add configurable... [OCP-44493]", ...) to: g.It("[OTP] add configurable... [OCP-44493] [Skipped:Disconnected]", ...)
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title directly references the migrated test case (OCP-44493) and clearly describes the primary change: adding configurable terminationGracePeriodSeconds for probes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test names are stable with static strings, no dynamic content like pod names or timestamps in titles.
Test Structure And Quality ✅ Passed All quality checks pass: proper single responsibility, complete cleanup, appropriate timeouts, meaningful assertion messages, consistent with codebase patterns.
Microshift Test Compatibility ✅ Passed Test uses standard k8s APIs only and is wrapped in Describe block with BeforeEach MicroShift skip check. No unsupported OpenShift APIs or multi-node assumptions.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Test creates individual pods without multi-node requirements. No node affinity, topology constraints, or pod rescheduling. Compatible with SNO.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies test code only. Topology-aware scheduling check applies to deployment manifests, operators, and controllers—not tests.
Ote Binary Stdout Contract ✅ Passed New test is properly contained in g.It() blocks. All stdout writes are framework-intercepted. No process-level violations of OTE binary stdout contract detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2026
@BhargaviGudi BhargaviGudi changed the title WIP [OCPNODE-3812]: Migrate testcase 44493 - configurable terminationGracePeriodSeconds for probes WIP [OCPNODE-4529]: Migrate testcase 44493 - configurable terminationGracePeriodSeconds for probes May 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
test/extended/node/node_e2e/node.go (4)

282-432: ⚡ Quick win

Three near-identical pod specs — extract a builder helper.

The three pods (liveness-probe, startup-probe, liveness-probe-no-term) duplicate ~50 lines of spec each (same image, security context, command, ports). Only the probe kind, name, and probe-level TerminationGracePeriodSeconds differ. A small builder would shrink the test by ~100 lines and make the differences explicit.

♻️ Sketch
buildProbePod := func(name string, probe *corev1.Probe, kind string) *corev1.Pod {
    container := corev1.Container{
        Name:    "test",
        Image:   "quay.io/openshifttest/nginx-alpine@sha256:04f316442d48ba60e3ea0b5a67eb89b0b667abf1c198a3d0056ca748736336a0",
        Command: []string{"bash", "-c", "sleep 100000000"},
        Ports:   []corev1.ContainerPort{{ContainerPort: 8080}},
        SecurityContext: &corev1.SecurityContext{
            AllowPrivilegeEscalation: ptr.To(false),
            Capabilities:             &corev1.Capabilities{Drop: []corev1.Capability{"ALL"}},
        },
    }
    switch kind {
    case "liveness":
        container.LivenessProbe = probe
    case "startup":
        container.StartupProbe = probe
    }
    return &corev1.Pod{
        ObjectMeta: metav1.ObjectMeta{Name: name, Namespace: namespace},
        Spec: corev1.PodSpec{
            TerminationGracePeriodSeconds: ptr.To[int64](60),
            SecurityContext: &corev1.PodSecurityContext{
                RunAsNonRoot:   ptr.To(true),
                SeccompProfile: &corev1.SeccompProfile{Type: corev1.SeccompProfileTypeRuntimeDefault},
            },
            Containers: []corev1.Container{container},
        },
    }
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/node/node_e2e/node.go` around lines 282 - 432, The three pod
specs (liveness-probe, startup-probe, liveness-probe-no-term) duplicate
container/image/security/command/ports setup — extract a builder like
buildProbePod(name string, probe *corev1.Probe, kind string) that constructs the
common corev1.Container (Image, Command, Ports, SecurityContext) and attaches
either LivenessProbe or StartupProbe based on kind, and returns the full
*corev1.Pod with the shared PodSpec (pod-level TerminationGracePeriodSeconds and
PodSecurityContext); replace the three inline pod literals with calls to
buildProbePod("liveness-probe", livenessProbe, "liveness"),
buildProbePod("startup-probe", startupProbe, "startup") and
buildProbePod("liveness-probe-no-term", nil, "liveness") and keep
verifyProbeTermination calls unchanged.

216-279: 🏗️ Heavy lift

Use the Events API directly instead of parsing humanized oc describe output.

The helper extracts timestamps by splitting describe output and indexing fields[2]. This is fragile:

  • The "Age" column position depends on the describe template and event aggregation. While aggregated events (e.g., 60s (x3 over 90s)) keep the last-seen value at fields[2], any format change breaks parsing silently.
  • The substring match "Started container" and multi-substring match for probe failures could match unrelated events if more containers are added.

Use the Events API directly: query oc.KubeClient().CoreV1().Events(namespace).List(...), filter by involvedObject and reason (Started, Killing), and consume event.FirstTimestamp / event.LastTimestamp directly without parsing humanized durations. This pattern is idiomatic across the test suite.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/node/node_e2e/node.go` around lines 216 - 279, The current
verifyProbeTermination function is brittle because it parses humanized `oc
describe` output and indexes fields[2]; replace that logic with a direct Events
API query: inside verifyProbeTermination (and remove dependence on
parseDurationToSeconds), call
oc.KubeClient().CoreV1().Events(namespace).List(...) (or use the client wrapper
available in the test helpers), filter events by event.InvolvedObject.Name ==
podName and by event.Reason == "Started" for container starts and event.Reason
== "Killing" (or the probe-failure reason used by your cluster) for
probe/termination events, then use the event.FirstTimestamp/LastTimestamp fields
to compute seconds difference and compare to expectedTerminationSec with the
same tolerance logic; log the selected event timestamps and keep the
polling/wait.PollUntilContextTimeout wrapper and return true when the time diff
is within range.

185-213: ⚡ Quick win

Replace custom duration parser with time.ParseDuration.

The Go standard library already parses the exact formats kubectl emits ("45s", "1m30s", "1h2m3s"). The current implementation silently returns 0, nil for unrecognized inputs (e.g., "5h" since it matches neither the "m" nor "s" branches), which would cause downstream timing arithmetic to be wrong without surfacing an error. Switching to time.ParseDuration removes the custom code path and the silent-zero failure mode.

♻️ Proposed refactor
-		// Helper function to parse duration string like "1m30s" or "45s" to seconds
-		parseDurationToSeconds := func(durationStr string) (int, error) {
-			var totalSeconds int
-			if strings.Contains(durationStr, "m") {
-				parts := strings.Split(durationStr, "m")
-				minutes, err := strconv.Atoi(parts[0])
-				if err != nil {
-					return 0, err
-				}
-				totalSeconds = minutes * 60
-				if len(parts) > 1 && strings.Contains(parts[1], "s") {
-					secStr := strings.TrimSuffix(parts[1], "s")
-					if secStr != "" {
-						seconds, err := strconv.Atoi(secStr)
-						if err != nil {
-							return 0, err
-						}
-						totalSeconds += seconds
-					}
-				}
-			} else if strings.Contains(durationStr, "s") {
-				secStr := strings.TrimSuffix(durationStr, "s")
-				seconds, err := strconv.Atoi(secStr)
-				if err != nil {
-					return 0, err
-				}
-				totalSeconds = seconds
-			}
-			return totalSeconds, nil
-		}
+		// Helper function to parse duration string like "1m30s" or "45s" to seconds
+		parseDurationToSeconds := func(durationStr string) (int, error) {
+			d, err := time.ParseDuration(durationStr)
+			if err != nil {
+				return 0, err
+			}
+			return int(d.Seconds()), nil
+		}

The strconv import can then be removed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/node/node_e2e/node.go` around lines 185 - 213, The custom
parser function parseDurationToSeconds should be replaced to use
time.ParseDuration: call time.ParseDuration(durationStr), return
int(duration.Seconds()) on success and propagate the error on failure so
unrecognized inputs (e.g., "5h") don't silently return 0; update the function
signature/returns accordingly, remove the now-unused strconv import, and ensure
callers still get an int number of seconds from the parsed duration.

288-291: 💤 Low value

Use a pointer helper instead of &[]T{v}[0].

The &[]int64{60}[0] / &[]bool{true}[0] pattern (used 11 times in this test) allocates a single-element slice solely to take its address. Prefer ptr.To from k8s.io/utils/ptr for clarity and consistency with the rest of the k8s ecosystem.

♻️ Example refactor
 import (
   ...
+  "k8s.io/utils/ptr"
   ...
 )
-				TerminationGracePeriodSeconds: &[]int64{60}[0],
+				TerminationGracePeriodSeconds: ptr.To[int64](60),
 				SecurityContext: &corev1.PodSecurityContext{
-					RunAsNonRoot: &[]bool{true}[0],
+					RunAsNonRoot: ptr.To(true),
 					...
 				},
 				...
-							AllowPrivilegeEscalation: &[]bool{false}[0],
+							AllowPrivilegeEscalation: ptr.To(false),
 							...
-							TerminationGracePeriodSeconds: &[]int64{10}[0],
+							TerminationGracePeriodSeconds: ptr.To[int64](10),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/node/node_e2e/node.go` around lines 288 - 291, Replace the
hacky &[]T{v}[0] pointer constructions with the ptr.To helper from
k8s.io/utils/ptr for clarity and consistency: e.g. change
TerminationGracePeriodSeconds: &[]int64{60}[0] to use ptr.To(int64(60)) and
RunAsNonRoot: &[]bool{true}[0] to ptr.To(true); update all other similar
occurrences (about 11 spots) in this file (look for
TerminationGracePeriodSeconds, RunAsNonRoot, SeccompProfile usages) and add the
import for "k8s.io/utils/ptr" if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 170: Update the test title string used in g.It to reference the correct
Kubernetes field name "terminationGracePeriodSeconds" (currently
"terminationGracePeriod") and ensure any Polarion title/identifier used in the
same test block is updated to match; locate the g.It invocation (the test named
"[OTP] add configurable terminationGracePeriod to liveness and startup probes
[OCP-44493]") and change the human-readable title and Polarion metadata to use
"terminationGracePeriodSeconds" so the test name matches the PR description and
API field.

---

Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 282-432: The three pod specs (liveness-probe, startup-probe,
liveness-probe-no-term) duplicate container/image/security/command/ports setup —
extract a builder like buildProbePod(name string, probe *corev1.Probe, kind
string) that constructs the common corev1.Container (Image, Command, Ports,
SecurityContext) and attaches either LivenessProbe or StartupProbe based on
kind, and returns the full *corev1.Pod with the shared PodSpec (pod-level
TerminationGracePeriodSeconds and PodSecurityContext); replace the three inline
pod literals with calls to buildProbePod("liveness-probe", livenessProbe,
"liveness"), buildProbePod("startup-probe", startupProbe, "startup") and
buildProbePod("liveness-probe-no-term", nil, "liveness") and keep
verifyProbeTermination calls unchanged.
- Around line 216-279: The current verifyProbeTermination function is brittle
because it parses humanized `oc describe` output and indexes fields[2]; replace
that logic with a direct Events API query: inside verifyProbeTermination (and
remove dependence on parseDurationToSeconds), call
oc.KubeClient().CoreV1().Events(namespace).List(...) (or use the client wrapper
available in the test helpers), filter events by event.InvolvedObject.Name ==
podName and by event.Reason == "Started" for container starts and event.Reason
== "Killing" (or the probe-failure reason used by your cluster) for
probe/termination events, then use the event.FirstTimestamp/LastTimestamp fields
to compute seconds difference and compare to expectedTerminationSec with the
same tolerance logic; log the selected event timestamps and keep the
polling/wait.PollUntilContextTimeout wrapper and return true when the time diff
is within range.
- Around line 185-213: The custom parser function parseDurationToSeconds should
be replaced to use time.ParseDuration: call time.ParseDuration(durationStr),
return int(duration.Seconds()) on success and propagate the error on failure so
unrecognized inputs (e.g., "5h") don't silently return 0; update the function
signature/returns accordingly, remove the now-unused strconv import, and ensure
callers still get an int number of seconds from the parsed duration.
- Around line 288-291: Replace the hacky &[]T{v}[0] pointer constructions with
the ptr.To helper from k8s.io/utils/ptr for clarity and consistency: e.g. change
TerminationGracePeriodSeconds: &[]int64{60}[0] to use ptr.To(int64(60)) and
RunAsNonRoot: &[]bool{true}[0] to ptr.To(true); update all other similar
occurrences (about 11 spots) in this file (look for
TerminationGracePeriodSeconds, RunAsNonRoot, SeccompProfile usages) and add the
import for "k8s.io/utils/ptr" if not already present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3b8a78f5-42e4-4bc0-921b-438671a54f52

📥 Commits

Reviewing files that changed from the base of the PR and between 793d403 and b783cc7.

📒 Files selected for processing (1)
  • test/extended/node/node_e2e/node.go

Comment thread test/extended/node/node_e2e/node.go Outdated
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 13, 2026

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: b783cc7

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-fips High - "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493] [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-gcp-ovn High - "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493] [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6 High - "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493] [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-vsphere-ovn High - "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493] [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi High - "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493] [Suite:openshift/conformance/parallel]" is a new test that failed 1 time(s) against the current commit

New tests seen in this PR at sha: b783cc7

  • "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriod to liveness and startup probes [OCP-44493] [Suite:openshift/conformance/parallel]" [Total: 5, Pass: 0, Fail: 5, Flake: 0]

@BhargaviGudi BhargaviGudi changed the title WIP [OCPNODE-4529]: Migrate testcase 44493 - configurable terminationGracePeriodSeconds for probes WIP OCPNODE-4529: Migrate test case 44493 - configurable terminationGracePeriodSeconds for probes May 14, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 14, 2026

@BhargaviGudi: This pull request references OCPNODE-4529 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Migrates test case OCP-44493 from openshift-tests-private to origin.

Validates that Kubernetes liveness and startup probes honor their probe-level terminationGracePeriodSeconds setting instead of defaulting to the pod-level value.

Polarion

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-44493

Bug

OCPBUGS-44493

Test Coverage

  • ✅ Liveness probe with probe-level terminationGracePeriodSeconds (10s vs pod-level 60s)
  • ✅ Startup probe with probe-level terminationGracePeriodSeconds (10s vs pod-level 60s)
  • ✅ Liveness probe without probe-level setting (falls back to pod-level 60s)

Implementation

  • Created pod resources inline (no templates)
  • Proper Ginkgo patterns (DeferCleanup, ote.Informing(), context-aware polling)
  • Helper functions return errors, all assertions in main test
  • Pod security context with RunAsNonRoot, SeccompProfile, drop ALL capabilities

Testing

```bash
./openshift-tests run-test "[sig-node] [Jira:Node/Kubelet] Kubelet, CRI-O, CPU manager [OTP] add configurable terminationGracePeriodSeconds to liveness and startup probes [OCP-44493]"
```

Summary by CodeRabbit

  • Tests
  • Added comprehensive e2e test suite validating configurable terminationGracePeriod for liveness and startup probes, including probe-level overrides, pod-level defaults, and graceful termination timing verification.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 16, 2026
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 18, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/extended/node/node_e2e/node.go (1)

183-211: ⚡ Quick win

Prefer time.ParseDuration over custom parsing.

The custom parser silently returns 0, nil for unrecognized formats (e.g., empty string, "123") and doesn't handle hours. time.ParseDuration handles all these cases correctly and is more maintainable.

♻️ Proposed fix
-		// Helper function to parse duration string like "1m30s" or "45s" to seconds
-		parseDurationToSeconds := func(durationStr string) (int, error) {
-			var totalSeconds int
-			if strings.Contains(durationStr, "m") {
-				parts := strings.Split(durationStr, "m")
-				minutes, err := strconv.Atoi(parts[0])
-				if err != nil {
-					return 0, err
-				}
-				totalSeconds = minutes * 60
-				if len(parts) > 1 && strings.Contains(parts[1], "s") {
-					secStr := strings.TrimSuffix(parts[1], "s")
-					if secStr != "" {
-						seconds, err := strconv.Atoi(secStr)
-						if err != nil {
-							return 0, err
-						}
-						totalSeconds += seconds
-					}
-				}
-			} else if strings.Contains(durationStr, "s") {
-				secStr := strings.TrimSuffix(durationStr, "s")
-				seconds, err := strconv.Atoi(secStr)
-				if err != nil {
-					return 0, err
-				}
-				totalSeconds = seconds
-			}
-			return totalSeconds, nil
-		}
+		// Helper function to parse duration string like "1m30s" or "45s" to seconds
+		parseDurationToSeconds := func(durationStr string) (int, error) {
+			d, err := time.ParseDuration(durationStr)
+			if err != nil {
+				return 0, err
+			}
+			return int(d.Seconds()), nil
+		}

This also allows removing the strconv import (line 7) if not used elsewhere.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/extended/node/node_e2e/node.go` around lines 183 - 211, Replace the
custom parseDurationToSeconds implementation with one that uses
time.ParseDuration: call time.ParseDuration(durationStr), return an error if
parsing fails, convert the resulting time.Duration to seconds
(int(d.Seconds())), and ensure edge cases like empty strings or bare numbers
bubble up as parse errors; update any callers expecting the same signature and
remove the now-unused strconv import if it is no longer referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 323: Change the container Command invocations that use "bash -c" to use
"sh -c" because the nginx-alpine image does not contain bash; update the Command
entries (e.g. the slice literal Command: []string{"bash", "-c", "sleep
100000000"}) to Command: []string{"sh", "-c", "sleep 100000000"} in all three
places referenced (the Command fields at the occurrences around the given diff
and the similar blocks at the other two occurrences mentioned).

---

Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 183-211: Replace the custom parseDurationToSeconds implementation
with one that uses time.ParseDuration: call time.ParseDuration(durationStr),
return an error if parsing fails, convert the resulting time.Duration to seconds
(int(d.Seconds())), and ensure edge cases like empty strings or bare numbers
bubble up as parse errors; update any callers expecting the same signature and
remove the now-unused strconv import if it is no longer referenced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: f58cdb6c-4cd7-49ef-b354-0512c6543c25

📥 Commits

Reviewing files that changed from the base of the PR and between b783cc7 and b453e4b.

📒 Files selected for processing (1)
  • test/extended/node/node_e2e/node.go

Comment thread test/extended/node/node_e2e/node.go Outdated
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BhargaviGudi
Once this PR has been reviewed and has the lgtm label, please assign rphillips for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@BhargaviGudi BhargaviGudi force-pushed the migrate-44493 branch 2 times, most recently from 94cd990 to 084fa7b Compare May 18, 2026 06:52
@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 18, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

…nd startup probes

Migrates test from openshift-tests-private to origin.

Test validates probe-level terminationGracePeriodSeconds for:
- Liveness probes with probe-level terminationGracePeriodSeconds (10s)
- Startup probes with probe-level terminationGracePeriodSeconds (10s)
- Liveness probes without probe-level (falls back to pod-level 60s)

The test creates pods with failing probes and verifies the time
difference between probe failure (Killing event) and container
restart (Started event) matches the expected termination grace
period within acceptable range.

Event matching logic parses 'oc describe pod' output for:
- Killing events with container name
- Started events after restart

Updates:
- Add test to test/extended/node/node_e2e/node.go
- Document test in test/extended/node/README.md

Relates: https://issues.redhat.com/browse/OCPBUGS-44493
Signed-off-by: Bhargavi Gudi <BhargaviGudi@users.noreply.github.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

@BhargaviGudi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-microshift 2aca499 link true /test e2e-aws-ovn-microshift
ci/prow/e2e-aws-ovn-microshift-serial 2aca499 link true /test e2e-aws-ovn-microshift-serial
ci/prow/e2e-metal-ipi-ovn-ipv6 2aca499 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-vsphere-ovn-upi 2aca499 link true /test e2e-vsphere-ovn-upi
ci/prow/e2e-gcp-ovn 2aca499 link true /test e2e-gcp-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants