Skip to content

feat(publish): run validation on create#2782

Open
fpotier wants to merge 2 commits into
masterfrom
feat/validate-oncreate
Open

feat(publish): run validation on create#2782
fpotier wants to merge 2 commits into
masterfrom
feat/validate-oncreate

Conversation

@fpotier

@fpotier fpotier commented Jun 15, 2026

Copy link
Copy Markdown
Member

Description

Runs ajv validation during creation and update of PublishedData records, ignoring any validation error that may occur.

Motivation

We use ajv validation process to auto-fill some properties, for instance the full name of a creator is derived from the given and the family names.

This caused some confusion to our users. The frontend uses metadata.creators[].name to display the list of creators but the full names are not computed on creation/updates.

In combination with #2749, it would also allow us to keep metadata accurate (e.g. computing the full size of the published data if the dataset list changes)

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

Summary by Sourcery

Bug Fixes:

  • Ensure published data creation requests are validated before persistence to prevent invalid records.

Summary by Sourcery

Validate PublishedData v4 payloads with the AJV-based validator during create, update, and resync operations to ensure metadata-driven fields are populated consistently.

New Features:

  • Trigger AJV validation when creating PublishedData v4 records so metadata-derived fields like publication year are set automatically.
  • Trigger AJV validation when updating and resyncing PublishedData v4 records to keep computed metadata fields in sync.

Enhancements:

  • Default the update DTO metadata field to an empty object to support AJV-driven metadata augmentation.

Tests:

  • Add integration tests verifying metadata.publicationYear is set on PublishedData v4 create, patch, and resync operations via AJV extensions.

@fpotier fpotier force-pushed the feat/validate-oncreate branch 3 times, most recently from b7dd924 to 65ed3db Compare June 15, 2026 12:55
@fpotier fpotier force-pushed the feat/validate-oncreate branch from 65ed3db to 1986291 Compare June 16, 2026 12:35
@fpotier fpotier marked this pull request as ready for review June 16, 2026 12:44
@fpotier fpotier requested a review from a team as a code owner June 16, 2026 12:44

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In the resync test ("should set 'metadata.publicationYear' on resync"), the POST /resync request is not returned/awaited, so the test can proceed to the GET before resync has completed, leading to potential flakiness; return the request(appUrl).post(...) chain or await it before issuing the GET.
  • Setting a default value of {} on the optional metadata?: Record<string, unknown> = {} in UpdatePublishedDataV4Dto changes the semantics from "possibly absent" to "always present but possibly empty"; consider whether callers or persistence logic rely on metadata being undefined when not provided, and if not required, drop the default to avoid unintended side effects.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the resync test (`"should set 'metadata.publicationYear' on resync"`), the `POST /resync` request is not returned/awaited, so the test can proceed to the `GET` before resync has completed, leading to potential flakiness; return the `request(appUrl).post(...)` chain or `await` it before issuing the `GET`.
- Setting a default value of `{}` on the optional `metadata?: Record<string, unknown> = {}` in `UpdatePublishedDataV4Dto` changes the semantics from "possibly absent" to "always present but possibly empty"; consider whether callers or persistence logic rely on `metadata` being `undefined` when not provided, and if not required, drop the default to avoid unintended side effects.

## Individual Comments

### Comment 1
<location path="src/published-data/dto/update-published-data.v4.dto.ts" line_range="55" />
<code_context>
   @IsObject()
   @IsOptional()
-  readonly metadata?: Record<string, unknown>;
+  readonly metadata?: Record<string, unknown> = {};
 }

</code_context>
<issue_to_address>
**issue (bug_risk):** Defaulting `metadata` to `{}` undermines `@IsOptional()` semantics and changes the API contract for omitted vs empty metadata.

With this default, `metadata` is never `undefined`, so `@IsOptional()` no longer has any effect and callers can’t tell “not provided” from “provided but empty”. If `metadata` should always exist, make it required and remove `@IsOptional()` and the `?`. If it should stay optional, drop the default and normalize `undefined` to `{}` at the appropriate boundary (e.g., service layer).
</issue_to_address>

### Comment 2
<location path="test/PublishedDataV4.js" line_range="509-515" />
<code_context>
+        });
+    });
+
+    it("should set 'metadata.publicationYear' on resync", async () => {
+      request(appUrl)
+        .post(`/api/v4/PublishedData/${id}/resync`)
+        .send(strippedPublishedData)
+        .set("Accept", "application/json")
+        .set({ Authorization: `Bearer ${accessTokenAdminIngestor}` })
+        .expect(TestData.EntryCreatedStatusCode);
+
+      return request(appUrl)
</code_context>
<issue_to_address>
**issue (testing):** The resync test does not await the POST request, so the test can pass even if the resync fails.

In `"should set 'metadata.publicationYear' on resync"`, the initial `POST /resync` is fired but neither awaited nor returned. Mocha can therefore move on to the `GET` (or finish the test) before resync completes, and any failure status from the POST is ignored. Please `return`/`await` the POST call and add status/content-type expectations so the test actually waits for and validates the resync before asserting the `GET` response.
</issue_to_address>

### Comment 3
<location path="test/PublishedDataV4.js" line_range="471-473" />
<code_context>
       .expect("Content-Type", /json/);
   });
+
+  describe("Ajv extensions are executed on create/save", () => {
+    const { metadata, ...strippedPublishedData } = publishedData;
+    const expectedPublicationYear = new Date().getFullYear();
+    let id;
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add tests that cover ignored validation errors to prove the new behavior of running validation but not failing on errors.

The current tests only cover the happy path where `publicationYear` is auto-populated; they don’t verify that records are still created/updated when the payload violates the schema.

Please add at least one test that sends a payload which fails AJV validation (e.g., missing a required field or containing an invalid value) and asserts that the endpoint still responds with success and persists the record. This will ensure the "validate and enrich, but do not block on errors" behavior and protect against regressions where validation becomes blocking.

Suggested implementation:

```javascript
  describe("Ajv extensions are executed on create/save", () => {
    const { metadata, ...strippedPublishedData } = publishedData;
    const expectedPublicationYear = new Date().getFullYear();
    let id;

    it("should set 'metadata.publicationYear' on create", async () => {
      const res = await request(appUrl)
        .post("/api/v4/PublishedData")
        .send(strippedPublishedData)
        .set("Accept", "application/json")
        .expect(201)
        .expect("Content-Type", /json/);

      expect(res.body).to.have.nested.property(
        "metadata.publicationYear",
        expectedPublicationYear
      );

      id = res.body.id;
      expect(id).to.be.ok;
    });

    it("should still create a record when payload fails AJV validation", async () => {
      // Construct a payload that violates the AJV schema but should not block persistence.
      // Example: set an invalid type for publicationYear (string instead of number).
      const invalidPayload = {
        ...strippedPublishedData,
        metadata: {
          ...metadata,
          publicationYear: "not-a-number",
        },
      };

      const createRes = await request(appUrl)
        .post("/api/v4/PublishedData")
        .send(invalidPayload)
        .set("Accept", "application/json")
        .expect(201)
        .expect("Content-Type", /json/);

      const createdId = createRes.body.id;
      expect(createdId).to.be.ok;

      // Verify that the record was actually persisted and is readable via the API.
      const getRes = await request(appUrl)
        .get(`/api/v4/PublishedData/${createdId}`)
        .set("Accept", "application/json")
        .expect(200)
        .expect("Content-Type", /json/);

      expect(getRes.body).to.have.property("id", createdId);
      // We don't assert on metadata.publicationYear here; the important part is that
      // AJV validation did not block creation, even though the payload was invalid.
    });

```

1. This change assumes you are using Chai's `expect` style with `.to.have` and `.to.be` as in other tests in this file. If the assertion library or style differs, adjust the expectations accordingly.
2. The validation failure in the new test is simulated by setting `metadata.publicationYear` to a string. If your AJV schema enforces other required fields or formats instead, you may want to tweak `invalidPayload` to better match an actual failing case (e.g., omit a required field).
3. The status codes `201` for create and `200` for get are inferred; if your API uses different codes, update the `.expect(<status>)` calls to match your existing tests.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/published-data/dto/update-published-data.v4.dto.ts Outdated
Comment thread test/PublishedDataV4.js
Comment thread test/PublishedDataV4.js

@minottic minottic left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I wonder if this might be a problem in some edge cases, I will add an issue to raise that and suggest a possible later solution

@fpotier

fpotier commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

thanks! I wonder if this might be a problem in some edge cases, I will add an issue to raise that and suggest a possible later solution

what edge case do you have in mind?
If the dto doesn't have any metadata property, then it will not be initialized (see the 1st sourcery comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants