Skip to content

feat(OrigDatablock): add file count endpoints#2795

Open
abdimo101 wants to merge 3 commits into
masterfrom
origdatablock-files-count
Open

feat(OrigDatablock): add file count endpoints#2795
abdimo101 wants to merge 3 commits into
masterfrom
origdatablock-files-count

Conversation

@abdimo101

@abdimo101 abdimo101 commented Jun 22, 2026

Copy link
Copy Markdown
Member

Description

This PR adds new endpoints to count files across origdatablocks linked to a specific dataset. It also includes API tests for the public and private file count endpoints.

Motivation

Fixes

  • Bug fixed (#X)

Changes:

  • changes made

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

Summary by Sourcery

Add API support to count files across origdatablocks, including public and authenticated variants, and cover them with tests.

New Features:

  • Expose a secured /api/v4/origdatablocks/files/count endpoint to return the number of files matching optional filters.
  • Expose a public /api/v4/origdatablocks/public/files/count endpoint to return the number of public files matching optional filters.
  • Add a service-layer aggregation to count files across origdatablocks with support for filters and projections.

Tests:

  • Add authenticated and unauthenticated tests for the private origdatablocks files count endpoint.
  • Add tests for the public origdatablocks files count endpoint and adjust existing test case numbering.

@abdimo101 abdimo101 requested a review from a team as a code owner June 22, 2026 09:18
@abdimo101 abdimo101 requested review from Junjiequan and nitrosx June 22, 2026 09:19

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 5 issues, and left some high level feedback:

  • In OrigDatablocksService.countFiles, accessing result.count will throw if the aggregation returns no documents; consider guarding against an empty aggregation result (e.g. defaulting result to { count: 0 } or checking for undefined) before reading count.
  • In countFiles, fieldsProjection is typed as string[] but defaulted to {}, which is inconsistent; defaulting to an empty array (e.g. filter.fields ?? []) would better match the type and avoid passing an object into parsePipelineProjection.
  • The @Query decorator call in countFiles has mismatched closing parentheses/commas in the diff; it would be good to simplify/format this parameter declaration to avoid syntactic ambiguity and make it easier to read.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `OrigDatablocksService.countFiles`, accessing `result.count` will throw if the aggregation returns no documents; consider guarding against an empty aggregation result (e.g. defaulting `result` to `{ count: 0 }` or checking for undefined) before reading `count`.
- In `countFiles`, `fieldsProjection` is typed as `string[]` but defaulted to `{}`, which is inconsistent; defaulting to an empty array (e.g. `filter.fields ?? []`) would better match the type and avoid passing an object into `parsePipelineProjection`.
- The `@Query` decorator call in `countFiles` has mismatched closing parentheses/commas in the diff; it would be good to simplify/format this parameter declaration to avoid syntactic ambiguity and make it easier to read.

## Individual Comments

### Comment 1
<location path="src/origdatablocks/origdatablocks.v4.controller.ts" line_range="745" />
<code_context>
+    )
+    queryFilter: string,
+  ) {
+    const parsedFilter = JSON.parse(queryFilter ?? "{}");
+
+    return this.origDatablocksService.countFiles(parsedFilter);
</code_context>
<issue_to_address>
**issue (bug_risk):** Unvalidated JSON.parse may throw on malformed filter input, causing 500 responses.

`JSON.parse(queryFilter ?? "{}")` will throw on any malformed `filter` query and result in a 500. Unless `FilterValidationPipe` guarantees valid JSON, please either wrap this in try/catch (or use a safe parse helper) and return an appropriate 4xx instead, and mirror the same handling in `countFilesPublic`.
</issue_to_address>

### Comment 2
<location path="src/origdatablocks/origdatablocks-public.v4.controller.ts" line_range="285-287" />
<code_context>
+        },
+      ),
+    )
+    queryFilter: string,
+  ) {
+    const parsedFilter = JSON.parse(queryFilter ?? "{}");
</code_context>
<issue_to_address>
**suggestion:** The queryFilter parameter is typed as required but used as if it may be undefined.

Here `queryFilter` is required, but `JSON.parse(queryFilter ?? "{}")` assumes it may be `undefined`. Since Nest can pass `undefined` when the query param is missing, please type this as `string | undefined` (or `queryFilter?: string`) so the typing matches actual runtime behavior.

```suggestion
    )
    queryFilter?: string,
  ) {
```
</issue_to_address>

### Comment 3
<location path="src/origdatablocks/origdatablocks.service.ts" line_range="390" />
<code_context>
+    filter: FilterQuery<OrigDatablockDocument>,
+  ): Promise<CountApiResponse> {
+    const whereFilter: FilterQuery<OrigDatablockDocument> = filter.where ?? {};
+    const fieldsProjection: string[] = filter.fields ?? {};
+
+    const pipeline: PipelineStage[] = [{ $match: whereFilter }];
</code_context>
<issue_to_address>
**issue (bug_risk):** Defaulting fieldsProjection to an object conflicts with its string[] type and intended usage.

`fieldsProjection` is declared as `string[]`, but the fallback is `{}`, which can cause type errors or break `isEmpty`/`parsePipelineProjection` if they expect an array. Please default to an empty array (e.g. `filter.fields ?? []`) or allow `undefined` in the type and handle it explicitly.
</issue_to_address>

### Comment 4
<location path="src/origdatablocks/origdatablocks.service.ts" line_range="421-425" />
<code_context>
+
+    pipeline.push({ $count: "count" });
+
+    const [result] = await this.origDatablockModel
+      .aggregate<{ count: number }>(pipeline)
+      .exec();
+
+    return { count: result.count ?? 0 };
+  }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Accessing result.count without guarding against an empty aggregation result can throw.

If the aggregation returns no documents, the destructured `result` will be `undefined`, so `result.count` will throw before `?? 0` is applied. Using optional chaining avoids this:

```ts
const [result] = await this.origDatablockModel
  .aggregate<{ count: number }>(pipeline)
  .exec();

return { count: result?.count ?? 0 };
```
</issue_to_address>

### Comment 5
<location path="src/origdatablocks/origdatablocks.service.ts" line_range="386" />
<code_context>
     return { count };
   }

+  async countFiles(
+    filter: FilterQuery<OrigDatablockDocument>,
+  ): Promise<CountApiResponse> {
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying the new countFiles aggregation pipeline by counting list sizes directly and cleaning up the fields projection handling.

You can simplify `countFiles` while keeping behavior intact.

**1. Remove `$unwind`/`$count` in favor of `$project` + `$group`**

You don’t need to explode `dataFileList` just to count its elements. Project its size and sum:

```ts
async countFiles(
  filter: FilterQuery<OrigDatablockDocument>,
): Promise<CountApiResponse> {
  const whereFilter: FilterQuery<OrigDatablockDocument> = filter.where ?? {};
  const fieldsProjection = filter.fields ?? [];

  const pipeline: PipelineStage[] = [{ $match: whereFilter }];
  this.addLookupFields(pipeline, filter.include);

  if (!isEmpty(fieldsProjection)) {
    const projection = parsePipelineProjection(fieldsProjection);
    pipeline.push({ $project: projection });
  }

  // keep dataset constraint but simplify the stages
  pipeline.push({
    $lookup: {
      from: "Dataset",
      as: "dataset_temp",
      let: { datasetId: "$datasetId" },
      pipeline: [{ $match: { $expr: { $eq: ["$pid", "$$datasetId"] } } }],
    },
  });

  // only keep origdatablocks for which a dataset exists
  pipeline.push({
    $match: {
      $expr: { $gt: [{ $size: "$dataset_temp" }, 0] },
    },
  });

  pipeline.push({ $unset: "dataset_temp" });

  // count files by summing sizes of dataFileList
  pipeline.push(
    {
      $project: {
        fileCount: { $size: "$dataFileList" },
      },
    },
    {
      $group: {
        _id: null,
        count: { $sum: "$fileCount" },
      },
    },
  );

  const [result] = await this.origDatablockModel
    .aggregate<{ count: number }>(pipeline)
    .exec();

  return { count: result?.count ?? 0 };
}
```

This keeps the dataset-existence constraint but removes `$addFields`, `$unwind`, and `$count`, making the intent “sum of list sizes” explicit.

**2. Fix `fieldsProjection` typing/initialization**

Right now `fieldsProjection` is typed as `string[]` but initialized with `{}` if `filter.fields` is an object (or falsy). Ensure its type/value align:

```ts
type FieldsInput = string[] | undefined; // adjust to whatever your filter type is

const fieldsProjection: string[] =
  Array.isArray(filter.fields) ? filter.fields : [];
```

Or, if `filter.fields` can already be guaranteed as `string[]`, drop the `{}` default entirely:

```ts
const fieldsProjection: string[] = filter.fields ?? [];
```

This removes the implicit type mismatch and makes the projection logic easier to follow.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/origdatablocks/origdatablocks.v4.controller.ts
Comment thread src/origdatablocks/origdatablocks-public.v4.controller.ts
Comment thread src/origdatablocks/origdatablocks.service.ts Outdated
Comment thread src/origdatablocks/origdatablocks.service.ts Outdated
return { count };
}

async countFiles(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider simplifying the new countFiles aggregation pipeline by counting list sizes directly and cleaning up the fields projection handling.

You can simplify countFiles while keeping behavior intact.

1. Remove $unwind/$count in favor of $project + $group

You don’t need to explode dataFileList just to count its elements. Project its size and sum:

async countFiles(
  filter: FilterQuery<OrigDatablockDocument>,
): Promise<CountApiResponse> {
  const whereFilter: FilterQuery<OrigDatablockDocument> = filter.where ?? {};
  const fieldsProjection = filter.fields ?? [];

  const pipeline: PipelineStage[] = [{ $match: whereFilter }];
  this.addLookupFields(pipeline, filter.include);

  if (!isEmpty(fieldsProjection)) {
    const projection = parsePipelineProjection(fieldsProjection);
    pipeline.push({ $project: projection });
  }

  // keep dataset constraint but simplify the stages
  pipeline.push({
    $lookup: {
      from: "Dataset",
      as: "dataset_temp",
      let: { datasetId: "$datasetId" },
      pipeline: [{ $match: { $expr: { $eq: ["$pid", "$$datasetId"] } } }],
    },
  });

  // only keep origdatablocks for which a dataset exists
  pipeline.push({
    $match: {
      $expr: { $gt: [{ $size: "$dataset_temp" }, 0] },
    },
  });

  pipeline.push({ $unset: "dataset_temp" });

  // count files by summing sizes of dataFileList
  pipeline.push(
    {
      $project: {
        fileCount: { $size: "$dataFileList" },
      },
    },
    {
      $group: {
        _id: null,
        count: { $sum: "$fileCount" },
      },
    },
  );

  const [result] = await this.origDatablockModel
    .aggregate<{ count: number }>(pipeline)
    .exec();

  return { count: result?.count ?? 0 };
}

This keeps the dataset-existence constraint but removes $addFields, $unwind, and $count, making the intent “sum of list sizes” explicit.

2. Fix fieldsProjection typing/initialization

Right now fieldsProjection is typed as string[] but initialized with {} if filter.fields is an object (or falsy). Ensure its type/value align:

type FieldsInput = string[] | undefined; // adjust to whatever your filter type is

const fieldsProjection: string[] =
  Array.isArray(filter.fields) ? filter.fields : [];

Or, if filter.fields can already be guaranteed as string[], drop the {} default entirely:

const fieldsProjection: string[] = filter.fields ?? [];

This removes the implicit type mismatch and makes the projection logic easier to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant