Skip to content

Fix legacy pdfjs dynamic import#6410

Open
cansu-jarvis wants to merge 2 commits into
FlowiseAI:mainfrom
cansu-jarvis:fix/legacy-pdfjs-build
Open

Fix legacy pdfjs dynamic import#6410
cansu-jarvis wants to merge 2 commits into
FlowiseAI:mainfrom
cansu-jarvis:fix/legacy-pdfjs-build

Conversation

@cansu-jarvis
Copy link
Copy Markdown

@cansu-jarvis cansu-jarvis commented May 19, 2026

Summary

  • Add a shared loadLegacyPdfJs helper that resolves pdfjs-dist/legacy/build/pdf.mjs and loads it with a native dynamic file:// import.
  • Use the helper for legacy PDF loading in the PDF document loader, generic File loader, and Arxiv tool.
  • Add the full-file-upload UI toggle for pdfFile.legacyBuild, preserving existing upload config defaults.
  • Add a unit test for the helper.

Testing

  • PATH="/opt/homebrew/opt/node@20/bin:$PATH" /Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin/pnpm --filter flowise-components test -- src/utils.test.ts --runInBand
  • PATH="/opt/homebrew/opt/node@20/bin:$PATH" /Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin/pnpm --filter flowise-components build
  • PATH="/opt/homebrew/opt/node@20/bin:$PATH" ./node_modules/.bin/eslint packages/components/src/utils.ts packages/components/src/utils.test.ts packages/components/nodes/documentloaders/Pdf/Pdf.ts packages/components/nodes/documentloaders/File/File.ts packages/components/nodes/tools/Arxiv/core.ts --ext ts --report-unused-disable-directives --max-warnings 0
  • PATH="/opt/homebrew/opt/node@20/bin:$PATH" node -e "const u=require('./packages/components/dist/src/utils.js'); u.loadLegacyPdfJs().then(m=>console.log(typeof m.getDocument, m.version)).catch(e=>{console.error(e); process.exit(1)})"
  • PATH="/opt/homebrew/opt/node@20/bin:/Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin:$PATH" ./node_modules/.bin/eslint packages/ui/src/ui-component/extended/FileUpload.jsx --ext js,jsx --report-unused-disable-directives --max-warnings 0
  • PATH="/opt/homebrew/opt/node@20/bin:$PATH" /Users/hermes-runtime/.npm/_npx/ee7ca6831d726ff5/node_modules/.bin/pnpm --filter flowise-ui build

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a loadLegacyPdfJs utility function in packages/components/src/utils.ts to handle the dynamic loading of the pdfjs-dist legacy ESM build, ensuring compatibility within a CommonJS environment by bypassing TypeScript's transpilation. This utility is integrated into the File, Pdf, and Arxiv document loaders. A review comment suggests that loadLegacyPdfJs should return the entire module object rather than a subset containing only getDocument and version to maintain consistency with standard dynamic imports and prevent potential breakage for consumers expecting other module exports.

Comment on lines +42 to +50
): Promise<{ getDocument: unknown; version?: string }> => {
const modulePath = resolver('pdfjs-dist/legacy/build/pdf.mjs')
const pdfjs = await importer(pathToFileURL(modulePath).href)

return {
getDocument: pdfjs.getDocument,
version: pdfjs.version
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loadLegacyPdfJs function currently returns a subset of the pdfjs-dist module (getDocument and version). This might break compatibility if the consumer (e.g., PDFLoader or a user-defined script) expects other parts of the module, such as GlobalWorkerOptions or OPS. It is safer and more flexible to return the entire module object, matching the behavior of a standard dynamic import(). Per repository guidelines, using a default implementation is preferred unless specific behavior is required, and signature changes are acceptable for internal functions if callers are unaffected.

export const loadLegacyPdfJs = async (
    importer: NativeModuleImporter = nativeImport,
    resolver: ModuleResolver = (specifier) => require.resolve(specifier)
): Promise<any> => {
    const modulePath = resolver('pdfjs-dist/legacy/build/pdf.mjs')
    return await importer(pathToFileURL(modulePath).href)
}
References
  1. Use a default (fallback) implementation unless the specific implementation has meaningfully different behavior or provides better error messages.
  2. A signature change that would be a breaking change for a public API is acceptable for an internal function if all its callers are known and unaffected by the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant