Skip to content

Replace deprecated pyarrow.HadoopFileSystem import with pyarrow.fs.HadoopFileSystem#870

Open
huyhoang171106 wants to merge 1 commit intonteract:mainfrom
huyhoang171106:fix/replace-deprecated-pyarrow-hadoopfilesys
Open

Replace deprecated pyarrow.HadoopFileSystem import with pyarrow.fs.HadoopFileSystem#870
huyhoang171106 wants to merge 1 commit intonteract:mainfrom
huyhoang171106:fix/replace-deprecated-pyarrow-hadoopfilesys

Conversation

@huyhoang171106
Copy link
Copy Markdown

Summary

The warning is caused by an outdated import in papermill/iorw.py (around the reported line 50): from pyarrow import HadoopFileSystem. In PyArrow 2.0+, this symbol is deprecated at the top-level and should be imported from pyarrow.fs. Papermill’s optional dependency already allows pyarrow>=2, so current environments will keep emitting FutureWarning until this import is updated. This is not an immediate runtime failure, but it is a forward-compatibility risk (likely breakage in a future PyArrow release).

Files changed

  • papermill/iorw.py (modified)

Testing

  • Not run in this environment.

What does this PR do?

Fixes #<issue_number>

Closes #816

…pyarrow.fs.hadoopfilesystem`

Signed-off-by: Nguyen Huy Hoang <181364121+huyhoang171106@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 26, 2026 14:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Papermill’s PyArrow/HDFS optional dependency handling in papermill/iorw.py, intending to address a deprecation warning / forward-compatibility concern around HadoopFileSystem imports.

Changes:

  • Adds a missing-dependency fallback assignment for FileSelector when pyarrow is not installed.
  • Keeps (already-present) pyarrow.fs import for FileSelector and HadoopFileSystem.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread papermill/iorw.py
Comment on lines +52 to 54
FileSelector = missing_dependency_generator("pyarrow", "hdfs")
HadoopFileSystem = missing_dependency_generator("pyarrow", "hdfs")

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing_dependency_generator returns a zero-arg function. Assigning it to FileSelector/HadoopFileSystem means later calls like HadoopFileSystem(host="default") and FileSelector(path) will raise TypeError (unexpected args) instead of the intended PapermillOptionalDependencyException. Wrap the generator so the stub accepts *args, **kwargs (or adjust missing_dependency_generator) and raises the optional-dependency exception for both symbols; consider adding a regression test for missing-pyarrow behavior in the HDFS handler.

Suggested change
FileSelector = missing_dependency_generator("pyarrow", "hdfs")
HadoopFileSystem = missing_dependency_generator("pyarrow", "hdfs")
_missing_pyarrow_hdfs = missing_dependency_generator("pyarrow", "hdfs")
def FileSelector(*args, **kwargs):
return _missing_pyarrow_hdfs()
def HadoopFileSystem(*args, **kwargs):
return _missing_pyarrow_hdfs()

Copilot uses AI. Check for mistakes.
Comment thread papermill/iorw.py
Comment on lines 49 to +52
try:
from pyarrow.fs import FileSelector, HadoopFileSystem
except ImportError:
FileSelector = missing_dependency_generator("pyarrow", "hdfs")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description says it replaces a deprecated from pyarrow import HadoopFileSystem import, but the code already imports HadoopFileSystem from pyarrow.fs here. Please double-check whether the intended import change is already present on the target branch (and update the PR description/title), or whether this diff is missing the actual deprecated-import replacement.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FutureWarning: pyarrow.HadoopFileSystem is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.

2 participants