Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
erp5
erp5
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Merge Requests 135
    • Merge Requests 135
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Jobs
  • Commits
  • nexedi
  • erp5erp5
  • Merge Requests
  • !1615

Merged
Opened May 03, 2022 by Levin Zimmermann@levin.zimmermann
  • Report abuse
Report abuse

Allow patched pandas.read_* in restricted Python

(please see wendelin!99 (closed))

This merge request aims to add restricted (secure) versions of selected pandas.read_* functions, so that they can be used in the zope sandbox. It does so by monkey-patching the respective functions as suggested by @tatuya.

The monkey-patched versions only allow str inputs, instances of any other data type (e.g. pathlib.Path, bytes) are prohibited. If the parsed argument is a str, it will convert it to a StringIO instance and further parse it to the original function.

In this way the restriction prevents that users can parse urls, file paths, etc. which would be downloaded by pandas. The implementation is heuristic, since it makes the assumption that pandas won't search for urls etc. in StringIO instances (which is true in 0.19.x, as the tests showed).

Besides general code issues, my main question is: can the given solution be considered to be secure or should I move to other solutions (see also solution in Wendelin SR and Note below)?

Then: I didn't add read_html yet, because it has an additional unresolved dependency (html5lib). Should we add this dependency and also add read_html?

Regarding the tests: For "the proof of security" the tests need to use working links from which pandas can actually initialize DataFrame instances if the function wouldn't be restricted in correct way. For now I simply added random real-word links, which should be replaced.

Note:

Initially I wanted to patch read_json more explicitly, e.g. more or less directly parse the given argument to pandas internal FrameParser or SeriesParser and simply bypass all of pandas url-etc. parsing parts. But this solution has other disadvantages:

  1. Since we would have to rely on code which doesn't belong to the public api of pandas it would potentially break when switching pandas version (in fact in pandas up-to-date version the mentioned classes have been moved to other modules).

  2. We may have to duplicate code in order to provide users with mostly the same api (rich set of different kwargs and args) and we would have to adapt it when moving to newer pandas versions in order to avoid unexpected behaviour.

  3. This patch wouldn't be generic for the selected read_* functions, because the code structure between pandas different read functions differ widely.

Edited May 18, 2022 by Levin Zimmermann

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch https://lab.nexedi.com/levin.zimmermann/erp5.git add-restricted-pandas-read-functions
git checkout -b levin.zimmermann/erp5-add-restricted-pandas-read-functions FETCH_HEAD

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git fetch origin
git checkout origin/master
git merge --no-ff levin.zimmermann/erp5-add-restricted-pandas-read-functions

Step 4. Push the result of the merge to GitLab

git push origin master

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

  • Discussion 52
  • Commits 1
  • Changes 3
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
0
Labels
None
Assign labels
  • View project labels
Reference: nexedi/erp5!1615

Revert this merge request

This will create a new commit in order to revert the existing changes.

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.

Cherry-pick this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备14008524号