Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
cloudooo cloudooo
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Labels
    • Labels
  • Merge requests 7
    • Merge requests 7
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Members
    • Members
  • Activity
  • Graph
  • Jobs
  • Commits
Collapse sidebar
  • nexedi
  • cloudooocloudooo
  • Merge requests
  • !18

Merged
Created Sep 14, 2018 by Jérome Perrin@jeromeOwner

Support UTF-8 encoded CSV

  • Overview 8
  • Commits 7
  • Pipelines 1
  • Changes 43

Even though CSV encoding is not specified, the de-facto "standard" nowadays seems to be utf-8.

  • Google spreadsheet, export in utf8 by default and autodetect at install
  • libreoffice ask users when saving and importing
  • with Excel, users can choose from the list of formats: "CSV" (which is latin1, at least, with a french locale ) or "CSV UTF-8' (which is utf-8)

The problem was that cloudooo only accept latin1 and because latin1 is a subset of utf-8, it's not even possible to workaround before sending to cloudoo.

So this MR is about changing default encoding for cloudooo importing CSV from latin1 to UTF-8. Some backward compatibility is kept, if file cannot be decoded as utf-8, we just use latin1.

There are some other slightly unrelated changes, but since we are here:

  • "modernize" tests a bit, by using unittest builtin loader instead of re-implementing something. Also split a big class in smaller classes to group tests by topic.
  • fix some warnings at startup about deprecated arguments
  • cloudooo also only supported ; as field delimiter. This is now more clever thanks to python's csv.Sniffer
  • Correct current failures from master. I did not try too much, but ignore docy the same way doc was already ignored. We can do better if you want.
Assignee
Assign to
Reviewer
Request review from
None
Milestone
None
Assign milestone
Time tracking
Source branch: feat/csv-utf8
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7