Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
5cd2f0d4
Commit
5cd2f0d4
authored
Sep 21, 2000
by
Marc-André Lemburg
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Updated according to the changes made to the "s#" parser marker
and bumped the version number to 1.7.
parent
b425f5e3
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
27 additions
and
20 deletions
+27
-20
Misc/unicode.txt
Misc/unicode.txt
+27
-20
No files found.
Misc/unicode.txt
View file @
5cd2f0d4
=============================================================================
Python Unicode Integration Proposal Version: 1.
6
Python Unicode Integration Proposal Version: 1.
7
-----------------------------------------------------------------------------
...
...
@@ -738,16 +738,26 @@ type).
Buffer Interface:
-----------------
Implement the buffer interface using the <defenc> Python string
object as basis for bf_getcharbuf (corresponds to the "t#" argument
parsing marker) and the internal buffer for bf_getreadbuf (corresponds
to the "s#" argument parsing marker). If bf_getcharbuf is requested
and the <defenc> object does not yet exist, it is created first.
Implement the buffer interface using the <defenc> Python string object
as basis for bf_getcharbuf and the internal buffer for
bf_getreadbuf. If bf_getcharbuf is requested and the <defenc> object
does not yet exist, it is created first.
Note that as special case, the parser marker "s#" will not return raw
Unicode UTF-16 data (which the bf_getreadbuf returns), but instead
tries to encode the Unicode object using the default encoding and then
returns a pointer to the resulting string object (or raises an
exception in case the conversion fails). This was done in order to
prevent accidentely writing binary data to an output stream which the
other end might not recognize.
This has the advantage of being able to write to output streams (which
typically use this interface) without additional specification of the
encoding to use.
If you need to access the read buffer interface of Unicode objects,
use the PyObject_AsReadBuffer() interface.
The internal format can also be accessed using the 'unicode-internal'
codec, e.g. via u.encode('unicode-internal').
...
...
@@ -815,14 +825,11 @@ These markers are used by the PyArg_ParseTuple() APIs:
"s": For Unicode objects: return a pointer to the object's
<defenc> buffer (which uses the <default encoding>).
"s#": Access to the Unicode object via the bf_getreadbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not the Unicode string length (this may be different
depending on the Internal Format).
"s#": Access to the default encoded version of the Unicode object
(see Buffer Interface); note that the length relates to the length
of the default encoded string rather than the Unicode object length.
"t#": Access to the Unicode object via the bf_getcharbuf buffer interface
(see Buffer Interface); note that the length relates to the buffer
length, not necessarily to the Unicode string length.
"t#": Same as "s#".
"es":
Takes two parameters: encoding (const char *) and
...
...
@@ -934,14 +941,13 @@ Using "es#" with a pre-allocated buffer:
File/Stream Output:
-------------------
Since file.write(object) and most other stream writers use the "s#"
argument parsing marker for binary files and "t#" for text files
, the
buffer interface implementation determines the encoding to use (see
Buffer Interface).
Since file.write(object) and most other stream writers use the "s#"
or
"t#" argument parsing marker for querying the data to write
, the
default encoded string version of the Unicode object will be written
to the streams (see
Buffer Interface).
For explicit handling of files using Unicode, the standard
stream codecs as available through the codecs module should
be used.
For explicit handling of files using Unicode, the standard stream
codecs as available through the codecs module should be used.
The codecs module should provide a short-cut open(filename,mode,encoding)
available which also assures that mode contains the 'b' character when
...
...
@@ -1043,6 +1049,7 @@ Encodings:
History of this Proposal:
-------------------------
1.7: Added note about the changed behaviour of "s#".
1.6: Changed <defencstr> to <defenc> since this is the name used in the
implementation. Added notes about the usage of <defenc> in the
buffer protocol implementation.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment