Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
5cd2f0d4
Commit
5cd2f0d4
authored
Sep 21, 2000
by
Marc-André Lemburg
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Updated according to the changes made to the "s#" parser marker
and bumped the version number to 1.7.
parent
b425f5e3
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
27 additions
and
20 deletions
+27
-20
Misc/unicode.txt
Misc/unicode.txt
+27
-20
No files found.
Misc/unicode.txt
View file @
5cd2f0d4
=============================================================================
=============================================================================
Python Unicode Integration Proposal Version: 1.
6
Python Unicode Integration Proposal Version: 1.
7
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
...
@@ -738,16 +738,26 @@ type).
...
@@ -738,16 +738,26 @@ type).
Buffer Interface:
Buffer Interface:
-----------------
-----------------
Implement the buffer interface using the <defenc> Python string
Implement the buffer interface using the <defenc> Python string object
object as basis for bf_getcharbuf (corresponds to the "t#" argument
as basis for bf_getcharbuf and the internal buffer for
parsing marker) and the internal buffer for bf_getreadbuf (corresponds
bf_getreadbuf. If bf_getcharbuf is requested and the <defenc> object
to the "s#" argument parsing marker). If bf_getcharbuf is requested
does not yet exist, it is created first.
and the <defenc> object does not yet exist, it is created first.
Note that as special case, the parser marker "s#" will not return raw
Unicode UTF-16 data (which the bf_getreadbuf returns), but instead
tries to encode the Unicode object using the default encoding and then
returns a pointer to the resulting string object (or raises an
exception in case the conversion fails). This was done in order to
prevent accidentely writing binary data to an output stream which the
other end might not recognize.
This has the advantage of being able to write to output streams (which
This has the advantage of being able to write to output streams (which
typically use this interface) without additional specification of the
typically use this interface) without additional specification of the
encoding to use.
encoding to use.
If you need to access the read buffer interface of Unicode objects,
use the PyObject_AsReadBuffer() interface.
The internal format can also be accessed using the 'unicode-internal'
The internal format can also be accessed using the 'unicode-internal'
codec, e.g. via u.encode('unicode-internal').
codec, e.g. via u.encode('unicode-internal').
...
@@ -815,14 +825,11 @@ These markers are used by the PyArg_ParseTuple() APIs:
...
@@ -815,14 +825,11 @@ These markers are used by the PyArg_ParseTuple() APIs:
"s": For Unicode objects: return a pointer to the object's
"s": For Unicode objects: return a pointer to the object's
<defenc> buffer (which uses the <default encoding>).
<defenc> buffer (which uses the <default encoding>).
"s#": Access to the Unicode object via the bf_getreadbuf buffer interface
"s#": Access to the default encoded version of the Unicode object
(see Buffer Interface); note that the length relates to the buffer
(see Buffer Interface); note that the length relates to the length
length, not the Unicode string length (this may be different
of the default encoded string rather than the Unicode object length.
depending on the Internal Format).
"t#": Access to the Unicode object via the bf_getcharbuf buffer interface
"t#": Same as "s#".
(see Buffer Interface); note that the length relates to the buffer
length, not necessarily to the Unicode string length.
"es":
"es":
Takes two parameters: encoding (const char *) and
Takes two parameters: encoding (const char *) and
...
@@ -934,14 +941,13 @@ Using "es#" with a pre-allocated buffer:
...
@@ -934,14 +941,13 @@ Using "es#" with a pre-allocated buffer:
File/Stream Output:
File/Stream Output:
-------------------
-------------------
Since file.write(object) and most other stream writers use the "s#"
Since file.write(object) and most other stream writers use the "s#"
or
argument parsing marker for binary files and "t#" for text files
, the
"t#" argument parsing marker for querying the data to write
, the
buffer interface implementation determines the encoding to use (see
default encoded string version of the Unicode object will be written
Buffer Interface).
to the streams (see
Buffer Interface).
For explicit handling of files using Unicode, the standard
For explicit handling of files using Unicode, the standard stream
stream codecs as available through the codecs module should
codecs as available through the codecs module should be used.
be used.
The codecs module should provide a short-cut open(filename,mode,encoding)
The codecs module should provide a short-cut open(filename,mode,encoding)
available which also assures that mode contains the 'b' character when
available which also assures that mode contains the 'b' character when
...
@@ -1043,6 +1049,7 @@ Encodings:
...
@@ -1043,6 +1049,7 @@ Encodings:
History of this Proposal:
History of this Proposal:
-------------------------
-------------------------
1.7: Added note about the changed behaviour of "s#".
1.6: Changed <defencstr> to <defenc> since this is the name used in the
1.6: Changed <defencstr> to <defenc> since this is the name used in the
implementation. Added notes about the usage of <defenc> in the
implementation. Added notes about the usage of <defenc> in the
buffer protocol implementation.
buffer protocol implementation.
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment