- 15 Aug, 2008 1 commit
-
-
Stefan Behnel authored
String literals pass through the compiler as follows: - unicode string literals are stored as unicode strings and encoded to UTF-8 on the way out - byte string literals are stored as correctly encoded byte strings by unescaping the source string literal into the corresponding byte sequence. No further encoding is done later on! - char literals are stored as byte strings of length 1. This can be verified by the parser now, e.g. a non-ASCII char literal in UTF-8 source code will result in an error, as it would end up as two or more bytes in the C code, which can no longer be represented as a C char. Storing byte strings is necessary as we otherwise loose the ability to encode byte string literals on the way out. They do not necessarily contain only bytes that fit into the source code encoding as the source can use escape sequences to represent them. Previously, ASCII encoded source code could not contain byte string literals with properly escaped non-ASCII bytes. Another bug that was fixed: in Python, escape sequences behave different in unicode strings (where they represent the character code) and byte strings (where they represent a byte value). Previously, they resulted in the same byte value in Cython code. This is only a problem for non-ASCII escapes, since the character code and the byte value of ASCII characters are identical.
-
- 14 Aug, 2008 6 commits
-
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Robert Bradshaw authored
-
Robert Bradshaw authored
-
Stefan Behnel authored
-
Robert Bradshaw authored
-
- 13 Aug, 2008 12 commits
-
-
Dag Sverre Seljebotn authored
-
Dag Sverre Seljebotn authored
-
Dag Sverre Seljebotn authored
-
Dag Sverre Seljebotn authored
-
Dag Sverre Seljebotn authored
-
Robert Bradshaw authored
-
Robert Bradshaw authored
-
Robert Bradshaw authored
-
Robert Bradshaw authored
-
Robert Bradshaw authored
-
Robert Bradshaw authored
All test pass but bufaccess, tnumpy, and r_mang1.
-
Robert Bradshaw authored
-
- 12 Aug, 2008 11 commits
-
-
Stefan Behnel authored
-
Stefan Behnel authored
fixes the unicode literal indexing problem (only for unicode strings, not for byte strings!)
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
otherwise, different ways of spelling special characters can end up being correctly escaped or not in the C file
-
Stefan Behnel authored
-
- 11 Aug, 2008 5 commits
-
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
- 10 Aug, 2008 5 commits
-
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
-
Stefan Behnel authored
fixed unicode escape handling in byte strings unescape \xXY in string literals as C allows it to conflict with trailing hex numbers - output string escaping will do the right thing
-