Commit 2d969d5d authored by Guido van Rossum's avatar Guido van Rossum

Cosmetic changes; added sections on notation and on objects;

new grammar (global, '==').
parent 03228857
......@@ -60,6 +60,69 @@ informal introduction to the language, see the {\em Python Tutorial}.
This reference manual describes the Python programming language.
It is not intended as a tutorial.
While I am trying to be as precise as possible, I chose to use English
rather than formal specifications for everything except syntax and
lexical analysis. This should make the document better understandable
to the average reader, but will leave room for ambiguities.
Consequently, if you were coming from Mars and tried to re-implement
Python from this document alone, you might in fact be implementing
quite a different language. On the other hand, if you are using
Python and wonder what the precise rules about a particular area of
the language are, you should be able to find it here.
It is dangerous to add too many implementation details to a language
reference document -- the implementation may change, and other
implementations of the same language may work differently. On the
other hand, there is currently only one Python implementation, and
particular quirks of it are sometimes worth mentioning, especially
where it differs from the ``ideal'' specification.
Every Python implementation comes with a number of built-in and
standard modules. These are not documented here, but in the separate
{\em Python Library Reference} document. A few built-in modules are
mentioned when they interact in a significant way with the language
The descriptions of lexical analysis and syntax use a modified BNF
grammar notation. This uses the following style of definition:
name: lcletter (lcletter | "_")*
lcletter: "a"..."z"
The first line says that a \verb\name\ is a \verb\lcletter\ followed by
a sequence of zero or more \verb\lcletter\s and underscores. A
\verb\lcletter\ in turn is any of the single characters `a' through `z'.
(This rule is actually adhered to for the names defined in syntax and
grammar rules in this document.)
Each rule begins with a name (which is the name defined by the rule)
followed by a colon. Each rule is wholly contained on one line. A
vertical bar (\verb\|\) is used to separate alternatives, it is the
least binding operator in this notation. A star (\verb\*\) means zero
or more repetitions of the preceding item; likewise, a plus (\verb\+\)
means one or more repetitions and a question mark (\verb\?\) zero or
one (in other words, the preceding item is optional). These three
operators bind as tight as possible; parentheses are used for
grouping. Literal strings are enclosed in double quotes. White space
is only meaningful to separate tokens.
In lexical definitions (as the example above), two more conventions
are used: Two literal characters separated by three dots mean a choice
of any single character in the given (inclusive) range of ASCII
characters. A phrase between angular brackets (\verb\<...>\) gives an
informal description of the symbol defined; e.g., this could be used
to describe the notion of `control character' if needed.
Although the notation used is almost the same, there is a big
difference between the meaning of lexical and syntactic definitions:
a lexical definition operates on the individual characters of the
input source, while a syntax definition operates on the stream of
tokens generated by the lexical analysis.
\chapter{Lexical analysis}
A Python program is read by a {\em parser}. Input to the parser is a
......@@ -130,11 +193,6 @@ Spaces and tabs are not tokens, but serve to delimit tokens. Where
ambiguity exists, a token comprises the longest possible string that
forms a legal token, when read from left to right.
Tokens are described using an extended regular expression notation.
This is similar to the extended BNF notation used later, except that
the notation \verb\<...>\ is used to give an informal description of a
character, and that spaces and tabs are not to be ignored.
Identifiers are described by the following regular expressions:
......@@ -142,9 +200,9 @@ Identifiers are described by the following regular expressions:
identifier: (letter|"_") (letter|digit|"_")*
letter: lowercase | uppercase
lowercase: "a"|"b"|...|"z"
uppercase: "A"|"B"|...|"Z"
digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
lowercase: "a"..."z"
uppercase: "A"..."Z"
digit: "0"..."9"
Identifiers are unlimited in length. Case is significant.
......@@ -156,13 +214,14 @@ keywords} of the language, and may not be used as ordinary
identifiers. They must be spelled exactly as written here:
and del for is raise
break elif from not return
class else if or try
continue except import pass while
def finally in print
and del for in print
break elif from is raise
class else global not return
continue except if or try
def finally import pass while
% # This Python program sorts and formats the above table
% import string
% l = []
% try:
......@@ -185,8 +244,8 @@ String literals are described by the following regular expressions:
stringliteral: "'" stringitem* "'"
stringitem: stringchar | escapeseq
stringchar: <any character except newline or "\" or "'">
escapeseq: "'" <any character except newline>
stringchar: <any ASCII character except newline or "\" or "'">
escapeseq: "'" <any ASCII character except newline>
String literals cannot span physical line boundaries. Escape
......@@ -208,7 +267,7 @@ are:
\verb/\t/ & ASCII Horizontal Tab (TAB) \\
\verb/\v/ & ASCII Vertical Tab (VT) \\
\verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\
\verb/\x/{em xx...} & ASCII character with hex value {\em xx} \\
\verb/\x/{em xx...} & ASCII character with hex value {\em xx...} \\
......@@ -221,9 +280,10 @@ are used...).
All unrecognized escape sequences are left in the string {\em
unchanged}, i.e., the backslash is left in the string. (This rule is
useful when debugging: if an escape sequence is mistyped, the
resulting output is more easily recognized as broken. It also helps
somewhat for string literals used as regular expressions or otherwise
passed to other modules that do their own escape handling.)
resulting output is more easily recognized as broken. It also helps a
great deal for string literals used as regular expressions or
otherwise passed to other modules that do their own escape handling --
but you may end up quadrupling backslashes that must appear literally.)
\subsection{Numeric literals}
......@@ -239,9 +299,9 @@ decimalinteger: nonzerodigit digit* | "0"
octinteger: "0" octdigit+
hexinteger: "0" ("x"|"X") hexdigit+
nonzerodigit: "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
octdigit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
hexdigit: digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
nonzerodigit: "1"..."9"
octdigit: "0"..."7"
hexdigit: digit|"a"..."f"|"A"..."F"
Floating point numbers are described by the following regular expressions:
......@@ -260,16 +320,20 @@ The following tokens are operators:
+ - * / %
<< >> & | ^ ~
< = == > <= <> != >=
< == > <= <> != >=
The comparison operators \verb\<>\ and \verb\!=\ are alternate
spellings of the same operator.
The following tokens are delimiters:
The following tokens serve as delimiters or otherwise have a special
( ) [ ] { }
; , : . `
; , : . ` =
The following printing ASCII characters are currently not used;
......@@ -281,35 +345,83 @@ their occurrence is an unconditional error:
\chapter{Execution model}
(XXX This chapter should explain the general model
of the execution of Python code and
the evaluation of expressions.
It should introduce objects, values, code blocks, scopes, name spaces,
name binding,
types, sequences, numbers, mappings,
exceptions, and other technical terms needed to make the following
chapters concise and exact.)
(XXX This chapter should explain the general model of the execution of
Python code and the evaluation of expressions. It should introduce
objects, values, code blocks, scopes, name spaces, name binding,
types, sequences, numbers, mappings, exceptions, and other technical
terms needed to make the following chapters concise and exact.)
\section{Objects, values and types}
I won't try to define rigorously here what an object is, but I'll give
some properties of objects that are important to know about.
Every object has an identity, a type and a value. An object's {\em
identity} never changes once it has been created; think of it as the
object's (permanent) address. An object's {\em type} determines the
operations that an object supports (e.g., can its length be taken?)
and also defines the ``meaning'' of the object's value; it also never
changes. The {\em value} of some objects can change; whether an
object's value can change is a property of its type.
Objects are never explicitly destroyed; however, when they become
unreachable they may be garbage-collected. An implementation,
however, is allowed to delay garbage collection or omit it altogether
-- it is a matter of implementation quality how garbage collection is
implemented. (Implementation note: the current implementation uses a
reference-counting scheme which collects most objects as soon as they
become onreachable, but does not detect garbage containing circular
(Some objects contain references to ``external'' resources such as
open files. It is understood that these resources are freed when the
object is garbage-collected, but since garbage collection is not
guaranteed such objects also provide an explicit way to release the
external resource (e.g., a \verb\close\ method) and programs are
recommended to use this.)
Some objects contain references to other objects. These references
are part of the object's value; in most cases, when such a
``container'' object is compared to another (of the same type), the
comparison takes the {\em values} of the referenced objects into
account (not their identities).
Except for their identity, types affect almost any aspect of objects.
Even object identities are affected in some sense: for immutable
types, operations that compute new values may actually return a
reference to an existing object with the same type and value, while
for mutable objects this is not allowed. E.g., after
a = 1; b = 1; c = []; d = []
\verb\a\ and \verb\b\ may or may not refer to the same object, but
\verb\c\ and \verb\d\ are guaranteed to refer to two different, unique,
newly created lists.
\section{Execution frames, name spaces, and scopes}
\chapter{Expressions and conditions}
(From now on, extended BNF notation will be used to describe
syntax, not lexical analysis.)
(XXX Explain the notation.)
From now on, extended BNF notation will be used to describe syntax,
not lexical analysis.
This chapter explains the meaning of the elements of expressions and
conditions. Conditions are a superset of expressions, and a condition
may be used where an expression is required by enclosing it in
parentheses. The only place where an unparenthesized condition
is not allowed is on the right-hand side of the assignment operator,
because this operator is the same token (\verb\=\) as used for
The comma plays a somewhat special role in Python's syntax.
It is an operator with a lower precedence than all others, but
occasionally serves other purposes as well (e.g., it has special
semantics in print statements). When a comma is accepted by the
syntax, one of the syntactic categories \verb\expression_list\
or \verb\condition_list\ is always used.
parentheses. The only place where an unparenthesized condition is not
allowed is on the right-hand side of the assignment operator, because
this operator is the same token (\verb\=\) as used for compasisons.
The comma plays a somewhat special role in Python's syntax. It is an
operator with a lower precedence than all others, but occasionally
serves other purposes as well (e.g., it has special semantics in print
statements). When a comma is accepted by the syntax, one of the
syntactic categories \verb\expression_list\ or \verb\condition_list\
is always used.
When (one alternative of) a syntax rule has the form
......@@ -351,11 +463,11 @@ Syntax rules for atoms:
atom: identifier | literal | parenth_form | string_conversion
literal: stringliteral | integer | longinteger | floatnumber
parenth_form: enclosure | list_display | dict_display
enclosure: '(' [condition_list] ')'
list_display: '[' [condition_list] ']'
dict_display: '{' [key_datum (',' key_datum)* [','] '}'
key_datum: condition ':' condition
string_conversion:'`' condition_list '`'
enclosure: "(" [condition_list] ")"
list_display: "[" [condition_list] "]"
dict_display: "{" [key_datum ("," key_datum)* [","] "}"
key_datum: condition ":" condition
string_conversion:"`" condition_list "`"
\subsection{Identifiers (Names)}
......@@ -413,10 +525,9 @@ define the entries of the dictionary:
each key object is used as a key into the dictionary to store
the corresponding datum pair.
Key objects must be strings, otherwise a {\tt TypeError}
exception is raised.
Clashes between keys are not detected; the last datum stored for a given
key value prevails.
Keys must be strings, otherwise a {\tt TypeError} exception is raised.
Clashes between keys are not detected; the last datum (textually
rightmost in the display) stored for a given key value prevails.
\subsection{String conversions}
......@@ -445,10 +556,10 @@ Their syntax is:
primary: atom | attributeref | call | subscription | slicing
attributeref: primary '.' identifier
call: primary '(' [condition_list] ')'
subscription: primary '[' condition ']'
slicing: primary '[' [condition] ':' [condition] ']'
attributeref: primary "." identifier
call: primary "(" [condition_list] ")"
subscription: primary "[" condition "]"
slicing: primary "[" [condition] ":" [condition] "]"
\subsection{Attribute references}
......@@ -465,7 +576,7 @@ Factors represent the unary numeric operators.
Their syntax is:
factor: primary | '-' factor | '+' factor | '~' factor
factor: primary | "-" factor | "+" factor | "~" factor
The unary \verb\-\ operator yields the negative of its numeric argument.
......@@ -483,7 +594,7 @@ a {\tt TypeError} exception is raised.
Terms represent the most tightly binding binary operators:
term: factor | term '*' factor | term '/' factor | term '%' factor
term: factor | term "*" factor | term "/" factor | term "%" factor
The \verb\*\ operator yields the product of its arguments.
......@@ -494,13 +605,13 @@ and then multiplied together.
In the latter case, string repetition is performed; a negative
repetition factor yields the empty string.
The \verb|'/'| operator yields the quotient of its arguments.
The \verb|"/"| operator yields the quotient of its arguments.
The numeric arguments are first converted to a common type.
(Short or long) integer division yields an integer of the same type,
truncating towards zero.
Division by zero raises a {\tt RuntimeError} exception.
The \verb|'%'| operator yields the remainder from the division
The \verb|"%"| operator yields the remainder from the division
of the first argument by the second.
The numeric arguments are first converted to a common type.
The outcome of $x \% y$ is defined as $x - y*trunc(x/y)$.
......@@ -511,28 +622,28 @@ $3.14 \% 0.7$ equals $0.34$.
\section{Arithmetic expressions}
arith_expr: term | arith_expr '+' term | arith_expr '-' term
arith_expr: term | arith_expr "+" term | arith_expr "-" term
The \verb|'+'| operator yields the sum of its arguments.
The \verb|"+"| operator yields the sum of its arguments.
The arguments must either both be numbers, or both strings.
In the former case, the numbers are converted to a common type
and then added together.
In the latter case, the strings are concatenated directly,
without inserting a space.
The \verb|'-'| operator yields the difference of its arguments.
The \verb|"-"| operator yields the difference of its arguments.
The numeric arguments are first converted to a common type.
\section{Shift expressions}
shift_expr: arith_expr | shift_expr '<<' arith_expr | shift_expr '>>' arith_expr
shift_expr: arith_expr | shift_expr "<<" arith_expr | shift_expr ">>" arith_expr
These operators accept short integers as arguments only.
They shift their left argument to the left or right by the number of bits
given by the right argument. Shifts are ``logical'', e.g., bits shifted
given by the right argument. Shifts are ``logical"", e.g., bits shifted
out on one end are lost, and bits shifted in are zero;
negative numbers are shifted as if they were unsigned in C.
Negative shift counts and shift counts greater than {\em or equal to}
......@@ -541,7 +652,7 @@ the word size yield undefined results.
\section{Bitwise AND expressions}
and_expr: shift_expr | and_expr '&' shift_expr
and_expr: shift_expr | and_expr "&" shift_expr
This operator yields the bitwise AND of its arguments,
......@@ -550,7 +661,7 @@ which must be short integers.
\section{Bitwise XOR expressions}
xor_expr: and_expr | xor_expr '^' and_expr
xor_expr: and_expr | xor_expr "^" and_expr
This operator yields the bitwise exclusive OR of its arguments,
......@@ -559,7 +670,7 @@ which must be short integers.
\section{Bitwise OR expressions}
or_expr: xor_expr | or_expr '|' xor_expr
or_expr: xor_expr | or_expr "|" xor_expr
This operator yields the bitwise OR of its arguments,
......@@ -569,7 +680,7 @@ which must be short integers.
expression: or_expression
expr_list: expression (',' expression)* [',']
expr_list: expression ("," expression)* [","]
An expression list containing at least one comma yields a new tuple.
......@@ -587,7 +698,7 @@ To create an empty tuple, use an empty pair of parentheses: \verb\()\.
comparison: expression (comp_operator expression)*
comp_operator: '<'|'>'|'='|'=='|'>='|'<='|'<>'|'!='|['not'] 'in'|is' ['not']
comp_operator: "<"|">"|"=="|">="|"<="|"<>"|"!="|"is" ["not"]|["not"] "in"
Comparisons yield integer value: 1 for true, 0 for false.
......@@ -605,12 +716,9 @@ $e_{n-1} op_n e_n$, except that each expression is evaluated at most once.
Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison
between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal.
For the benefit of C programmers,
the comparison operators \verb\=\ and \verb\==\ are equivalent,
and so are \verb\<>\ and \verb\!=\.
Use of the C variants is discouraged.
The forms \verb\<>\ and \verb\!=\ are equivalent.
The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare
The operators {\tt "<", ">", "==", ">=", "<="}, and {\tt "<>"} compare
the values of two objects. The objects needn't have the same type.
If both are numbers, they are compared to a common type.
Otherwise, objects of different types {\em always} compare unequal,
......@@ -652,9 +760,9 @@ $x {\tt is not} y$ yields the inverse truth value.
condition: or_test
or_test: and_test | or_test 'or' and_test
and_test: not_test | and_test 'and' not_test
not_test: comparison | 'not' not_test
or_test: and_test | or_test "or" and_test
and_test: not_test | and_test "and" not_test
not_test: comparison | "not" not_test
In the context of Boolean operators, and also when conditions are
......@@ -686,7 +794,7 @@ Several simple statements may occor on a single line separated
by semicolons. The syntax for simple statements is:
stmt_list: simple_stmt (';' simple_stmt)* [';']
stmt_list: simple_stmt (";" simple_stmt)* [";"]
simple_stmt: expression_stmt
| assignment
| pass_stmt
......@@ -697,6 +805,7 @@ simple_stmt: expression_stmt
| break_stmt
| continue_stmt
| import_stmt
| global_stmt
\section{Expression statements}
......@@ -718,9 +827,9 @@ do not cause any output.)
assignment: target_list ('=' target_list)* '=' expression_list
target_list: target (',' target)* [',']
target: identifier | '(' target_list ')' | '[' target_list ']'
assignment: target_list ("=" target_list)* "=" expression_list
target_list: target ("," target)* [","]
target: identifier | "(" target_list ")" | "[" target_list "]"
| attributeref | subscription | slicing
......@@ -835,7 +944,7 @@ messages.)
\section{The {\tt pass} statement}
pass_stmt: 'pass'
pass_stmt: "pass"
{\tt pass} is a null operation -- when it is executed,
......@@ -844,7 +953,7 @@ nothing happens.
\section{The {\tt del} statement}
del_stmt: 'del' target_list
del_stmt: "del" target_list
Deletion is recursively defined similar to assignment.
......@@ -866,7 +975,7 @@ right type (but even this is determined by the sliced object).
\section{The {\tt print} statement}
print_stmt: 'print' [ condition (',' condition)* [','] ]
print_stmt: "print" [ condition ("," condition)* [","] ]
{\tt print} evaluates each condition in turn and writes the resulting
......@@ -897,7 +1006,7 @@ standard output instead, but this is not safe, and should be fixed.)
\section{The {\tt return} statement}
return_stmt: 'return' [condition_list]
return_stmt: "return" [condition_list]
\verb\return\ may only occur syntactically nested in a function
......@@ -917,7 +1026,7 @@ before really leaving the function.
\section{The {\tt raise} statement}
raise_stmt: 'raise' condition [',' condition]
raise_stmt: "raise" condition ["," condition]
\verb\raise\ evaluates its first condition, which must yield
......@@ -930,7 +1039,7 @@ with the second one (or \verb\None\) as its parameter.
\section{The {\tt break} statement}
break_stmt: 'break'
break_stmt: "break"
\verb\break\ may only occur syntactically nested in a \verb\for\
......@@ -949,7 +1058,7 @@ before really leaving the loop.
\section{The {\tt continue} statement}
continue_stmt: 'continue'
continue_stmt: "continue"
\verb\continue\ may only occur syntactically nested in a \verb\for\
......@@ -962,9 +1071,17 @@ It continues with the next cycle of the nearest enclosing loop.
\section{The {\tt import} statement}
import_stmt: 'import' identifier (',' identifier)*
| 'from' identifier 'import' identifier (',' identifier)*
| 'from' identifier 'import' '*'
import_stmt: "import" identifier ("," identifier)*
| "from" identifier "import" identifier ("," identifier)*
| "from" identifier "import" "*"
(XXX To be done.)
\section{The {\tt global} statement}
global_stmt: "global" identifier ("," identifier)*
(XXX To be done.)
......@@ -982,48 +1099,49 @@ suite: statement | NEWLINE INDENT statement+ DEDENT
\section{The {\tt if} statement}
if_stmt: 'if' condition ':' suite
('elif' condition ':' suite)*
['else' ':' suite]
if_stmt: "if" condition ":" suite
("elif" condition ":" suite)*
["else" ":" suite]
\section{The {\tt while} statement}
while_stmt: 'while' condition ':' suite ['else' ':' suite]
while_stmt: "while" condition ":" suite ["else" ":" suite]
\section{The {\tt for} statement}
for_stmt: 'for' target_list 'in' condition_list ':' suite
['else' ':' suite]
for_stmt: "for" target_list "in" condition_list ":" suite
["else" ":" suite]
\section{The {\tt try} statement}
try_stmt: 'try' ':' suite
('except' condition [',' condition] ':' suite)*
['finally' ':' suite]
try_stmt: "try" ":" suite
("except" condition ["," condition] ":" suite)*
["finally" ":" suite]
\section{Function definitions}
funcdef: 'def' identifier '(' [parameter_list] ')' ':' suite
parameter_list: parameter (',' parameter)*
parameter: identifier | '(' parameter_list ')'
funcdef: "def" identifier "(" [parameter_list] ")" ":" suite
parameter_list: parameter ("," parameter)*
parameter: identifier | "(" parameter_list ")"
\section{Class definitions}
classdef: 'class' identifier '(' ')' [inheritance] ':' suite
inheritance: '=' identifier '(' ')' (',' identifier '(' ')')*
classdef: "class" identifier [inheritance] ":" suite
inheritance: "(" expression ("," expression)* ")"
XXX Syntax for scripts, modules
XXX Syntax for interactive input, eval, exec, input
XXX New definition of expressions (as conditions)
......@@ -60,6 +60,69 @@ informal introduction to the language, see the {\em Python Tutorial}.
This reference manual describes the Python programming language.
It is not intended as a tutorial.
While I am trying to be as precise as possible, I chose to use English
rather than formal specifications for everything except syntax and
lexical analysis. This should make the document better understandable
to the average reader, but will leave room for ambiguities.
Consequently, if you were coming from Mars and tried to re-implement
Python from this document alone, you might in fact be implementing
quite a different language. On the other hand, if you are using
Python and wonder what the precise rules about a particular area of
the language are, you should be able to find it here.
It is dangerous to add too many implementation details to a language
reference document -- the implementation may change, and other
implementations of the same language may work differently. On the
other hand, there is currently only one Python implementation, and
particular quirks of it are sometimes worth mentioning, especially
where it differs from the ``ideal'' specification.
Every Python implementation comes with a number of built-in and
standard modules. These are not documented here, but in the separate
{\em Python Library Reference} document. A few built-in modules are
mentioned when they interact in a significant way with the language
The descriptions of lexical analysis and syntax use a modified BNF
grammar notation. This uses the following style of definition:
name: lcletter (lcletter | "_")*
lcletter: "a"..."z"
The first line says that a \verb\name\ is a \verb\lcletter\ followed by
a sequence of zero or more \verb\lcletter\s and underscores. A
\verb\lcletter\ in turn is any of the single characters `a' through `z'.
(This rule is actually adhered to for the names defined in syntax and
grammar rules in this document.)
Each rule begins with a name (which is the name defined by the rule)
followed by a colon. Each rule is wholly contained on one line. A
vertical bar (\verb\|\) is used to separate alternatives, it is the
least binding operator in this notation. A star (\verb\*\) means zero
or more repetitions of the preceding item; likewise, a plus (\verb\+\)
means one or more repetitions and a question mark (\verb\?\) zero or
one (in other words, the preceding item is optional). These three
operators bind as tight as possible; parentheses are used for
grouping. Literal strings are enclosed in double quotes. White space
is only meaningful to separate tokens.
In lexical definitions (as the example above), two more conventions
are used: Two literal characters separated by three dots mean a choice
of any single character in the given (inclusive) range of ASCII
characters. A phrase between angular brackets (\verb\<...>\) gives an
informal description of the symbol defined; e.g., this could be used
to describe the notion of `control character' if needed.
Although the notation used is almost the same, there is a big
difference between the meaning of lexical and syntactic definitions:
a lexical definition operates on the individual characters of the
input source, while a syntax definition operates on the stream of
tokens generated by the lexical analysis.
\chapter{Lexical analysis}
A Python program is read by a {\em parser}. Input to the parser is a
......@@ -130,11 +193,6 @@ Spaces and tabs are not tokens, but serve to delimit tokens. Where
ambiguity exists, a token comprises the longest possible string that
forms a legal token, when read from left to right.
Tokens are described using an extended regular expression notation.
This is similar to the extended BNF notation used later, except that
the notation \verb\<...>\ is used to give an informal description of a
character, and that spaces and tabs are not to be ignored.
Identifiers are described by the following regular expressions:
......@@ -142,9 +200,9 @@ Identifiers are described by the following regular expressions:
identifier: (letter|"_") (letter|digit|"_")*
letter: lowercase | uppercase
lowercase: "a"|"b"|...|"z"
uppercase: "A"|"B"|...|"Z"
digit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
lowercase: "a"..."z"
uppercase: "A"..."Z"
digit: "0"..."9"
Identifiers are unlimited in length. Case is significant.
......@@ -156,13 +214,14 @@ keywords} of the language, and may not be used as ordinary
identifiers. They must be spelled exactly as written here:
and del for is raise
break elif from not return
class else if or try
continue except import pass while
def finally in print
and del for in print
break elif from is raise
class else global not return
continue except if or try
def finally import pass while
% # This Python program sorts and formats the above table
% import string
% l = []
% try:
......@@ -185,8 +244,8 @@ String literals are described by the following regular expressions:
stringliteral: "'" stringitem* "'"
stringitem: stringchar | escapeseq
stringchar: <any character except newline or "\" or "'">
escapeseq: "'" <any character except newline>
stringchar: <any ASCII character except newline or "\" or "'">
escapeseq: "'" <any ASCII character except newline>
String literals cannot span physical line boundaries. Escape
......@@ -208,7 +267,7 @@ are:
\verb/\t/ & ASCII Horizontal Tab (TAB) \\
\verb/\v/ & ASCII Vertical Tab (VT) \\
\verb/\/{\em ooo} & ASCII character with octal value {\em ooo} \\
\verb/\x/{em xx...} & ASCII character with hex value {\em xx} \\
\verb/\x/{em xx...} & ASCII character with hex value {\em xx...} \\
......@@ -221,9 +280,10 @@ are used...).
All unrecognized escape sequences are left in the string {\em
unchanged}, i.e., the backslash is left in the string. (This rule is
useful when debugging: if an escape sequence is mistyped, the
resulting output is more easily recognized as broken. It also helps
somewhat for string literals used as regular expressions or otherwise
passed to other modules that do their own escape handling.)
resulting output is more easily recognized as broken. It also helps a
great deal for string literals used as regular expressions or
otherwise passed to other modules that do their own escape handling --
but you may end up quadrupling backslashes that must appear literally.)
\subsection{Numeric literals}
......@@ -239,9 +299,9 @@ decimalinteger: nonzerodigit digit* | "0"
octinteger: "0" octdigit+
hexinteger: "0" ("x"|"X") hexdigit+
nonzerodigit: "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"
octdigit: "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"
hexdigit: digit|"a"|"b"|"c"|"d"|"e"|"f"|"A"|"B"|"C"|"D"|"E"|"F"
nonzerodigit: "1"..."9"
octdigit: "0"..."7"
hexdigit: digit|"a"..."f"|"A"..."F"
Floating point numbers are described by the following regular expressions:
......@@ -260,16 +320,20 @@ The following tokens are operators:
+ - * / %
<< >> & | ^ ~
< = == > <= <> != >=
< == > <= <> != >=
The comparison operators \verb\<>\ and \verb\!=\ are alternate
spellings of the same operator.
The following tokens are delimiters:
The following tokens serve as delimiters or otherwise have a special
( ) [ ] { }
; , : . `
; , : . ` =
The following printing ASCII characters are currently not used;
......@@ -281,35 +345,83 @@ their occurrence is an unconditional error:
\chapter{Execution model}
(XXX This chapter should explain the general model
of the execution of Python code and
the evaluation of expressions.
It should introduce objects, values, code blocks, scopes, name spaces,
name binding,
types, sequences, numbers, mappings,
exceptions, and other technical terms needed to make the following
chapters concise and exact.)
(XXX This chapter should explain the general model of the execution of
Python code and the evaluation of expressions. It should introduce
objects, values, code blocks, scopes, name spaces, name binding,
types, sequences, numbers, mappings, exceptions, and other technical
terms needed to make the following chapters concise and exact.)
\section{Objects, values and types}
I won't try to define rigorously here what an object is, but I'll give
some properties of objects that are important to know about.
Every object has an identity, a type and a value. An object's {\em
identity} never changes once it has been created; think of it as the
object's (permanent) address. An object's {\em type} determines the
operations that an object supports (e.g., can its length be taken?)
and also defines the ``meaning'' of the object's value; it also never
changes. The {\em value} of some objects can change; whether an
object's value can change is a property of its type.
Objects are never explicitly destroyed; however, when they become
unreachable they may be garbage-collected. An implementation,
however, is allowed to delay garbage collection or omit it altogether
-- it is a matter of implementation quality how garbage collection is
implemented. (Implementation note: the current implementation uses a
reference-counting scheme which collects most objects as soon as they
become onreachable, but does not detect garbage containing circular
(Some objects contain references to ``external'' resources such as
open files. It is understood that these resources are freed when the
object is garbage-collected, but since garbage collection is not
guaranteed such objects also provide an explicit way to release the
external resource (e.g., a \verb\close\ method) and programs are
recommended to use this.)
Some objects contain references to other objects. These references
are part of the object's value; in most cases, when such a
``container'' object is compared to another (of the same type), the
comparison takes the {\em values} of the referenced objects into
account (not their identities).
Except for their identity, types affect almost any aspect of objects.
Even object identities are affected in some sense: for immutable
types, operations that compute new values may actually return a
reference to an existing object with the same type and value, while
for mutable objects this is not allowed. E.g., after
a = 1; b = 1; c = []; d = []
\verb\a\ and \verb\b\ may or may not refer to the same object, but
\verb\c\ and \verb\d\ are guaranteed to refer to two different, unique,
newly created lists.
\section{Execution frames, name spaces, and scopes}
\chapter{Expressions and conditions}
(From now on, extended BNF notation will be used to describe
syntax, not lexical analysis.)
(XXX Explain the notation.)
From now on, extended BNF notation will be used to describe syntax,
not lexical analysis.
This chapter explains the meaning of the elements of expressions and
conditions. Conditions are a superset of expressions, and a condition
may be used where an expression is required by enclosing it in
parentheses. The only place where an unparenthesized condition
is not allowed is on the right-hand side of the assignment operator,
because this operator is the same token (\verb\=\) as used for
The comma plays a somewhat special role in Python's syntax.
It is an operator with a lower precedence than all others, but
occasionally serves other purposes as well (e.g., it has special
semantics in print statements). When a comma is accepted by the
syntax, one of the syntactic categories \verb\expression_list\
or \verb\condition_list\ is always used.
parentheses. The only place where an unparenthesized condition is not
allowed is on the right-hand side of the assignment operator, because
this operator is the same token (\verb\=\) as used for compasisons.
The comma plays a somewhat special role in Python's syntax. It is an
operator with a lower precedence than all others, but occasionally
serves other purposes as well (e.g., it has special semantics in print
statements). When a comma is accepted by the syntax, one of the
syntactic categories \verb\expression_list\ or \verb\condition_list\
is always used.
When (one alternative of) a syntax rule has the form
......@@ -351,11 +463,11 @@ Syntax rules for atoms:
atom: identifier | literal | parenth_form | string_conversion
literal: stringliteral | integer | longinteger | floatnumber
parenth_form: enclosure | list_display | dict_display
enclosure: '(' [condition_list] ')'
list_display: '[' [condition_list] ']'
dict_display: '{' [key_datum (',' key_datum)* [','] '}'
key_datum: condition ':' condition
string_conversion:'`' condition_list '`'
enclosure: "(" [condition_list] ")"
list_display: "[" [condition_list] "]"
dict_display: "{" [key_datum ("," key_datum)* [","] "}"
key_datum: condition ":" condition
string_conversion:"`" condition_list "`"
\subsection{Identifiers (Names)}
......@@ -413,10 +525,9 @@ define the entries of the dictionary:
each key object is used as a key into the dictionary to store
the corresponding datum pair.
Key objects must be strings, otherwise a {\tt TypeError}
exception is raised.
Clashes between keys are not detected; the last datum stored for a given
key value prevails.
Keys must be strings, otherwise a {\tt TypeError} exception is raised.
Clashes between keys are not detected; the last datum (textually
rightmost in the display) stored for a given key value prevails.
\subsection{String conversions}
......@@ -445,10 +556,10 @@ Their syntax is:
primary: atom | attributeref | call | subscription | slicing
attributeref: primary '.' identifier
call: primary '(' [condition_list] ')'
subscription: primary '[' condition ']'
slicing: primary '[' [condition] ':' [condition] ']'
attributeref: primary "." identifier
call: primary "(" [condition_list] ")"
subscription: primary "[" condition "]"
slicing: primary "[" [condition] ":" [condition] "]"
\subsection{Attribute references}
......@@ -465,7 +576,7 @@ Factors represent the unary numeric operators.
Their syntax is:
factor: primary | '-' factor | '+' factor | '~' factor
factor: primary | "-" factor | "+" factor | "~" factor
The unary \verb\-\ operator yields the negative of its numeric argument.
......@@ -483,7 +594,7 @@ a {\tt TypeError} exception is raised.
Terms represent the most tightly binding binary operators:
term: factor | term '*' factor | term '/' factor | term '%' factor
term: factor | term "*" factor | term "/" factor | term "%" factor
The \verb\*\ operator yields the product of its arguments.
......@@ -494,13 +605,13 @@ and then multiplied together.
In the latter case, string repetition is performed; a negative
repetition factor yields the empty string.
The \verb|'/'| operator yields the quotient of its arguments.
The \verb|"/"| operator yields the quotient of its arguments.
The numeric arguments are first converted to a common type.
(Short or long) integer division yields an integer of the same type,
truncating towards zero.
Division by zero raises a {\tt RuntimeError} exception.
The \verb|'%'| operator yields the remainder from the division
The \verb|"%"| operator yields the remainder from the division
of the first argument by the second.
The numeric arguments are first converted to a common type.
The outcome of $x \% y$ is defined as $x - y*trunc(x/y)$.
......@@ -511,28 +622,28 @@ $3.14 \% 0.7$ equals $0.34$.
\section{Arithmetic expressions}
arith_expr: term | arith_expr '+' term | arith_expr '-' term
arith_expr: term | arith_expr "+" term | arith_expr "-" term
The \verb|'+'| operator yields the sum of its arguments.
The \verb|"+"| operator yields the sum of its arguments.
The arguments must either both be numbers, or both strings.
In the former case, the numbers are converted to a common type
and then added together.
In the latter case, the strings are concatenated directly,
without inserting a space.
The \verb|'-'| operator yields the difference of its arguments.
The \verb|"-"| operator yields the difference of its arguments.
The numeric arguments are first converted to a common type.
\section{Shift expressions}
shift_expr: arith_expr | shift_expr '<<' arith_expr | shift_expr '>>' arith_expr
shift_expr: arith_expr | shift_expr "<<" arith_expr | shift_expr ">>" arith_expr
These operators accept short integers as arguments only.
They shift their left argument to the left or right by the number of bits
given by the right argument. Shifts are ``logical'', e.g., bits shifted
given by the right argument. Shifts are ``logical"", e.g., bits shifted
out on one end are lost, and bits shifted in are zero;
negative numbers are shifted as if they were unsigned in C.
Negative shift counts and shift counts greater than {\em or equal to}
......@@ -541,7 +652,7 @@ the word size yield undefined results.
\section{Bitwise AND expressions}
and_expr: shift_expr | and_expr '&' shift_expr
and_expr: shift_expr | and_expr "&" shift_expr
This operator yields the bitwise AND of its arguments,
......@@ -550,7 +661,7 @@ which must be short integers.
\section{Bitwise XOR expressions}
xor_expr: and_expr | xor_expr '^' and_expr
xor_expr: and_expr | xor_expr "^" and_expr
This operator yields the bitwise exclusive OR of its arguments,
......@@ -559,7 +670,7 @@ which must be short integers.
\section{Bitwise OR expressions}
or_expr: xor_expr | or_expr '|' xor_expr
or_expr: xor_expr | or_expr "|" xor_expr
This operator yields the bitwise OR of its arguments,
......@@ -569,7 +680,7 @@ which must be short integers.
expression: or_expression
expr_list: expression (',' expression)* [',']
expr_list: expression ("," expression)* [","]
An expression list containing at least one comma yields a new tuple.
......@@ -587,7 +698,7 @@ To create an empty tuple, use an empty pair of parentheses: \verb\()\.
comparison: expression (comp_operator expression)*
comp_operator: '<'|'>'|'='|'=='|'>='|'<='|'<>'|'!='|['not'] 'in'|is' ['not']
comp_operator: "<"|">"|"=="|">="|"<="|"<>"|"!="|"is" ["not"]|["not"] "in"
Comparisons yield integer value: 1 for true, 0 for false.
......@@ -605,12 +716,9 @@ $e_{n-1} op_n e_n$, except that each expression is evaluated at most once.
Note that $e_0 op_1 e_1 op_2 e_2$ does not imply any kind of comparison
between $e_0$ and $e_2$, e.g., $x < y > z$ is perfectly legal.
For the benefit of C programmers,
the comparison operators \verb\=\ and \verb\==\ are equivalent,
and so are \verb\<>\ and \verb\!=\.
Use of the C variants is discouraged.
The forms \verb\<>\ and \verb\!=\ are equivalent.
The operators {\tt '<', '>', '=', '>=', '<='}, and {\tt '<>'} compare
The operators {\tt "<", ">", "==", ">=", "<="}, and {\tt "<>"} compare
the values of two objects. The objects needn't have the same type.
If both are numbers, they are compared to a common type.
Otherwise, objects of different types {\em always} compare unequal,
......@@ -652,9 +760,9 @@ $x {\tt is not} y$ yields the inverse truth value.
condition: or_test
or_test: and_test | or_test 'or' and_test
and_test: not_test | and_test 'and' not_test
not_test: comparison | 'not' not_test
or_test: and_test | or_test "or" and_test
and_test: not_test | and_test "and" not_test
not_test: comparison | "not" not_test
In the context of Boolean operators, and also when conditions are
......@@ -686,7 +794,7 @@ Several simple statements may occor on a single line separated
by semicolons. The syntax for simple statements is:
stmt_list: simple_stmt (';' simple_stmt)* [';']
stmt_list: simple_stmt (";" simple_stmt)* [";"]
simple_stmt: expression_stmt
| assignment
| pass_stmt
......@@ -697,6 +805,7 @@ simple_stmt: expression_stmt
| break_stmt
| continue_stmt
| import_stmt
| global_stmt
\section{Expression statements}
......@@ -718,9 +827,9 @@ do not cause any output.)
assignment: target_list ('=' target_list)* '=' expression_list
target_list: target (',' target)* [',']
target: identifier | '(' target_list ')' | '[' target_list ']'
assignment: target_list ("=" target_list)* "=" expression_list
target_list: target ("," target)* [","]
target: identifier | "(" target_list ")" | "[" target_list "]"
| attributeref | subscription | slicing
......@@ -835,7 +944,7 @@ messages.)
\section{The {\tt pass} statement}
pass_stmt: 'pass'
pass_stmt: "pass"
{\tt pass} is a null operation -- when it is executed,
......@@ -844,7 +953,7 @@ nothing happens.
\section{The {\tt del} statement}
del_stmt: 'del' target_list
del_stmt: "del" target_list
Deletion is recursively defined similar to assignment.
......@@ -866,7 +975,7 @@ right type (but even this is determined by the sliced object).
\section{The {\tt print} statement}
print_stmt: 'print' [ condition (',' condition)* [','] ]
print_stmt: "print" [ condition ("," condition)* [","] ]
{\tt print} evaluates each condition in turn and writes the resulting
......@@ -897,7 +1006,7 @@ standard output instead, but this is not safe, and should be fixed.)
\section{The {\tt return} statement}
return_stmt: 'return' [condition_list]
return_stmt: "return" [condition_list]
\verb\return\ may only occur syntactically nested in a function
......@@ -917,7 +1026,7 @@ before really leaving the function.
\section{The {\tt raise} statement}
raise_stmt: 'raise' condition [',' condition]
raise_stmt: "raise" condition ["," condition]
\verb\raise\ evaluates its first condition, which must yield
......@@ -930,7 +1039,7 @@ with the second one (or \verb\None\) as its parameter.
\section{The {\tt break} statement}
break_stmt: 'break'
break_stmt: "break"
\verb\break\ may only occur syntactically nested in a \verb\for\
......@@ -949,7 +1058,7 @@ before really leaving the loop.
\section{The {\tt continue} statement}
continue_stmt: 'continue'
continue_stmt: "continue"
\verb\continue\ may only occur syntactically nested in a \verb\for\
......@@ -962,9 +1071,17 @@ It continues with the next cycle of the nearest enclosing loop.
\section{The {\tt import} statement}
import_stmt: 'import' identifier (',' identifier)*
| 'from' identifier 'import' identifier (',' identifier)*
| 'from' identifier 'import' '*'
import_stmt: "import" identifier ("," identifier)*
| "from" identifier "import" identifier ("," identifier)*
| "from" identifier "import" "*"
(XXX To be done.)
\section{The {\tt global} statement}
global_stmt: "global" identifier ("," identifier)*
(XXX To be done.)
......@@ -982,48 +1099,49 @@ suite: statement | NEWLINE INDENT statement+ DEDENT
\section{The {\tt if} statement}
if_stmt: 'if' condition ':' suite
('elif' condition ':' suite)*
['else' ':' suite]
if_stmt: "if" condition ":" suite
("elif" condition ":" suite)*
["else" ":" suite]
\section{The {\tt while} statement}
while_stmt: 'while' condition ':' suite ['else' ':' suite]
while_stmt: "while" condition ":" suite ["else" ":" suite]
\section{The {\tt for} statement}
for_stmt: 'for' target_list 'in' condition_list ':' suite
['else' ':' suite]
for_stmt: "for" target_list "in" condition_list ":" suite
["else" ":" suite]
\section{The {\tt try} statement}
try_stmt: 'try' ':' suite
('except' condition [',' condition] ':' suite)*
['finally' ':' suite]
try_stmt: "try" ":" suite
("except" condition ["," condition] ":" suite)*
["finally" ":" suite]
\section{Function definitions}
funcdef: 'def' identifier '(' [parameter_list] ')' ':' suite
parameter_list: parameter (',' parameter)*
parameter: identifier | '(' parameter_list ')'
funcdef: "def" identifier "(" [parameter_list] ")" ":" suite
parameter_list: parameter ("," parameter)*
parameter: identifier | "(" parameter_list ")"
\section{Class definitions}
classdef: 'class' identifier '(' ')' [inheritance] ':' suite
inheritance: '=' identifier '(' ')' (',' identifier '(' ')')*
classdef: "class" identifier [inheritance] ":" suite
inheritance: "(" expression ("," expression)* ")"
XXX Syntax for scripts, modules
XXX Syntax for interactive input, eval, exec, input
XXX New definition of expressions (as conditions)
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment