Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
fc06a192
Commit
fc06a192
authored
Mar 12, 2019
by
Raymond Hettinger
Committed by
GitHub
Mar 12, 2019
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
bpo-35892: Fix mode() and add multimode() (#12089)
parent
3e936431
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
97 additions
and
48 deletions
+97
-48
Doc/library/statistics.rst
Doc/library/statistics.rst
+30
-8
Doc/whatsnew/3.8.rst
Doc/whatsnew/3.8.rst
+8
-0
Lib/statistics.py
Lib/statistics.py
+40
-32
Lib/test/test_statistics.py
Lib/test/test_statistics.py
+19
-8
No files found.
Doc/library/statistics.rst
View file @
fc06a192
...
@@ -37,7 +37,7 @@ Averages and measures of central location
...
@@ -37,7 +37,7 @@ Averages and measures of central location
These functions calculate an average or typical value from a population
These functions calculate an average or typical value from a population
or sample.
or sample.
======================= =============================================
======================= =============================================
==================
:func:`mean` Arithmetic mean ("average") of data.
:func:`mean` Arithmetic mean ("average") of data.
:func:`fmean` Fast, floating point arithmetic mean.
:func:`fmean` Fast, floating point arithmetic mean.
:func:`harmonic_mean` Harmonic mean of data.
:func:`harmonic_mean` Harmonic mean of data.
...
@@ -45,8 +45,9 @@ or sample.
...
@@ -45,8 +45,9 @@ or sample.
:func:`median_low` Low median of data.
:func:`median_low` Low median of data.
:func:`median_high` High median of data.
:func:`median_high` High median of data.
:func:`median_grouped` Median, or 50th percentile, of grouped data.
:func:`median_grouped` Median, or 50th percentile, of grouped data.
:func:`mode` Mode (most common value) of discrete data.
:func:`mode` Single mode (most common value) of discrete or nominal data.
======================= =============================================
:func:`multimode` List of modes (most common values) of discrete or nomimal data.
======================= ===============================================================
Measures of spread
Measures of spread
------------------
------------------
...
@@ -287,12 +288,12 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -287,12 +288,12 @@ However, for reading convenience, most of the examples show sorted sequences.
.. function:: mode(data)
.. function:: mode(data)
Return the
most common data point from discrete or nominal *data*. The mode
Return the
single most common data point from discrete or nominal *data*.
(when it exists) is the most typical value, and is a robust measure of
The mode (when it exists) is the most typical value and serves as a
central location.
measure of
central location.
If
*data* is empty, or if there is not exactly one most common value,
If
there are multiple modes, returns the first one encountered in the *data*.
:exc:`StatisticsError` is raised.
If *data* is empty,
:exc:`StatisticsError` is raised.
``mode`` assumes discrete data, and returns a single value. This is the
``mode`` assumes discrete data, and returns a single value. This is the
standard treatment of the mode as commonly taught in schools:
standard treatment of the mode as commonly taught in schools:
...
@@ -310,6 +311,27 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -310,6 +311,27 @@ However, for reading convenience, most of the examples show sorted sequences.
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
'red'
.. versionchanged:: 3.8
Now handles multimodal datasets by returning the first mode encountered.
Formerly, it raised :exc:`StatisticsError` when more than one mode was
found.
.. function:: multimode(data)
Return a list of the most frequently occurring values in the order they
were first encountered in the *data*. Will return more than one result if
there are multiple modes or an empty list if the *data* is empty:
.. doctest::
>>> multimode('aabbbbccddddeeffffgg')
['b', 'd', 'f']
>>> multimode('')
[]
.. versionadded:: 3.8
.. function:: pstdev(data, mu=None)
.. function:: pstdev(data, mu=None)
...
...
Doc/whatsnew/3.8.rst
View file @
fc06a192
...
@@ -282,6 +282,9 @@ Added :func:`statistics.fmean` as a faster, floating point variant of
...
@@ -282,6 +282,9 @@ Added :func:`statistics.fmean` as a faster, floating point variant of
:func:`statistics.mean()`. (Contributed by Raymond Hettinger and
:func:`statistics.mean()`. (Contributed by Raymond Hettinger and
Steven D'Aprano in :issue:`35904`.)
Steven D'Aprano in :issue:`35904`.)
Added :func:`statistics.multimode` that returns a list of the most
common values. (Contributed by Raymond Hettinger in :issue:`35892`.)
Added :class:`statistics.NormalDist`, a tool for creating
Added :class:`statistics.NormalDist`, a tool for creating
and manipulating normal distributions of a random variable.
and manipulating normal distributions of a random variable.
(Contributed by Raymond Hettinger in :issue:`36018`.)
(Contributed by Raymond Hettinger in :issue:`36018`.)
...
@@ -591,6 +594,11 @@ Changes in the Python API
...
@@ -591,6 +594,11 @@ Changes in the Python API
* The function :func:`platform.popen` has been removed, it was deprecated since
* The function :func:`platform.popen` has been removed, it was deprecated since
Python 3.3: use :func:`os.popen` instead.
Python 3.3: use :func:`os.popen` instead.
* The :func:`statistics.mode` function no longer raises an exception
when given multimodal data. Instead, it returns the first mode
encountered in the input data. (Contributed by Raymond Hettinger
in :issue:`35892`.)
* The :meth:`~tkinter.ttk.Treeview.selection` method of the
* The :meth:`~tkinter.ttk.Treeview.selection` method of the
:class:`tkinter.ttk.Treeview` class no longer takes arguments. Using it with
:class:`tkinter.ttk.Treeview` class no longer takes arguments. Using it with
arguments for changing the selection was deprecated in Python 3.6. Use
arguments for changing the selection was deprecated in Python 3.6. Use
...
...
Lib/statistics.py
View file @
fc06a192
...
@@ -17,6 +17,7 @@ median_low Low median of data.
...
@@ -17,6 +17,7 @@ median_low Low median of data.
median_high High median of data.
median_high High median of data.
median_grouped Median, or 50th percentile, of grouped data.
median_grouped Median, or 50th percentile, of grouped data.
mode Mode (most common value) of data.
mode Mode (most common value) of data.
multimode List of modes (most common values of data)
================== =============================================
================== =============================================
Calculate the arithmetic mean ("the average") of data:
Calculate the arithmetic mean ("the average") of data:
...
@@ -79,10 +80,9 @@ A single exception is defined: StatisticsError is a subclass of ValueError.
...
@@ -79,10 +80,9 @@ A single exception is defined: StatisticsError is a subclass of ValueError.
__all__
=
[
'StatisticsError'
,
'NormalDist'
,
__all__
=
[
'StatisticsError'
,
'NormalDist'
,
'pstdev'
,
'pvariance'
,
'stdev'
,
'variance'
,
'pstdev'
,
'pvariance'
,
'stdev'
,
'variance'
,
'median'
,
'median_low'
,
'median_high'
,
'median_grouped'
,
'median'
,
'median_low'
,
'median_high'
,
'median_grouped'
,
'mean'
,
'mode'
,
'harmonic_mean'
,
'fmean'
,
'mean'
,
'mode'
,
'
multimode'
,
'
harmonic_mean'
,
'fmean'
,
]
]
import
collections
import
math
import
math
import
numbers
import
numbers
import
random
import
random
...
@@ -92,8 +92,8 @@ from decimal import Decimal
...
@@ -92,8 +92,8 @@ from decimal import Decimal
from
itertools
import
groupby
from
itertools
import
groupby
from
bisect
import
bisect_left
,
bisect_right
from
bisect
import
bisect_left
,
bisect_right
from
math
import
hypot
,
sqrt
,
fabs
,
exp
,
erf
,
tau
,
log
,
fsum
from
math
import
hypot
,
sqrt
,
fabs
,
exp
,
erf
,
tau
,
log
,
fsum
from
operator
import
itemgetter
from
collections
import
Counter
# === Exceptions ===
# === Exceptions ===
...
@@ -249,20 +249,6 @@ def _convert(value, T):
...
@@ -249,20 +249,6 @@ def _convert(value, T):
raise
raise
def
_counts
(
data
):
# Generate a table of sorted (value, frequency) pairs.
table
=
collections
.
Counter
(
iter
(
data
)).
most_common
()
if
not
table
:
return
table
# Extract the values with the highest frequency.
maxfreq
=
table
[
0
][
1
]
for
i
in
range
(
1
,
len
(
table
)):
if
table
[
i
][
1
]
!=
maxfreq
:
table
=
table
[:
i
]
break
return
table
def
_find_lteq
(
a
,
x
):
def
_find_lteq
(
a
,
x
):
'Locate the leftmost value exactly equal to x'
'Locate the leftmost value exactly equal to x'
i
=
bisect_left
(
a
,
x
)
i
=
bisect_left
(
a
,
x
)
...
@@ -334,9 +320,9 @@ def fmean(data):
...
@@ -334,9 +320,9 @@ def fmean(data):
nonlocal
n
nonlocal
n
n
+=
1
n
+=
1
return
x
return
x
total
=
math
.
fsum
(
map
(
count
,
data
))
total
=
fsum
(
map
(
count
,
data
))
else
:
else
:
total
=
math
.
fsum
(
data
)
total
=
fsum
(
data
)
try
:
try
:
return
total
/
n
return
total
/
n
except
ZeroDivisionError
:
except
ZeroDivisionError
:
...
@@ -523,19 +509,38 @@ def mode(data):
...
@@ -523,19 +509,38 @@ def mode(data):
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'
'red'
If there is not exactly one most common value, ``mode`` will raise
If there are multiple modes, return the first one encountered.
StatisticsError.
>>> mode(['red', 'red', 'green', 'blue', 'blue'])
'red'
If *data* is empty, ``mode``, raises StatisticsError.
"""
"""
# Generate a table of sorted (value, frequency) pairs.
data
=
iter
(
data
)
table
=
_counts
(
data
)
try
:
if
len
(
table
)
==
1
:
return
Counter
(
data
).
most_common
(
1
)[
0
][
0
]
return
table
[
0
][
0
]
except
IndexError
:
elif
table
:
raise
StatisticsError
(
'no mode for empty data'
)
from
None
raise
StatisticsError
(
'no unique mode; found %d equally common values'
%
len
(
table
)
)
def
multimode
(
data
):
else
:
""" Return a list of the most frequently occurring values.
raise
StatisticsError
(
'no mode for empty data'
)
Will return more than one result if there are multiple modes
or an empty list if *data* is empty.
>>> multimode('aabbbbbbbbcc')
['b']
>>> multimode('aabbbbccddddeeffffgg')
['b', 'd', 'f']
>>> multimode('')
[]
"""
counts
=
Counter
(
iter
(
data
)).
most_common
()
maxcount
,
mode_items
=
next
(
groupby
(
counts
,
key
=
itemgetter
(
1
)),
(
0
,
[]))
return
list
(
map
(
itemgetter
(
0
),
mode_items
))
# === Measures of spread ===
# === Measures of spread ===
...
@@ -836,6 +841,7 @@ if __name__ == '__main__':
...
@@ -836,6 +841,7 @@ if __name__ == '__main__':
from
math
import
isclose
from
math
import
isclose
from
operator
import
add
,
sub
,
mul
,
truediv
from
operator
import
add
,
sub
,
mul
,
truediv
from
itertools
import
repeat
from
itertools
import
repeat
import
doctest
g1
=
NormalDist
(
10
,
20
)
g1
=
NormalDist
(
10
,
20
)
g2
=
NormalDist
(
-
5
,
25
)
g2
=
NormalDist
(
-
5
,
25
)
...
@@ -893,3 +899,5 @@ if __name__ == '__main__':
...
@@ -893,3 +899,5 @@ if __name__ == '__main__':
S
=
NormalDist
.
from_samples
([
x
-
y
for
x
,
y
in
zip
(
X
.
samples
(
n
),
S
=
NormalDist
.
from_samples
([
x
-
y
for
x
,
y
in
zip
(
X
.
samples
(
n
),
Y
.
samples
(
n
))])
Y
.
samples
(
n
))])
assert_close
(
X
-
Y
,
S
)
assert_close
(
X
-
Y
,
S
)
print
(
doctest
.
testmod
())
Lib/test/test_statistics.py
View file @
fc06a192
...
@@ -1769,7 +1769,7 @@ class TestMode(NumericTestCase, AverageMixin, UnivariateTypeMixin):
...
@@ -1769,7 +1769,7 @@ class TestMode(NumericTestCase, AverageMixin, UnivariateTypeMixin):
def
test_range_data
(
self
):
def
test_range_data
(
self
):
# Override test from UnivariateCommonMixin.
# Override test from UnivariateCommonMixin.
data
=
range
(
20
,
50
,
3
)
data
=
range
(
20
,
50
,
3
)
self
.
assert
Raises
(
statistics
.
StatisticsError
,
self
.
func
,
data
)
self
.
assert
Equal
(
self
.
func
(
data
),
20
)
def
test_nominal_data
(
self
):
def
test_nominal_data
(
self
):
# Test mode with nominal data.
# Test mode with nominal data.
...
@@ -1790,13 +1790,14 @@ class TestMode(NumericTestCase, AverageMixin, UnivariateTypeMixin):
...
@@ -1790,13 +1790,14 @@ class TestMode(NumericTestCase, AverageMixin, UnivariateTypeMixin):
# Test mode with bimodal data.
# Test mode with bimodal data.
data
=
[
1
,
1
,
2
,
2
,
2
,
2
,
3
,
4
,
5
,
6
,
6
,
6
,
6
,
7
,
8
,
9
,
9
]
data
=
[
1
,
1
,
2
,
2
,
2
,
2
,
3
,
4
,
5
,
6
,
6
,
6
,
6
,
7
,
8
,
9
,
9
]
assert
data
.
count
(
2
)
==
data
.
count
(
6
)
==
4
assert
data
.
count
(
2
)
==
data
.
count
(
6
)
==
4
#
Check for an exception.
#
mode() should return 2, the first encounted mode
self
.
assert
Raises
(
statistics
.
StatisticsError
,
self
.
func
,
data
)
self
.
assert
Equal
(
self
.
func
(
data
),
2
)
def
test_unique_data
_failure
(
self
):
def
test_unique_data
(
self
):
# Test mode
exception
when data points are all unique.
# Test mode when data points are all unique.
data
=
list
(
range
(
10
))
data
=
list
(
range
(
10
))
self
.
assertRaises
(
statistics
.
StatisticsError
,
self
.
func
,
data
)
# mode() should return 0, the first encounted mode
self
.
assertEqual
(
self
.
func
(
data
),
0
)
def
test_none_data
(
self
):
def
test_none_data
(
self
):
# Test that mode raises TypeError if given None as data.
# Test that mode raises TypeError if given None as data.
...
@@ -1809,8 +1810,18 @@ class TestMode(NumericTestCase, AverageMixin, UnivariateTypeMixin):
...
@@ -1809,8 +1810,18 @@ class TestMode(NumericTestCase, AverageMixin, UnivariateTypeMixin):
# Test that a Counter is treated like any other iterable.
# Test that a Counter is treated like any other iterable.
data
=
collections
.
Counter
([
1
,
1
,
1
,
2
])
data
=
collections
.
Counter
([
1
,
1
,
1
,
2
])
# Since the keys of the counter are treated as data points, not the
# Since the keys of the counter are treated as data points, not the
# counts, this should raise.
# counts, this should return the first mode encountered, 1
self
.
assertRaises
(
statistics
.
StatisticsError
,
self
.
func
,
data
)
self
.
assertEqual
(
self
.
func
(
data
),
1
)
class
TestMultiMode
(
unittest
.
TestCase
):
def
test_basics
(
self
):
multimode
=
statistics
.
multimode
self
.
assertEqual
(
multimode
(
'aabbbbbbbbcc'
),
[
'b'
])
self
.
assertEqual
(
multimode
(
'aabbbbccddddeeffffgg'
),
[
'b'
,
'd'
,
'f'
])
self
.
assertEqual
(
multimode
(
''
),
[])
class
TestFMean
(
unittest
.
TestCase
):
class
TestFMean
(
unittest
.
TestCase
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment