Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
1f58f4fa
Commit
1f58f4fa
authored
Mar 06, 2019
by
Raymond Hettinger
Committed by
Miss Islington (bot)
Mar 06, 2019
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Refine statistics.NormalDist documentation and improve test coverage (GH-12208)
parent
318d537d
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
26 additions
and
29 deletions
+26
-29
Doc/library/statistics.rst
Doc/library/statistics.rst
+24
-28
Lib/test/test_statistics.py
Lib/test/test_statistics.py
+2
-1
No files found.
Doc/library/statistics.rst
View file @
1f58f4fa
...
@@ -479,7 +479,7 @@ measurements as a single entity.
...
@@ -479,7 +479,7 @@ measurements as a single entity.
Normal distributions arise from the `Central Limit Theorem
Normal distributions arise from the `Central Limit Theorem
<https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range
<https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range
of applications in statistics
, including simulations and hypothesis testing
.
of applications in statistics.
.. class:: NormalDist(mu=0.0, sigma=1.0)
.. class:: NormalDist(mu=0.0, sigma=1.0)
...
@@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing.
...
@@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing.
.. attribute:: mean
.. attribute:: mean
A read-only property
representing
the `arithmetic mean
A read-only property
for
the `arithmetic mean
<https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal
<https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal
distribution.
distribution.
.. attribute:: stdev
.. attribute:: stdev
A read-only property
representing
the `standard deviation
A read-only property
for
the `standard deviation
<https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal
<https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal
distribution.
distribution.
.. attribute:: variance
.. attribute:: variance
A read-only property
representing
the `variance
A read-only property
for
the `variance
<https://en.wikipedia.org/wiki/Variance>`_ of a normal
<https://en.wikipedia.org/wiki/Variance>`_ of a normal
distribution. Equal to the square of the standard deviation.
distribution. Equal to the square of the standard deviation.
...
@@ -584,8 +584,8 @@ of applications in statistics, including simulations and hypothesis testing.
...
@@ -584,8 +584,8 @@ of applications in statistics, including simulations and hypothesis testing.
Dividing a constant by an instance of :class:`NormalDist` is not supported.
Dividing a constant by an instance of :class:`NormalDist` is not supported.
Since normal distributions arise from additive effects of independent
Since normal distributions arise from additive effects of independent
variables, it is possible to `add and subtract two
normally distributed
variables, it is possible to `add and subtract two
independent normally
random variables
distributed
random variables
<https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_
<https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_
represented as instances of :class:`NormalDist`. For example:
represented as instances of :class:`NormalDist`. For example:
...
@@ -607,15 +607,15 @@ of applications in statistics, including simulations and hypothesis testing.
...
@@ -607,15 +607,15 @@ of applications in statistics, including simulations and hypothesis testing.
For example, given `historical data for SAT exams
For example, given `historical data for SAT exams
<https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
<https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
are normally distributed with a mean of 1060 and standard deviation of 192,
are normally distributed with a mean of 1060 and
a
standard deviation of 192,
determine the percentage of students with scores between 1100 and 1200:
determine the percentage of students with scores between 1100 and 1200:
.. doctest::
.. doctest::
>>> sat = NormalDist(1060, 195)
>>> sat = NormalDist(1060, 195)
>>> fraction = sat.cdf(1200
) - sat.cdf(1100
)
>>> fraction = sat.cdf(1200
+ 0.5) - sat.cdf(1100 - 0.5
)
>>> f'{fraction * 100 :.1f}% score between 1100 and 1200'
>>> f'{fraction * 100 :.1f}% score between 1100 and 1200'
'18.
2
% score between 1100 and 1200'
'18.
4
% score between 1100 and 1200'
What percentage of men and women will have the same height in `two normally
What percentage of men and women will have the same height in `two normally
distributed populations with known means and standard deviations
distributed populations with known means and standard deviations
...
@@ -644,20 +644,12 @@ model:
...
@@ -644,20 +644,12 @@ model:
Normal distributions commonly arise in machine learning problems.
Normal distributions commonly arise in machine learning problems.
Wikipedia has a `nice example
with
a Naive Bayesian Classifier
Wikipedia has a `nice example
of
a Naive Bayesian Classifier
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge
is to
is to guess a person's gender from measurements of normally distributed
predict a person's gender from measurements of normally distributed features
features
including height, weight, and foot size.
including height, weight, and foot size.
The `prior probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of
We're given a training dataset with measurements for eight people. The
being male or female is 50%:
.. doctest::
>>> prior_male = 0.5
>>> prior_female = 0.5
We also have a training dataset with measurements for eight people. These
measurements are assumed to be normally distributed, so we summarize the data
measurements are assumed to be normally distributed, so we summarize the data
with :class:`NormalDist`:
with :class:`NormalDist`:
...
@@ -670,8 +662,8 @@ with :class:`NormalDist`:
...
@@ -670,8 +662,8 @@ with :class:`NormalDist`:
>>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10])
>>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10])
>>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9])
>>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9])
We observe a new person whose feature measurements are known but whose gender
Next, we encounter a new person whose feature measurements are known but whose
is unknown:
gender
is unknown:
.. doctest::
.. doctest::
...
@@ -679,19 +671,23 @@ is unknown:
...
@@ -679,19 +671,23 @@ is unknown:
>>> wt = 130 # weight
>>> wt = 130 # weight
>>> fs = 8 # foot size
>>> fs = 8 # foot size
The posterior is the product of the prior times each likelihood of a
Starting with a 50% `prior probability
feature measurement given the gender:
<https://en.wikipedia.org/wiki/Prior_probability>`_ of being male or female,
we compute the posterior as the prior times the product of likelihoods for the
feature measurements given the gender:
.. doctest::
.. doctest::
>>> prior_male = 0.5
>>> prior_female = 0.5
>>> posterior_male = (prior_male * height_male.pdf(ht) *
>>> posterior_male = (prior_male * height_male.pdf(ht) *
... weight_male.pdf(wt) * foot_size_male.pdf(fs))
... weight_male.pdf(wt) * foot_size_male.pdf(fs))
>>> posterior_female = (prior_female * height_female.pdf(ht) *
>>> posterior_female = (prior_female * height_female.pdf(ht) *
... weight_female.pdf(wt) * foot_size_female.pdf(fs))
... weight_female.pdf(wt) * foot_size_female.pdf(fs))
The final prediction
is awarded to the largest posterior -- this is known as
The final prediction
goes to the largest posterior. This is known as the
the
`maximum a posteriori
`maximum a posteriori
<https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP:
<https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP:
.. doctest::
.. doctest::
...
...
Lib/test/test_statistics.py
View file @
1f58f4fa
...
@@ -2123,6 +2123,7 @@ class TestNormalDist(unittest.TestCase):
...
@@ -2123,6 +2123,7 @@ class TestNormalDist(unittest.TestCase):
0.3605
,
0.3589
,
0.3572
,
0.3555
,
0.3538
,
0.3605
,
0.3589
,
0.3572
,
0.3555
,
0.3538
,
]):
]):
self
.
assertAlmostEqual
(
Z
.
pdf
(
x
/
100.0
),
px
,
places
=
4
)
self
.
assertAlmostEqual
(
Z
.
pdf
(
x
/
100.0
),
px
,
places
=
4
)
self
.
assertAlmostEqual
(
Z
.
pdf
(
-
x
/
100.0
),
px
,
places
=
4
)
# Error case: variance is zero
# Error case: variance is zero
Y
=
NormalDist
(
100
,
0
)
Y
=
NormalDist
(
100
,
0
)
with
self
.
assertRaises
(
statistics
.
StatisticsError
):
with
self
.
assertRaises
(
statistics
.
StatisticsError
):
...
@@ -2262,7 +2263,7 @@ class TestNormalDist(unittest.TestCase):
...
@@ -2262,7 +2263,7 @@ class TestNormalDist(unittest.TestCase):
self
.
assertEqual
(
X
*
y
,
NormalDist
(
1000
,
150
))
# __mul__
self
.
assertEqual
(
X
*
y
,
NormalDist
(
1000
,
150
))
# __mul__
self
.
assertEqual
(
y
*
X
,
NormalDist
(
1000
,
150
))
# __rmul__
self
.
assertEqual
(
y
*
X
,
NormalDist
(
1000
,
150
))
# __rmul__
self
.
assertEqual
(
X
/
y
,
NormalDist
(
10
,
1.5
))
# __truediv__
self
.
assertEqual
(
X
/
y
,
NormalDist
(
10
,
1.5
))
# __truediv__
with
self
.
assertRaises
(
TypeError
):
with
self
.
assertRaises
(
TypeError
):
# __rtruediv__
y
/
X
y
/
X
def
test_equality
(
self
):
def
test_equality
(
self
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment