Commit 1f58f4fa authored by Raymond Hettinger's avatar Raymond Hettinger Committed by Miss Islington (bot)

Refine statistics.NormalDist documentation and improve test coverage (GH-12208)

parent 318d537d
...@@ -479,7 +479,7 @@ measurements as a single entity. ...@@ -479,7 +479,7 @@ measurements as a single entity.
Normal distributions arise from the `Central Limit Theorem Normal distributions arise from the `Central Limit Theorem
<https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range <https://en.wikipedia.org/wiki/Central_limit_theorem>`_ and have a wide range
of applications in statistics, including simulations and hypothesis testing. of applications in statistics.
.. class:: NormalDist(mu=0.0, sigma=1.0) .. class:: NormalDist(mu=0.0, sigma=1.0)
...@@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing. ...@@ -492,19 +492,19 @@ of applications in statistics, including simulations and hypothesis testing.
.. attribute:: mean .. attribute:: mean
A read-only property representing the `arithmetic mean A read-only property for the `arithmetic mean
<https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal <https://en.wikipedia.org/wiki/Arithmetic_mean>`_ of a normal
distribution. distribution.
.. attribute:: stdev .. attribute:: stdev
A read-only property representing the `standard deviation A read-only property for the `standard deviation
<https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal <https://en.wikipedia.org/wiki/Standard_deviation>`_ of a normal
distribution. distribution.
.. attribute:: variance .. attribute:: variance
A read-only property representing the `variance A read-only property for the `variance
<https://en.wikipedia.org/wiki/Variance>`_ of a normal <https://en.wikipedia.org/wiki/Variance>`_ of a normal
distribution. Equal to the square of the standard deviation. distribution. Equal to the square of the standard deviation.
...@@ -584,8 +584,8 @@ of applications in statistics, including simulations and hypothesis testing. ...@@ -584,8 +584,8 @@ of applications in statistics, including simulations and hypothesis testing.
Dividing a constant by an instance of :class:`NormalDist` is not supported. Dividing a constant by an instance of :class:`NormalDist` is not supported.
Since normal distributions arise from additive effects of independent Since normal distributions arise from additive effects of independent
variables, it is possible to `add and subtract two normally distributed variables, it is possible to `add and subtract two independent normally
random variables distributed random variables
<https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_ <https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables>`_
represented as instances of :class:`NormalDist`. For example: represented as instances of :class:`NormalDist`. For example:
...@@ -607,15 +607,15 @@ of applications in statistics, including simulations and hypothesis testing. ...@@ -607,15 +607,15 @@ of applications in statistics, including simulations and hypothesis testing.
For example, given `historical data for SAT exams For example, given `historical data for SAT exams
<https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores <https://blog.prepscholar.com/sat-standard-deviation>`_ showing that scores
are normally distributed with a mean of 1060 and standard deviation of 192, are normally distributed with a mean of 1060 and a standard deviation of 192,
determine the percentage of students with scores between 1100 and 1200: determine the percentage of students with scores between 1100 and 1200:
.. doctest:: .. doctest::
>>> sat = NormalDist(1060, 195) >>> sat = NormalDist(1060, 195)
>>> fraction = sat.cdf(1200) - sat.cdf(1100) >>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5)
>>> f'{fraction * 100 :.1f}% score between 1100 and 1200' >>> f'{fraction * 100 :.1f}% score between 1100 and 1200'
'18.2% score between 1100 and 1200' '18.4% score between 1100 and 1200'
What percentage of men and women will have the same height in `two normally What percentage of men and women will have the same height in `two normally
distributed populations with known means and standard deviations distributed populations with known means and standard deviations
...@@ -644,20 +644,12 @@ model: ...@@ -644,20 +644,12 @@ model:
Normal distributions commonly arise in machine learning problems. Normal distributions commonly arise in machine learning problems.
Wikipedia has a `nice example with a Naive Bayesian Classifier Wikipedia has a `nice example of a Naive Bayesian Classifier
<https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge <https://en.wikipedia.org/wiki/Naive_Bayes_classifier>`_. The challenge is to
is to guess a person's gender from measurements of normally distributed predict a person's gender from measurements of normally distributed features
features including height, weight, and foot size. including height, weight, and foot size.
The `prior probability <https://en.wikipedia.org/wiki/Prior_probability>`_ of We're given a training dataset with measurements for eight people. The
being male or female is 50%:
.. doctest::
>>> prior_male = 0.5
>>> prior_female = 0.5
We also have a training dataset with measurements for eight people. These
measurements are assumed to be normally distributed, so we summarize the data measurements are assumed to be normally distributed, so we summarize the data
with :class:`NormalDist`: with :class:`NormalDist`:
...@@ -670,8 +662,8 @@ with :class:`NormalDist`: ...@@ -670,8 +662,8 @@ with :class:`NormalDist`:
>>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10]) >>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10])
>>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9]) >>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9])
We observe a new person whose feature measurements are known but whose gender Next, we encounter a new person whose feature measurements are known but whose
is unknown: gender is unknown:
.. doctest:: .. doctest::
...@@ -679,19 +671,23 @@ is unknown: ...@@ -679,19 +671,23 @@ is unknown:
>>> wt = 130 # weight >>> wt = 130 # weight
>>> fs = 8 # foot size >>> fs = 8 # foot size
The posterior is the product of the prior times each likelihood of a Starting with a 50% `prior probability
feature measurement given the gender: <https://en.wikipedia.org/wiki/Prior_probability>`_ of being male or female,
we compute the posterior as the prior times the product of likelihoods for the
feature measurements given the gender:
.. doctest:: .. doctest::
>>> prior_male = 0.5
>>> prior_female = 0.5
>>> posterior_male = (prior_male * height_male.pdf(ht) * >>> posterior_male = (prior_male * height_male.pdf(ht) *
... weight_male.pdf(wt) * foot_size_male.pdf(fs)) ... weight_male.pdf(wt) * foot_size_male.pdf(fs))
>>> posterior_female = (prior_female * height_female.pdf(ht) * >>> posterior_female = (prior_female * height_female.pdf(ht) *
... weight_female.pdf(wt) * foot_size_female.pdf(fs)) ... weight_female.pdf(wt) * foot_size_female.pdf(fs))
The final prediction is awarded to the largest posterior -- this is known as The final prediction goes to the largest posterior. This is known as the
the `maximum a posteriori `maximum a posteriori
<https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP: <https://en.wikipedia.org/wiki/Maximum_a_posteriori_estimation>`_ or MAP:
.. doctest:: .. doctest::
......
...@@ -2123,6 +2123,7 @@ class TestNormalDist(unittest.TestCase): ...@@ -2123,6 +2123,7 @@ class TestNormalDist(unittest.TestCase):
0.3605, 0.3589, 0.3572, 0.3555, 0.3538, 0.3605, 0.3589, 0.3572, 0.3555, 0.3538,
]): ]):
self.assertAlmostEqual(Z.pdf(x / 100.0), px, places=4) self.assertAlmostEqual(Z.pdf(x / 100.0), px, places=4)
self.assertAlmostEqual(Z.pdf(-x / 100.0), px, places=4)
# Error case: variance is zero # Error case: variance is zero
Y = NormalDist(100, 0) Y = NormalDist(100, 0)
with self.assertRaises(statistics.StatisticsError): with self.assertRaises(statistics.StatisticsError):
...@@ -2262,7 +2263,7 @@ class TestNormalDist(unittest.TestCase): ...@@ -2262,7 +2263,7 @@ class TestNormalDist(unittest.TestCase):
self.assertEqual(X * y, NormalDist(1000, 150)) # __mul__ self.assertEqual(X * y, NormalDist(1000, 150)) # __mul__
self.assertEqual(y * X, NormalDist(1000, 150)) # __rmul__ self.assertEqual(y * X, NormalDist(1000, 150)) # __rmul__
self.assertEqual(X / y, NormalDist(10, 1.5)) # __truediv__ self.assertEqual(X / y, NormalDist(10, 1.5)) # __truediv__
with self.assertRaises(TypeError): with self.assertRaises(TypeError): # __rtruediv__
y / X y / X
def test_equality(self): def test_equality(self):
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment