Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
C
cpython
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
Kirill Smelkov
cpython
Commits
d8c93aa5
Commit
d8c93aa5
authored
Sep 05, 2019
by
Raymond Hettinger
Committed by
GitHub
Sep 05, 2019
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
More refinements to the statistics docs (GH-15713)
parent
6b519985
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
33 additions
and
27 deletions
+33
-27
Doc/library/statistics.rst
Doc/library/statistics.rst
+33
-27
No files found.
Doc/library/statistics.rst
View file @
d8c93aa5
...
@@ -19,17 +19,21 @@
...
@@ -19,17 +19,21 @@
--------------
--------------
This module provides functions for calculating mathematical statistics of
This module provides functions for calculating mathematical statistics of
numeric (:class:`Real`-valued) data.
numeric (:class:`~numbers.Real`-valued) data.
.. note::
The module is not intended to be a competitor to third-party libraries such
as `NumPy <https://numpy.org>`_, `SciPy <https://www.scipy.org/>`_, or
Unless explicitly noted otherwise, these functions support :class:`int`,
proprietary full-featured statistics packages aimed at professional
:class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`.
statisticians such as Minitab, SAS and Matlab. It is aimed at the level of
Behaviour with other types (whether in the numeric tower or not) is
graphing and scientific calculators.
currently unsupported. Collections with a mix of types are also undefined
and implementation-dependent. If your input data consists of mixed types,
Unless explicitly noted, these functions support :class:`int`,
you may be able to use :func:`map` to ensure a consistent result, for
:class:`float`, :class:`~decimal.Decimal` and :class:`~fractions.Fraction`.
example: ``map(float, input_data)``.
Behaviour with other types (whether in the numeric tower or not) is
currently unsupported. Collections with a mix of types are also undefined
and implementation-dependent. If your input data consists of mixed types,
you may be able to use :func:`map` to ensure a consistent result, for
example: ``map(float, input_data)``.
Averages and measures of central location
Averages and measures of central location
-----------------------------------------
-----------------------------------------
...
@@ -107,7 +111,7 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -107,7 +111,7 @@ However, for reading convenience, most of the examples show sorted sequences.
:func:`median` and :func:`mode`.
:func:`median` and :func:`mode`.
The sample mean gives an unbiased estimate of the true population mean,
The sample mean gives an unbiased estimate of the true population mean,
which means that,
taken on average over all the possible samples,
so that when
taken on average over all the possible samples,
``mean(sample)`` converges on the true mean of the entire population. If
``mean(sample)`` converges on the true mean of the entire population. If
*data* represents the entire population rather than a sample, then
*data* represents the entire population rather than a sample, then
``mean(data)`` is equivalent to calculating the true population mean μ.
``mean(data)`` is equivalent to calculating the true population mean μ.
...
@@ -163,8 +167,16 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -163,8 +167,16 @@ However, for reading convenience, most of the examples show sorted sequences.
will be equivalent to ``3/(1/a + 1/b + 1/c)``.
will be equivalent to ``3/(1/a + 1/b + 1/c)``.
The harmonic mean is a type of average, a measure of the central
The harmonic mean is a type of average, a measure of the central
location of the data. It is often appropriate when averaging quantities
location of the data. It is often appropriate when averaging
which are rates or ratios, for example speeds. For example:
rates or ratios, for example speeds.
Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr.
What is the average speed?
.. doctest::
>>> harmonic_mean([40, 60])
48.0
Suppose an investor purchases an equal value of shares in each of
Suppose an investor purchases an equal value of shares in each of
three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
three companies, with P/E (price/earning) ratios of 2.5, 3 and 10.
...
@@ -175,9 +187,6 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -175,9 +187,6 @@ However, for reading convenience, most of the examples show sorted sequences.
>>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
>>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio.
3.6
3.6
Using the arithmetic mean would give an average of about 5.167, which
is well over the aggregate P/E ratio.
:exc:`StatisticsError` is raised if *data* is empty, or any element
:exc:`StatisticsError` is raised if *data* is empty, or any element
is less than zero.
is less than zero.
...
@@ -190,9 +199,9 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -190,9 +199,9 @@ However, for reading convenience, most of the examples show sorted sequences.
middle two" method. If *data* is empty, :exc:`StatisticsError` is raised.
middle two" method. If *data* is empty, :exc:`StatisticsError` is raised.
*data* can be a sequence or iterator.
*data* can be a sequence or iterator.
The median is a robust measure of central location
,
and is less affected by
The median is a robust measure of central location and is less affected by
the presence of outliers
in your data. When the number of data points is
the presence of outliers
. When the number of data points is odd, the
odd, the
middle data point is returned:
middle data point is returned:
.. doctest::
.. doctest::
...
@@ -210,13 +219,10 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -210,13 +219,10 @@ However, for reading convenience, most of the examples show sorted sequences.
This is suited for when your data is discrete, and you don't mind that the
This is suited for when your data is discrete, and you don't mind that the
median may not be an actual data point.
median may not be an actual data point.
If
your
data is ordinal (supports order operations) but not numeric (doesn't
If
the
data is ordinal (supports order operations) but not numeric (doesn't
support addition),
you should use
:func:`median_low` or :func:`median_high`
support addition),
consider using
:func:`median_low` or :func:`median_high`
instead.
instead.
.. seealso:: :func:`median_low`, :func:`median_high`, :func:`median_grouped`
.. function:: median_low(data)
.. function:: median_low(data)
Return the low median of numeric data. If *data* is empty,
Return the low median of numeric data. If *data* is empty,
...
@@ -319,7 +325,7 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -319,7 +325,7 @@ However, for reading convenience, most of the examples show sorted sequences.
desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``.
desired instead, use ``min(multimode(data))`` or ``max(multimode(data))``.
If the input *data* is empty, :exc:`StatisticsError` is raised.
If the input *data* is empty, :exc:`StatisticsError` is raised.
``mode`` assumes discrete data
,
and returns a single value. This is the
``mode`` assumes discrete data and returns a single value. This is the
standard treatment of the mode as commonly taught in schools:
standard treatment of the mode as commonly taught in schools:
.. doctest::
.. doctest::
...
@@ -522,7 +528,7 @@ However, for reading convenience, most of the examples show sorted sequences.
...
@@ -522,7 +528,7 @@ However, for reading convenience, most of the examples show sorted sequences.
cut-point will evaluate to ``104``.
cut-point will evaluate to ``104``.
The *method* for computing quantiles can be varied depending on
The *method* for computing quantiles can be varied depending on
whether the
data in
*data* includes or excludes the lowest and
whether the *data* includes or excludes the lowest and
highest possible values from the population.
highest possible values from the population.
The default *method* is "exclusive" and is used for data sampled from
The default *method* is "exclusive" and is used for data sampled from
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment