summaryrefslogtreecommitdiffstats
path: root/Doc/library
diff options
context:
space:
mode:
authorTymoteusz Wołodźko <twolodzko@users.noreply.github.com>2021-04-25 11:45:09 (GMT)
committerGitHub <noreply@github.com>2021-04-25 11:45:09 (GMT)
commit09aa6f914dc313875ff18474770a0a7c13ea8dea (patch)
tree8f4ea916f3016fd3845b87705b1eb6f85c4fb190 /Doc/library
parent172c0f2752d8708b6dda7b42e6c5a3519420a4e8 (diff)
downloadcpython-09aa6f914dc313875ff18474770a0a7c13ea8dea.zip
cpython-09aa6f914dc313875ff18474770a0a7c13ea8dea.tar.gz
cpython-09aa6f914dc313875ff18474770a0a7c13ea8dea.tar.bz2
bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression (#16813)
Co-authored-by: Tymoteusz Wołodźko <twolodzko+gitkraken@gmail.com
Diffstat (limited to 'Doc/library')
-rw-r--r--Doc/library/statistics.rst103
1 files changed, 103 insertions, 0 deletions
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst
index 695fb49..117d2b6 100644
--- a/Doc/library/statistics.rst
+++ b/Doc/library/statistics.rst
@@ -68,6 +68,17 @@ tends to deviate from the typical or average values.
:func:`variance` Sample variance of data.
======================= =============================================
+Statistics for relations between two inputs
+-------------------------------------------
+
+These functions calculate statistics regarding relations between two inputs.
+
+========================= =====================================================
+:func:`covariance` Sample covariance for two variables.
+:func:`correlation` Pearson's correlation coefficient for two variables.
+:func:`linear_regression` Intercept and slope for simple linear regression.
+========================= =====================================================
+
Function details
----------------
@@ -566,6 +577,98 @@ However, for reading convenience, most of the examples show sorted sequences.
.. versionadded:: 3.8
+.. function:: covariance(x, y, /)
+
+ Return the sample covariance of two inputs *x* and *y*. Covariance
+ is a measure of the joint variability of two inputs.
+
+ Both inputs must be of the same length (no less than two), otherwise
+ :exc:`StatisticsError` is raised.
+
+ Examples:
+
+ .. doctest::
+
+ >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
+ >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
+ >>> covariance(x, y)
+ 0.75
+ >>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1]
+ >>> covariance(x, z)
+ -7.5
+ >>> covariance(z, x)
+ -7.5
+
+ .. versionadded:: 3.10
+
+.. function:: correlation(x, y, /)
+
+ Return the `Pearson's correlation coefficient
+ <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
+ for two inputs. Pearson's correlation coefficient *r* takes values
+ between -1 and +1. It measures the strength and direction of the linear
+ relationship, where +1 means very strong, positive linear relationship,
+ -1 very strong, negative linear relationship, and 0 no linear relationship.
+
+ Both inputs must be of the same length (no less than two), and need
+ not to be constant, otherwise :exc:`StatisticsError` is raised.
+
+ Examples:
+
+ .. doctest::
+
+ >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
+ >>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
+ >>> correlation(x, x)
+ 1.0
+ >>> correlation(x, y)
+ -1.0
+
+ .. versionadded:: 3.10
+
+.. function:: linear_regression(regressor, dependent_variable)
+
+ Return the intercept and slope of `simple linear regression
+ <https://en.wikipedia.org/wiki/Simple_linear_regression>`_
+ parameters estimated using ordinary least squares. Simple linear
+ regression describes relationship between *regressor* and
+ *dependent variable* in terms of linear function:
+
+ *dependent_variable = intercept + slope \* regressor + noise*
+
+ where ``intercept`` and ``slope`` are the regression parameters that are
+ estimated, and noise term is an unobserved random variable, for the
+ variability of the data that was not explained by the linear regression
+ (it is equal to the difference between prediction and the actual values
+ of dependent variable).
+
+ Both inputs must be of the same length (no less than two), and regressor
+ needs not to be constant, otherwise :exc:`StatisticsError` is raised.
+
+ For example, if we took the data on the data on `release dates of the Monty
+ Python films <https://en.wikipedia.org/wiki/Monty_Python#Films>`_, and used
+ it to predict the cumulative number of Monty Python films produced, we could
+ predict what would be the number of films they could have made till year
+ 2019, assuming that they kept the pace.
+
+ .. doctest::
+
+ >>> year = [1971, 1975, 1979, 1982, 1983]
+ >>> films_total = [1, 2, 3, 4, 5]
+ >>> intercept, slope = linear_regression(year, films_total)
+ >>> round(intercept + slope * 2019)
+ 16
+
+ We could also use it to "predict" how many Monty Python films existed when
+ Brian Cohen was born.
+
+ .. doctest::
+
+ >>> round(intercept + slope * 1)
+ -610
+
+ .. versionadded:: 3.10
+
Exceptions
----------