diff options
author | Tymoteusz Wołodźko <twolodzko@users.noreply.github.com> | 2021-04-25 11:45:09 (GMT) |
---|---|---|
committer | GitHub <noreply@github.com> | 2021-04-25 11:45:09 (GMT) |
commit | 09aa6f914dc313875ff18474770a0a7c13ea8dea (patch) | |
tree | 8f4ea916f3016fd3845b87705b1eb6f85c4fb190 /Doc/library | |
parent | 172c0f2752d8708b6dda7b42e6c5a3519420a4e8 (diff) | |
download | cpython-09aa6f914dc313875ff18474770a0a7c13ea8dea.zip cpython-09aa6f914dc313875ff18474770a0a7c13ea8dea.tar.gz cpython-09aa6f914dc313875ff18474770a0a7c13ea8dea.tar.bz2 |
bpo-38490: statistics: Add covariance, Pearson's correlation, and simple linear regression (#16813)
Co-authored-by: Tymoteusz Wołodźko <twolodzko+gitkraken@gmail.com
Diffstat (limited to 'Doc/library')
-rw-r--r-- | Doc/library/statistics.rst | 103 |
1 files changed, 103 insertions, 0 deletions
diff --git a/Doc/library/statistics.rst b/Doc/library/statistics.rst index 695fb49..117d2b6 100644 --- a/Doc/library/statistics.rst +++ b/Doc/library/statistics.rst @@ -68,6 +68,17 @@ tends to deviate from the typical or average values. :func:`variance` Sample variance of data. ======================= ============================================= +Statistics for relations between two inputs +------------------------------------------- + +These functions calculate statistics regarding relations between two inputs. + +========================= ===================================================== +:func:`covariance` Sample covariance for two variables. +:func:`correlation` Pearson's correlation coefficient for two variables. +:func:`linear_regression` Intercept and slope for simple linear regression. +========================= ===================================================== + Function details ---------------- @@ -566,6 +577,98 @@ However, for reading convenience, most of the examples show sorted sequences. .. versionadded:: 3.8 +.. function:: covariance(x, y, /) + + Return the sample covariance of two inputs *x* and *y*. Covariance + is a measure of the joint variability of two inputs. + + Both inputs must be of the same length (no less than two), otherwise + :exc:`StatisticsError` is raised. + + Examples: + + .. doctest:: + + >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] + >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3] + >>> covariance(x, y) + 0.75 + >>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1] + >>> covariance(x, z) + -7.5 + >>> covariance(z, x) + -7.5 + + .. versionadded:: 3.10 + +.. function:: correlation(x, y, /) + + Return the `Pearson's correlation coefficient + <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_ + for two inputs. Pearson's correlation coefficient *r* takes values + between -1 and +1. It measures the strength and direction of the linear + relationship, where +1 means very strong, positive linear relationship, + -1 very strong, negative linear relationship, and 0 no linear relationship. + + Both inputs must be of the same length (no less than two), and need + not to be constant, otherwise :exc:`StatisticsError` is raised. + + Examples: + + .. doctest:: + + >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9] + >>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1] + >>> correlation(x, x) + 1.0 + >>> correlation(x, y) + -1.0 + + .. versionadded:: 3.10 + +.. function:: linear_regression(regressor, dependent_variable) + + Return the intercept and slope of `simple linear regression + <https://en.wikipedia.org/wiki/Simple_linear_regression>`_ + parameters estimated using ordinary least squares. Simple linear + regression describes relationship between *regressor* and + *dependent variable* in terms of linear function: + + *dependent_variable = intercept + slope \* regressor + noise* + + where ``intercept`` and ``slope`` are the regression parameters that are + estimated, and noise term is an unobserved random variable, for the + variability of the data that was not explained by the linear regression + (it is equal to the difference between prediction and the actual values + of dependent variable). + + Both inputs must be of the same length (no less than two), and regressor + needs not to be constant, otherwise :exc:`StatisticsError` is raised. + + For example, if we took the data on the data on `release dates of the Monty + Python films <https://en.wikipedia.org/wiki/Monty_Python#Films>`_, and used + it to predict the cumulative number of Monty Python films produced, we could + predict what would be the number of films they could have made till year + 2019, assuming that they kept the pace. + + .. doctest:: + + >>> year = [1971, 1975, 1979, 1982, 1983] + >>> films_total = [1, 2, 3, 4, 5] + >>> intercept, slope = linear_regression(year, films_total) + >>> round(intercept + slope * 2019) + 16 + + We could also use it to "predict" how many Monty Python films existed when + Brian Cohen was born. + + .. doctest:: + + >>> round(intercept + slope * 1) + -610 + + .. versionadded:: 3.10 + Exceptions ---------- |