Welcome, guest | Sign In | My Account | Store | Cart

What?

Also I don't know how the formula have been created the practical part was very easy to me. I have verified one example (see code) using Open Office Calc (I've learned: you can display the formula for the trend line as well as the coefficient of correlation - great).

Why?

• In recipe 578111 I'm printing out current error rate for different training sessions in mental arithmetic.
• Anyway I would like to be able to given information - approximately - about how many you have improved since you train yourself.

What has changed?

• Revision2: Didn't compile with Jython 2.5.3b1 because of not supported exception syntax. Now it does work without exception.
• Revision3: Test data row for failure not removed.
Python, 96 lines
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96``` ```""" @author Thomas Lehmann @file Main.py @brief correlation and regression analyse Referring to document at (german): http://www.faes.de/Basis/Basis-Statistik/Basis-Statistik-Korrelation-Re/basis-statistik-korrelation-re.html """ import sys import math EPSILON = 0.0000001 class SimpleLinearRegression: """ tool class as help for calculating a linear function """ def __init__(self, data): """ initializes members with defaults """ self.data = data # list of (x,y) pairs self.a = 0 # "a" of y = a + b*x self.b = 0 # "b" of y = a + b*x self.r = 0 # coefficient of correlation def run(self): """ calculates coefficient of correlation and the parameters for the linear function """ sumX, sumY, sumXY, sumXX, sumYY = 0, 0, 0, 0, 0 n = float(len(self.data)) for x, y in self.data: sumX += x sumY += y sumXY += x*y sumXX += x*x sumYY += y*y denominator = math.sqrt((sumXX - 1/n * sumX**2)*(sumYY - 1/n * sumY**2)) if denominator < EPSILON: return False # coefficient of correlation self.r = (sumXY - 1/n * sumX * sumY) self.r /= denominator # is there no relationship between 'x' and 'y'? if abs(self.r) < EPSILON: return False # calculating 'a' and 'b' of y = a + b*x self.b = sumXY - sumX * sumY / n self.b /= (sumXX - sumX**2 / n) self.a = sumY - self.b * sumX self.a /= n return True def function(self, x): """ linear function (be aware of current coefficient of correlation """ return self.a + self.b * x def __repr__(self): """ current linear function for print """ return "y = f(x) = %(a)f + %(b)f*x" % self.__dict__ def example(): """ provides an example with error rates (one per session) @note linear function verified in open office calc """ print("Simple linear regression v0.3 by Thomas Lehmann 2012") print("...Python %s" % sys.version.replace("\n", "")) data = [(1.0, 18.0), (2, 15.0), (3, 19.0), (4, 10.0)] print("...data is %s" % data) linRegr = SimpleLinearRegression(data) if not linRegr.run(): print("...error: failed to calculate parameters") return print("...the coefficient of correlation r = %f (r**2 is %f)" % (linRegr.r, linRegr.r**2)) print("...parameter a of y = f(x) = a + b*x is %f" % linRegr.a) print("...parameter b of y = f(x) = a + b*x is %f" % linRegr.b) print("...linear function is then %s" % linRegr) print("...forecast of next value: f(5) = %f" % linRegr.function(5)) firstY = linRegr.function(1) lastY = linRegr.function(4) change = (lastY - firstY) / firstY * 100.0 # keep in mind: reducing of error rate (inverse valuation)! if change < 0: print("...the trend is about %.1f%% improvement" % -change) else: print("...the trend is about %.1f%% to the worse" % change) if __name__ == "__main__": example() ```

Example

The value for 'x' is the nth session. Each session has tasks and either you give the right answer or a wrong one. From this you can calculate the error rate 'y'. Let's say you have a session with 10 tasks and you fail two times then your error rate is 2 * 100 / 10 = 20% (the error rates after here are not taken from a real example):

``````C:\Python32\python.exe F:/Programming/Python/Projects/SimpleLinearRegression/Main.py
Simple linear regression v0.1 by Thomas Lehmann 2012
...Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
...data is [(1.0, 18.0), (2, 15.0), (3, 19.0), (4, 10.0)]
...the coefficient of correlation r = -0.638877 (r**2 is 0.408163)
...parameter a of y = f(x) = a + b*x is 20.500000
...parameter b of y = f(x) = a + b*x is -2.000000
...linear function is then y = f(x) = 20.500000 + -2.000000*x
...forecast of next value: f(5) = 10.500000
...the trend is about 32.4% improvement
`````` s_h_a_i_o 10 years ago

There seems to be an error in line 43 self.r = (sumXY - 1/n * sumX * sumY) If I am correct, this is the covariate term E(XY)- E(X) E(Y), which should write self.r = (sumXY - 1/n * sumX * 1/n* sumY)

you may use assert(abs(self.r)<1.000) to detect such errors s_h_a_i_o 10 years ago

This also holds for denominator = math.sqrt((sumXX - 1/n * sumX2)*(sumYY - 1/n * sumY2))

=> denominator = math.sqrt((sumXX - 1/n 1/n * sumX2)(sumYY - 1/n*1/n * sumY2))

but since you did not write the extra 1/n on top and bottom, your initial computation is correct ! s_h_a_i_o 10 years ago

In any case, linear regerssion can be performed fastly with ONE left matrix division, for any number of explanatory variables (not just 2 as in your code) Well, interesting, I just referring to the formula from the link. Could you please give me another reference to show me your version? (I've additional used Open Office to verify the values) I also found another page with same formula (without squared 'n'): http://math.hws.edu/javamath/ryan/Regression.html

And here: http://easycalculation.com/statistics/learn-regression.php Here the only difference seems to be that the formula as been extended in top and bottom by 'N' so that the division changes into a multiplication of 'N'. So it's the same formular as above.

But still no squaring of 'N' s_h_a_i_o 9 years, 11 months ago

I used the formula regression beta= Var(X)^{-1} * Cov(X,Y) which can be computed as " Cov(X,Y) leftdivided by Var(X) ", and which holds for any number of explanatory variables X.

The formula holds when the model includes a constant term. In cas it does not, the formula becomes E(X^2)^{-1} * E(X*Y) Created by Thomas Lehmann on Sat, 12 May 2012 (MIT)