Welcome, guest | Sign In | My Account | Store | Cart

Simple Linear Regression with Pure Python (Python recipe) by Chaobin Tang (唐超斌)
ActiveState Code (http://code.activestate.com/recipes/578914/)

Linear regression is a very useful and simple to understand way for predicting values, given a set of training data. The outcome of the regression is a best fitting line function, which, by definition, is the line that minimizes the sum of the squared errors (When plotted on a 2 dimensional coordination system, the errors are the distance between the actual Y' and predicted Y' on the line.) In machine learning, this line equation Y' = b*x + A is solved using Gradient Descent to gradually approach to it. Also, there is a statistical approach that directly solves this line equation without using an iterative algorithm.

This recipe is a pure Python implementation of this statistical algorithm. It has no dependencies.

If you have pandas and numpy, you can test its result by uncommenting the assert lines.

      def fit(X, Y):

    def mean(Xs):
        return sum(Xs) / len(Xs)
    m_X = mean(X)
    m_Y = mean(Y)

    def std(Xs, m):
        normalizer = len(Xs) - 1
        return math.sqrt(sum((pow(x - m, 2) for x in Xs)) / normalizer)
    # assert np.round(Series(X).std(), 6) == np.round(std(X, m_X), 6)

    def pearson_r(Xs, Ys):

        sum_xy = 0
        sum_sq_v_x = 0
        sum_sq_v_y = 0

        for (x, y) in zip(Xs, Ys):
            var_x = x - m_X
            var_y = y - m_Y
            sum_xy += var_x * var_y
            sum_sq_v_x += pow(var_x, 2)
            sum_sq_v_y += pow(var_y, 2)
        return sum_xy / math.sqrt(sum_sq_v_x * sum_sq_v_y)
    # assert np.round(Series(X).corr(Series(Y)), 6) == np.round(pearson_r(X, Y), 6)

    r = pearson_r(X, Y)

    b = r * (std(Y, m_Y) / std(X, m_X))
    A = m_Y - b * m_X

    def line(x):
        return b * x + A
    return line

      

Tags: algorithms, machine_learning, python, regression

Created by Chaobin Tang (唐超斌) on Thu, 31 Jul 2014 (MIT)

◄	Python recipes (4591)	►
◄	Chaobin Tang (唐超斌)'s recipes (7)	►

Required Modules

(none specified)

Other Information and Tasks

Licensed under the MIT License
Viewed 23449 times
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Simple Linear Regression with Pure Python (Python recipe) by Chaobin Tang (唐超斌) ActiveState Code (http://code.activestate.com/recipes/578914/)