Linear Regression - R tool vs Manual Calculations

Linear Regression establishes a relationship between a dependent variable Y and one or more independent variables X using a best fit line known as Regression Line.
The equation of the regression line can then be used to predict value of Y for any given X.

Regression Analysis is one of  the important Statistical tool to establish the relationship between two variables. The value of the predictor variable is gathered through experiments. The value of the response variable is derived from the predictor variable. 

The general mathematical equation for Linear Regression is y = ax + b
Here, 
y is the response variable
x is the predictor variable
a and b are the constants called as co - efficients.

The following images show the manual method of finding the Linear Regression equations for the set of points
X = ( 6, 2, 10, 4, 8)
Y = (9, 11, 5, 8, 7)





Let's calculate the same co efficients and fit the straight line using R.

Step 1 : Lets create two vectors namely X and Y.
X <- c(6, 2, 10, 4, 8)
Y <- c(9, 11, 5, 8, 7)

Step 2 : Convert the vectors to a data frame and rename them as Z
Z <- data.frame(X,Y)

Step 3 : Select the Y column from the data frame and name it as x1 and similarly for X as y1.
y1 <- Z[,"X"]
x1 <- Z[,"Y"]

Step 4 : Create a model using the method lm() - the Linear Regression method. It has the syntax as lm(formula, data)
model1 <- lm(y1 ~ x1)
Here, the model is created for the equation X on Y

Output : 
Call: 
lm(formula = y1 ~ x1)
Coefficients:
(Intercept) = 16.4
x = -1.3

Step 5: Plot the given points using the command plot().
plot(y1 ~ x1)

Step 6 : Fit the Regression line using the command abline().
abline(model1, col="blue", lwd=3)

Results:

Repeat the same process but here build the model using the formula (y2 ~ x2) where
y2 <- Z [ , "Y" ]
x2 <- Z [ , "X" ]

The output we got when solved using R is that,
the intercept is 11.90
the x is -0.65

These are the results for the Regression equation with Y on X.

Comparing both the answers from Statistical method and the R method, they are same!!
Solving the regression equations manually using the big statistical formulae seems to be tedious and it is a lengthy process.
With the help of R tool, we can arrive at the final answer with just two to three steps.
Thus, R is found to be more easier and effective in today's fast world for more mathematical analysis.

Let's explore more using the Data Science Tools.

Comments

  1. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    Data science with Python Training in Electronic City

    ReplyDelete

Post a Comment