Understanding Gradient Descent and Regression Techniques
Gradient DescentDefinition: An optimization algorithm used in machine learning to minimize the cost function of a model by iteratively adjusting its parameters in the opposite direction of the gradient. How Does Gradient Descent Work?
| Gradient Descent SummaryThe sum of the squared residuals is just one type of loss function; there are many other types of loss functions, but gradient descent works the same way. Specifically, the steps of the gradient descent algorithm are:
Stochastic Gradient DescentStochastic gradient descent updates the model’s parameters using the gradient of one training example at a time. It randomly selects a training example, computes the gradient of the cost function for that example, and updates the parameters in the opposite direction. Stochastic gradient descent is computationally efficient and can converge faster than batch gradient descent. However, it can be noisy and may not converge to the global minimum. Difference Between Gradient Descent and Stochastic Gradient Descent:Gradient Descent considers all the points in calculating the loss and derivatives, while Stochastic gradient descent uses subsets of the data at random to calculate the loss and derivatives. Gradient Descent can be computationally expensive and slow for large datasets, and Stochastic Gradient Descent might be noisy and may not converge to the global minimum. | Ridge RegressionRidge regression is an algorithm that builds on gradient descent to find an optimum regression. Ridge regression adds a penalty measure to enable a more flexible best-fit line. y-axis-intercept + slope•weight + λ•||slope||^2 Lasso RegressionLasso Regression is a regularization algorithm that assists in the elimination of irrelevant parameters, thus helping in the concentration of selection and regularizes the models. In other words, it adds an L1 penalty term to vanish the insignificant predictor variables. Elastic Net RegressionElastic Net Regression uses the penalties from both Lasso and Ridge techniques to regularize the regression models. It combines the L1 penalty from lasso regression and the L2 penalty from ridge regression. Advantages:Lasso will eliminate many features and reduce overfitting in your linear model. Ridge will reduce the impact of features that are not important in predicting your y values. Elastic Net combines feature elimination from Lasso and feature coefficient reduction from the Ridge model to improve your model’s predictions. Ridge Regression tends to do better when most variables are relevant. Lasso Regression gets rid of useless variables. Elastic Net Regression is good for many variables when we don’t know which ones are relevant. |
Step Size for Gradient DescentIt is the amount of change in the intercept value as we search for the optimum. To avoid over or under-shooting the minimum, we need to find an appropriate step size at a rate that is suitable when far or close to the minimum. Ideally, as the slope of the loss function approaches 0, the step size becomes smaller and smaller. Step size = slope • learning rate Learning RateA higher learning rate can be used to speed up the descent to the minimum. However, a learning rate that is too high can cause the algorithm to overshoot the minimum and miss it. A learning rate that is too low can cause the algorithm to take too long to find the minimum. Often, manual adjustments to the learning rate are needed to find an optimal balance between speed and accuracy. | StemmingDefinition: The performance of the sentiment algorithms can sometimes be improved if the words are reduced to their root. With stemming, words with similar meaning can be conveyed with just one word. This representative root word can then be weighted more accurately with the algorithm. Stemming is a blunt approach which normalizes the words by removing their suffixes. Eg. connect, connected, connecting, connection, connections Stop WordsStop words are words that do not have any importance in search queries. A stop word is a commonly used word (such as “the”, “a”, “an”, “in”) that a search engine has been programmed to ignore, both when indexing entries for searching and when retrieving them as the result of a search query. Sentiment AnalysisSentiment analysis attempts to understand and summarize the feeling and intent behind large volumes of text. It is contextual mining of text which identifies and extracts subjective information in source material, and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations. N-GramsThey are continuous sequences of words or symbols, or tokens in a document. In technical terms, they can be defined as the neighboring sequences of items in a document. They are generated by two or more word groupings. We count the frequency of the word groups that appear for that specific ratings. They are useful for examining the highest recurring word that gives the high or low ratings of the topic in question. It can give some significant insight into what people are really saying about it. | |