-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmlgrp4log.txt2
176 lines (176 loc) · 9.96 KB
/
mlgrp4log.txt2
1
<ankit_> : It first pushes after 10 messages I will share the log link then <vadi> : ok awesome<ujdcodr> : why are there 2 Prabhjots?<Prabhjot_> : sorry one wasnt working<ujdcodr> : people remember to use "/quit" when you want to exit the channel anytime<ujdcodr> : without the quotes<ujdcodr> : that will safely disconnect you from the server<ankit_> : If he is unble to use it because of network issues quit won't work either <vadi> : ok so ill start. all there?<random> : testing<random> : This is Ujjwal<random> : Everyone connected?<Shivani> : yes<swastik> : Yep<Prabhjot_> : yes<VS> : Yes<random> : great!! all set<Vikram> : Yes<vadi> : ok ill begin now<vadi> : So thanks for coming and showing more interest in Machine learning<vadi> : If you haven't understood anything in the last session pls let us know now<vadi> : As logistic regression is pretty similar to the way linear regression works<vadi> : i'll recap a bit first<vadi> : basicall machine learning has 3 main types<vadi> : supervised, unsupervised and reinforcement learning<vadi> : in supervised we know the target labels (eg hosing prices given area etc)<vadi> : in unsupervised we dont know much about the data ( eg clustering news articles into sports, politics etc)<vadi> : in reinforcement learning the algorithm get penalized to gets rewarded for decisions it made ( used in alpha go and chess engines)<vadi> : in these sessions we will concentrate on the first 2 only<vadi> : again in supervised we have 2 categories<vadi> : 1) Regression<vadi> : 2) Classification<vadi> : Yesterday you learn an algorithm to handle regression problems where there result variable was continuous<vadi> : In classification problems the result variable is discrete<vadi> : for eg classify an image as bicycle/car<vadi> : the result falls in multiple classes<vadi> : now assuming you ppl have understood linear regression lets consider a problem<vadi> : say we want to classify an email as spam or not using the number of time "congratulations" apperars in the mail<vadi> : say the data points are as shown https://awwapp.com/b/uwuhbxpk8/<vadi> : now can someone draw the line linear regression would have drawn? <vadi> : i hope you can see the drawing board?<vadi> : lol<vadi> : ok dont draw pls XD<vadi> : ok so the blue line shows the answer<vadi> : thats what liner regression would have done<vadi> : but say we had an anomaly<vadi> : see the red circle<vadi> : now if we apply linear regression we get the purple line<vadi> : so if we want to classify the email we need a threshold <vadi> : say we set it at 0.5<vadi> : so any point with a value > 0.5 is spam<vadi> : but the purple line is misclassifying spam datapoints<vadi> : the green border points get misclassified<vadi> : so linear regression is failing to classify in this case<vadi> : so we need a better algorithm that can classify data points into classes<vadi> : are you guys following till now?<vadi> : hello? xD<swastik> : Yeah<vadi> : so are you following?<vadi> : ok is there anything particular that needs clarification?<swastik> : No<vadi> : i guess many are afk<vadi> : ok so moving on<vadi> : lets consider the case of binary classification<vadi> : where we have only 2 classes<vadi> : so we want our hypothesis function to ouput values between 0 and 1<vadi> : where 0 and 1 are the 2 classes<vadi> : so we choose the sigmoid / logistic function which does this<ujdcodr> : Guys...a simple yes goes a long way in restoring our confidence that you are still interested<ujdcodr> : :-P<vadi> : https://upload.wikimedia.org/wikipedia/commons/8/88/Logistic-curve.svg<vadi> : its basically 1/(1+e^-x)<vadi> : where x is the hypothesis given by linear regression ( basically linear combination of features with parameters)<vadi> : so the hypothesis function of logistic regression is sigmoid( parameters . features) (. -> dot product)<ujdcodr> : Yep... AFK AF<vadi> : XD<vadi> : so we predict that output is class 1 if hypothesis > 0.5<vadi> : we predict that output is class 0 if hypothesis < 0.5<vadi> : now that we have out hypothesis we have to find the optimum parameters that minimizes a cost function<vadi> : our*<ujdcodr> : Is there a problem with the channel?<ujdcodr> : Whoever is "not" AFK please acknowledge<ujdcodr> : Right now<swastik> : Not afk<Shivani> : not afk<ujdcodr> : Common guys<Prabhjot_> : not<vadi> : we could use the same cost function as linear regression but it turns out that the function becomes non convex<Vikram> : Not<vadi> : as now out hypothesis has a sigmoid as well<ujdcodr> : proceed<vadi> : non convex means that we will not reach a global minimum we will reach local minimas thats it<vadi> : ie gradient descent does not give us global minima if we use the same cost function as before<vadi> : which was sigma( ( y - hypothesis(x) ) ^ 2)<vadi> : so we need a new cost function to find optimum parameters<vadi> : the cost function of logistic regression is <vadi> : - log ( hyp ) if y =1<vadi> : - log ( 1 - hyp ) if y=0<vadi> : you can see the graphs here https://www.google.co.in/search?q=-+log+x, https://www.google.co.in/search?q=-+log+(1-x)<vadi> : basically - log (hyp) tends to 0 if hyp is close to 1<vadi> : and tends to infinity if hyp is close to 0<vadi> : so we are penalizing miclassifications using thi cost function<vadi> : so when we apply gradient descent to this cost function we can get the optimal parameters<vadi> : so then we can take dot product of features and parameters and then take sigmoid<vadi> : if the value > 0.5 then we have class1 else class 0<vadi> : the cost function can also be written as y*- log ( hyp ) + (1-y) * - log ( 1 - hyp )<vadi> : basically it combines both parts<vadi> : so when we apply logistic regression to our previous example we get a much smoother curve<vadi> : the cost function can also be derived from statistics using maximum likelihood etc so its not that we got it randomly<vadi> : so to list out some key points<vadi> : lets say your problem has 2 features f1 and f2<vadi> : say size if the house and no of rooms<vadi> : and you want to predict if the house will sell or not<vadi> : so first you will associate random parameters to your features<vadi> : say p1 and p2<vadi> : so your hypothesis will be sigmoid ( p1 * f1 + p2 * f2 + p3 ( bias term) ) <vadi> : what this means is that is sigmoid ( p1 * f1 + p2 * f2 + p3 ) > 0.5 then house will sell<vadi> : or else you predict that house will not sell<vadi> : ok now we need to find the parameters p1 , p2 and p3 <vadi> : for that we need a cost function that we keep with a minimization objective<vadi> : so that cost function is y*- log ( hyp ) + (1-y) * - log ( 1 - hyp )<vadi> : where hyp is p1*f1 + p2*f2 + p3<vadi> : and we can apply gradient descent on this as it is a convex function ( has only 1 minima)<vadi> : so we apply gradient descent to obtain the parameters that fit the training data the best<vadi> : now if we are given a new house with features f1 and f2<vadi> : we simply check with hypothesis and predict if the house can sell or not<vadi> : is this example clear<vadi> : ?<Vikram> : Yes<Shivani> : yes<vadi> : ok nice<vadi> : so we can have situations like this as well<vadi> : https://awwapp.com/b/uawvhv43s/<vadi> : where the boundary may be non linear<vadi> : so how can we make linear regression and logistic regression work for suck cases<vadi> : we can do this by simply adding polynomial features<vadi> : if we make f1^2 and f2^2 as well as f1*f2 as features we can get non linear decision boundaries<vadi> : also we can apply logistic regression for multiclass classification<vadi> : for example if we have 3 classes<vadi> : ie we want to classify an image as bicycle/car/bike<vadi> : then we use a method called 1 vs all<vadi> : basically we first solve the problem of bike vs not a bike<vadi> : and then car vs not a car<vadi> : basically we combine all classes except one and then apply logistic regression<vadi> : so which ever classification gives the highest score we can choose that one<vadi> : so thats all i wanted to discuss about logistic regression<vadi> : objectives, hypothesis , cost function, non linear boundary, multiclass classification<vadi> : a good place to learn more is andrew ngs lectures on coursera<vadi> : also the book "An introduction to statistical learning is good<vadi> : if you didnt understand any part you can go through the logs again and post doubts on the whatsapp group<vadi> : ml is a very powerful tool used largescale by most techcompanies<vadi> : also just as an intro to unsupervised learning<vadi> : its basically trying to indentify some pattern between data and give us clusters<Vikram> : Why sigmoid? There are many continuous function that map to {0,1}?<vadi> : yeah people have got good results with that as its symmetric and also flatens out well ~ 3/4<vadi> : also the cost function associated with it is good for fast computation<vadi> : and it tuens out that the partial derivative is same for both linear and logistic regression<vadi> : k-means is a popular clustering algorithm and is simple to understand and will be discussed in the coming sessions<vadi> : https://www.quora.com/Logistic-Regression-Why-sigmoid-function<vadi> : yeah so thats it for today ! pls reach out on whats app if you have any doubts. also watch andrew ng's videos for more clarity<Vikram> : Okay<Prabhjot_> : Thanks<Shivani> : Thank you!<VS> : Thanks<swastik> : " so which ever classification gives the highest score we can choose that one " what exactly did you mean?<vadi> : so when you do bike vs not bike<vadi> : your hypothesis function (sigmoid(linear combination))<vadi> : give a value as output ( between 0-1)<vadi> : so say it give 0.7<vadi> : so it means that the ml model thinks that its a bike with score 0.7<vadi> : when you do car vs not a car<vadi> : if the output is 0.6<vadi> : then its more likely its a bike<vadi> : so you choose the class corresponding to a higher value of hypothesis<swastik> : Ohh got it.Thanks<vadi> : ok thank you all again for coming!