Using Microsoft SQL Server
1. Checking duplicates rows by using Group By & Having and delete duplicate rows
SELECT course_title,course_id
FROM udemy_courses
GROUP BY course_title,course_id
HAVING count(course_title)>1 and count(course_id) >1;
Duplicate rows
- Delete duplicate rows
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY course_id,course_title ORDER BY course_id,course_title) AS RN
FROM udemy_courses
)
DELETE FROM CTE WHERE RN<>1;
2.Checking typing errors of category groups (level, subject)
SELECT DISTINCT [level]
FROM udemy_courses
SELECT DISTINCT [subject]
FROM udemy_courses;
---> There are 3 categorys type having typing error as 'Business Ffinance', 'Graphic Design', 'Musical Instrucmentss'
- Update typing errors
UPDATE udemy_courses
SET subject = REPLACE(subject,'Business Ffinance','Business Finance')
UPDATE udemy_courses
SET subject = REPLACE(subject,'Graphic Design','Graphic Design')
UPDATE udemy_courses
SET subject = REPLACE(subject,'Musical Instrumentss','Musical Instruments')
3. Replace Null Values
UPDATE udemy_courses
SET is_paid = 'FALSE'
WHERE is_paid is NULL
4. Add one more column as Date as save file csv as "Results_Final1.csv"
SELECT *, CONVERT(VARCHAR(10), published_timestamp, 114) AS Date
FROM udemy_courses
- First 5 rows :
course_id | course_title | url | is_paid | price | num_subscribers | num_reviews | num_lectures | level | content_duration | published_timestamp | subject | Date |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1070968 | Ultimate Investment Banking Course. | https://www.udemy.com/ultimate-investment-banking-course/ | TRUE | 200 | 2147 | 23 | 51 | All Levels | 1.5 | 1/19/2017 3:58 | Business Finance | 1/18/2017 |
1113822 | Complete GST Course & Certification - Grow Your CA Practice | https://www.udemy.com/goods-and-services-tax/ | TRUE | 75 | 2792 | 923 | 274 | All Levels | 39 | 3/9/2017 23:34 | Business Finance | 3/9/2017 |
1006314 | Financial Modeling for Business Analysts and Consultants | https://www.udemy.com/financial-modeling-for-business-analysts-and-consultants/ | TRUE | 45 | 2174 | 74 | 51 | Intermediate Level | 2.5 | 12/20/2016 2:26 | Business Finance | 12/20/2016 |
1210588 | Beginner to Pro - Financial Analysis in Excel 2017 | https://www.udemy.com/complete-excel-finance-course-from-beginner-to-pro/ | TRUE | 95 | 2451 | 11 | 36 | All Levels | 3 | 5/31/2017 3:07 | Business Finance | 5/31/2017 |
1011058 | How To Maximize Your Profits Trading Options | https://www.udemy.com/how-to-maximize-your-profits-trading-options/ | TRUE | 200 | 1276 | 45 | 26 | Intermediate Level | 2 | 12/13/2016 21:57 | Business Finance | 12/13/2016 |
192870 | Trading Penny Stocks: A Guide for All Levels In 2017 | https://www.udemy.com/trading-penny-stocks-a-guide-for-all-levels/ | TRUE | 150 | 9221 | 138 | 25 | All Levels | 3 | 5/2/2014 22:13 | Business Finance | 5/2/2014 |
1. Transform Data : Use First rows as Header, Change type of Is_paid to text data type, replace value "True" to "Paid" & "False" to "Free".
2. Add one more column Total_Income = Price * num_subscribers
3.Add new measure dax - To calculate income different between quarter ?