Skip to content

Latest commit

 

History

History
220 lines (199 loc) · 14.3 KB

README.md

File metadata and controls

220 lines (199 loc) · 14.3 KB

CRAN status

DSjobtracker

What skills and qualifications are required for data science related jobs?

The DSjobtraker package compiles data from 1172 job advertisements in data science and statistics, making it an invaluable tool for learning key skills required in these areas. Here is a word cloud of 97 requested skills mentioned in the job advertisements. It provides insights into the most sought-after skills in the field.

Installation

You can install the development version from GitHub:

#install.packages("devtools")
devtools::install_github("thiyangt/DSjobtracker")
library(DSjobtracker)

Glimpse of tidy data

tibble::glimpse(DStidy)
Rows: 1,172
Columns: 109
$ ID                                 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, …
$ Consultant                         <chr> "Thiyanga", "Jayani", "Jayani", "Ja…
$ DateRetrieved                      <dttm> 2020-08-05, 2020-08-07, 2020-08-07…
$ DatePublished                      <dttm> NA, 2020-07-31, 2020-08-06, 2020-0…
$ Job_title                          <chr> NA, "Junior Data Scientist", "Engin…
$ Company                            <chr> NA, "Dialog Axiata PLC", "London St…
$ R                                  <dbl> 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0,…
$ SAS                                <dbl> 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,…
$ SPSS                               <dbl> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0,…
$ Python                             <dbl> 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0,…
$ MAtlab                             <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Scala                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `C#`                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `MS Word`                          <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,…
$ `Ms Excel`                         <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,…
$ `OLE/DB`                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `Ms Access`                        <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,…
$ `Ms PowerPoint`                    <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0,…
$ Spreadsheets                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Data_visualization                 <dbl> 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ Presentation_Skills                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Communication                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ BigData                            <dbl> 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0,…
$ Data_warehouse                     <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ cloud_storage                      <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Google_Cloud                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ AWS                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Machine_Learning                   <dbl> 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0,…
$ `Deep Learning`                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Computer_vision                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Java                               <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0,…
$ `C++`                              <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ C                                  <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ `Linux/Unix`                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ SQL                                <dbl> 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0,…
$ NoSQL                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ RDBMS                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Oracle                             <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,…
$ MySQL                              <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0,…
$ PHP                                <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ SPL                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ web_design_and_development_tools   <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ AI                                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `Natural_Language_Processing(NLP)` <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ `Microsoft Power BI`               <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Google_Analytics                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ graphics_and_design_skills         <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,…
$ Data_marketing                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ SEO                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Content_Management                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Tableau                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,…
$ D3                                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,…
$ Alteryx                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ KNIME                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Spotfire                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Spark                              <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,…
$ S3                                 <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ Redshift                           <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ DigitalOcean                       <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ Javascript                         <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ Kafka                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Storm                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Bash                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Hadoop                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
$ Data_Pipelines                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ MPP_Platforms                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Qlik                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Pig                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Hive                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
$ Tensorflow                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `Map/Reduce`                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Impala                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Solr                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Teradata                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ MongoDB                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Elasticsearch                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ YOLO                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `agile execution`                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
$ Data_management                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ pyspark                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Data_mining                        <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ Data_science                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
$ Web_Analytic_tools                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ IOT                                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Numerical_Analysis                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Economic                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Finance_Knowledge                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Investment_Knowledge               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Problem_Solving                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Team_Handling                      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Debtor_reconcilation               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Payroll_management                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Bayesian                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Optimization                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Knowledge_in                       <chr> NA, NA, "Elasticsearch, Logstash, K…
$ City                               <chr> NA, "Colombo", "Colombo", "Colombo"…
$ Educational_qualifications         <chr> NA, "Degree in Engineering / IT or …
$ Salary                             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ URL                                <chr> NA, "https://www.google.com/search?…
$ Search_Term                        <chr> NA, "Data Analysis Jobs in Sri Lank…
$ Job_Category                       <chr> "Unimportant", "Data Science", "Dat…
$ Experience_Category                <chr> "More than 2 and less than 5 years"…
$ Location                           <chr> NA, "Sri Lanka", "Sri Lanka", "Sri …
$ `Payment Frequency`                <chr> "unspecified", "unspecified", "unsp…
$ BSc_needed                         <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1,…
$ MSc_needed                         <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,…
$ PhD_needed                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ `English Needed`                   <dbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ year                               <dbl> 2020, 2020, 2020, 2020, 2020, 2020,…

2021 survey data

df2021 <- get_data(2021)
df2021
# A tibble: 382 × 115
      ID Consultant URL        Search_Term DateRetrieved DatePublished Job_Field
   <dbl> <chr>      <chr>      <chr>       <chr>         <chr>         <chr>    
 1     1 Abhishanya https://w… statistics… 11/10/2021    11/10/2021    Informat…
 2     2 Abhishanya https://w… statistics… 11/11/2021    11/11/2021    Education
 3     3 Abhishanya https://w… Data scien… 11/11/2021    11/9/2021     Other    
 4     4 Abhishanya https://w… Data scien… 11/11/2021    10/13/2021    Other    
 5     5 Abhishanya https://w… Statistics… 11/11/2021    10/14/2021    Informat…
 6     6 Abhishanya https://w… Data Scien… 11/11/2021    10/14/2020    Health   
 7     7 Abhishanya https://c… Data engin… 11/11/2021    11/8/2021     Other    
 8     8 Abhishanya https://l… Data engin… 11/11/2021    11/9/2021     Other    
 9     9 Abhishanya https://w… Data engin… 11/11/2021    10/20/2021    Informat…
10    10 Abhishanya https://w… Data engin… 11/11/2021    10/21/2021    Finance  
# ℹ 372 more rows
# ℹ 108 more variables: Job_title <chr>, Company <chr>, Knowledge_in <chr>,
#   `Minimum Experience in Years` <dbl>, City <chr>, Location <chr>,
#   Educational_qualifications <chr>, `Payment Frequency` <chr>,
#   Currency <chr>, Salary <dbl>, `English Needed` <dbl>,
#   `English proficiency description` <chr>, Additional_languages <chr>,
#   AI <dbl>, `Natural_Language_Processing(NLP)` <dbl>, Data_Pipelines <dbl>, …

Preview of the tidy version of the dataset

head(DStidy)
# A tibble: 6 × 109
     ID Consultant DateRetrieved       DatePublished       Job_title     Company
  <dbl> <chr>      <dttm>              <dttm>              <chr>         <chr>  
1     1 Thiyanga   2020-08-05 00:00:00 NA                  <NA>          <NA>   
2     2 Jayani     2020-08-07 00:00:00 2020-07-31 00:00:00 Junior Data … Dialog…
3     3 Jayani     2020-08-07 00:00:00 2020-08-06 00:00:00 Engineer, An… London…
4     4 Jayani     2020-08-07 00:00:00 2020-07-24 00:00:00 CI-Statistic… E.D. B…
5     5 Jayani     2020-08-07 00:00:00 2020-07-24 00:00:00 DA-Data Anal… E.D. B…
6     6 Jayani     2020-08-07 00:00:00 2020-08-13 00:00:00 Data Scienti… Emirat…
# ℹ 103 more variables: R <dbl>, SAS <dbl>, SPSS <dbl>, Python <dbl>,
#   MAtlab <dbl>, Scala <dbl>, `C#` <dbl>, `MS Word` <dbl>, `Ms Excel` <dbl>,
#   `OLE/DB` <dbl>, `Ms Access` <dbl>, `Ms PowerPoint` <dbl>,
#   Spreadsheets <dbl>, Data_visualization <dbl>, Presentation_Skills <dbl>,
#   Communication <dbl>, BigData <dbl>, Data_warehouse <dbl>,
#   cloud_storage <dbl>, Google_Cloud <dbl>, AWS <dbl>, Machine_Learning <dbl>,
#   `Deep Learning` <dbl>, Computer_vision <dbl>, Java <dbl>, `C++` <dbl>, …

Preview of the untidy version of the dataset

head(DSraw)
# A tibble: 6 × 152
     ID Consultant DateRetrieved DatePublished Job_title     Company     R   SAS
  <dbl> <chr>      <chr>         <chr>         <chr>         <chr>   <dbl> <dbl>
1     1 Thiyanga   05/08/2020    <NA>          <NA>          <NA>        1     1
2     2 Jayani     07/08/2020    31/07/2020    Junior Data … Dialog…     1     0
3     3 Jayani     07/08/2020    06/08/20      Engineer, An… London…     0     0
4     4 Jayani     07/08/2020    24/07/2020    CI-Statistic… E.D. B…     1     1
5     5 Jayani     07/08/2020    24/07/2020    DA-Data Anal… E.D. B…     0     1
6     6 Jayani     07/08/2020    13/08/2020    Data Scienti… Emirat…     1     0
# ℹ 144 more variables: SPSS <dbl>, Python <dbl>, MAtlab <dbl>, Scala <dbl>,
#   `C#` <dbl>, `MS Word` <dbl>, `Ms Excel` <dbl>, `OLE/DB` <dbl>,
#   `Ms Access` <dbl>, `Ms PowerPoint` <dbl>, Spreadsheets <dbl>,
#   Data_visualization <dbl>, Presentation_Skills <dbl>, Communication <dbl>,
#   BigData <dbl>, Data_warehouse <dbl>, cloud_storage <dbl>,
#   Google_Cloud <dbl>, AWS <dbl>, Machine_Learning <dbl>,
#   `Deep Learning` <dbl>, Computer_vision <dbl>, Java <dbl>, `C++` <dbl>, …