Data analysis of Lagou
This repository is designed for job data analysis of Lagou. The main function it includes is listed here:
- Crawl job data from Lagou, and get the latest info of jobs
- Data analysis and visualize
- Crawl job details info and generate word cloud as Job Impression
- In order to train a NLP task with machine learning, the data of interviewee's comments will be stored in mongodb
-
Install 3rd party libraries
sudo pip3 install -r requirements.txt
-
Install mongodb and start mongodb service
sudo service mongod start
- clone this project from github
- change the file path in source code
- run lagou_spider.py to get job data and output them with a Excel file
- run hot_words.py to cut sentences, and return TOP30 hot words
For more information, please visit my answer at Zhihu.
In addition, there is an another repository which may help you!
The PPT report can be found here.
Inspired by Google IO 2017. We've gotten the data, but how can we make deeper analysis instead of just doing simple statistics. With the help of Machine Learning, we can make full use of these data.
Here are several insights I have thought yet.
- To train a model with machine learning algorithm and judge which company deserves your entrance. This article describe the basic job data mining with machine learning.
- More features are being developed ~
- If your are interested in machine learning or data mining, welcome to join us!