Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 Use LinkedIn to Find Current Positions and Facilitate Recommendations [Proposal] #33

Closed
EssamWisam opened this issue Jan 11, 2024 · 15 comments
Assignees
Labels

Comments

@EssamWisam
Copy link
Owner

EssamWisam commented Jan 11, 2024

Background Information:

  • Selenium is known for its ability to scrape websites regardless of API by simulating a browser
  • Could possibly run periodically with a Github action

Idea:

  • Suppose we could display a colored ring around the picture of anyone who is not currently hired (known by scraping LinkedIn)
  • Suppose we extract their LinkedIn Top Skills (these are always five)
    • In this case, it could be known to everyone as we announce CMPDocs to other classes about this feature
    • If someone working at company X in any class, notices that someone else whether in their class or not with the "looking for job ring" and their top skills match what's needed for a role in X, then they could take the initiative to recommend them at the company.

Further Motivation:
This solve the problem that a CMP graduate: (i), may not know who is looking for a job in the class/department and its not easy to manually enumerate this and (ii), their company is looking for employees and (iii), they would be willing to recommend someone in the class/department if it wasn't for (i).

Bonus Features:
Could also scrap information such as current job (e.g., for stats or just viewing) and profile picture (to close #29)

Formal Description:
Given a list of LinkedIn profile links, extract the profile image, current role (if hired) and top skills (if presented) for each. This information will be made use of in the class page.

Feasibility:
In a friendly chat, I discussed this idea with Tarek @KnockerPulsar who was also helping in scraping another website with Selenium in another project. I requested that he confirms the feasibility of this, and he (thankfully) confirmed that it could be achieved with Selenium.

What do you think of the proposal @Iten-No-404 @KnockerPulsar ?

@KnockerPulsar Do Github Actions support the Chrome/browser driver needed for Selenium?

@Iten-No-404
Copy link
Collaborator

  • Well firstly, I think it could benefit a decent amount of students and not necessarily graduates. It could help students who are looking for an internship or a part-time job (in third or fourth year especially) if applied to younger classes as well. Since they tend to have a clearer view of what they want to specialize in.

  • Secondly, I like the idea of adding more details about each student and specifically their field, skills, and their current job. It would invite more younger students to seek advice from the ones that already walk a path similar to the one they'd like to go on later in their career.

  • Thirdly, it would fix and close 🪲 Bug regarding squished images #29 since it seems that LinkedIn changes the addresses for the profile picture images periodically for some reason.

  • Finally, the main problem I foresee of using Selenium or any web scraper in general is that some LinkedIn accounts are private. Since it is not the general case though, I don't think it would cause any major problems and if need be, the handful of private LinkedIn profiles can just be scraped manually. And since Tarek says it's achievable, then it's almost certainly is.

All in all, I think it's a formidable idea and would be a great addition to the website.

@EssamWisam
Copy link
Owner Author

It could help students who are looking for an internship or a part-time job

Yes, absolutely, thanks for shedding light on that. I emphasized graduates because this is where it hurts more when someone doesn't find a job; meanwhile, much more tolerable (and joke salaries anyway) for younger students. But Again, your point is perfectly valid.

It would invite more younger students to seek advice from the ones that already walk a path similar...

Another good point 👌🏻👌🏻.

Thirdly, it would fix and close #29

Thanks for re-emphasizing

some LinkedIn accounts are private...can just be scraped manually

Well, in my opinion these could be completely ignored because they chose to make their profile private (unless they want to do it manually themselves via PR with changes to their profile only). We could mention at the top of the page that any profile is expected to be public and mention the five top skills that LinkedIn asks for (anyone not complying with that may then be not interested).

All in all, I think it's a formidable idea and would be a great addition to the website.

Thanks. Unless Tarek is interested in working on it (he may be), I have no issues to schedule myself for it in the upcoming weeks.

@Iten-No-404
Copy link
Collaborator

I emphasized graduates because this is where it hurts more when someone doesn't find a job

You're absolutely right.

in my opinion these could be completely ignored because they chose to make their profile private...anyone not complying with that may then be not interested.

Alright, I see your point and I agree.

Thanks. Unless Tarek is interested in working on it (he may be), I have no issues to schedule myself for it in the upcoming weeks.

Great and you're welcome. I have no previous experience with Selenium but if there's something you think I can help with, feel free to let me know.

@C-Nubela
Copy link

Hey there @Iten-No-404 @EssamWisam

I work for Proxycurl, a B2B data provider that extensively scrapes LinkedIn, and I just wanted to chime in:

You're gonna' have a tough time scraping LinkedIn. Be prepared to deal with proxies, cookies, rotating LinkedIn accounts, and beyond.

That said, our whole thing at Proxycurl is taking care of the headache that is scraping LinkedIn for you.

We offer several endpoints that you could integrate into your product, such as our Person Profile Endpoint, which could grab details like work history, skills, and beyond.

Send us an email to "hello@nubela.co" if you have any questions!

@EssamWisam
Copy link
Owner Author

@C-Nubela
Thanks for letting us know. We will surely get in touch should we conclude that our budget and current scraping abilities require that.

@EssamWisam
Copy link
Owner Author

@KnockerPulsar Last think I heard from you was decent progress towards this. Can I have a report of how far along are we on the scraping part of this feature?

@EssamWisam
Copy link
Owner Author

@KnockerPulsar Understand you can be busy but could you make a PR with the work so far...

@KnockerPulsar
Copy link
Collaborator

@KnockerPulsar Understand you can be busy but could you make a PR with the work so far...

My sincerest apologies. I really cannot apologize for this delay. I'll try to make a PR tomorrow after work.

@EssamWisam
Copy link
Owner Author

@KnockerPulsar
Thanks. I'm looking forward to have this feature be up by the finals. Hopefully, it will see real use after that.

@Iten-No-404
Copy link
Collaborator

@EssamWisam, @KnockerPulsar.
I believe we're missing one last thing to close this issue: Preparing a GitHub action to run the script automatically on all the available yaml files (the English ones only since the Arabic are already mapped accordingly) for example once every 2 weeks.
I am going to give it a try on a new branch. Let me know if you have any thoughts regarding this.

@EssamWisam
Copy link
Owner Author

Indeed, I don't know why I thought initially that Github actions may not support installing a browser driver in the first place which is necessary to the script.

I think if we can it into a Github action, it's easier to configure the periodic running time while not worrying about anything since it can support running every six hours (but LinkedIn won't be happy). So we can even make it every three days or something in that case.

Okay good luck and don't hestitate to mention if any assistance is needed.

@Iten-No-404
Copy link
Collaborator

Iten-No-404 commented May 25, 2024

Okay, so I checked creating a workflow for running the LinkedIn Scraping Script. You can find the latest version of the YAML file here.

The Chrome Driver didn't take much time to set up. I am facing a different problem though. From my understanding, when you try to login from a new IP/MAC address, they give you the following check:
image
image

I don't think this can be bypassed by scripting and unless we create a dedicated container that has a stable IP/MAC address where we have logged in manually once before, I don't see any other way of overcoming this. So, until we come up with another idea, the script can be run locally once a week and its outputs pushed as any normal commit.

@EssamWisam & @KnockerPulsar, let me know if you have any ideas.

@EssamWisam
Copy link
Owner Author

I think it's completely fine for us to run the script locally. I will likely just make a commit that makes it support Microsoft Edge as well as I don't use chrome. Maybe we can rather make a Github action that runs every two week and makes a Pull request asking us (reminding us) to run the script.

In the other issue related to this, we can also add the steps for running the script (which are quite simple).

Other than that, I wonder, does this help: https://stackoverflow.com/questions/66970875/is-it-possible-to-use-a-static-ip-when-using-github-actions

@Iten-No-404
Copy link
Collaborator

I think it's completely fine for us to run the script locally. I will likely just make a commit that makes it support Microsoft Edge as well as I don't use chrome. Maybe we can rather make a Github action that runs every two week and makes a Pull request asking us (reminding us) to run the script.

Sounds good!

In the other issue related to this, we can also add the steps for running the script (which are quite simple).

They can be easily added to the README file.

Other than that, I wonder, does this help: https://stackoverflow.com/questions/66970875/is-it-possible-to-use-a-static-ip-when-using-github-actions

This mentions 2 separate ideas: Larger Runners and Self-hosted Runners. Neither documentation stated explicitly whether we can access the UI of the VM or not. I believe that there is a chance that Self-hosted Runners can be accessed as UI and so, login manually and pass the verification the first time before running the script. Either way, this needs further investigation.

Regarding this current issue, how about we close it and create a new one purely concerned with automatically running the script? That's if we decide to go through with it.

@EssamWisam
Copy link
Owner Author

OK. We can close this; I only delayed responding because I initiated communication with one credit friend, that may have experience, regarding this issue but they seem to be busy (exams time). Will come back to either here or another issue if we make one if my friend responds.

As for adding to the README, I was regarding this issue as a candidate for the LinkedIn feature-specific stuff. Just to keep the original README more simple for the broader audience that won't have much to do with adding their class or running the script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants