-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💡 Use LinkedIn to Find Current Positions and Facilitate Recommendations [Proposal] #33
Comments
All in all, I think it's a formidable idea and would be a great addition to the website. |
Yes, absolutely, thanks for shedding light on that. I emphasized graduates because this is where it hurts more when someone doesn't find a job; meanwhile, much more tolerable (and joke salaries anyway) for younger students. But Again, your point is perfectly valid.
Another good point 👌🏻👌🏻.
Thanks for re-emphasizing
Well, in my opinion these could be completely ignored because they chose to make their profile private (unless they want to do it manually themselves via PR with changes to their profile only). We could mention at the top of the page that any profile is expected to be public and mention the five top skills that LinkedIn asks for (anyone not complying with that may then be not interested).
Thanks. Unless Tarek is interested in working on it (he may be), I have no issues to schedule myself for it in the upcoming weeks. |
You're absolutely right.
Alright, I see your point and I agree.
Great and you're welcome. I have no previous experience with Selenium but if there's something you think I can help with, feel free to let me know. |
Hey there @Iten-No-404 @EssamWisam I work for Proxycurl, a B2B data provider that extensively scrapes LinkedIn, and I just wanted to chime in: You're gonna' have a tough time scraping LinkedIn. Be prepared to deal with proxies, cookies, rotating LinkedIn accounts, and beyond. That said, our whole thing at Proxycurl is taking care of the headache that is scraping LinkedIn for you. We offer several endpoints that you could integrate into your product, such as our Person Profile Endpoint, which could grab details like work history, skills, and beyond. Send us an email to "hello@nubela.co" if you have any questions! |
@C-Nubela |
@KnockerPulsar Last think I heard from you was decent progress towards this. Can I have a report of how far along are we on the scraping part of this feature? |
@KnockerPulsar Understand you can be busy but could you make a PR with the work so far... |
My sincerest apologies. I really cannot apologize for this delay. I'll try to make a PR tomorrow after work. |
@KnockerPulsar |
@EssamWisam, @KnockerPulsar. |
Indeed, I don't know why I thought initially that Github actions may not support installing a browser driver in the first place which is necessary to the script. I think if we can it into a Github action, it's easier to configure the periodic running time while not worrying about anything since it can support running every six hours (but LinkedIn won't be happy). So we can even make it every three days or something in that case. Okay good luck and don't hestitate to mention if any assistance is needed. |
Okay, so I checked creating a workflow for running the LinkedIn Scraping Script. You can find the latest version of the YAML file here. The Chrome Driver didn't take much time to set up. I am facing a different problem though. From my understanding, when you try to login from a new IP/MAC address, they give you the following check: I don't think this can be bypassed by scripting and unless we create a dedicated container that has a stable IP/MAC address where we have logged in manually once before, I don't see any other way of overcoming this. So, until we come up with another idea, the script can be run locally once a week and its outputs pushed as any normal commit. @EssamWisam & @KnockerPulsar, let me know if you have any ideas. |
I think it's completely fine for us to run the script locally. I will likely just make a commit that makes it support Microsoft Edge as well as I don't use chrome. Maybe we can rather make a Github action that runs every two week and makes a Pull request asking us (reminding us) to run the script. In the other issue related to this, we can also add the steps for running the script (which are quite simple). Other than that, I wonder, does this help: https://stackoverflow.com/questions/66970875/is-it-possible-to-use-a-static-ip-when-using-github-actions |
Sounds good!
They can be easily added to the README file.
This mentions 2 separate ideas: Larger Runners and Self-hosted Runners. Neither documentation stated explicitly whether we can access the UI of the VM or not. I believe that there is a chance that Regarding this current issue, how about we close it and create a new one purely concerned with automatically running the script? That's if we decide to go through with it. |
OK. We can close this; I only delayed responding because I initiated communication with one credit friend, that may have experience, regarding this issue but they seem to be busy (exams time). Will come back to either here or another issue if we make one if my friend responds. As for adding to the README, I was regarding this issue as a candidate for the LinkedIn feature-specific stuff. Just to keep the original README more simple for the broader audience that won't have much to do with adding their class or running the script. |
Background Information:
Idea:
Further Motivation:
This solve the problem that a CMP graduate: (i), may not know who is looking for a job in the class/department and its not easy to manually enumerate this and (ii), their company is looking for employees and (iii), they would be willing to recommend someone in the class/department if it wasn't for (i).
Bonus Features:
Could also scrap information such as current job (e.g., for stats or just viewing) and profile picture (to close #29)
Formal Description:
Given a list of LinkedIn profile links, extract the profile image, current role (if hired) and top skills (if presented) for each. This information will be made use of in the class page.
Feasibility:
In a friendly chat, I discussed this idea with Tarek @KnockerPulsar who was also helping in scraping another website with Selenium in another project. I requested that he confirms the feasibility of this, and he (thankfully) confirmed that it could be achieved with Selenium.
What do you think of the proposal @Iten-No-404 @KnockerPulsar ?
@KnockerPulsar Do Github Actions support the Chrome/browser driver needed for Selenium?
The text was updated successfully, but these errors were encountered: