Comments (30)
Go Ahead @kshittijagrawal
from amazing-python-scripts.
Hey @kaustubhgupta !
I wrote a script recently but it uses selenium to scrape data from the HTML tags and data received from the server end. Currently am trying to figure out a way using an API. Please give me a day or two to do so.
from amazing-python-scripts.
@XZANATOL The previous contributor was given 11 days to complete the task but he failed to make it. You will be given the usual 1 week where I will ask about the progress on the 3rd day.
from amazing-python-scripts.
@XZANATOL You can complete the documentation and update the PR at your pace 😃. I will review your code in PR only as I have to review a lot of PRs at the moment. I really appreciate your efforts to upload the code on but it's easier for me to suggest the changes in GitHub than the actual code on the drive 😃.
P.S: It's currently 11:20 am here and I just woke up to review PRs 😁
from amazing-python-scripts.
Well, You can take a rest from me as I'm on my way college right now 😂
Good luck with the reviewing :D
from amazing-python-scripts.
hey I would like to work on this. Can you assign it to me?
from amazing-python-scripts.
@lara-sahoo ..Sure.. Go ahead.
from amazing-python-scripts.
Hi, I would Like to contribute more to this.
Can I have a go on this if the person before has not delivered?
and you can assign me this one
Thank You
from amazing-python-scripts.
@hardikkhurana It's already assigned.. You can work on a reddit scraper if you want to..
from amazing-python-scripts.
@avinashkranjan I cannot find an issue or a readme file for that scraper.
can you pls send some link for that or explain what u have to scrap
Thank You
from amazing-python-scripts.
@hardikkhurana You can write script to scrape subreddits top 10 or 20 posts and if you want it user friendly I can add an option that will enable user to choose the type of the posts he/she want. Create a Issue for the Same..!
from amazing-python-scripts.
@lara-sahoo Any Updates??
from amazing-python-scripts.
@hardikkhurana Do You Still want to work on this?
from amazing-python-scripts.
i want to work on this issue please assign it to me.
from amazing-python-scripts.
@avinashkranjan Can you please assign this issue to me as a SWOC 2021 participant?
from amazing-python-scripts.
Issues are First Come First Serve.. So It's assigned to @shubhigupta991 she commented first..
Assigned to you @shubhigupta991
from amazing-python-scripts.
Any updates @shubhigupta991
from amazing-python-scripts.
@avinashkranjan I would like to work on this issue please assign this issue to me.
from amazing-python-scripts.
Sure Go ahead @pritamp17
from amazing-python-scripts.
Any Updates @pritamp17
from amazing-python-scripts.
Hey there!
I'd like to contribute to the project. I want to address and fix this issue. My implementation will be in python with a clean code and documentation. Kindly provide me with this issue.
I'm also a GSSOC'21 participant.
from amazing-python-scripts.
@kshittijagrawal Updates on this?
from amazing-python-scripts.
@kaustubhgupta
will make a PR by tomorrow :)
from amazing-python-scripts.
@kshittijagrawal Should I make it "up-for-grab" or you are still trying this out?
from amazing-python-scripts.
Okay take your time 💯
from amazing-python-scripts.
@kshittijagrawal updates? It's already 2 days. If you fail to make the PR today, then this issue will be up-for-grab
from amazing-python-scripts.
Hi, I'm interested in building this project. I did some research and turned out using LinkedIn Developers API is not the ideal approach as it limits what you can get access to. One of these limitations is actually getting "My Connections" list. So, the another approach is using selenium and regex for scraping webpages with the help of ChromeDriver.
Advantages
- User doesn't have to bother himself with API tokens setup,etc.. just his account username and password will be enough.
- ChromeDriver saves cookies like any other normal browser. So, This facilates going more deeper and make scraps on each connection profile.
- The script will have the support of input arguments, so that the user doesn't have to go step by step to enter username and password, Oneline will save some time. :D
Disadvantages
- Every profile has its own pattern of showcasing their headlines, so it's programmatically not possible to even use Regex to get latest job position, but we can append the whole headline into one cell.
- Saving skills of each profile will be time consuming, as the script will have to visit each profile 1 by 1 and extract the skill set (Many accounts these days have 500+ connections). We can add an argument here to save the skills if the user -wants to-. (Will be declared in the Documentation)
Furthur notes
- I am beginning my second semester in college tomorrow, so I can't garantee that I can finish the script in 3 days as it was mentioned with the previous contributor. However I will try my best to finish it by the end of this week, If not less. I will keep you updated.
from amazing-python-scripts.
@kaustubhgupta Hi there, :D
I've some good news, the script is 80% ready, I managed to get the connections list with their corresponding Names, & Headlines. I've added the option to get the Top-skills of each profile, I just need to append all in a 1 nice and tidy dataframe and save it into a CSV file. However I'm facing a small challenge which I can't figure out a solution to it.
LinkedIn Ajax calls activates only when you scroll down to its specific area, that's what I've mostly done to get the full connection list. I was trying to do the same for each profile to be able to click the "Show more" button in the skills section. Everything was going fine, but at the moment of selecting that button, it seems like I can't locate/click is somehow. I printed the error message and seems like something supposed to recieve that click (which I tried to locate but the script couldn't). The error was:
Other element would receive the click and had the tag name <use>
I tried to use the "driver.switch_to.active_element" method, but turned out it selected the one on the Interests section! 😂
I thought to select one unactive element that lays before the button and then reuse the same method, but whatever I select, it still skips to the button that's in the Interests section!
I can go on and skip extracting detailed info, but I'm very curious what is causing the skip of that button. any ideas?
from amazing-python-scripts.
@XZANATOL Hmm, I can't give the exact solution for this until I debug the code myself. For now, you can skip this part of extracting detailed instructions and make the PR. However, if you want me to debug it for you, then comment out that logic and put the reference of this GitHub comment in that comment. If I get some solution then I will discuss that with you and if not, then we will drop this idea. Cool?
from amazing-python-scripts.
@kaustubhgupta I did figure out the solution before I go to sleep yesterday. xd
Turned out Selenium doesn't only depend on code, but also low-level mouse integerations. Selenium uses grid to click buttons. The button was getting selected as the code printed out, but everytime my screen had the "show more" in the interested section showed, so using "driver.switch_to.active_element" will just skip to the only active element shown on the user screen. All what was have to be done, was to load the actions library and scroll to the element and just click it. :D
I did finish the script tho, but didn't finish the documentation yet. I uploaded the script with some result files I used from my profile here:
https://drive.google.com/drive/folders/1AdSRNXmhWoU81ylOc0jlHAvNcK6Hry6F?usp=sharing
The CSV file contains everything but the skills because of its time consuming, so I extracted that piece of code and tried it on my own profile and It worked as expected. There is a "CAPTURE.PNG" file containing the printed list of my skills scrapped from the profile. Feel free to test it out and inform me of any bugs you notice, and I will be online this night to finish the documentation and make the PR.
Note:
Place the chromdriver in the same directory of the script, here is the download link:
https://sites.google.com/a/chromium.org/chromedriver/downloads
Download the one that has the same version of chrome on your machine.
from amazing-python-scripts.
Related Issues (20)
- [Script]: Reddit Website Scraper HOT 5
- [Script]: Predicting Video Game Global Sales
- [Script]: Adding the StackOverflow scrapper HOT 2
- [Script]: Text Summarizer using TextRank (NLP) HOT 1
- [Script]: Codebreaker Challenge game
- [Script]: HOT 1
- [Feat]: Improve Documentation with READMEs for Individual Scripts HOT 1
- Predict Hotel Cancellation HOT 1
- [Feat]:demo HOT 1
- [Feat]: Table of Content (Index) HOT 2
- [Script]: Mental Health Classification HOT 1
- Improper Dark Mode in Contact Us page HOT 2
- Inconsistent Top-scroll button in bottom-right corner HOT 1
- [Bug]: Syntax Error in Automated Google Search HOT 1
- [Bug]: Replace the github logo image with the png version instead of squared background HOT 2
- [Feat]: Make cursor pointer on hovering the arrows HOT 2
- [Bug]: Lyrics Finder directory issue while cloning repo HOT 2
- Suggestion for Code Refactoring and Enhancement in Anime Tracker python project HOT 2
- Need help with pull request HOT 1
- [Script]: Google-Meet-Scheduler HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazing-python-scripts.