Git Product home page Git Product logo

Comments (30)

avinashkranjan avatar avinashkranjan commented on May 30, 2024 1

Go Ahead @kshittijagrawal

from amazing-python-scripts.

kshittijagrawal avatar kshittijagrawal commented on May 30, 2024 1

Hey @kaustubhgupta !
I wrote a script recently but it uses selenium to scrape data from the HTML tags and data received from the server end. Currently am trying to figure out a way using an API. Please give me a day or two to do so.

from amazing-python-scripts.

kaustubhgupta avatar kaustubhgupta commented on May 30, 2024 1

@XZANATOL The previous contributor was given 11 days to complete the task but he failed to make it. You will be given the usual 1 week where I will ask about the progress on the 3rd day.

from amazing-python-scripts.

kaustubhgupta avatar kaustubhgupta commented on May 30, 2024 1

@XZANATOL You can complete the documentation and update the PR at your pace 😃. I will review your code in PR only as I have to review a lot of PRs at the moment. I really appreciate your efforts to upload the code on but it's easier for me to suggest the changes in GitHub than the actual code on the drive 😃.

P.S: It's currently 11:20 am here and I just woke up to review PRs 😁

from amazing-python-scripts.

XZANATOL avatar XZANATOL commented on May 30, 2024 1

Well, You can take a rest from me as I'm on my way college right now 😂
Good luck with the reviewing :D

from amazing-python-scripts.

lara-sahoo avatar lara-sahoo commented on May 30, 2024

hey I would like to work on this. Can you assign it to me?

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

@lara-sahoo ..Sure.. Go ahead.

from amazing-python-scripts.

hardikkhurana avatar hardikkhurana commented on May 30, 2024

Hi, I would Like to contribute more to this.
Can I have a go on this if the person before has not delivered?
and you can assign me this one
Thank You

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

@hardikkhurana It's already assigned.. You can work on a reddit scraper if you want to..

from amazing-python-scripts.

hardikkhurana avatar hardikkhurana commented on May 30, 2024

@avinashkranjan I cannot find an issue or a readme file for that scraper.
can you pls send some link for that or explain what u have to scrap
Thank You

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

@hardikkhurana You can write script to scrape subreddits top 10 or 20 posts and if you want it user friendly I can add an option that will enable user to choose the type of the posts he/she want. Create a Issue for the Same..!

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

@lara-sahoo Any Updates??

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

@hardikkhurana Do You Still want to work on this?

from amazing-python-scripts.

shubhigupta991 avatar shubhigupta991 commented on May 30, 2024

i want to work on this issue please assign it to me.

from amazing-python-scripts.

amandp13 avatar amandp13 commented on May 30, 2024

@avinashkranjan Can you please assign this issue to me as a SWOC 2021 participant?

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

Issues are First Come First Serve.. So It's assigned to @shubhigupta991 she commented first..
Assigned to you @shubhigupta991

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

Any updates @shubhigupta991

from amazing-python-scripts.

pritamp17 avatar pritamp17 commented on May 30, 2024

@avinashkranjan I would like to work on this issue please assign this issue to me.

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

Sure Go ahead @pritamp17

from amazing-python-scripts.

avinashkranjan avatar avinashkranjan commented on May 30, 2024

Any Updates @pritamp17

from amazing-python-scripts.

kshittijagrawal avatar kshittijagrawal commented on May 30, 2024

Hey there!
I'd like to contribute to the project. I want to address and fix this issue. My implementation will be in python with a clean code and documentation. Kindly provide me with this issue.
I'm also a GSSOC'21 participant.

from amazing-python-scripts.

kaustubhgupta avatar kaustubhgupta commented on May 30, 2024

@kshittijagrawal Updates on this?

from amazing-python-scripts.

kshittijagrawal avatar kshittijagrawal commented on May 30, 2024

@kaustubhgupta
will make a PR by tomorrow :)

from amazing-python-scripts.

kaustubhgupta avatar kaustubhgupta commented on May 30, 2024

@kshittijagrawal Should I make it "up-for-grab" or you are still trying this out?

from amazing-python-scripts.

kaustubhgupta avatar kaustubhgupta commented on May 30, 2024

Okay take your time 💯

from amazing-python-scripts.

kaustubhgupta avatar kaustubhgupta commented on May 30, 2024

@kshittijagrawal updates? It's already 2 days. If you fail to make the PR today, then this issue will be up-for-grab

from amazing-python-scripts.

XZANATOL avatar XZANATOL commented on May 30, 2024

Hi, I'm interested in building this project. I did some research and turned out using LinkedIn Developers API is not the ideal approach as it limits what you can get access to. One of these limitations is actually getting "My Connections" list. So, the another approach is using selenium and regex for scraping webpages with the help of ChromeDriver.

Advantages

  • User doesn't have to bother himself with API tokens setup,etc.. just his account username and password will be enough.
  • ChromeDriver saves cookies like any other normal browser. So, This facilates going more deeper and make scraps on each connection profile.
  • The script will have the support of input arguments, so that the user doesn't have to go step by step to enter username and password, Oneline will save some time. :D

Disadvantages

  • Every profile has its own pattern of showcasing their headlines, so it's programmatically not possible to even use Regex to get latest job position, but we can append the whole headline into one cell.
  • Saving skills of each profile will be time consuming, as the script will have to visit each profile 1 by 1 and extract the skill set (Many accounts these days have 500+ connections). We can add an argument here to save the skills if the user -wants to-. (Will be declared in the Documentation)

Furthur notes

  • I am beginning my second semester in college tomorrow, so I can't garantee that I can finish the script in 3 days as it was mentioned with the previous contributor. However I will try my best to finish it by the end of this week, If not less. I will keep you updated.

from amazing-python-scripts.

XZANATOL avatar XZANATOL commented on May 30, 2024

@kaustubhgupta Hi there, :D
I've some good news, the script is 80% ready, I managed to get the connections list with their corresponding Names, & Headlines. I've added the option to get the Top-skills of each profile, I just need to append all in a 1 nice and tidy dataframe and save it into a CSV file. However I'm facing a small challenge which I can't figure out a solution to it.

LinkedIn Ajax calls activates only when you scroll down to its specific area, that's what I've mostly done to get the full connection list. I was trying to do the same for each profile to be able to click the "Show more" button in the skills section. Everything was going fine, but at the moment of selecting that button, it seems like I can't locate/click is somehow. I printed the error message and seems like something supposed to recieve that click (which I tried to locate but the script couldn't). The error was:

Other element would receive the click and had the tag name <use>

I tried to use the "driver.switch_to.active_element" method, but turned out it selected the one on the Interests section! 😂
I thought to select one unactive element that lays before the button and then reuse the same method, but whatever I select, it still skips to the button that's in the Interests section!

I can go on and skip extracting detailed info, but I'm very curious what is causing the skip of that button. any ideas?

from amazing-python-scripts.

kaustubhgupta avatar kaustubhgupta commented on May 30, 2024

@XZANATOL Hmm, I can't give the exact solution for this until I debug the code myself. For now, you can skip this part of extracting detailed instructions and make the PR. However, if you want me to debug it for you, then comment out that logic and put the reference of this GitHub comment in that comment. If I get some solution then I will discuss that with you and if not, then we will drop this idea. Cool?

from amazing-python-scripts.

XZANATOL avatar XZANATOL commented on May 30, 2024

@kaustubhgupta I did figure out the solution before I go to sleep yesterday. xd

Turned out Selenium doesn't only depend on code, but also low-level mouse integerations. Selenium uses grid to click buttons. The button was getting selected as the code printed out, but everytime my screen had the "show more" in the interested section showed, so using "driver.switch_to.active_element" will just skip to the only active element shown on the user screen. All what was have to be done, was to load the actions library and scroll to the element and just click it. :D

I did finish the script tho, but didn't finish the documentation yet. I uploaded the script with some result files I used from my profile here:
https://drive.google.com/drive/folders/1AdSRNXmhWoU81ylOc0jlHAvNcK6Hry6F?usp=sharing
The CSV file contains everything but the skills because of its time consuming, so I extracted that piece of code and tried it on my own profile and It worked as expected. There is a "CAPTURE.PNG" file containing the printed list of my skills scrapped from the profile. Feel free to test it out and inform me of any bugs you notice, and I will be online this night to finish the documentation and make the PR.

Note:
Place the chromdriver in the same directory of the script, here is the download link:
https://sites.google.com/a/chromium.org/chromedriver/downloads
Download the one that has the same version of chrome on your machine.

from amazing-python-scripts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.