0xprateek / stardox Goto Github PK
View Code? Open in Web Editor NEWGithub stargazers information gathering tool
License: GNU General Public License v3.0
Github stargazers information gathering tool
License: GNU General Public License v3.0
The current code in if __name__ == '__main__':
part can be made into a function, since if we add new features where this part is not required, it will not be called. (For example, in fetching details using username, this part will not be required.)
Also, to add new arguments, I suggest making a separate arguments.py file, which will store the action
classes of new arguments.
Increase code readability by adding comments and by change in code style.
Use PEP-8 standard for formatting code.
you can read about it here.
Logging of all the;
Stardox should run even if instead of giving the complete link to the repository, only the format owner/repo-name
is entered.
The user will have an option to enter complete link or in this format.
Along with the other details of each stargazer's github profile, if the bio and location is also showed, it'll help to know them better.
In the README there are a few typos, i.e. "It scraps Github" and "information of yours/someone's"
I might be wrong on this one, but if this app doesn't use Github's security tokens, fetching member info from various repositories is severely limited to just a few entries no matter what.
Top right avatar / Settings / Developer settings / Personal access tokens / Generate new token
I think this raises the number from 60 entries to something like 1500 per hour. I've actually bumped into some similar app that was also working around the limitations by waiting, which is also handy!
Looks like a great tool! After following the installation instructions I run:
$ python3 stardox.py
The Stardox logo appears, then the following error appears:
[-] Error importing requests module.
Suggestions appreciated!
As a new feature, using the username to fetch someone's github profile details can prove to be helpful. Also, their repositories will be listed and then one can look at the repo information.
A new feature to get the details of all the contributors of a repository.
The current approach of command-line arguments in our code is not feasible for adding a new argument.
writing a new easy to implement command-line argument code is required to make everything simple out there.
Make sure to update your changes and usage method in readme.md
Making it to use via command line arguments.
Stardox takes a lot of time to come up with the results. This issue is made for resolving the speed problem.
I have listed things we can do for speeding it up (Or for the fast mode.) :
The repo has 2.3k stargazers, but it can only show me 1,192 stargazers' info.
When exporting the results to csv, I am not receiving more than 1201 rows returned.
Traceback (most recent call last):
File "stardox.py", line 346, in
stardox(repository_link, verbose, max_threads=16)
File "stardox.py", line 232, in stardox
soup1 = BeautifulSoup(html, "lxml")
NameError: name 'BeautifulSoup' is not defined
An --email only
flag is required as it's requested by many users of stardox.
It will give us only emails of the stargazers.
With this feature, users will be able to save the doxed stargazer's information into a text file.
Increase the speed of scrapper by using multithreding.
Adding windows colors support.
Use something like pipenv
?
It's not a good practice to install requirements globally for an application. We can use a virtual environment where the dependencies are installed only for the application, not globally.
This also reduces errors due to environment.
Stardox currently supports linux platforms only and is not tested for windows platform. Looking forward to make it compatible for windows users as well.
while user enters wrong info or wrong repository cli exit the script , instead of exiting cli we should give atleast 3 attempts ,or atleast one attempt before exiting which will create more ease of entering info to the user .
If the repo only has 1 stargazer,cannot get any info.
scrap the email id of stargazer and display it in tree list view.
First of all, thank you...
I've tried this on a few repos and am getting this response:
Traceback (most recent call last): File "stardox.py", line 384, in <module> stardox(repository_link,verbose,issave) File "stardox.py", line 327, in stardox structer.plotdata(len(data.username_list), pos, count) File "/Users/xxxxxxxxx/Desktop/Stardox/src/structer.py", line 20, in plotdata data.star_list[pos].strip(), IndexError: list index out of range
Any ideas?
Using this for the first time and it seems to be a bit bad with the numbers when it's used on big repositories. For example the stargazers for this repo are actually 1000x bigger
Enter the repository address :: https://github.com/freeCodeCamp/freeCodeCamp
[+] Got the repository data
[+] Repository Title : freeCodeCamp
[+] Total watchers : 83
[+] Total stargazers : 306
[+] Total Forks : 231
[] Fetching stargazers list] Doxing started ...
[
This fails to work on my linux machine for some reason...
dread@FreezingMoon:
$ git clone https://github.com/0xprateek/stardox$ cd stardox/
Cloning into 'stardox'...
remote: Enumerating objects: 28, done.
remote: Counting objects: 100% (28/28), done.
remote: Compressing objects: 100% (27/27), done.
remote: Total 211 (delta 15), reused 4 (delta 0), pack-reused 183
Receiving objects: 100% (211/211), 79.87 KiB | 614.00 KiB/s, done.
Resolving deltas: 100% (110/110), done.
dread@FreezingMoon:
dread@FreezingMoon:/stardox$ pip install -r requirements.txt/stardox$ cd src/
Collecting requests (from -r requirements.txt (line 1))
Using cached https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl
Collecting beautifulsoup4 (from -r requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/f9/d9/183705a87492249b212d88eef740995f55076195bcf45ed59306c146e42d/beautifulsoup4-4.8.1-py2-none-any.whl
Collecting lxml (from -r requirements.txt (line 3))
Using cached https://files.pythonhosted.org/packages/e4/f4/65d145cd6917131826050b0479be35aaccba2847b7f80fc4afc6bec6616b/lxml-4.4.1-cp27-cp27mu-manylinux1_x86_64.whl
Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests->-r requirements.txt (line 1))
Using cached https://files.pythonhosted.org/packages/b4/40/a9837291310ee1ccc242ceb6ebfd9eb21539649f193a7c8c86ba15b98539/urllib3-1.25.7-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests->-r requirements.txt (line 1))
Using cached https://files.pythonhosted.org/packages/18/b0/8146a4f8dd402f60744fa380bc73ca47303cccf8b9190fd16a827281eac2/certifi-2019.9.11-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests->-r requirements.txt (line 1))
Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.9,>=2.5 (from requests->-r requirements.txt (line 1))
Using cached https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl
Collecting soupsieve>=1.2 (from beautifulsoup4->-r requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Collecting backports.functools-lru-cache; python_version < "3" (from soupsieve>=1.2->beautifulsoup4->-r requirements.txt (line 2))
Using cached https://files.pythonhosted.org/packages/da/d1/080d2bb13773803648281a49e3918f65b31b7beebf009887a529357fd44a/backports.functools_lru_cache-1.6.1-py2.py3-none-any.whl
Installing collected packages: urllib3, certifi, chardet, idna, requests, backports.functools-lru-cache, soupsieve, beautifulsoup4, lxml
Successfully installed backports.functools-lru-cache-1.6.1 beautifulsoup4-4.8.1 certifi-2019.9.11 chardet-3.0.4 idna-2.8 lxml-4.4.1 requests-2.22.0 soupsieve-1.9.5 urllib3-1.25.7
dread@FreezingMoon:
dread@FreezingMoon:~/stardox/src$ python3 stardox.py
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
sssssss ssssssssss ss ss sss ss sss ss ss ss sss sss
sssssss ssss sss sss sss ss sss ss ss ss ss ss
ssssssssssssss ssss sss sss sss ss sss ss ss ss ss ss
ssssssssssssss ssss sssssssssss sssssssssss sss ss ss ss ssss
ssss ssss sssssssssss sssssss sss ss ss ss ssss
ssss ssss sss sss sss sss sss ss ss ss ss ss
ssssssssssssss ssss sss sss sss sss sss ss ssssssssss ss ss
sssssssssssssss ssss sss sss sss sss sssssssss ssssssssss sss sss Made By : Pr0t0n
[-] Error importing requests module.
dread@FreezingMoon:~/stardox/src$
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.