Comments (10)
What's the link so that I can try to reproduce it? Also can you provide more information such as
- Operating System
- Which version of TorBot that you're using?
- How you're executing the application?
- TOR configuration
from torbot.
Thanks for the quick reply!
I am trying the links :
http://alphvmmm27o3abo3r2mlmjrpdmzle3rykajqc5xsj7j7ejksbpsa36ad.onion/
http://noescapemsqxvizdxyl7f7rmg5cdjwp33pg2wpmiaaibilb4btwzttad.onion/
Operating System : Ubuntu 22
Which version of TorBot that you're using? : current dev version. i git cloned it
How you're executing the application?
python3 torbot -u http://website.onion --depth 2
TOR configuration : default config
sudo apt install tor
sudo service tor start
Also, is there a way to crawl based on a text file of email addresses?
from torbot.
You're welcome and thanks for providing the information, I'll look into it later today or sometime this week. There is no feature to crawl email addresses, the current program operates on HTML retrieved from sites so I don't know how that would be possible with email addresses but if you have suggestions for a new feature then feel free to submit a ticket and it'll be looked into. If you already know how the feature should be implemented then you can take a crack at it and submit a pull request to the repo.
from torbot.
correction, text file of websites* not email addresses. And thanks for looking into it. ill go and mess around with the settings and see what happens. 2 other things :
- Is it recommended to amend the torcc config file? Because i didnt touch that and all
- Can I get a link to the slack channel? The link on the main page has expired.
Thanks once again!
from torbot.
- It's your choice. I've created CLI flags to dynamically define the SOCKS5 proxy when instantiating the HTTPS client.
- The link should still work, but the Slack channel is not highly used. If you have suggestions, thoughts, or problems. You'll likely get the quickest response from posting here.
from torbot.
There's no way for us to crawl multiple websites at once right?
from torbot.
Not currently, it'd probably be a fairly straightforward feature to implement but no one has requested it. If you want to know what's possible or not, check the README. If you have ideas or suggestions, create a new ticket.
from torbot.
Or build it out yourself and submit it if you're capable.
from torbot.
I checked the URLs and the reason why it's only returning the host domain is that all of the links are paths within the same domain. The scraper looks for unique host domains that are fully qualified URIs. All of the links are paths to the same domain, not different sites.
from torbot.
I'll look into modifying the feature to identify paths.
from torbot.
Related Issues (20)
- Replace ete3 with plotly HOT 4
- Screenshot capture feature HOT 7
- Replace bs4 with gotor when gathering data
- installation issue HOT 1
- TorBot not searching onion sites HOT 1
- Move log level from environment variable to CLI flag HOT 12
- Connection Refuse Error HOT 1
- TorBot Installation Issue HOT 3
- Help Us Enhance Tests! HOT 4
- Move pull request template HOT 1
- Move markdown files to a docs directory HOT 2
- Add HTML feature HOT 20
- Save links to a database HOT 3
- Add keyword/phrase search HOT 1
- Create docker image HOT 4
- [Docs]:- Adding Contributors Section to the readme.md HOT 4
- Move socks5 env variables to CLI flags
- [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1006) HOT 12
- Feature: Adding contributors section to the README.md file. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torbot.