Git Product home page Git Product logo

Comments (10)

KingAkeem avatar KingAkeem commented on June 14, 2024

What's the link so that I can try to reproduce it? Also can you provide more information such as

  • Operating System
  • Which version of TorBot that you're using?
  • How you're executing the application?
  • TOR configuration

from torbot.

0xEnders avatar 0xEnders commented on June 14, 2024

Thanks for the quick reply!

I am trying the links :

http://alphvmmm27o3abo3r2mlmjrpdmzle3rykajqc5xsj7j7ejksbpsa36ad.onion/
http://noescapemsqxvizdxyl7f7rmg5cdjwp33pg2wpmiaaibilb4btwzttad.onion/

Operating System : Ubuntu 22
Which version of TorBot that you're using? : current dev version. i git cloned it

How you're executing the application?
python3 torbot -u http://website.onion --depth 2

TOR configuration : default config
sudo apt install tor
sudo service tor start

Also, is there a way to crawl based on a text file of email addresses?

from torbot.

KingAkeem avatar KingAkeem commented on June 14, 2024

You're welcome and thanks for providing the information, I'll look into it later today or sometime this week. There is no feature to crawl email addresses, the current program operates on HTML retrieved from sites so I don't know how that would be possible with email addresses but if you have suggestions for a new feature then feel free to submit a ticket and it'll be looked into. If you already know how the feature should be implemented then you can take a crack at it and submit a pull request to the repo.

from torbot.

0xEnders avatar 0xEnders commented on June 14, 2024

correction, text file of websites* not email addresses. And thanks for looking into it. ill go and mess around with the settings and see what happens. 2 other things :

  1. Is it recommended to amend the torcc config file? Because i didnt touch that and all
  2. Can I get a link to the slack channel? The link on the main page has expired.

Thanks once again!

from torbot.

KingAkeem avatar KingAkeem commented on June 14, 2024
  1. It's your choice. I've created CLI flags to dynamically define the SOCKS5 proxy when instantiating the HTTPS client.
  2. The link should still work, but the Slack channel is not highly used. If you have suggestions, thoughts, or problems. You'll likely get the quickest response from posting here.

from torbot.

0xEnders avatar 0xEnders commented on June 14, 2024

There's no way for us to crawl multiple websites at once right?

from torbot.

KingAkeem avatar KingAkeem commented on June 14, 2024

Not currently, it'd probably be a fairly straightforward feature to implement but no one has requested it. If you want to know what's possible or not, check the README. If you have ideas or suggestions, create a new ticket.

from torbot.

KingAkeem avatar KingAkeem commented on June 14, 2024

Or build it out yourself and submit it if you're capable.

from torbot.

KingAkeem avatar KingAkeem commented on June 14, 2024

I checked the URLs and the reason why it's only returning the host domain is that all of the links are paths within the same domain. The scraper looks for unique host domains that are fully qualified URIs. All of the links are paths to the same domain, not different sites.

from torbot.

KingAkeem avatar KingAkeem commented on June 14, 2024

I'll look into modifying the feature to identify paths.

from torbot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.