Git Product home page Git Product logo

orobouros's Introduction

The Orobouros Framework

Orobouros is a C# framework for scraping the web. Many attempts to do this have been created in various languages, but a different approach is taken with Orobouros due to the patented OrobourosModule™ system that allows any person to write their own plugin for any website.

Installation

Orobouros is available as a NuGet package and from the Github Actions page. Keep in mind the pre-compiled builds on GitHub do not include dependencies. If you prefer the .NET CLI, you can also simply run:

dotnet add package Orobouros

On its own, Orobouros does nothing and needs modules to function. A list of publically available modules for download is listed on the GitHub repository.

Building

If you insist on compiling this yourself, all you need is .NET 8 Core. I would not recommend taking advantage of the tests, as they require specific configurations I use in debugging.

Development

Take a look at the TestModule project included in this repo to get a general idea on how to use this framework. XML annotations are also provided. At some point I will create a wiki with relevant information, but for now the core functionality takes priority. Obfuscated code is allowed (as in the framework won't refuse to execute it) but incredibly discouraged due to malware concerns. If you really feel the need to keep your source code hidden, just don't share your module.

Example code to submit a scrape request to the loaded module stack:

ScrapingManager.InitializeModules(); // Only call this once at the entry point of your application
List<ModuleContent> requestedInfo = new List<ModuleContent> { ModuleContent.Text }; // Content you want to request from the modules. How this is handled is entirely dependent on the module's developer.
ModuleData? data = ScrapingManager.ScrapeURL("https://www.test.com/posts/posthere", requestedInfo); // Perform scrape request and wait for the returned data.
ScrapingManager.FlushSupplementaryMethods(); // Stop background methods. This should be called at least once when the application is exiting.

Example code to return a simple line of text from a module's scrape method:

ModuleData data = new ModuleData();
ProcessedScrapeData exampleInstance = new ProcessedScrapeData(ModuleContent.Text, parameters.URL, "Hello World!");
data.Content.Add(exampleInstance);
return data;

Please consult the XML documentation or the TestModule project for further code examples.

Copyright

This repository holds no responsibility over any modules programmers develop for this framework. No copywritten content is included in this repo and will never be. If someone has made a module for your website and you don't like it, I cannot help you. You must get in contact with them to resolve such matters. This also applies to potentially illicit/illegal content scraped with modules created by the community.

TODO:

  • Dynamic module loading
  • Raw HTTP support
  • Downloader service
  • Attribute scanning
  • Custom attributes
  • Module init method
  • Module supplementary methods
  • Module scrape method
  • Module options
  • Module return data
  • Module GUIDs
  • Custom library support
  • Referenced library support Deprecated
  • SQLite support
  • Dynamic database support
  • Website API support (separate from raw HTTP)
  • Cross-module support
  • XML annotations
  • Module security checks
  • Module sanity checks
  • Multiple modules for same website support
  • Improved module error handling
  • SQlite module integration
  • Public module downloader tool
  • Switch to protobuf for caching
  • General framework configuration class
  • Cross-language support (extremely advanced)
  • Data language translation toolkit
  • Overhaul download class & integrate better
  • Logging overhaul
  • Module developer web toolkit
  • Framework-level exception handling
  • Bulk data downloading functions (stored in RAM)

Credits

  • Branden Stober - Main Project Lead
  • ImSoupp - Reflection Help & Database Help
  • CTAG - Database Help

orobouros's People

Contributors

brandenstoberreal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.