Git Product home page Git Product logo

apple-fritter / jetsam Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 38 KB

Jetsam is a tool designed to sanitize IRC logs stored in the Driftwood format. It identifies and flags lines for further review. Written in Rust.

Home Page: http://linktr.ee/apple_fritter

License: MIT License

Rust 100.00%
chatting internet-relay-chat irc log-parsing machine-learning machinelearning mit-license moderation nlp nlp-machine-learning

jetsam's Introduction

Jetsam

Jetsam is a tool designed to sanitize IRC logs stored in the Driftwood format. It helps identify and flag lines in the log files that contain potentially sensitive or inappropriate content for further review or moderation. Jetsam is meant to pair closely with flotsam, which aggregates a per-user metric of flagged contributions.

Features

  • Parses log files stored in the Driftwood format, separating columns using a unique Unicode character as a field separator.
  • Sanitizes log lines by adding a "#" symbol in the first column to flag them for further review or moderation.
  • Flexible wordlist usage: Allows the use of single wordlist files or a directory of wordlist files for customizable content sanitization.
  • Supports recursive parsing of wordlist files to handle complex and nested directory structures.
  • Supports log files with the .txt extension, adhering to the Driftwood format specification.

Usage

To use Jetsam, run the following command:

jetsam <log directory path> <wordlist path>
  • <log directory path>: Path to the directory containing the log files in the Driftwood format.
  • <wordlist path>: Path to the wordlist file or directory for content sanitization.

Output

Jetsam modifies the log files by adding a "#" symbol in the first column of sanitized lines. This modification flags the lines for further review or moderation.

Example Driftwood Entry:

☕12☕34☕56☕GitHubFAN23☕Hello, world!☕

This line will be modified to:

#☕12☕34☕56☕GitHubFAN23☕Hello, world!☕

Jetsam logfile entry

Timestamp: 20230613-143200
Path: /logs/freenode/programming/2003/01/01.txt
Line Number: 2
Original Line: 123456 GitHubFAN23 Hello, world!

Considerations

  • Input Validation: Ensure that the provided log directory and wordlist paths are valid and exist. Jetsam does not perform extensive input validation, so it's essential to validate the input to avoid errors.
  • Backups: Before running Jetsam on your log files, make sure to create backups of your original files. This precaution helps prevent accidental data loss or unintended modifications.
  • Data Security: Treat the log files containing potentially sensitive information with care. Take appropriate measures to protect the data, such as restricting access permissions and following security best practices.

Flowchart

┌─ Start Program
│
├─ Load Log Directory
│   ├─ Load Wordlist
│   │   ├─ Read Wordlist File
│   │   └─ Recursively Read Wordlist Directory
│   │
│   ├─ Read Log Files
│   │   ├─ Read Log File
│   │   │   └─ Process Log Lines
│   │   │       ├─ Sanitize Line Content
│   │   │       ├─ Check for Wordlist Match
│   │   │       └─ Modify Line Number and Content
│   │   │
│   │   └─ Recursively Read Log Directory
│   │
│   └─ Log Changes to Jetsam Log
│       └─ Create Jetsam Log File
│           └─ Iterate Modified Lines
│               ├─ Get Timestamp
│               ├─ Get Log File Path
│               ├─ Get Line Number
│               ├─ Get Original Line Content
│               └─ Write to Jetsam Log File
│
└─ End Program

🤪 IRC Meta

@apple-fritter's IRC Repositories:


Driftwood Suite of IRC Analytics

Driftwood utilities
  • driftwood: A unified IRC log format definition. (Rust)
  • flotsam: Aggregate a per-user metric of flagged contributions to any given user. (Rust)
  • jetsam: Flag lines of driftwood formatted IRC logs for sanitization, moderation, or further review. (Rust)
  • scrimshaw: Create a quoteslist of any given user, from your driftwood formatted logs. (Rust)
Driftwood native logging plugins
  • weechat.driftwood: Natively log WeeChat messages in the driftwood standard. (Python)

heX-Chat


IRCcloud


WeeChat


IRC usage considerations

When working with any project involving IRC (Internet Relay Chat), it's important to keep the following considerations in mind to ensure a positive and respectful environment for all participants.

Philosophy of Use

Tailor your project's behavior and responses to align with the expected norms and conventions of IRC. Take into account the preferences and expectations of IRC users, ensuring that your project provides a seamless and familiar experience within the IRC ecosystem.

Foster a Positive and Inclusive Environment

Respect and adhere to the guidelines and policies of the IRC platform you are using. Familiarize yourself with the platform's rules regarding script usage, automation, and acceptable behavior. Comply with the platform's Terms of Service, and be mindful of any limitations or restrictions imposed by the platform. Strive to create an inclusive and welcoming environment where all users can engage respectfully and comfortably.

Respect the Rights and Dignity of Other Users

Maintain a polite and courteous demeanor in all interactions. Uphold the fundamental principles of respect, avoiding engagement in illegal, inappropriate, or offensive behavior. This includes refraining from using derogatory or inflammatory language, sharing explicit, triggering, or offensive content, engaging in harassment, or launching personal attacks. Obtain explicit consent before interacting with other users or sending automated responses. Respect the privacy of other users and avoid invading their personal space without their permission.

Respect the IRC Community and Channels

Avoid disrupting the normal flow of conversation within IRC channels. Ensure that your project's actions and responses do not cause unnecessary disruptions or inconvenience to other users. Implement mechanisms to prevent spamming or flooding the channel with excessive or irrelevant messages. Handle errors gracefully, preventing unintended behavior or disruptions to the IRC platform or the experiences of other users.

Ensure Compatibility

Consider the potential variations in behavior across different IRC platforms and clients. While aiming for compatibility, be aware that certain functionalities may not be available or consistent across all platforms. Test your project on multiple IRC platforms and clients to ensure compatibility and provide the best possible experience for users.


Contributing

Contributions are welcome! If you'd like to contribute, please follow these steps:

  1. Fork the repository.
  2. Create a new branch for your feature or bug fix.
  3. Make your changes and commit them.
  4. Push your changes to your forked repository.
  5. Submit a pull request to the main repository.

This software is provided "as is" and without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

The authors do not endorse or support any harmful or malicious activities that may be carried out with the software. It is the user's responsibility to ensure that their use of the software complies with all applicable laws and regulations.


License

This project is licensed under the MIT License.

jetsam's People

Contributors

apple-fritter avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.