Git Product home page Git Product logo

brandon689 / htmlconverter Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 16 KB

HtmlToJsonParser: A versatile C# library for converting HTML to JSON with multiple parsing modes and customizable options.

Home Page: https://htmltojsonconverter.azurewebsites.net/

License: MIT License

C# 100.00%
anglesharp csharp dotnet-core html html-parser html-to-json json json-converter web-development web-tool converter parsing

htmlconverter's Introduction

HtmlToJsonParser ๐Ÿ”„

HtmlToJsonParser is a versatile C# library that converts HTML to JSON using various parsing modes and customizable options. It leverages AngleSharp for HTML parsing and provides flexible output formatting.

โœจ Features

  • Multiple parsing modes:
    • Generic: Converts all HTML nodes to JSON
    • Table: Converts HTML tables to structured JSON
    • JSON-LD: Extracts JSON-LD data from HTML
  • Customizable options:
    • New line conversion in values
    • Attribute prefix customization
    • Text property name customization
    • Output indentation control
    • JSON unescaping
    • Inside word trimming
    • Multiple table conversion

๐Ÿš€ Installation

To use HtmlToJsonParser in your project, you need to install the following NuGet packages:

  • Install-Package AngleSharp
  • Install-Package Newtonsoft.Json

๐Ÿ” Parsing Modes

Generic Mode

Converts all HTML nodes to JSON objects and properties.

Table Mode

Converts HTML tables into a structured JSON format. Each row becomes a JSON object with column headers as keys.

JSON-LD Mode

Extracts and parses JSON-LD data from HTML documents.

โš™๏ธ Options

  • ValueNewLineConversion: Specifies how to handle new lines in text values.
  • AttributePrefix: Sets the prefix for HTML attributes in the JSON output.
  • TextPropertyName: Defines the property name for text nodes.
  • Indent: Controls whether the output JSON is indented.
  • UnescapeJson: Attempts to unescape the input if it appears to be HTML wrapped in a JSON string.
  • TrimInsideWords: Trims multiple consecutive spaces inside words to a single space.
  • ConvertAllTables: Controls whether all tables or just the first one should be converted in Table mode.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE.md file for details.

htmlconverter's People

Contributors

brandon689 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.