Git Product home page Git Product logo

folder2sitemap's Introduction

folder2sitemap

Background

I worked this as a tool that would generate a sitemap from a webtozip.com download.

It should probably work with other website export tools like HTTrack, SiteSucker, or Archivarix, but I haven't tested it with those tools.

The download is a zip file that contains the entire website. The website is structured in a way that the root directory contains the index.html file and the rest of the website is structured in directories.

Each directory contains an index.html file that represents the page. The script scans the directories recursively to build a nested structure of the website.

The script extracts the title of each page from the <title> tag in the HTML files and uses the directory structure to create a hierarchical JSON object representing the site's structure.

Overview

folder2sitemap is a Node.js script designed to generate a JSON representation of a website's structure based on its directory and file organization.

It scans a specified directory for HTML files, particularly looking for index.html files in each directory to determine the structure of the website.

Note that index.html files ARE REQUIRED for the script to work properly.

The script extracts the title of each page from the <title> tag in the HTML files and uses the directory structure to create a hierarchical JSON object representing the site's structure.

Features

  • Title Extraction: Extracts titles directly from the HTML files to accurately represent each page.
  • Recursive Directory Traversal: Scans directories recursively to build a nested structure of the website.
  • JSON Output: Outputs the website structure in a readable JSON format.
  • CSV Output: Outputs the website structure in CSV format.
  • Exclusion of Directories: Allows you to exclude specific directories from the sitemap generation.
  • Custom Output File: Option to save the output directly to a file.
  • No Dependencies: Requires only Node.js to run.

Installation

  1. Ensure you have Node.js installed on your system.
  2. Clone this repository or download the script to your local machine.

Usage

To use folder2sitemap, run the script from the command line, passing the path to the root directory of your website as an argument:

node folder2sitemap ./example.com

The script will output the structure of your website in JSON format to the console. You can redirect this output to a file if needed:

node folder2sitemap ./example.com > site_structure.json

Saving Output to a File Directly

To save the output directly to a file, use the --output flag followed by the file name:

node folder2sitemap ./example.com --output site_structure.json

Selecting Output Format

By default, the script outputs the website structure in JSON format. If you prefer to output the structure in CSV format, use the --format=csv flag:

node folder2sitemap ./example.com --format=csv

Would output the website structure in CSV format to the console. You can redirect this output to a file as well:

slug,title
"/","Home"
"/about/","About"
"/blog/","Blog"
"/blog/post1/","Post 1"
"/blog/post2/","Post 2"

Excluding Directories

To exclude specific directories from the sitemap generation, use the --exclude flag followed by the directory name relative to the site root. You can specify multiple directories to exclude by using multiple --exclude flags. For example:

node folder2sitemap ./example.com --exclude=contentassets --exclude=zh-cn

This command will generate the sitemap without including the directories /contentassets and /zh-cn.

Example Output

Given a website with a simple structure, the output might look like this:

{
  "slug": "/",
  "title": "Home",
  "children": [
    {
      "slug": "/about/",
      "title": "About"
    },
    {
      "slug": "/blog/",
      "title": "Blog",
      "children": [
        {
          "slug": "/blog/post1/",
          "title": "Post 1"
        },
        {
          "slug": "/blog/post2/",
          "title": "Post 2"
        }
      ]
    }
  ]
}

Visualizing the Sitemap

You can use online tools like JSON Crack to visualize the JSON output in a more structured format. Simply paste the JSON output into the tool to see a visual representation of the website structure.

alt text

folder2sitemap's People

Contributors

eduwass avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.