radam9 / bookmarks-converter Goto Github PK

Parse db/html/json bookmarks file from (Chrome - Firefox - Custom source) and convert it to db/html/json format.

License: MIT License

HTML 48.51% Python 51.49%

bookmarks bookmarks-converter bookmarks-parser html-bookmarks netscape-bookmark json-bookmarks

bookmarks-converter's Introduction

Bookmarks Converter

Bookmarks Converter is a package that converts the webpage bookmarks from DataBase/HTML/JSON to DataBase/HTML/JSON. It can be used as a module or using the CLI.

The Database files supported are custom sqlite database files created by the SQLAlchemy ORM model found in the .models.py.
The HTML files supported are Netscape-Bookmark files from either Chrome or Firefox. The output HTML files adhere to the firefox format.
The JSON files supported are the Chrome .json bookmarks file, the Firefox .json bookmarks export file, and the custom json file created by this package.

To see example of the structure or layout of the DataBase, HTML or JSON versions supported by the packege, you can check the corresponding file in the data folder found in the github page data or the bookmarks_file_structure.md.

Table of Contents

Python and OS Support

The package has been tested on Github Actions with the following OSs and Python versions:

OS \ Python	`3.12`	`3.11`	`3.10`	`3.9`
`macos-latest`	✓	✓	✓	✓
`ubuntu-latest`	✓	✓	✓	✓
`windows-latest`	✓	✓	✓	✓

Dependencies

The package relies on the following libraries:

BeautifulSoup4: used to parse the HTML files.
SQLAlchemy: used to create and manager the database files.

Install

Bookmarks Converter is available on PYPI

python -m pip install bookmarks-converter

Test

To test the package you will need to clone the git repository.

# Cloning with HTTPS
git clone https://github.com/radam9/bookmarks-converter.git

# Cloning with SSH
git clone [email protected]:radam9/bookmarks-converter.git

then you create and install the dependencies using Poetry.

# navigate to repo's folder
cd bookmarks-converter
# install the dependencies
poetry install
# run the tests
poetry run pytest

Usage as Module

from bookmarks_converter import BookmarksConverter

# initialize the class passing in the path to the bookmarks file to convert
bookmarks = BookmarksConverter("/path/to/bookmarks_file")

# parse the file passing the format of the source file; "db", "html" or "json"
bookmarks.parse("html")

# convert the bookmarks to the desired format by passing the fomrat as a string; "db", "html", or "json"
bookmarks.convert("json")

# at this point the converted bookmarks are stored in the 'bookmarks' attribute.
# which can be used directly or exported to a file.
bookmarks.save()

Usage as CLI

# Activate the virtual environment if the "bookmarks-converter" package was installed inside one.

# run bookmarks-converter with the desired settings

# bookmarks-converter input_format output_format filepath
bookmarks-converter db json /path/to/file.db

# use -h for to show the help message (shown in the code block below)
bookmarks-converter -h

The help message:

usage: bookmarks-converter [-h] [-V] input_format output_format filepath

Convert your browser bookmarks file from (db, html, json) to (db, html, json).

positional arguments:
  input_format   Format of the input bookmarks file. one of (db, html, json).
  output_format  Format of the output bookmarks file. one of (db, html, json).
  filepath       Path to bookmarks file to convert.

optional arguments:
  -h, --help     show this help message and exit
  -V, --version  show program's version number and exit

License

MIT License

bookmarks-converter's People

Contributors

Stargazers

Watchers

Forkers

lucaswz tomy0000000 fishyer

bookmarks-converter's Issues

Feature Request: back-convert from JSON/db format to Chrome/Firefox format

This is a feature request to implement converting back to Chrome or Firefox-formatted JSON from the currently-implemented JSON and/or sqlite3 format.

The reason for this is simply to keep bookmarks in sync. If someone is using your utility to manage and back up bookmarks, it would be amazing to be able to convert directly to the Chrome and Firefox JSON formats in order to replace their respective Bookmarks files programatically, without having to go through the browser and import an HTML file manually (which could also result in duplicates).

An example workflow would be as follows:

User exports Bookmarks JSON file from Chrome/Firefox to bookmarks.db
User manages/updates bookmarks directly in bookmarks.db (which can be easily implemented in other applications that could require the bookmarks-converter project in requirements.txt)
User converts bookmarks.db back to Chrome/Firefox JSON format using bookmarks-converter
User replaces Chrome/Firefox Bookmarks JSON file with resulting file from bookmarks-converter

Bingo-bango, bookmarks are kept in sync.

This could also be used as a sync mechanism to keep Chrome and Firefox bookmarks in-sync by running a cron script using bookmarks-converter to convert from the Firefox native JSON format to the universal Bookmarkie JSON format, and then from the Bookmarkie format to the Chrome native JSON format (and vice-versa).

I feel like this could be an absolute game-changer and would cause this project to absolutely explode, as it is currently the only project on the Internet that implements its current abilities, and with the addition of being backward-compatible, it would make this project absolutely unstoppable.

I also feel like this could be implemented fairly-easily since you have the knowledge of the various necessary JSON formats. I would try and help, but I feel like you could do this in a fraction of the time it would take me. The basic changes (at least for Chrome), would be to [re-]re-structure the root object (rename to roots), remove the extraneous fields, add a checksum (hashlib.sha256(f"fake_placeholder_hash".encode('utf-8')).hexdigest()), convert the 3 child items to dictionary objects, so that roots.children is a dictionary instead of a list, and renaming the appropriate keys back to the Chrome-specific naming conventions.

By implementing this back-conversion functionality, you could additionally implement direct Chrome-to-Firefox and Firefox-to-Chrome functionalities that essentially do the exact same thing but hide the middle step from the user. You would convert Firefox JSON to universal JSON/sqlite, then universal JSON/sqlite to Chrome JSON (and vice-versa).

I sincerely hope that you take this into serious consideration, as I believe it could be implemented quite easily. Please let me know of any potential caveats that you could see that could prevent this functionality.

Import custom JSON based on file system hierarchy?

I am building a bookmark manager and would like to include your project, but when I generate custom JSON and try to convert it, it messes everything up.

I have a feeling this comes from the IDs, which appear to determine folder hierarchy.

Would there be a way to generate custom IDs to auto-generate the hierarchy? The folder stack appears completely correct, but the final output is unable to be imported without looking messed up.

I have attached a few example files for reference.

bookmarks_troubleshooting.zip

json input file from Android Chromium browser, wrong file structure

My best regards. I've been using Adblock browser (a Chromium browser for Android) for long and I wanted to export its bookmarks. Unfortunately, it has the feature disabled unlike other Chromium browsers available for Android (one ot the reasons behind the switch) so I've manually saved the bookmarks file (bookmarks_unedited.json).
It clearly seems to me a json but bookmarks-converter doesn't accept it as valid. I've then checked the expected file structure here and have found out that it isn't the same so I've tried to edit the original file obtaining bookmarks.json but it still complains.
I've then found this javascript (thanks to one of the advices given here) which has worked like a charm solving my problem but I'd still like to understand what I'm doing wrong here, clearly not an issue with bookmarks-converter itself and I care about underlining it.
Thanks in advance and have a nice day!! :)

Adding support for Safari Bookmarks

I'm interested in helping adding support for safari bookmarks.

I have done some research, and here's what I know so far:

The default location that safari store bookmarks on macOS is ~/Library/Safari/Bookmarks.plist (ref)
The file is packed with Apple's binary property list format, which can be breakdown following this awesome guide
Luckily, we also have plistlib which should be very helpful for parsing the file

Seeing the code, I found that there seems to be a well-organized structure of how this library is orchestrated between different components/classes. I was hoping if there are some manual to guide me on steps to add support for this.

Thank you

Combine PRs security fix

I'm the maintainer of https://github.com/hrvey/combine-prs-workflow and we just made a new release - https://github.com/hrvey/combine-prs-workflow/releases/tag/1.2.0 - to fix a potential injection attack based on a PR with a malicious branch name. I wanted to let you know since a GitHub search showed me you're using a prior version.

Cannot parse Chrome Bookmarks file, and _iterate_folder_html doesn't appear to be recursive

core.py looks for a list in the roots dict if the browser is Chrome. However, Bookmarks folder lookes like this:

{ 
  "checksum": "example",
  "roots": {
    "bookmark_bar": { "name": "Bookmarks bar",  "id": "1", "children": [(stuff)], ...},
    "other": { "name": "Other Bookmarks", "id": "2", "children": [], ...},
    "synced": { "name": "Mobile Bookmarks", "id": "3", "children": [], ...}
  }
}

Trying to parse this file yields an error, because like it said, it's looking for a list instead of a dict.

However, if I try to pass in roots['bookmark_bar']['children'] as the actual JSON to parse, this "works", but the html output is nothing that can be imported in any logical order because of a possible oversight (?) in core.py in the _iterate_folder_html method. A short snippet of the final html output I get is as follows (note that my "categories" folder in my "Bookmarks bar" is quite extensive with lots of nested folders):

<TITLE>Bookmarks</TITLE>
--
  | <H1>Bookmarks Menu</H1>
  |  
  | <DL><p>
  | <DT><H3 ADD_DATE="1655413626" LAST_MODIFIED="0" PERSONAL_TOOLBAR_FOLDER="true">Bookmarks bar</H3>
  | <DL><p>
  | <DT><H3 ADD_DATE="1655413626" LAST_MODIFIED="0">categories</H3>
  | <DL><p>
  | <folder4><folder5><folder6><folder7><folder10><folder14><folder15><folder16><folder17><folder18><folder19><folder20><folder22><folder23><folder76><folder94><folder95><folder122><folder123><folder124><folder125><folder126><folder128><folder132><folder133><folder135><folder136><folder137><folder138><folder139><folder140><DT><H3 ADD_DATE="1655413626" LAST_MODIFIED="0">mon</H3>
  | <DL><p>
  | <DT><H3 ADD_DATE="1655413626" LAST_MODIFIED="0">alertsite</H3>
  | <DL><p>
  | <DT><A HREF="http://www.alertsite.com/cgi-bin/helpme.cgi?page=monitoring_locations.html" ADD_DATE="1655413626" LAST_MODIFIED="0" ICON_URI="None" ICON="None">AlertSite Monitoring Locations-IPs</A>
  | </DL><p>

Based on the code, it is my understanding that these placeholders are supposed to get replaced with the actual information that resides in the bookmark stack (which is absolutely correct when viewed with a debugger), but they don't seem to be getting replaced or are overlooked somehow from non-recursion or something.

That being said, initially parsing a bookmarks.html file into the Bookmarkie formatted JSON, and subsequently parsing back to HTML works just fine.

The trouble seems to happen when parsing the native Chrome Bookmarks JSON file, where things seem to get all out of order.