laxamentumtech / audnexus Goto Github PK

An audiobook data aggregation API that harmonizes data from multiple sources into a unified stream. It offers a consistent and user-friendly source of audiobook data for various applications.

Home Page: https://audnex.us/

License: GNU General Public License v3.0

Dockerfile 0.21% TypeScript 99.28% JavaScript 0.51%

audiobooks api typescript fastify redis mongodb metadata papr docker audnexus

audnexus's Introduction

audnexus

An audiobook data aggregation API, combining multiple sources of data into one, consistent source.

📝 Table of Contents

About
Getting Started
Deployment
Usage
Built Using
TODO
Contributing
Authors
Acknowledgments

🧐 About

Nexus - noun: a connection or series of connections linking two or more things.

Looking around for audiobook metadata, we realized there's no solid (or open) single source of truth. Further, some solutions had community curated data, only to close their API. As such, this project has been created to enable development to include multiple sources of audiobook content in one response.

This project also makes integration into existing media servers very streamlined. Since all data can be returned with 1-2 API calls, there's little to no overhead processing on the client side. This enables rapid development of stable client plugins. Audnexus serves as a provider during the interim of waiting for a community driven audiobook database, at which time audnexus will be a seeder for such a database.

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

There are 2 ways to deploy this project - for the purposes of this project, this guide will only cover Docker deployment:
- Docker Swarm
- Directly, via pnpm run or pm2
  - Mongo 4 or greater
  - Node/NPM 16 or greater
  - Redis
- Registered Audible device keys, ADP_TOKEN and PRIVATE_KEY, for chapters. You will need Python and audible for this. More on that here

Installing locally

Install Mongo, Node and Redis on your system
pnpm install from project directory to get dependencies
Set ADP_TOKEN and PRIVATE_KEY environment variables as mentioned above if you are using the chapters endpoint.
pnpm run watch-debug to start the server

Test an API call with

http://localhost:3000/books/${ASIN}

🔧 Running the tests

Tests for this project use the Jest framework. Tests can be done locally in a dev environment:

pnpm test

After the tests have run, you may also browse the test coverage. This is generated in coverage/lcov-report/index.html under the project directory.

🎈 Usage

API usage documentation can be read here: https://audnex.us/

Pre-rendered HTML documentation is also included in docs/index.html.

HTML can be re-generated from the spec, using:

pnpm run build-docs

🚀 Deployment

Once you have Docker Swarm setup, grab the docker-compose.yml from this repo, and use it to start the stack. Using something like Portainer for a Swarm GUI will make this much easier.

The stack defaults to 15 replicas for the node-server container. Customize this as needed.

Environment variables to add:

NODE_ADP_TOKEN: Aforementioned ADP_TOKEN value
NODE_MAX_REQUESTS: Maximum amount of requests per 1 minute period from a single source (default 100)
NODE_MONGODB_URI: MongoDB connection URL, such as mongodb://mongo/audnexus
NODE_PRIVATE_KEY: Aforementioned PRIVATE_KEY value
NODE_REDIS_URL: Redis connection URL, such as redis://redis:6379
NODE_UPDATE_INTERVAL: Frequency (in days) to run scheduled update tasks (default 30). Update task is also run at startup.
NODE_UPDATE_THRESHOLD: Minimum number of days after an item is updated, to allow it to check for updates again (either scheduled or param).
TRAEFIK_DOMAIN: FQDN for the API server
TRAEFIK_EMAIL: Email to register SSL cert with

Once the stack is up, test an API call with

https://${TRAEFIK_DOMAIN}/books/${ASIN}

Set up DB indexes to keep item lookups fast and to support searches.

Connect to the DB either from inside the mongodb container terminal or a MongoDB Compass/MongoSH session.
Switch to the correct DB:
```
use audnexus
```

Create the recommended indexes:

db.authors.createIndex( { asin: 1, region: 1 } )

db.books.createIndex( { asin: 1, region: 1 } )

db.chapters.createIndex( { asin: 1, region: 1 } )

db.authors.createIndex( { name: "text" } )

⛏️ Built Using

Fastify - Server Framework
MongoDB - Database
NodeJs - Server Environment
Papr - Databse connection
Redis - Cached responses

✍️ Authors

@djdembeck - Idea & Initial work

🎉 Acknowledgements

Huge thanks to mkb79 and their audible project for a great starting point.
macr0dev for introducing us to scraping.
seanap for passionately standardizing audiobook organization.
Bookcamp for giving us a reason to have awesome audiobook data.

audnexus's People

Contributors

Stargazers

Watchers

Forkers

jeffjose oasis256 ichlmoku csandman l2g2h justcallmelarry

audnexus's Issues

Write documentation

README
OpenAPI docs

make "https://api.audible.*" configurable

As I learned today, every audible domain has its own catalog, so some titles are not available in other languages (I think trading rights are in the way..)
So it would be nice to allow asking other domains, maybe at first via environment variables, but in the long run like https://api.audnex.us/de/books (and with https://api.audnex.us/books referring to https://apt.audnex.us/com/books for compatibility)

update `ChapterHelper` `requiredKeys` implementation to match `ApiHelper`

Promise may not be fulfilled with 'undefined' when statusCode is not 204

On asin B002V1NOW2

Attempt call again on error 500 of api

Errors such as:

{"level":50,"time":1633376518566,"pid":1,"hostname":"5e18cf643200","reqId":"req-226t","req":{"method":"GET","url":"/books/B008LUTE9E","hostname":"api.audnex.us","remoteAddress":"192.168.16.2","remotePort":44364},"res":{"statusCode":500},"err":{"type":"FetchError","message":"request to https://www.audible.com/pd/B008LUTE9E/ failed, reason: getaddrinfo EAI_AGAIN www.audible.com","stack":"FetchError: request to https://www.audible.com/pd/B008LUTE9E/ failed, reason: getaddrinfo EAI_AGAIN www.audible.com\n    at ClientRequest.<anonymous> (/app/node_modules/node-fetch/lib/index.js:1461:11)\n    at ClientRequest.emit (node:events:394:28)\n    at TLSSocket.socketErrorListener (node:_http_client:447:9)\n    at TLSSocket.emit (node:events:394:28)\n    at emitErrorNT (node:internal/streams/destroy:157:8)\n    at emitErrorCloseNT (node:internal/streams/destroy:122:3)\n    at processTicksAndRejections (node:internal/process/task_queues:83:21)","errno":"EAI_AGAIN","code":"EAI_AGAIN","name":"FetchError"},"msg":"request to https://www.audible.com/pd/B008LUTE9E/ failed, reason: getaddrinfo EAI_AGAIN www.audible.com"}

Getting details for this book returns a 500 error

https://api.audnex.us/books/B00WJ7RCB2

It looks like its expecting a field from Audible which it doesn't get, and hence its erroring.

Using region "au", book is not returning

I am using the Audnexus API (through the Audiobookshelf server application) and it is not returning specific books. I can get the book to return correctly through the Audible API, but using the ASIN provided by the Audible API with the Audnexus API, is returning a 500.

Here are the two calls:

Audible API
https://api.audible.com.au/1.0/catalog/products?title=nolyn

Audnexus API
https://api.audnex.us/books/B089XC53RW?region=au

I initially opened an issue on the Audiobookshelf github page but the problem was narrowed down to Audnexus.

advplyr/audiobookshelf#1191

Reduce CPU usage

Either through less query usage or better processing. CPU on 2vcpu is stuck at 100%

Any reason not to use the Audible API for your "extended" genres?

In another issue I made I remember you mentioning that you wanted this to be an API first app with heavy safeguards for anything scraped from HTML. However, when looking through your code I noticed that you're using cheerio to scrape the extended genres from each book. I was curious if you were aware of the category_ladders response group when using the Audible API, and if you are, is there any reason you're not using that instead?

For example, the response for this endpoint:

https://api.audible.com/1.0/catalog/products/B002UZKI96?response_groups=contributors,media,product_attrs,product_desc,product_extended_attrs,product_plan_details,product_plans,rating,review_attrs,reviews,sample,sku,category_ladders

or more simply:

https://api.audible.com/1.0/catalog/products/B002UZKI96?response_groups=category_ladders

gives you the following object in your response:

{
  "product": {
    "asin": "B002UZKI96",
    "category_ladders": [
      {
        "ladder": [
          {
            "id": "18572091011",
            "name": "Children's Audiobooks"
          },
          {
            "id": "18572092011",
            "name": "Action & Adventure"
          }
        ],
        "root": "Genres"
      },
      {
        "ladder": [
          {
            "id": "18572091011",
            "name": "Children's Audiobooks"
          }
        ],
        "root": "Genres"
      },
      {
        "ladder": [
          {
            "id": "18572091011",
            "name": "Children's Audiobooks"
          },
          {
            "id": "18572491011",
            "name": "Literature & Fiction"
          },
          {
            "id": "18572505011",
            "name": "Family Life"
          }
        ],
        "root": "Genres"
      },
      {
        "ladder": [
          {
            "id": "18573267011",
            "name": "Education & Learning"
          }
        ],
        "root": "Genres"
      },
      {
        "ladder": [
          {
            "id": "18574784011",
            "name": "Relationships, Parenting & Personal Development"
          },
          {
            "id": "18574814011",
            "name": "Relationships"
          }
        ],
        "root": "Genres"
      }
    ]
  },
  "response_groups": [
    "always-returned",
    "category_ladders"
  ]
}

This response_group is available for both the search and the individual product details endpoints and as far as I can tell it returns all of the genres and tags you're including in your extended genre field. All you have to do is filter the genres to unique ASINs and you're good to go.

Just figured I'd let you know in case you weren't aware this field was available or ask about your reasoning for not using them if you were, as it could influence whether or not I use them in my own app.

API status

Before filing any issues, please first check the uptime/status of any endpoint you may be having trouble with:

https://status.audnex.us/

Incidents and downtimes can be reported in this issue.

Question: Why do you need a token for chapters?

I've been trying to set up a docker build of this project for the first time, and I'm curious about something. Why do you require ADP_TOKEN for hitting the /chapters endpoint? As far as I'm aware, the /content/asin/metadata endpoint is always public. I've been using it to get chapters in my own project for a while and have never seen a reason to include an auth token.

Here's an example endpoint you can access in your browser fine without authentication: https://api.audible.com/1.0/content/B07FMMSF5H/metadata?response_groups=always-returned%2Ccontent_url%2Cchapter_info%2Ccontent_reference&drm_type=Adrm&quality=High

Just curious!

Find chapters by asin returning internal server error 500

https://api.audnex.us/books/B002V8DFQ0/chapters

According to docs, it should be returning a 404 if there are no chapters found

Cc: Neurrone/beets-audible#6

TypeError : Cannot read property 'length' of undefined

ASIN: B00WNBF0RM

[2021-09-07T02:54:40.265Z] INFO  aorus-283611/API: => GET /book?asin=B00WNBF0RM
[2021-09-07T02:54:40.267Z] INFO  aorus-283611/API:    Call 'book.getBook' action
[2021-09-07T02:54:41.544Z] INFO  aorus-283611/TRACER: ┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
[2021-09-07T02:54:41.544Z] INFO  aorus-283611/TRACER: │ ID: dceea731-24ad-4111-bfbd-9a9f9a1094c3                                       Depth: 2 Total: 2 │
[2021-09-07T02:54:41.545Z] INFO  aorus-283611/TRACER: ├──────────────────────────────────────────────────────────────────────────────────────────────────┤
[2021-09-07T02:54:41.545Z] INFO  aorus-283611/TRACER: │ GET /api/book?asin=B00WNBF0RM ×                    1s [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] │
[2021-09-07T02:54:41.545Z] INFO  aorus-283611/TRACER: │ └─── action 'book.getBook' ×                       1s [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■.] │
[2021-09-07T02:54:41.545Z] INFO  aorus-283611/TRACER: └──────────────────────────────────────────────────────────────────────────────────────────────────┘
[2021-09-07T02:54:41.545Z] ERROR aorus-283611/API:    Request error! TypeError : Cannot read property 'length' of undefined 
 TypeError: Cannot read property 'length' of undefined
    at stitchHelper.setSeriesOrder (/home/djdembeck/projects/audnexus/helpers/audibleStitch.js:17:24)
    at stitchHelper.process (/home/djdembeck/projects/audnexus/helpers/audibleStitch.js:34:14)
    at /home/djdembeck/projects/audnexus/services/book.service.js:52:35
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Service.handler (/home/djdembeck/projects/audnexus/services/book.service.js:50:24)
    at async Service.callAction (/home/djdembeck/projects/audnexus/node_modules/moleculer-web/src/index.js:621:16)
    at async /home/djdembeck/projects/audnexus/node_modules/moleculer-web/src/index.js:446:22 
Data: undefined

tests: write new tests that expect errors

Tracking them here before writing the tests:

2022-06-23T20:37:02.097167980Z {"level":50,"time":1656016622094,"pid":1,"hostname":"7cbefc5a2912","reqId":"req-16t","req":{"method":"GET","url":"/authors/B001HOUWOW","hostname":"localhost:3000","remoteAddress":"127.0.0.1","remotePort":57378},"res":{"statusCode":500},"err":{"type":"Error","message":"Author name not available","stack":"Error: Author name not available\n at ScrapeHelper.parseResponse (/app/dist/helpers/authors/audible/ScrapeHelper.js:133:19)\n at ScrapeHelper.process (/app/dist/helpers/authors/audible/ScrapeHelper.js:143:21)\n at runMicrotasks (<anonymous>)\n at processTicksAndRejections (node:internal/process/task_queues:96:5)\n at async Object.<anonymous> (/app/dist/config/routes/authors/show.js:48:28)"},"msg":"Author name not available"}

2022-06-23T21:28:05.162185566Z An error has occured while scraping HTML 500: https://www.audible.com/pd/B00QHBQZV4/

2022-06-24T11:33:45.187577413Z {"level":50,"time":1656070425186,"pid":1,"hostname":"15b8c377c8fa","reqId":"req-6ca","req":{"method":"GET","url":"/books/B06Y12KS5X/chapters","hostname":"dev.audnex.us","remoteAddress":"10.0.1.9","remotePort":35092},"res":{"statusCode":500},"err":{"type":"FetchError","message":"request to https://api.audible.com/1.0/content/B06Y12KS5X/metadata?response_groups=chapter_info failed, reason: read ECONNRESET","stack":"FetchError: request to https://api.audible.com/1.0/content/B06Y12KS5X/metadata?response_groups=chapter_info failed, reason: read ECONNRESET\n at ClientRequest.<anonymous> (/app/node_modules/.pnpm/[email protected]/node_modules/node-fetch/lib/index.js:1491:11)\n at ClientRequest.emit (node:events:520:28)\n at TLSSocket.socketErrorListener (node:_http_client:442:9)\n at TLSSocket.emit (node:events:520:28)\n at emitErrorNT (node:internal/streams/destroy:157:8)\n at emitErrorCloseNT (node:internal/streams/destroy:122:3)\n at processTicksAndRejections (node:internal/process/task_queues:83:21)","errno":"ECONNRESET","code":"ECONNRESET","name":"FetchError"},"msg":"request to https://api.audible.com/1.0/content/B06Y12KS5X/metadata?response_groups=chapter_info failed, reason: read ECONNRESET"}

Plex Plug-In doesn't pull any data

When using the audnexus plug in for Plex, nothing is found. No Match Found constantly shows for author and album and nothing will match.

Unexpected token < in JSON at position 0

B07QG4DZFS, B002V5GRXG

"msg":"invalid json response body at https://api.audible.com/1.0/catalog/products/B07QG4DZFS/?response_groups=contributors,product_desc,product_extended_attrs,product_attrs,media reason: Unexpected token < in JSON at position 0"}

Fix all `any` types

Main occurences are request, reply, fastify, and options

Got an "Internal Server Error" when calling the find a book API

For example this call:
https://api.audnex.us/books/B086VNH61Z
The response is:
{"statusCode":500,"error":"Internal Server Error","message":"Required key 'authors' does not exist in Audible API response for ASIN B086VNH61Z"}

Make sure tests are properly isolated and mocked

Some tests still call outside helpers

DRY for ShowHelper's

Currently, AuthorShowHelper and BookShowHelper share over 90% of the same logic. This could probably be made generic to work with both and remove duplicate code and simplify testing.

ChapterShowHelper has some different logic, so that would need some extra additions to the generics, but otherwise could mesh.

Normalize people names

This is largely an issue when a source like Audible lists a narrator with different punctuation ["R.C. Bray", "R.C Bray", "RC Bray", "R C Bray"]
Maybe do this by seeking an external source if the name matches?

FastifyDeprecation: Variadic listen method is deprecated

[FSTDEP011] FastifyDeprecation: Variadic listen method is deprecated. Please use ".listen(optionsObject)" instead. The variadic signature will be removed in fastify@5.

If `image` doesn't exist on an entry in the DB, it cannot be updated with new logic

{asin: '1774241250'}

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Warning

These dependencies are deprecated:

Datasource	Name	Replacement PR?
npm	`standard-version`

Pending Status Checks

These updates await pending status checks. To force their creation now, click the checkbox below.

build(deps): Upgrade @types/node to v20.16.1

Other Branches

These updates are pending. To force PRs open, click the checkbox below.

build(deps): Replace standard-version with commit-and-tag-version 9.5.0

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

build(deps): Upgrade axios to v1.7.4
build(deps): Upgrade cheerio to v1.0.0
build(deps): Upgrade typescript-eslint monorepo to v8.1.0 (@typescript-eslint/eslint-plugin, @typescript-eslint/parser)
build(deps): Lock file maintenance
Click on this checkbox to rebase all open PRs at once

Detected dependencies

docker-compose

docker-compose.yml

ghcr.io/laxamentumtech/audnexus latest@sha256:cdb7159b7ec0576e2a00a944709859e8fdf51bdb414aedf48093745abcf162ff

mongo 7@sha256:ae1cf99fa7bfb007db8416ad4f3980c46054d949fa55d28e6d301a813fee6c06

redis alpine@sha256:eaea8264f74a95ea9a0767c794da50788cbd9cf5223951674d491fa1b3f4f2d2

traefik v3.1@sha256:ec1a82940b8e00eaeef33fb4113aa1d1573b2ebb6440e10c023743fe96f08475

dockerfile

Dockerfile

node lts-alpine@sha256:eb8101caae9ac02229bd64c024919fe3d4504ff7f329da79ca60a04db08cef52

github-actions

.github/workflows/conventional-commits.yml

actions/checkout v4@692973e3d937129bcbf40652eb9f2f61becf3332

webiny/action-conventional-commits v1.3.0@8bc41ff4e7d423d56fa4905f6ff79209a78776c7

.github/workflows/deploy-caprover.yml

actions/checkout v4@692973e3d937129bcbf40652eb9f2f61becf3332

actions/setup-node v4@1e60f620b9541d16bece96c5465dc8ee9832be0b

.github/workflows/docker-publish.yml

actions/checkout v4@692973e3d937129bcbf40652eb9f2f61becf3332

docker/login-action 9780b0c442fbb1117ed29e0efdff1e18412f7567

docker/metadata-action 60a0d343a0d8a18aedee9d34e62251f752153bdb

docker/build-push-action 5cd11c3a4ced054e52742c5fd54dca954e0edd85

.github/workflows/jest-coverage.yml

actions/checkout v4@692973e3d937129bcbf40652eb9f2f61becf3332

pnpm/action-setup v4.0.0@fe02b34f77f8bc703788d5817da081398fad5dd2

ArtiomTr/jest-coverage-report-action v2@c026e98ae079f4b0b027252c8e957f5ebd420610

.github/workflows/node.js.yml

actions/checkout v4@692973e3d937129bcbf40652eb9f2f61becf3332

pnpm/action-setup v4.0.0@fe02b34f77f8bc703788d5817da081398fad5dd2

actions/setup-node v4@1e60f620b9541d16bece96c5465dc8ee9832be0b

.github/workflows/release-please.yml

googleapis/release-please-action v4@7987652d64b4581673a76e33ad5e98e3dd56832f

npm

package.json

@fastify/cors 9.0.1

@fastify/helmet ^11.0.0

@fastify/rate-limit 9.1.0

@fastify/redis 6.2.0

@fastify/schedule ^4.1.1

axios ^1.2.4

cheerio 1.0.0-rc.12

domhandler 5.0.3

fastify 4.28.1

html-to-text 9.0.5

jest-mock-extended 3.0.7

jsrsasign ^11.0.0

lodash 4.17.21

module-alias 2.2.3

moment ^2.29.4

mongodb 6.8.0

papr 15.2.2

toad-scheduler ^3.0.0

typescript 5.5.4

zod ^3.20.6

@eslint/compat 1.1.1

@eslint/eslintrc 3.1.0

@eslint/js 9.9.0

@jest/types 29.6.3

@redocly/cli 1.19.0

@types/html-to-text 9.0.4

@types/jest 29.5.12

@types/jsrsasign 10.5.14

@types/lodash 4.17.7

@types/node 20.14.15

@typescript-eslint/eslint-plugin 8.0.1

@typescript-eslint/parser 8.0.1

concurrently 8.2.2

eslint 9.9.0

eslint-config-prettier 9.1.0

eslint-plugin-jest 28.8.0

eslint-plugin-simple-import-sort 12.1.1

globals 15.9.0

jest 29.7.0

nodemon 3.1.4

prettier 3.3.3

standard-version 9.5.0

ts-jest 29.2.4

ts-node 10.9.2

nvm

.nvmrc

Check this box to trigger a request for Renovate to run again on this repository

Region is ignored for /authors

Example:
"Jussi Adler-Olsen": "B0043BVIJE"

https://api.audnex.us/authors/B0043BVIJE?region=de
=> returns an English description

https://www.audible.de/author/Jussi-Adler-Olsen/B0043BVIJE
=> Audible has a German description

audiobooks from audible.de (german audible) has sometime no series data

Hey,

nice project, thanks for that!

Some audiobooks from audible.de in the audnex-API (public) has no series data. The json-element doesn't exist.

Here are two books that are working and two that aren't working.
Is this a problem off audible? I think not, because the series data is visible on the normal audible website.

Working audiobooks

not working

I hope you can use the sample books, when you need more, I can try to find some and add it here.

Remove key for loop

Write update scheduler

Refine log levels

Messages such as 'updating asin' can probably be info, and add others to debug.

Potentially switch to fastify logger:
https://www.fastify.io/docs/latest/Reference/Logging/

Update incorrect???, Adopt text format from content text

Hello,

you have already solved the problem I described here and also submitted it as a pull request. However, it looks like the update is not executed. Can you please check this again and restart or repair if necessary?

Is it possible to transfer the content text you retrieve with the tags and categories for example from audible.de with their text formatting (blank lines, structure)? You don't need the font but it would be very nice if it would be possible that the text has the same structure as on the proviederseiten (e.g. the website of audible.de or audible.com).

Podcast support

Some requests to https://api.audnex.us/ failing with 504 gateway time out error

Some requests to https://api.audnex.us/ are failing with a 504 error...

% curl -v https://api.audnex.us/books/B083QSLMGD
*   Trying 104.21.52.190:443...
* Connected to api.audnex.us (104.21.52.190) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=sni.cloudflaressl.com
*  start date: Aug  9 00:00:00 2022 GMT
*  expire date: Aug  9 23:59:59 2023 GMT
*  subjectAltName: host "api.audnex.us" matched cert's "*.audnex.us"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fd57700ca00)
> GET /books/B083QSLMGD HTTP/2
> Host: api.audnex.us
> user-agent: curl/7.79.1
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 504 
< date: Wed, 07 Sep 2022 03:52:53 GMT
< content-type: text/html
< cf-cache-status: BYPASS
< report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=XbNWX7BJKnf2FzhMoHk%2FNjPNHNTHYR7cAQhgGUAHmmVMiCZCPRbPh%2B1iGdiFdfYjKJmEn%2FoOQDgygl11zXZmjxItjhxEGo2cw59fJ0vg89Ikir53GUP9ZfraX9SDB35t"}],"group":"cf-nel","max_age":604800}
< nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
< server: cloudflare
< cf-ray: 746c7d1059878986-SIN
< alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
< 
<html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx</center>
</body>
</html>
* Connection #0 to host api.audnex.us left intact

While others are fine...

% curl -v https://api.audnex.us/books/0241548705
*   Trying 172.67.202.222:443...
* Connected to api.audnex.us (172.67.202.222) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Cloudflare, Inc.; CN=sni.cloudflaressl.com
*  start date: Aug  9 00:00:00 2022 GMT
*  expire date: Aug  9 23:59:59 2023 GMT
*  subjectAltName: host "api.audnex.us" matched cert's "*.audnex.us"
*  issuer: C=US; O=Cloudflare, Inc.; CN=Cloudflare Inc ECC CA-3
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fe9b7811400)
> GET /books/0241548705 HTTP/2
> Host: api.audnex.us
> user-agent: curl/7.79.1
> accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 200 
< date: Wed, 07 Sep 2022 03:54:24 GMT
< content-type: application/json; charset=utf-8
< content-length: 2100
< vary: Origin
< last-modified: Fri, 02 Sep 2022 03:02:47 GMT
< cache-control: max-age=432000
< cf-cache-status: HIT
< accept-ranges: bytes
< report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=OmAZbsZahiSZNkqjgFKNkh%2Ft2zKdukVvXjTPUC%2FM1pf3VvhXXrovO8ofHAuK7he70Em5mrDkIICP3W4JOC2y7S6Ne6kpd%2FhyqpI74%2FqrqDvk%2FVytxJlamiAnv%2FljUgzv"}],"group":"cf-nel","max_age":604800}
< nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
< server: cloudflare
< cf-ray: 746c80bff87187d1-SIN
< alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400
< 
* Connection #0 to host api.audnex.us left intact
{"asin":"0241548705","authors":[{"asin":"B000AQ74C6","name":"Philip Pullman"}],"description":"Lyra finds herself in a shimmering, haunted otherworld - Cittàgazze, where soul-eating Spectres stalk the streets and wingbeats of distant angels sound against the sky. But she is not without allies....","formatType":"unabridged","image":"https://m.media-amazon.com/images/I/A1hzSma-3gS.jpg","language":"english","narrators":[{"name":"Philip Pullman"},{"name":"full cast"}],"publisherName":"Penguin Audio","rating":"4.9","releaseDate":"2021-06-24T00:00:00.000Z","runtimeLengthMin":533,"subtitle":"His Dark Materials, Book 2","summary":"<p><b>Brought to you by Penguin.</b></p> <p><b>From the world of Philip Pullman's </b><b><i>His Dark Materials - </i></b><b>now a major critically acclaimed BBC series.</b><br /> <br /> <b><i>She had asked: what is he? A friend or an enemy?</i></b><br /> <b><i>The alethiometer answered: he is a murderer.</i></b><br /> <b><i>When she saw the answer, she relaxed at once.</i></b></p> <p>Lyra finds herself in a shimmering, haunted otherworld - Cittàgazze, where soul-eating Spectres stalk the streets and wingbeats of distant angels sound against the sky.</p> <p>But she is not without allies: 12-year-old Will Parry, fleeing for his life after taking another's, has also stumbled into this strange new realm.</p> <p>On a perilous journey from world to world, Lyra and Will uncover a deadly secret: an object of extraordinary and devastating power.</p> <p>And with every step, they move closer to an even greater threat - and the shattering truth of their own destiny.</p> <p><b>Read by Philip Pullman and a full cast of narrators.</b></p>","title":"The Subtle Knife","genres":[{"asin":"18580715011","name":"Teen & Young Adult","type":"genre"},{"asin":"18581048011","name":"Science Fiction & Fantasy","type":"genre"},{"asin":"18581054011","name":"Dark Fantasy","type":"tag"},{"asin":"18581055011","name":"Epic","type":"tag"},{"asin":"18581057011","name":"Magical Realism","type":"tag"}],"seriesPrimary":{"asin":"B006K1ML9G","name":"His Dark Materials","position":"2"}}

As an additional datapoint, I'm currently running a local container with laxamentumtech/audnexus:develop and it's retrieving the same ASINs fine.

% curl -v https://audnexus.example.com/books/B083QSLMGD
*   Trying 192.168.10.6:443...
* Connected to audnexus.example.com (192.168.10.6) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.example.com
*  start date: Aug 14 16:00:27 2022 GMT
*  expire date: Nov 12 16:00:26 2022 GMT
*  subjectAltName: host "audnexus.example.com" matched cert's "*.example.com"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fab06811400)
> GET /books/B083QSLMGD HTTP/2
> Host: audnexus.example.com
> user-agent: curl/7.79.1
> accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200
< server: nginx/1.18.0
< date: Wed, 07 Sep 2022 04:04:28 GMT
< content-type: application/json; charset=utf-8
< content-length: 2222
< vary: Origin
< x-robots-tag: noindex, nofollow, nosnippet, noarchive
<
* Connection #0 to host audnexus.example.com left intact
{"asin":"B083QSLMGD","authors":[{"asin":"B07C9483MP","name":"David Reiss"}],"description":"His reputation had been secure. He was the world's most feared supervillain! But then he saved the world....","formatType":"unabridged","genres":[{"asin":"18580606011","name":"Science Fiction & Fantasy","type":"genre"},{"asin":"18580607011","name":"Fantasy","type":"tag"},{"asin":"18580626011","name":"Superhero","type":"tag"},{"asin":"18580628011","name":"Science Fiction","type":"tag"},{"asin":"18580629011","name":"Adventure","type":"tag"}],"image":"https://m.media-amazon.com/images/I/810t6JP87aL.jpg","language":"english","narrators":[{"name":"George Napier"}],"publisherName":"Atian Press","rating":"4.8","releaseDate":"2020-01-11T00:00:00.000Z","runtimeLengthMin":576,"seriesPrimary":{"asin":"B07R3QSR7S","name":"The Chronicles of Fid","position":"2"},"subtitle":"The Chronicles of Fid, Book 2","summary":"<p><b>His reputation had been secure. </b></p> <p><b>He was the world's most feared supervillain! </b></p> <p><b><i>But then he saved the world....</i></b></p> <p>News cameras had captured every moment of the battle in which Doctor Fid single-handedly averted an alien invasion. As details emerge, the public discovers how close the Earth had come to inescapable subjugation...or to complete annihilation. </p> <p>In the aftermath, there are many who wonder if the veteran supervillain has changed his ways. There are many who think that Doctor Fid may not be a monster after all. </p> <p>Notoriety is important to many of Doctor Fid's long-term plans to punish the unworthy, and this shift in public perception threatens to undermine decades' worth of effort.... But it also presents a tempting opportunity. </p> <p>New dangers arise, and through it all Doctor Fid must struggle to decide what role he will play. Can the notorious supervillain set aside his endless quest? Can Doctor Fid become the hero that the world needs? </p> <p>Or will he remain the villain that the world's heroes deserve? </p> <p><b>Any fan of the superhero genre will love this supervillainous novel. Read book two in the series that critics have called innovative, snarky, and ridiculously fun!</b></p>","title":"Behind Distant Stars"}

Maybe scrape html with account

In some cases, like this podcast, genres are available on the logged in page, not in incognito:
https://www.audible.com/pd/Hit-Job-Podcast/B091ZH3RCM

API down

The public-facing API is currently down. We are investigating what happened and working with the datacenter to resolve it.

Will update here once it's back.

Use string/literal file for messages

Handle '1-2' in book number

B079LRSMNN

Normalize chapter names to title case

iTunes Hi-Res Cover Art

Hey, first of all, love the direction that this project is going in! I think it has a lot of potential for the Plex agent aspect of it.

I've been working on my own audiobook project for a while, a self hosted server for downloading books you have checked out on OverDrive, and your library from Audible. It also scrapes metadata from Audible, and cover art from iTunes, merges the OverDrive books into a single, properly tagged M4B file, and cracks the DRM on the Audible books, also converting them to M4B (and retagging them so they're in the same format). The project isn't open source because its kind of in a legal gray area but I could add you as a viewer if you'd like to check out the source, It's entirely written in JS and could offer some ideas.

Anyway, what I was getting at was that the iTunes API offers the highest resolution covers for audiobooks from pretty much any source, and its free to use. I originally happened upon this when I found the iTunes Artwork Finder from Ben Dodson. If you properly manipulate the URL format for the images it returns, you can get the highest resolution version they have available, anywhere from 1000x1000px to 3000x3000px with the latter being the most common. These covers easily beat out the measly 500x500px covers offered by Audible and look way better on something like Plex.

I extracted all of the code I use in my project for searching the API, matching the book, and converting the cover link and put it all in a Gist so you can check it out: https://gist.github.com/csandman/0615901502012976122a10b3b2db161f

There are a couple things to note when using this API. One is that occasionally the link for the cover will fail with a server error, but I've never actually encountered this when using the API programmatically, only in browser, and reloading the page has always fixed the problem. The other is that very rarely the cover image the API returns won't be a perfect square, but will be something very close, so for my project I run a resizing script on them to make both dimensions match the smaller dimension. They've never been off by enough to make the covers look stretched after resizing. I included those scripts as a separate file in my Gist in case you're curious.

Anyway, I hope you'll consider adding this. Its one of the few things I'm really anal about with my audiobooks, making sure the covers look nice. And if you have any questions about using this (or want to be added to my project) definitely let me know!

Why does the API return a 500 when querying a book that has not been released?

https://api.audnex.us/books/B09P3VZGZX returns a 500, although the information for this book is in the Audible API.

Incorrect interpretation of genre/tags from the "Audible.de" website

Hello, I use the audiobookshelf programme to manage audiobooks. This in turn uses the Audnexus software for scraping the metadata for the audiobooks from the website audible.de or audible.com and various other languages. Unfortunately, Audnexus misinterprets the genre and tags on, for example, audible.de. Here I have described the problems in detail with screenshots.

Here the link: [Bug]: Incorrect transfer of genres and tags when scraping with audible (all variants)

Multi-region support

The first implementation step would be other English based domains, such as:

.co.uk
.ca

These may need to be denoted either in a separate DB, or with a new key, like region.

Other language regions will be a much more involved process and are currently not of of interest.

Feature Request: Original Publish Date

One thing which I have not been able to find a consistent source for is an original publish date for audiobooks. I'm not talking about the date an audiobook was released, I'm talking specifically about the date the first edition of a book was released. In terms of organizing an Audiobook collection (in my case, in Plex), it is generally far more convenient to allow things to be sorted by when the books were published. This is especially true for books in a series which may have multiple releases for their audiobook versions.

I know GoodReads has this info, but I also know they closed their API access. However, it's surely still possible to scrape the info from their website with something like cheerio. I know Readarr has GoodReads scraping integration but I haven't had a chance to look through their code for how they do it yet.

There is also the Google Books API, specifically the volume search which is a convenient way to search the Google books library with a JSON response, but they don't include the original publish date in that response. Which is weird because they do offer that info on their book pages. It could be realistic to use their API to find a book match and then use the books ID to scrape the Google Books page for that book.

Anyway, I'm not sure if attempting to scrape more sources like this is in the scope for this project, I've just been thinking that the original publish date for a book is one of the only things I haven't been able to get from Audible that is actually useful to have. I'm curious if you have any thoughts on the topic!

Question on chapters endpoint, and using public instance as the source for a Beets plugin

Hi,

I'm currently writing a plugin for Beets to help organize audiobooks. I found that Beets really likes to have metadata on chapters of a book for it to do its matching well.

I see that the response for the /chapters endpoint has a isAccurate field. In what cases would that be false, i.e how reliable is this data? For the few examples that I've tested, this is very accurate.
I had to specify a browser user-agent to avoid being blocked when using the audnex.us instance. I'm wondering if it would be OK for me to release the plugin while using that endpoint, or do you prefer that people host their own instances? I notice that the plex agent does use the endpoint but given the user agent blocking, I wanted to check just in case.

Square author photos

One feature that could be nice to add to this package, would be returning a square version of an author's photo in addition to the full sized one. The main benefit I can see for this is the Plex agent that was written for this package. When a non-square image is used for an artist photo in Plex, it is always cropped at the center, whereas by default the Audible author images are centered on their faces. And because there are no standards for the shape of the Audible author images, they always look inconsistent in Plex.

Plex:

Audible:

I assume Audible locates the face using Amazon Rekognition or something of the sort.

By default, the Audible author images are only thumbnails, which are obviously too low in resolution to be useful. However, if you know the dimensions of the raw image you generate a square image from, it's possible to generate a facially centered square image by modifying the initial image URL.

By default, the Author images come in the following format:

https://images-na.ssl-images-amazon.com/images/I/81iHT1SI1lL.__01_SX120_CR0,0,120,120__.jpg

And from looking at your code, I know you remove the __01_SX120_CR0,0,120,120__. part to get the URL for the full sized image, which looks something like this:

https://images-na.ssl-images-amazon.com/images/I/81iHT1SI1lL.jpg

Now, I'm not sure if you've tried messing with their initial URL format, but if you just increase all the numbers to some arbitrarily high resolution number, borders will be added to fill in the extra space if the image isn't large enough to accommodate:

https://images-na.ssl-images-amazon.com/images/I/81iHT1SI1lL.__01_SX1500_CR0,0,1500,1500__.jpg

However, I have found that if you pull in the image and find the smaller dimension (usually the width but sometimes the photos are in landscape), and replace all the 120 values in the original thumbnail URL with that number, you'll get the largest image size available while still being a square image centered on the author's face. Taking this image for example, the width of the full size image is 464 (still pretty small but much bigger than 120), and the modified original URL would look like this:

https://images-na.ssl-images-amazon.com/images/I/81iHT1SI1lL.__01_SX464_CR0,0,464,464__.jpg

Which is much more useful for something like Plex! The best part is, you don't have to do any subject identification on the photo itself because Amazon has already done the work for you. Plus, this whole process can be scripted pretty easily in Node:

import sizeOf from 'image-size';
import fetch from 'node-fetch';

const AUTHOR_IMAGE_URL =
  'https://images-na.ssl-images-amazon.com/images/I/81iHT1SI1lL.__01_SX120_CR0,0,120,120__.jpg';

const getSquareImage = async (url) => {
  const fullAuthorUrl = url.replace('__01_SX120_CR0,0,120,120__.', '');

  const imageRes = await fetch(fullAuthorUrl);

  const imgBuffer = await imageRes.buffer();

  const imageSize = await sizeOf(imgBuffer);

  const minDimension = Math.min(imageSize.width, imageSize.height);

  const largeSquareImgUrl = url.replace(/120/g, minDimension);
  console.log(largeSquareImgUrl);

  return largeSquareImgUrl;
};

getSquareImage(AUTHOR_IMAGE_URL);

Idk if this is outside the scope of what you intend to deliver with this project, but it's something worth thinking about! I think it would be a great alternative to using the full size image for Plex thumbnails.

The only issue with this process is that you'd have to load a lot of images into memory to check their dimensions, which would probably affect processing time/power. I'm not sure how that would ultimately affect the performance of this tool, but I think it's worth investigating.

Please add AudiobookGuild catalog to the audnexus db.

Please add AudiobookGuild catalog to the audnexus db.
Audible had been abandoned by many important authors. The new rising publisher is AudiobookGuild, and most of the new audiobooks are exclusively on their platform. Not even on Amazon.
For example the popular series "Summoner" by Eric Vall is only available till volume 16 on Audible. If you want volume 17 to 22, you can only find them on AudiobookGuild.
If I search for metadata for Summoner 22, I cannot find any result because it is not even on Audible. But the metadata is there on AudiobookGuild (see: https://audiobookguild.com/collections/summoner-by-eric-vall/products/summoner-22 ).
Can you add it?

Handle missing required fields

These shouldn't have attempted to insert into the db:

B002UZL4P8 maybe don't require image?

"schemaRulesNotSatisfied":[{"operatorName":"required","specifiedAs":{"required":["_id","asin","description","formatType","image","language","publisherName","releaseDate","summary","title"]},"missingProperties":["image"]}]}}},"msg":"Document failed validation"}

B00SLW90N2

"specifiedAs":{"required":["_id","asin","description","formatType","image","language","publisherName","releaseDate","summary","title"]},"missingProperties":["formatType","language","publisherName","summary","title"]}]}}},"msg":"Document failed validation"}

Add cache busting param

Only 500 Internal Server Error

I always get this message.

https://api.audnex.us/books/B004V3F0RK
{"statusCode":500,"error":"Internal Server Error","message":"Required key 'authors' does not exist in Audible API response for ASIN B004V3F0RK"}

It seems to me the whole api is completley down even if the Status page tells sth different.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (:automergeEslintWeekly)

Podcast that fails mongo validation

{"method":"GET","url":"/books/B08GC9K8XJ","hostname":"api.audnex.us","remoteAddress":"10.0.1.5","remotePort":48132},"res":{"statusCode":500},"err":{"type":"MongoServerError","message":"Document failed validation","stack":"MongoServerError: Document failed validation\n at /app/node_modules/mongodb/lib/operations/insert.js:51:33\n at /app/node_modules/mongodb/lib/cmap/connection_pool.js:273:25\n at handleOperationResult (/app/node_modules/mongodb/lib/sdam/server.js:363:9)\n at MessageStream.messageHandler (/app/node_modules/mongodb/lib/cmap/connection.js:474:9)\n at MessageStream.emit (node:events:520:28)\n at processIncomingData (/app/node_modules/mongodb/lib/cmap/message_stream.js:108:16)\n at MessageStream._write (/app/node_modules/mongodb/lib/cmap/message_stream.js:28:9)\n at writeOrBuffer (node:internal/streams/writable:389:12)\n at _write (node:internal/streams/writable:330:10)\n at MessageStream.Writable.write (node:internal/streams/writable:334:10)","index":0,"code":121,"errInfo":{"failingDocumentId":"621ed0f76742d31b21ab04b1","details":{"operatorName":"$jsonSchema","schemaRulesNotSatisfied":[{"operatorName":"properties","propertiesNotSatisfied":[{"propertyName":"genres","details":[{"operatorName":"items","reason":"At least one item did not match the sub-schema","itemIndex":0,"details":[{"operatorName":"type","specifiedAs":{"type":"object"},"reason":"type did not match","consideredValue":null,"consideredType":"null"}]}]}]}]}}},"msg":"Document failed validation"}

B08GC9K8XJ