Git Product home page Git Product logo

hn_summary's Introduction

HN Summary Bot avatar

HN Summary is an open source bot which sumarizes top stories on Hacker News and publishes the summaries to a Telegram channel.

Join the HN Summary channel on Telegram to see the bot in action and enjoy the story summaries:
https://t.me/hn_summary

Flag bad summaries on the telegram channel with ๐Ÿ‘Ž to help mitigate and improve.

You can find summaries of the current top Hacker News articles here as well:
https://news.jiggy.ai

Feel free to open PR/issue or dm me at @wskish on telegram or twitter with feedback.

Overview

Whenever a new story appears on the Hacker News API /topstories.json endpoint, this bot summarizes it (currently using OpenAI gpt-3.5-turbo) and sends the Story title, summary, and url to the hn_summary channel on Telegram.

The purpose of this project is to help build intuition on the capabilities of the current generation of large language models while surfacing a broader swath of top Hacker News content. It could also serve as a platform for experimentation with other language model capabilities such as semantic search.

Limitations

Large language models such as GPT-3 are prone to crazy hallucinations and sometimes make things up while writing in a very authoritative tone.

The code for extracting text from html is very basic and error prone. (PR's welcome.) In addition many sites (such as news sites) are either paywalled or make it difficult to extract text. We now attempt to catch this case via prompt engineering but when one does slip through we tend to get fanciful hallucinations based on just the title and FQDN.

Links to content types other than PDF and HTML are currently ignored.

Text extraction from reddit and twitter and other commercial links are broken and probably produce wildly hallucinated summaries.

Telegram messages are limited to 4K. Currently the response is truncated to 4K.

Major Dependencies

The following environment variables are used to inject credentials and other required configuration for the major dependencies:

OpenAI

  • OPENAI_API_KEY # your OpenAI API key

PostgresQL

Database for keeping track of items we have already seen and associated item info.

  • HNSUM_POSTGRES_HOST # The database FQDN
  • HNSUM_POSTGRES_USER # The database username
  • HNSUM_POSTGRES_PASS # The database password

Telegram

  • HNSUM_TELEGRAM_API_TOKEN # The bot's telegram API token
  • HNSUM_TELEGRAM_CHANNEL_ID # the telegram chat where the bot will post the summaries

hn_summary's People

Contributors

amy-why avatar erjanmx avatar priyavrat-misra avatar wskish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hn_summary's Issues

twitter links produce hallucinations

the title text and javescript not available response make the model produce some interesting hallucinations.

Should filter twitter links or figure out a way of accessing the content.

retry failed requests.get()

currently uses a timeout of 30 seconds with no retries with attempting to retrieve story url content. consider adding mechanism for retry, or enqueue overloaded links for download in the future when they recover.

telegram summary messages are truncated to 4K

Telegram has a 4K limit on message size. Currently we avoid this limit by simply truncating the message to 4096 characters.

For summaries longer than 4K we could consider asking the LM to shorted the summary, or we could break the summary into multiple telegram messages.

surface prompts to help assess summary

a common failure mode is that we fail to extract the actual story text from a web site for a variety of reasons. (paywall, javascript requirement, anti-bot logic, etc)

make the prompt visible via a web link to provide additional context into suspicious summaries.

support pdf content

currently only html content is supported. would be interesting to support PDF content.

HN_comments_summary

Great bot! Would love to see a similar bot for HN comments - probably waiting 24h before summarising all the comments on the item.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.