Git Product home page Git Product logo

nyannyanovich / nyan Goto Github PK

View Code? Open in Web Editor NEW
166.0 4.0 24.0 369 KB

Automatic news aggregator in Telegram / Автоматический агрегатор новостей в Телеграме

Home Page: https://t.me/nyannews

License: Apache License 2.0

Python 97.06% Shell 0.95% HTML 2.00%
aggregator clustering embeddings news python telegram news-aggregator telegram-channel newsfeed newsfeed-aggregator

nyan's Introduction

НЯН

Tests Status https://t.me/nyannews License

изображение

НЯН (Nyan) is a news aggregator that scrapes news from different Telegram channels, clusters similar posts, and forms a united feed. All sources are split into several groups, so anyone can understand whether they can trust them.

Channel itself: NyanNews

Extensive description (in Russian): Whitepaper

Detailed instruction (in Russian): Как поднять свой НЯН

Install

Install git and pip

sudo apt-get install git python3-pip

Clone repo

git clone https://github.com/NyanNyanovich/nyan

Install Python requirements

pip3 install -r requirements.txt

Download models

bash download_models.sh

Install Docker and Docker Compose.

Provide Telegram API credentials to configs/client_config.json.

Run

Run Mongo container

docker-compose up

Run crawler

bash crawl.sh

Run server

bash send.sh

nyan's People

Contributors

nyannyanovich avatar thepaket avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nyan's Issues

./send.sh

root@root:~/nyan# bash send.sh
./send.sh: line 12: 2052900 Killed python3 -m nyan.send --channels-info-path channels.json --client-config-path configs/client_config.json --mongo-config-path configs/mongo_config.json --annotator-config-path configs/annotator_config.json --renderer-config-path configs/renderer_config.json --daemon-config-path configs/daemon_config.json

@ parsing issue

https://t.me/nyannews/8265

Как передает телеграм-канал «Ху Херсон» со ссылкой
... — Медуза — LIVE

but if one follows the link to see the meduzalive message, it was

Как передает телеграм-канал «Ху@вый Херсон» со ссылкой

Mistake in the renderer.py

File "/home/bumrus/nyan/nyan/renderer.py", line 141, in render_ratings
if cluster.issue != issue_name:
AttributeError: 'Cluster' object has no attribute 'issue'. Did you mean: 'issues'?

Is this really true?

main, clusters after first filter: 0

tech и economy работает как надо, но для main не кластеризируются данные.

Issue: main, clusters after first filter: 0

Issue: tech, clusters after first filter: 3
Added as no other clusters: 164531 Артемий Лебедев стал директором по дизайну соцсети «ВКонтакте». Он будет руководить дизайном продуктов, формировать...
Added as no other clusters: 1228 Разработчик софта для строительства и машиностроения Autodesk завершил ликвидацию своего российского подразделения. Компания является...
Added as no other clusters: 3271 Расследование The Bell. Кто такой Степан Ковальчук, который должен превратить «ВКонтакте» в «Первый канал»...

Issue: economy, clusters after first filter: 3
Added as no other clusters: 1228 Разработчик софта для строительства и машиностроения Autodesk завершил ликвидацию своего российского подразделения. Компания является...
Added as no other clusters: 49984 Россия считает высокими риски втягивания новых сторон в палестино-израильский конфликт — МИД.
Added as no other clusters: 61875 конфликт на Ближнем Востоке может спровоцировать рост мировых цен на нефть — Politico.

Issue: belarus, clusters after first filter: 0

6 clusters in all issues after filtering...

Problem with AgglomerativeClustering

Cannot found any solution, looking for a few examples how to fix it but without success. Any idea there is my problem ?

Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/user/nyan/nyan/send.py", line 47, in
main(**vars(args))
File "/home/user/nyan/nyan/send.py", line 27, in main
daemon.run(
File "/home/user/nyan/nyan/daemon.py", line 51, in run
self.call(
File "/home/user/nyan/nyan/daemon.py", line 89, in call
new_clusters = self.clusterer(annotated_docs)
File "/home/user/nyan/nyan/clusterer.py", line 31, in call
image_idx2cluster = self.find_image_duplicates(docs)
File "/home/user/nyan/nyan/clusterer.py", line 104, in find_image_duplicates
clustering = AgglomerativeClustering(
TypeError: AgglomerativeClustering.init() got an unexpected keyword argument 'affinity'

ValueError: max() arg is an empty sequence

Привет.

Используем библиотеку на MacOS, python 3.8 и python 3.9. Настроили только канал публикации main, источники и другие настройки не изменены.

Кластеры не сохраняются, выдается ошибка ValueError: max() arg is an empty sequence

./bash send.sh
Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar.
Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar.
===== New iteration =====
Reading clusters from Mongo
0 clusters loaded
Reading docs from Mongo
1336 docs loaded
Last document: 04-01-23 12:01
Warning: 0 docs from channel truexanewsua
Warning: 0 docs from channel taboo_news
Warning: 1 docs from channel agentstvonews
Warning: 0 docs from channel consultant_plus
Warning: 0 docs from channel fanimani_official
Warning: 0 docs from channel kojournal
Warning: 1 docs from channel frank_media
Warning: 1 docs from channel newsholod
Reading annotated docs from Mongo: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1336/1336 [00:01<00:00, 1158.36it/s]
1336 docs already annotated, 0 docs to annotate
1215 docs before clustering
0 updated documents
/Users/username/folder/nyan/venv/lib/python3.8/site-packages/sklearn/cluster/_agglomerative.py:983: FutureWarning: Attribute `affinity` was deprecated in version 1.2 and will be removed in 1.4. Use `metric` instead
  warnings.warn(
617 clusters overall

Issue: main, clusters after first filter: 0

Issue: tech, clusters after first filter: 0

0 clusters in all issues after filtering

0 clusters saved to file

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/username/folder/nyan/nyan/send.py", line 47, in <module>
    main(**vars(args))
  File "/Users/username/folder/nyan/nyan/send.py", line 26, in main
    daemon.run(
  File "/Users/username/folder/nyan/nyan/daemon.py", line 44, in run
    self.__call__(
  File "/Users/username/folder/nyan/nyan/daemon.py", line 97, in __call__
    saved_count = posted_clusters.save_to_mongo(mongo_config_path)
  File "/Users/username/folder/nyan/nyan/clusters.py", line 271, in save_to_mongo
    max_cluster_fetch_time = max([cl.fetch_time for cl in self.clid2cluster.values()])
ValueError: max() arg is an empty sequence

[Suggestion] Generate a summary of all News throughout some time interval.

I'm not really sure if that's the correct place to send an issue to, considering the feature lies a bit outside of the nyan's domain (and requires using LLM, probably), but it'd be perfect to have it.

Perhaps, we need some kind of ranking (in terms of importance and societal resonance, that is) to select the final collection? Or just a general summarization, idk, that's still something to think about.

8 hours summary using GPT not present

After setting up, everything is working except 8 hours summary.
I see files summary.py, openai.py, topics.py are present but don't seem to be included anywhere.
Maybe some bash scripts missing?
Also I've read that for openai you need to put your keys into .env which might also need to be mentioned in the setup instruction.

Docker container size

Good day.
I installed a Nyan-bot, after some time the docker container grew to a size of about 50 gigs (this is also written in the requirements) and will probably continue to grow.

Is it possible to somehow limit the disk used?
Maybe there is some option in the configs, or something similar?

I also considered the option of stopping the Nyan-bot, deleting the docker container and starting the bot again.
How will this option affect the bot's performance?
Is it like a reset to the initial state?
What do you say, what advice do you give?

Сделал форк

Добавил фильтр новостей и каналов, новости выходят намного чаще (раз так в 100 чаще). Поменял логику того как выходят посты, теперь существенно больше контента (фото, видео) и меньше скучного текста. Получился такой себе новостной тикток.

Выложу свой форк на гитхаб после мелких доработок

Бамп https://t.me/upnewsua/14457

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.