aurelg / feedspora Goto Github PK

FeedSpora posts RSS/Atom feeds to your social network accounts.

Python 98.93% Dockerfile 0.39% Makefile 0.68%

rss twitter facebook linkedin mastodon shaarli diaspora wordpress bot python3

feedspora's Introduction

What is FeedSpora?

FeedSpora posts RSS/Atom feeds to your social network accounts. It currently supports Facebook, Twitter, LinkedIn, Diaspora, Wordpress, Mastodon and Shaarli. It's a bot written in Python3 inspired from Fefebot.

Installation

Install dependencies: pip install -r requirements.txt.
Then install FeedSpora with the usual python setup.py install.

Configuration

Create a config file out of the provided template feedspora.yml.template. The enabled directive is optional and allow you to selectively enable/disable accounts by setting it to True or False.

Usage

Publish all RSS/Atom entries to your account with: python -m feedspora

Detailed Information

The FeedSpora Wiki contains many more details about configuration and other options.

feedspora's People

Contributors

Stargazers

Watchers

Forkers

bremensaki wilddeej valaia rizwanrajput lfw08

feedspora's Issues

Detailed documentation needed

FeedSpora has a lot of very nice and useful functionality, and as such, needs a good user-oriented document to outline its features and their usage.

Occasional test failures because of unpredictable client ordering

While trying to setup CI with github/travis-ci, I noticed that the feed and post tests were occasionally failing. It comes from the order of clients, which is different between the expected and tested outputs. Clients are instantiated from a dict which loaded from the configuration file using PyYAML. Such a dict is not ordered (python<3.7).

A workaround would be to use this trick. However, the root cause of the issue comes from the test itself: it checks the standard output of feedspora instead of testing the code logic. It would be enough to check the output of each client independently (this would be required for #43 as well).

Proposed implementation: each client should be able to store entries, perhaps as list stored as an attribute. Once all clients/entries have been processed, feedspora_runner.py could check whether we were in testing mode, retrieve the list from all clients, and merge them in a JSON structure [{'client1_name': client1_list, 'client2_name': client2_list, ...} that can be later loaded. The comparison with the expected output stored in the golden file (in the same format) would have to take into account dict keys and values, instead of their string representations.

Can't get URL's to be posted on Mastodon

So I'm trying to use Feedspora to publish to mastodon, the mastodon connection is functional, toots gets published correctly, but I can't get the URL's of the post to be published along the content.

This is the params I have under the Mastodon block :
enabled: True
post_include_media: True
post_link_content: True
post_include_content: True
#post_include_link: True
url_shortener: 'tinyurl'

Do I have to put these specifications elsewhere ? like in the Feed block for instance ?
does it need to be specified for each feeds ?

My first approach was that these params only need to be configured for each output channels, but I can't get it to work,

Any help would be very much appreciated, thanks in advance !

"ImportError: No module named src.feedspora"

Hello, when I run python setup.py I get this output error, could you help me please?

feedspora-master$ python setup.py
Traceback (most recent call last):
File "setup.py", line 6, in
from src.feedspora import version
ImportError: No module named src.feedspora

add Docker support

Hi Aurélien,

the main purpose of my pull requests is to make feedspora work for me for RSS to Mastodon. And YES, currently feedspora is successfully running on my machine 🎉

Because i do not want to install all that python stuff on my main machine, I already have created a Dockerfile. But this file is at the moment only local on my disk.
I am willing to share it of course. But currently the problem is, that it is using the Git URLs of my forks of your repositories.

Therefore i am waiting for your feedback back on my PRs and that we merge them. Afterwards i am polishing the Dockerfile i already have and let it point to the repository urls of your account. Finally you'll get another pull request with that file.

I hope that is of interest for you.

Best Regards,
Sven

PS: if you care about, you can assign this issue to me

Case of hashtags (keywords) should be retained

Currently the code that gathers the list of keywords converts these all to lower-case (to prevent duplication - which is useful), but when the keywords are applied, they should be in the case as originally specified.

Implementation notes:
Since the keywords need to be kept in a list (as already decided), it seems that a parallel dict could be kept that is keyed by the same lower-case keyword, with the original/unchanged keyword as the value. Is there another alternative that could be implemented in a single data structure, without losing the ordering provided by list implementation?
Another alternative is just to use the unaltered keywords as list entries, but this will mean a more complicated operation will be needed to prevent keyword duplication (rather than just lower-casing a potential entry and using "is not in keywords" to know if that new entry should be added).

Add support for Minds

The Minds social media platform would likely be a good addition for FeedSpora. A Python API for Minds can be found at https://gitlab.com/granitosaurus/minds-api, and there may be others as well.

Add support for `post_prefix` and `post_suffix` in all clients.

Add support for post_prefix, post_suffix in the GenericClient so that it's not limited to TweepyClient. See #13

Improve performance

When many entries have to be published to many clients, the performance of feedspora is disappointing. Entries are created by a generator on the fly, and published sequentially to each client in (here).

The low hanging fruit would be to publish to several clients in parallel (w/async). Ideally, each client should be provided with its own generator (fed with the "master" entry generator), out of which entries can be consumed (i.e. published). itertools can probably help.

This would not address a bigger issue: each client is responsible for preparing/formatting an entry according to its configuration/target service. While this makes sense as they are different, some operations can probably be extracted, merged and cache, e.g. using memoization with a static function in GenericClient.

Rewrite feed/post tests w/pytest

The current feed and post tests rely on a shell script (tests/run_tests.sh). They should be reimplemented with pytest.

Shaarli client enhancements

This is of course related to Issue #38.
Only two features are missing from the existing Shaarli client implementation: post_include_media and post_include_content - and actually, the current implementation does include content by default. So in the case of implementing post_include_content functionality, it's more a matter of changing some logic so that this content is only included when the option is specified.

From looking at the Shaarli documentation (and consistent with its purpose as a link-sharing service), it really has no facility for including media at all. So it seems the post_include_media functionality will need to remain unsupported in this client, due to its nature/limitations. This should of course be mentioned in the FeedSpora documentation.

Right now all posts to Shaarli are made public, but I can see the benefit of implementing a post_audience option that could be set to private to override this default behavior. This would be an easy-yet-useful feature to implement, and is planned for inclusion.

Finally, this is a good opportunity to generally clean up the client implementation, including the pylint errors. One of those errors is the assignment-from-no-return error associated with the call to the shaarpy.post_link API function. I think it would be best if that API function provided a return value, which would generally increase its usefulness, and that was what I had planned for this aspect. Two possibilities for this return value might be either the post "token" that is retrieved at the end of that function's successful operation, or perhaps the request result from which that token is derived would be more useful. Any input on this would of course be welcome.

Wrong estimation of the maximum length of a tweet resulting in missing hashtags / wrong content

Until commit ff852be, the maximum "usable" length of a tweet was calculated by removing 22 from the maximum length (280), because twitter counted embedded links as 22 characters (or maybe 20, with a small margin, I don't remember), as URLs are wrapped with t.co (see here). (code is here).

The commit c1d6499 introduced a change, by also removing the full length of the link URL to post. Posting long links consequently drastically reduces the number of available characters for the post content and associated hashtags. (code is here).

Since Twitter automatically wraps links with t.co, and unless I am missing something, I think this is a bug that should be fixed by removing len(post_url). Moreover, the old value of 22 for self._link_cost (here) should be updated to 23, see here (I assume this document is up to date, I haven't called the API directly).

@wilddeej Any thoughts about this?

Client feature consistency

(NOTE: This is probably more of a project than an issue, but I'm not sure how to reflect this differently, and am open to other options)

All implemented clients should support a defined "core" set of functionality, with variances only where it makes sense for a specific client platform that supports some sort of unique feature, or if there is no mechanism available to support a particular feature. This does not mean that every client will implement the core functionality identically, but as much consistency as possible should be attempted.

The core set of functionality is proposed as:

Enable/disable client via enabled [already implemented]
Ability to limit the number of posts (or "seed" the published db) via max_posts
Ability to include content in posts via post_include_content
Ability to post with an image via post_include_media
Ability to prefix a post via post_prefix [ref. Issue #14]
Ability to suffix a post via post_suffix [ref. Issue #14]
Ability to post with hashtags (keywords, tags) [probably already implemented]
Ability to specify user-defined hashtags (keywords,tags) via keywords
Ability to limit the number of hashtags via max_tags
Ability to shorten post link via url_shortener and url_shortener_opts

I think the best way to approach this is by first implementing any/all functionality that will be needed by all clients (this includes Issue #30), including documentation of that functionality (Issue #36 - which will likely be an incremental process throughout this "project").
After finishing the common core implementation, going through each client in turn to fully implement all the core functionality aspects for that client, or understand (and document) where this is not possible. The client-specific implementation will necessarily require detailed understanding and access to the relevant client, and that should be part of the client-specific documentation as well, of course.

If this sounds like a viable plan, and once all details are understood and agreed upon, it probably then makes sense to

Create an issue for the common core implementation (and associated documentation)
Create an issue for each client's implementation of the core functionality (and associated documentation)

Thoughts?

Provide feed-specific options

The idea here is to provide some of the client-specific options on a feed basis, specifically:

enabled
max_posts
max_tags
post_prefix
post_suffix
post_include_content
post_include_media
tags
tag_filter_opts
url_shortener
url_shortener_opts

These options would work the same way as the associated client option, but would apply only to postings from the specified feed. Using this functionality, it would be possible to post 1 article from one feed and 3 from another, each with their own posting prefix, for example.

The proposed use model is that any feed-specific options would override the equivalent client-specific options, if both were present.

config file location

is there a way to specify where the config file is located? I'd like to run this for multiple users on a server, but without pipenv.

Add support for images in Atom feeds, if any.

The RSS is able to find the URL of an image embedded in a post with the find_rss_image_url function. It would be nice to have the same feature for Atom feeds. Let's find one with images inside, and implement it!

LinkedIn support

The LinkedIn client should be enhanced to include functionality for the post_include_content and post_include_media options (ref. Issue #38).
The existing LinkedIn client implementation uses python3-linkedin. Another alternative might be python-linkedin. Depending upon the functionality each provides, the best alternative should be used.

Reorganize tests

Problem 1

In client_test.py, the tests override the default __init__ function of each client with new_init. Among other things, this new_init overrides the real provider with a fake one which just echoes what it is being sent.

The return value of the post function of clients is consequently ambiguous:

when called from the production code, it is tested against a boolean in feedspora_runner:197 (suggesting that it should be a boolean as well)
when called from client_test.py, it is tested against a dict

This make the implementation of the post function unnecessarily complicated and fragile.

Problem 2

On top of this convoluted testing strategy, the post_test tests also define their own way to test the clients, by passing a testing parameter to their __init__. Consequently, some testing-related code has to be implemented in the code being tested, i.e. in the __init__ and post function of each client. This is also bad practice, unnecessarily complicated and fragile.

Solution

The usual solution to problem 2 is to monkeypatch the objects being tested. However, monkeypatching may cause ambiguous return types like problem 1 unless functions are specifically implemented to be easy and unambiguous to test. Since that's not the case here, this should be fixed.

The order of hashtags / keywords should reflect the order of the categories in the parent feed

The current implementation filters out duplicated categories by converting the list of categories into a set in the Atom and RSS parsers.

The order of categories is not defined in the Atom and RSS specifications. We can't assume that the order is relevant or not and it makes sense not to alter it.

AttributeError: 'NoneType' object has no attribute 'text'

Help!
I installed feedspora following the directions.

When I run python3 -m feedspora, here is the result:

INFO:root:Feed read.
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.9/dist-packages/feedspora/main.py", line 100, in
main()
File "/usr/local/lib/python3.9/dist-packages/feedspora/main.py", line 96, in main
feedspora.run()
File "/usr/local/lib/python3.9/dist-packages/feedspora/feedspora_runner.py", line 243, in run
entry_count = self._process_feed(entry_count, feed)
File "/usr/local/lib/python3.9/dist-packages/feedspora/feedspora_runner.py", line 209, in _process_feed
for entry in entry_generator:
File "/usr/local/lib/python3.9/dist-packages/feedspora/generic_feed.py", line 257, in parse_rss
fse.content = entry.find('description').text.strip()
AttributeError: 'NoneType' object has no attribute 'text'

Store the `account` dict in the GenericClient

Currently, account parameters are read from the parameter file as a dict. Some of its key/value are manually extracted and stored in the GenericClient instance by its set_common_opts member function.

The number of account parameters is increasing. Their manual copy and extraction looks like an unnecessary step. Instead, the account dict could be stored as is in the GenericClient. Such an code would be simpler, more abstract and flexible.

Originally posted by @aurelg in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMTQ0ODM5MDI5OnYy/pull_request_review_threads/discussion

Atom feed parser should use summary if content is missing

According to the documentation at W3 Feed Validator (https://validator.w3.org/feed/docs/atom.html), the data in the content element might be empty, in which case the contents of the summary element should be used. An example of a stream that manifests this behavior (and so makes a good test case) can be found at http://www.javarticles.com/feed/atom.

Clean up initialization/usage of client vars

The client variables _tags, _tag_filter_opts, and _url_shortener_opts should each be initialized to None. This will also mean that various places within the module (and its tests) will need to compensate for a value of None (that code is currently expecting at least an empty list/dict as a value).

SyntaxError: invalid syntax

I'm trying to get feedspora to work with 1 RSS feed to Mastodon, I have followed the instructions, the wiki, it seems I'm almost there but when I run the command I'm getting this error :

$ python3 -m feedspora
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/feedspora/__main__.py", line 18, in <module>
    from feedspora.tweepy_client import TweepyClient  # @UnusedImport
  File "/usr/local/lib/python3.7/dist-packages/feedspora/tweepy_client.py", line 6, in <module>
    import tweepy
  File "/home/user2/.local/lib/python3.7/site-packages/tweepy/__init__.py", line 17, in <module>
    from tweepy.streaming import Stream, StreamListener
  File "/home/user2/.local/lib/python3.7/site-packages/tweepy/streaming.py", line 358
    def _start(self, async):
                         ^
SyntaxError: invalid syntax

Can you help pin/nudge me in the right direction to solve this ?

WordPress client enhancements

(Genesis from Issue #38)
The WordPress client needs to have functionality implemented for the post_include_media and post_include_content client options.

Implementation of post_include_media should be pretty straightforward, as documented at https://python-wordpress-xmlrpc.readthedocs.io/en/latest/examples/media.html.

The post_include_content option is not as simple, however. Upon further investigation into the WPClient code, I found that it is not actually posting based on RSS/Atom feed item contents, but rather on the contents of the link references from those entries. All other clients limit themselves to the feed item contents, not the content from the referenced link. I think this modified functionality in this client should be under the control of a new option (proposed post_link_contents), and the basic functionality should be changed to post only from feed item data. This approach also resolves what should be involved with the post_include_content option as well. I will wait for confirmation of this idea from @aurelg before proceeding with this implementation plan.

New opportunity ?

Hey buddy,

I happened to find your repository after the issue you created here: DEKHTIARJonathan/python3-linkedin#5

It seems like we share a lot of common goals, so I will allow myself to bring your attention to a project of mine which has common features with yours: https://github.com/DEKHTIARJonathan/FeedCrunch.IO

The website is available here: https://www.feedcrunch.io/
And my personal page: https://www.feedcrunch.io/@dataradar/

The core concepts we share:

a unified platform to handle RSS Feeds
automatically repost on various social networks entries of an RSS Feed
- LinkedIn
- Facebook
- Twitter
- Slack
- more comming

I invite you to read the readme of the project: https://github.com/DEKHTIARJonathan/FeedCrunch.IO

Maybe you would be open to some discussion and we could find a way to work together ? You tell me.

Tweepy client and/or mkrichtext formatting bug

I've noticed a few tweets produced by FeedSpora that have incorrect formatting. An real-life example is:

AD: New! "Introverts Unite" T-Shirt: #introversion | #unity #causes #activism #satire <link>

The hashtags are the problematic detail: this client is configured to only include 4 hashtags, and clearly 5 have been included. Additionally, the title/hashtag inclusion formatting is funky, using both a colon and a pipe as data separators.
Although minor, this is incorrect and needs resolution (as well as a pertinent test case to detect/report this issue, obviously).

Remove getters/setters in `GenericClient`, either use @property if they are required or remove them

I noticed several getters/setters in GenericClient. The implementations of the corresponding attributes is quite simple, and their (unlikely) changes would only impact feedspora itself and would be easy to manage without side effects. I think having direct read/write access to attributes would be enough and simplify the code.

OTOH, if there's a good reason I don't get yet to have getters and setters, they should probably be implemented with the @property (see here) to prevent direct access to attributes (and allow for more readable code when calling them).

Error reading vimeo feed --> KeyError: 'medium'

Hi,
i want to read my Vimeo feed with feedspora, but i get an error:

INFO:root:Found database file feedspora.db
INFO:root:Trying to read https://vimeo.com/strubbl/likes/rss as a file.
INFO:root:File not found.
INFO:root:Trying to read https://vimeo.com/strubbl/likes/rss as a URL.
INFO:root:Feed read.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/site-packages/feedspora/__main__.py", line 100, in <module>
    main()
  File "/usr/local/lib/python3.6/site-packages/feedspora/__main__.py", line 96, in main
    feedspora.run()
  File "/usr/local/lib/python3.6/site-packages/feedspora/feedspora_runner.py", line 243, in run
    entry_count = self._process_feed(entry_count, feed)
  File "/usr/local/lib/python3.6/site-packages/feedspora/feedspora_runner.py", line 209, in _process_feed
    for entry in entry_generator:
  File "/usr/local/lib/python3.6/site-packages/feedspora/generic_feed.py", line 276, in parse_rss
    fse.media_url = self.find_rss_image_url(entry, fse.link)
  File "/usr/local/lib/python3.6/site-packages/feedspora/generic_feed.py", line 210, in find_rss_image_url
    entry.find('media:content')['medium'] == 'image':
  File "/usr/local/lib/python3.6/site-packages/bs4/element.py", line 997, in __getitem__
    return self.attrs[key]
KeyError: 'medium'

TypeError: load() missing 1 required positional argument: 'Loader'

Could anyone assist? After installing using the instructions posted, I run python3 -m feedspora, I get this error:

Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.9/dist-packages/feedspora/main.py", line 100, in
main()
File "/usr/local/lib/python3.9/dist-packages/feedspora/main.py", line 84, in main
config = read_config_file(root_name + '.yml')
File "/usr/local/lib/python3.9/dist-packages/feedspora/main.py", line 33, in read_config_file
return load(config_file)
TypeError: load() missing 1 required positional argument: 'Loader'

Feed parser consistency

Atom and RSS parsers should support the same set of functionality. Currently, there are differencies, such as find_rss_image_url, which is availble for RSS, not for Atom.

Support of chatrooms?

Would you merge a support for chatrooms?
Like xmpp/jabber (or irc)?