kensanata / mastodon-archive Goto Github PK

View Code? Open in Web Editor NEW

344.0 344.0 30.0 501 KB

Archive your statuses, favorites and media using the Mastodon API (i.e. login required)

Home Page: https://alexschroeder.ch/software/Mastodon_Archive

License: GNU General Public License v3.0

Python 86.28% Makefile 0.21% Shell 13.51%

mastodon-archive's Introduction

Hello!

🏡 My home on the Internet: blog
🎡 Code repositories: self-hosted
📫 How to reach me: [email protected], encrypted
😄 Pronouns: he/him/his

mastodon-archive's People

Contributors

Stargazers

Watchers

mastodon-archive's Issues

Can't install on Korean Windows

on running command : pip3 install mastodon-archive,

Windows 10, Korean
Korean defaults to CP949 on Windows cmd.exe
I changed code page to 65001 (unicode) before installing by typing chcp 65001
admin privileged cmd.exe shows same error message

I guess It will happen on CJK Windows too, but not tested in Chinese, Japanese windows

Backup media attachments

Download media attachments, including previews, also for boosts.

Output is unreadable

It seems a pity to write text (JSON) but not have it easy to read for a casual user when a bit of indentation would help a lot.

Account information duplicated

The output is unnecessarily verbose as the account information is duplicated on each status.

Search

A way to search your backup (and produce text output).

Redundant code

Last line is fp.close() but the enclosing with statement should close the file.

Expire only boosts

Hello,

Is it possible to add an option to expire only our boosted toots, instead of expiring all statuses ?

Thanks

Html export to included mentions

Hello,

(first of all, thanks a lot for this great tool ;)

I tried to download an archive with --with-mentions, it finished correctly and displayed [some plausible number] mentions.
But in the HTML file I don't see any mention. And I can't see a conversation, apart from opening the original web URL (and that's not the goal in archiving mentions).
NB: I didn't do any "Clear mentions".
(and I'm not very sure all mentions are here, I feel like it's a bit too small)

Do you know what's happening ?

Media requests should also take --pace

Media requests are throttled and eventually denied, but invoking the command again restarts the requests from the first archived post all over again. Pacing these requests might be one way to solve this, or alternatively, storing the last requested attachment would also ease the pain.

Downloading |####                            | 1810/14002                                                                
Not Found: https://social.wxcafe.net/system/media_attachments/files/001/217/485/original/2717e40cab36a23d.png?1539819022 
Downloading |####                            | 1813/14002                                                                
Not Found: https://social.wxcafe.net/system/media_attachments/files/001/217/347/small/c2de73f9f4725c5a.png?1539815814    
Downloading |####                            | 1814/14002                                                                
Not Found: https://social.wxcafe.net/system/media_attachments/files/001/217/347/original/c2de73f9f4725c5a.png?1539815814 
Downloading |####                            | 1815/14002Traceback (most recent call last):                              
  File "/usr/lib/python3.6/urllib/request.py", line 1318, in do_open                                                     
    encode_chunked=req.has_header('Transfer-encoding'))                                                                  
  File "/usr/lib/python3.6/http/client.py", line 1239, in request                                                        
    self._send_request(method, url, body, headers, encode_chunked)                                                       
  File "/usr/lib/python3.6/http/client.py", line 1285, in _send_request                                                  
    self.endheaders(body, encode_chunked=encode_chunked)                                                                 
  File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders                                                     
    self._send_output(message_body, encode_chunked=encode_chunked)                                                       
  File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output                                                   
    self.send(msg)                                                                                                       
  File "/usr/lib/python3.6/http/client.py", line 964, in send                                                            
    self.connect()                                                                                                       
  File "/usr/lib/python3.6/http/client.py", line 1392, in connect                                                        
    super().connect()                                                                                                    
  File "/usr/lib/python3.6/http/client.py", line 936, in connect                                                         
    (self.host,self.port), self.timeout, self.source_address)                                                            
  File "/usr/lib/python3.6/socket.py", line 724, in create_connection                                                    
    raise err                                                                                                            
  File "/usr/lib/python3.6/socket.py", line 713, in create_connection                                                    
    sock.connect(sa)                                                                                                     
OSError: [Errno 101] Network is unreachable

Support account migration

In theory we could write a tool that reposts statuses to another instance. We would change all the public statuses to visibility unlisted in order to not pollute the local timeline of the destination instance. But it would still be super annoying for all the people mentioned in the statuses.

Should we export just the last twenty?
Should we limit ourselves to statuses not mentioning anybody?
What about hashtags? We don't want to pollute hashtag searches, I think? Perhaps not a problem if we limit ourselves to twenty statuses?

[Proposal] Prevent specific statuses from being expired

As an user, I would like to expire all statuses older than X, but not a few of them.
In my particular case, I want not to expire some pinned toots, yet expire every statuses around them.

I see several way to do that, more or less specific to this exact scope:

right now I could split the archive into 3 pieces around 1 status (1 before / 1 with the status/ 1 after), but it's really inconvenient
mark certain toots (thanks to their URL/id ?) as "on hold", that can't be deleted
add an option to keep pinned toots. That would be very convenient but very specific, and I don't know if it's doable
allow to expire toots excepts the ones marked as favorite. Not very specific, but not completely compatible with the removal of favorites.

What could be either for you, and how can we help you to implement that if you're willing to ?

`mastodon-backup-to-html.py` dies with `UnicodeError`

When I try to turn the backup (which works fine) to HTML, mastodon-backup-to-html.py dies with UnicodeError :-( I have three accounts, and all of them have the same problem. Full error:

Traceback (most recent call last):
  File "./mastodon-backup-to-html.py", line 294, in <module>
    print(html)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2663' in position 561: ordinal not in range(128)

I have the impression that my main account (in mastodon.social) chokes right away because of my second surname ("Velázquez"). The others seem to choke on concrete toots.

I'm happy to provide the backup JSON file if it helps (it doesn't have any private information, I take it? I haven't sent anything private at least in the most recent account, at least.

Also delete toots

It would be nice if there was an option --retention-period or something like that, so that someone could invoke the program with --retention-period=4w . That would archive all toots that are older than four weeks and then delete them from the instance. Of course, this should also automatically archive the attachments, before deleting the toot.

Archive bookmarks

It seems like bookmarks are not archived. Toots that are also favourited have the bookmark property set in the json archive but those that are only bookmarked are not present.
Also, it probably makes sense to add a bookmarks collection for the html and text generation.

Report which users have gone silent

I'd like to see who among the people I am following has fallen silent for a customizable time window, say eight weeks. In addition to that, I'd love to unfollow these people with some command. In order to support manual editing of this list, I think it would be nice if we used stdin and stdout. Maybe like this?

mastodon-archive inactive-users --silent-for 8 [email protected] | mastodon-archive unfollow --stdin [email protected]

Split media directory into two

Privacy: can't publish only my toots without media from my favorites.

Backup favorites

And integrate into HTML export, or create two HTML exports

Database rotation

As far as I can tell, the database will just grow without bounds, even if one expires old posts. This is what I want, since I am using mastodon-archive to archive old posts before expiring them. But the file will become unwieldy, and I imagine the program will eventually start running out of memory.

I'm not sure what the best approach is. I'm thinking that what I'd like is for the monotonically growing bits like statuses, favorites, mentions, and media to be split up somehow by date, with periodic snapshots of the other data. Since the data is already in a pretty simple format, this is already pretty easily doable with external tools, but it seems like the sort of thing that should be built into the software itself.

Media download fails roughly halfway through

The download tends to fail after about 1400-1600 downloads, and doesn't retain the ones it succeeded in downloading on retry:

mastodon-archive media [email protected]
3352 urls in your backup (half of them are previews)
Downloading |###############                 | 1599/3352Traceback (most recent call last):
  File "/usr/local/bin/mastodon-archive", line 9, in <module>
    load_entry_point('mastodon-archive', 'console_scripts', 'mastodon-archive')()
  File "/home/user/mastodon-backup/mastodon_archive/__init__.py", line 65, in main
    args.command(args)
  File "/home/user/mastodon-backup/mastodon_archive/media.py", line 68, in media
    download.start(blocking = False)
  File "/usr/local/lib/python3.5/dist-packages/pySmartDL/pySmartDL.py", line 250, in start
    urlObj = urllib.request.urlopen(req, timeout=self.timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 466, in open
    response = self._open(req, data)
  File "/usr/lib/python3.5/urllib/request.py", line 484, in _open
    '_open', req)
  File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 1297, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.5/urllib/request.py", line 1257, in do_open
    r = h.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.5/ssl.py", line 929, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.5/ssl.py", line 791, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 575, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

General API Problem on use

i followed the instructions up to running the first backup run, and ran into an error after a short while:

./mastodon-backup.py [email protected]
Get user info
Get statuses (this may take a while)
Traceback (most recent call last):
  File "./mastodon-backup.py", line 79, in <module>
    first_page = statuses)
  File "/usr/local/lib/python3.5/dist-packages/mastodon/Mastodon.py", line 907, in fetch_remaining
    current_page = self.fetch_next(current_page)
  File "/usr/local/lib/python3.5/dist-packages/mastodon/Mastodon.py", line 866, in fetch_next
    return self.__api_request(method, endpoint, params)
  File "/usr/local/lib/python3.5/dist-packages/mastodon/Mastodon.py", line 1124, in __api_request
    raise MastodonAPIError('General API problem.')
mastodon.Mastodon.MastodonAPIError: General API problem.

Invalid scope when re-authorizing for expire

I'm trying to expire my old toots from mastodon.technology with the following command:

mastodon-archive expire --older-than 4  [email protected] --confirmed

and when I follow the URL for authorization:

https://mastodon.technology/oauth/authorize?client_id=CENSORED&response_type=code&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=read+write

the instance shows me a message saying "The requested scope is invalid, unknown, or malformed." Changing the scope to just "write" results in the same error, though changing it to "read" makes it work.

Toots not deleted?

Hello @kensanata ,
I tried the command "expire" tonight (with the options --older-than 15 and --confirmed). Everything seemed to work well, the script processed during around 6 hours without any error mentioned. But when I visited my account this morning... every toots are still there! Did I miss something?
Note: I did archive my toots before expiring them.

Option to avoid expiring pinned toots

I use pinned toots to add stuff I'd otherwise put in my profile but won't fit. I don't want to expire these. It would be nice to have an option to never expire them.

Coloured text search?

The only thing I can do now is by piping the output into | pygmentize -s -l md, but it is not colourful enough :)

Travis CI and PyPI

Release 1.2 uses Travis CI to upload a package called mastodon-archive to PyPI but unfortunately it doesn't work. Travis says:

writing manifest file 'MANIFEST'
creating mastodon_archive-0.0.1
creating mastodon_archive-0.0.1/mastodon_archive
making hard links in mastodon_archive-0.0.1...
hard linking setup.py -> mastodon_archive-0.0.1
hard linking mastodon_archive/__init__.py -> mastodon_archive-0.0.1/mastodon_archive
hard linking mastodon_archive/archive.py -> mastodon_archive-0.0.1/mastodon_archive
hard linking mastodon_archive/html.py -> mastodon_archive-0.0.1/mastodon_archive
hard linking mastodon_archive/media.py -> mastodon_archive-0.0.1/mastodon_archive
hard linking mastodon_archive/text.py -> mastodon_archive-0.0.1/mastodon_archive
creating dist
Creating tar archive
removing 'mastodon_archive-0.0.1' (and everything under it)
Uploading distributions to https://upload.pypi.org/legacy/
Uploading mastodon_archive-0.0.1.tar.gz
UnicodeEncodeError: latin-1

No stash entries found.


Done. Your build exited with 0.

And yet, I cannot find it. Any ideas?

Incremental backup

Fetching new pages ourselves and only fetch a new page if we haven't seen any known ids.

Delete media from mentions

I downloaded my archive with my toots + mentions.
Then I downloaded the media.
The issue is that the media archive contains the media from the mentions.
I would like to get rid of them.

Is it possible to add an option to download/remove the media from the mentions ?

Archive broken after interrupt

My archive seems to be broken.
I launched a mastodon-archive command by mistake, and interrupted it during the Loading archive step.
Now running any mastodon-archive command (including archiving) raise an error:

File "/home/myaccount/.local/bin/mastodon-archive", line 10, in <module>
   sys.exit(main())
File "/home/myaccount/.local/lib/python3.7/site-packages/mastodon_archive/__init__.py", line 259, in main
   args.command(args)
File "/home/myaccount/.local/lib/python3.7/site-packages/mastodon_archive/expire.py", line 71, in expire
   data = core.load(status_file, required = True)
File "/home/myaccount/.local/lib/python3.7/site-packages/mastodon_archive/core.py", line 190, in load
   return json.load(fp)
File "/usr/lib/python3.7/json/__init__.py", line 296, in load
   parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
   return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
   obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
   obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1870349 column 30 (char 89941590)

How can I fix this ?

Step: Making a backup

Just to make this official. ;)

I'm using macOS. I've installed Python 3.6.3.

I run into problems early on when trying to authorize with the account. Here's the commands history:

$ python3
Python 3.6.3 (v3.6.3:2c5fed86e0, Oct  3 2017, 00:32:08) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> ^D

$ pip3 install Mastodon.py
Collecting Mastodon.py
  Downloading Mastodon.py-1.1.2-py2.py3-none-any.whl
... etc ...
Successfully installed Mastodon.py-1.1.2 certifi-2017.11.5 chardet-3.0.4 idna-2.6 python-dateutil-2.6.1 pytz-2017.3 requests-2.18.4 six-1.11.0 urllib3-1.22

$ ./mastodon-backup.py [email protected]
-bash: ./mastodon-backup.py: No such file or directory

First question is, where is mastodon-backup.py supposed to be installed?

Command "python setup.py egg_info" failed with error code 1

Hello kensanata,
I tried to install with the command: pip3 install mastodon-archive
But I got this error message:

Collecting mastodon-archive
  Using cached https://files.pythonhosted.org/packages/b9/ac/ac703bd0a8aac433c00fea2f3c360542b6cc897ad741cedf30df43d04b4c/mastodon_archive-1.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ModuleNotFoundError: No module named 'setuptools'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-4kf7emri/mastodon-archive/

Can you help me?

Sorting

Reverse the sort order for text?

mastodon-archive media: TypeError: can only concatenate str (not "bytes") to str

$ mastodon-archive media [email protected]
1132 urls in your backup (half of them are previews)
Downloading |###                             | 110/1132Traceback (most recent call last):
  File "/usr/local/bin/mastodon-archive", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/mastodon_archive/__init__.py", line 293, in main
    args.command(args)
  File "/usr/local/lib/python3.7/dist-packages/mastodon_archive/media.py", line 66, in media
    file_name = media_dir + path
TypeError: can only concatenate str (not "bytes") to str

This is with 1.3.1 installed today via pip3 .

Polls not shown in text and html

Polls are archived and contained in the json file but are not visible in html and text outputs

Could you release a windows executable that includes Python？

use PyInstaller or py2exe, just like youtube-dl.
so I can use it in cmd:

$ mastodon-archive.exe text [email protected] > statuses.txt
$ mastodon-archive html [email protected]

References:

UnicodeEncodeError when generating text or HTML output

Just to report it.

$ python3 mastodon-backup-to-text.py [email protected]
Traceback (most recent call last):
  File "mastodon-backup-to-text.py", line 87, in <module>
    status["created_at"]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)

Add documentation for how to use jq to report on stuff

jq "can mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler than you’d expect."

Split command doesn't exist ?

mastodon-archive split --older-than=100 [email protected]

mastodon-archive: error: invalid choice: 'split' (choose from 'archive', 'media', 'text', 'context', 'html', 'expire', 'report', 'followers', 'following', 'mutuals', 'whitelist', 'login')

What's going on ?

Version 1.2.0

Use media backup in HTML file

Bad Gateway causes failure to retrieve _all_ statuses

A bad gateway error during the status retrieval process causes the entire process to hault and all statuses retrieved to be lost.

Traceback (most recent call last):
  File "/usr/bin/mastodon-archive", line 11, in <module>
    load_entry_point('mastodon-archive==1.3.0', 'console_scripts', 'mastodon-archive')()
  File "/usr/lib/python3.7/site-packages/mastodon_archive/__init__.py", line 293, in main
    args.command(args)
  File "/usr/lib/python3.7/site-packages/mastodon_archive/archive.py", line 131, in archive
    first_page = statuses)
  File "/usr/lib/python3.7/site-packages/mastodon/Mastodon.py", line 2977, in fetch_remaining
    current_page = self.fetch_next(current_page)
  File "/usr/lib/python3.7/site-packages/mastodon/Mastodon.py", line 2936, in fetch_next
    return self.__api_request(method, endpoint, params)
  File "/usr/lib/python3.7/site-packages/mastodon/Mastodon.py", line 3307, in __api_request
    error_msg)
mastodon.Mastodon.MastodonBadGatewayError: ('Mastodon API returned error', 502, 'Bad Gateway', None)

Media missing for favourites

In the HTML output, pictures attached to boosted toots are shown but those attached to favourited ones are not. Instead, a little question mark is shown. Instead of a file: uri, it points to a string like this:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADIAAAAyCAYAAAAeP4ixAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4QsUETQjvc7YnAAAACZpVFh0Q29tbWVudAAAAAAAQ3JlYXRlZCB3aXRoIEdJTVAgb24gYSBNYWOV5F9bAAADfElEQVRo3u1ZTUhUURg997sPy9TnDxk4IeqkmKghCKLBLNq2MCXQTXsXomFMIdK0EIoIskWLcNVKaFEulBIzxEgQLBRFXZmiaIOkboSe2Lv3tsg3+Hr96Mx7OhNzl/PuvHfPPd/5zvfdy5719Cj8B4Pwn4wkkCSQRAGilIIQ4q9zpJSQUrr6Xc2tF0kpQURgjGF2fh4zc3P4vLyMza0t27xzubkoKy1FoLYW532+yP/iBohhGHjR349P09M/qSaCUs7M/nVzE1vb23g/Po6zOTkItrcjPS0NnPOYvs9i9RGlFJ739eHj1BSI6EghY81va2nBxZKSmJiJidPvpon+wUHMzs1FwgsAGGN/3rkDz6z5T3t7sR4O/1NbnoUWA3AqJQXm/gKsHS7Iz0egrg7lZWXI1PUI6PmFBbwcGMDW9rbjXQ8eP8aznp4T1AhjICIIIVBVWYkbzc1IPX0apmnaQkXjHJXl5ai6dAmvBgbwbmzMwdSHiQlcrqmJSi+uiD0jPR3BtjZkZ2VFwkXTNMdCrQVer6/H6toaFpeWbJqamJxEoK7u+DVCRKiqqMD9UCgSQocRrJQSTQ0NNhBKKSyvrJyM2IkIvry8QwM4yM55ny++nD2alPm3rHbihnjUKiC8seEAl52VlVhFIxFhcGjIxiZjDDXV1VF7ybEz8t008SUcxsy+iR5k6drVq78ta+KOEdM0sbe3h4dPnjieNTU2QggRtX6OjREhBKSUCN69C8ZYZOeJCBeKinAlEIh/sVs7fbOz0waCc46c7Gzcam2N/w7RAtF2546ttOecI1PX0d3VFbUujg2IlBKcc7QGg7Zql3MOPSMD90MhKKVc8RVPgRARbodC4JzbQJxJTcWDe/dcA+GpRqSUeD08jG+GYauphBB41N3tWovrOSNEhDcjI46OsbOjw1Hixy0jUkqsh8OOEkTXdRTk5yfOuZZSCqtraw4gZSUliXVAp5TCzs6Oo5bK1PWY+vIT0Yhpmo6MpGmaK54RN9Vv4gPxoKHyViMAjN1dmx6EEDAMA17dKrHkjVWcDU9LlF9dnYhcd3TPnX1xacl2AEdEKPb7Uez3ewLGUyBvR0cj58La/imjv7AwcYBYYEwhbJlLKpUUezJrudGPaAeuBzTOQR46u+YViGK/39anW78lVPq1Fu0vLExsH/F60cmslQSSBHL08QPK53LVsfanXQAAAABJRU5ErkJggg==

See the following screenshot from the favourites collection for an example. The top post was also boosted so the image is there but the bottom one was only favourited.

The base64-encoded image is the little questionmark itself so I guess that the media just isn't downloaded when creating the archive.

Argument of NoneType is not iterable

Installing via pip3 install mastodon, and invoking like:

~/.local/bin/mastodon-archive replies --pace "$1" || exit 1

Returns the error

Get user info
Traceback (most recent call last):
  File "/home/foo/.local/bin/mastodon-archive", line 11, in <module>
    sys.exit(main())
  File "/home/foo/.local/lib/python3.6/site-packages/mastodon_archive/__init__.py", line 293, in main
    args.command(args)
  File "/home/foo/.local/lib/python3.6/site-packages/mastodon_archive/replies.py", line 58, in replies
    if collection not in data:
TypeError: argument of type 'NoneType' is not iterable

Error is also present when installing via python3.8 -m pip install mastodon_archive, and invoking via python3.8 ~/.local/bin/mastodon-archive replies --pace "$1" || exit 1

This error is present both on Alpine edge and Ubuntu 18.04.4.

Linking back to the original for context

Ed Davies wrote: Any way to link back to the “real” toot? E.g., to see what replies were replying to.

I think for replies, we just get an id, e.g. 99033945374420755. But that's the internal URI, not the visible one. That's why https://mastodon.weaponvsac.space/users/kensanata/statuses/99033945374420755 is an error. You would have to call the API with that id and get the real status, and its URL. I guess we could link the timestamp to the original toot, though, i.e. https://mastodon.weaponvsac.space/users/kensanata/statuses/99035442420065671. We should totally do that.

Then again, this won't help you if the original site goes down, or you delete your account after making the backup. After all, the point of the backup is that you need it when you lose access to the original. That's why I feel it might make sense to actually retrieve all the statuses you have replied to and store them somewhere. Except that this is going to require many requests and that's not cool.

Once we go down that line of thought, we might as well request the context of every toot. Tempting!

What is your name doing here?

https://github.com/kensanata/mastodon-backup/blob/54788c7ff824a5b8d48eb378619bacf8830d4958/mastodon_archive/core.py#L246

permissions of secrets

Do you think permissions on the files containing the tokens should be tighter? We could make sure Files are only readable and writeable by the user. Or we could go further and encrypt the files? I’m not sure what we want to do, for now. If you have an opinion on this, please comment.

--pace not working (since 1.3.0 ?)

The --pace option is no longer working for me, I'm not sure but maybe it's the case since my upgrade to 1.3.0.
And I'm using the same command for months, so it's not related to my way to write it (--pace at the end of the command, but I tried other locations)

Report media and tags on boosted toots

The current report functionality reports the number of media attachments and the tags used in your toots, not in the toots you boosted.

I don't know if this should change by default, or only as an option. I think this depends a lot on the user: for me, 3/4 of my timeline consists of boosted toots, so I want this as a default. On the other hand, their might be people who see it the other way around.

I also don't know if these stats should be merged with the stats of your own toots. Maybe it should report three things: media in your toots, media in boosted toots and a total number of media attachments. Same for tags: most used tags in your toots, most used tags in boosted toots and most used tags in the combination of those two.

Unordered output after incremental backups

When making an incremental backup a few days after the first one, the new toots are appended to the json archive. They also appear at the end of the text and html output even if the original order had been from newest to oldest.
The straight forward solution would probably to just sort by toot date but I'm unsure how well that scales when the archive is very big.

Pagination for HTML output

Currently most users don't have many toots on their account so one single page HTML is enough for viewing all the contents, but it may not work well with a large number of toots. I have around 5000 toots now and it's already quite a long page. I think pagination would help browsing a large archive, such as every 3000 toots per page.

More stuff to report

@[email protected] says: "How about a media counter how many media file were uploaded? Or if possible a hashtag counter, what hashtags one uses the most?"

media fetching sometimes fails

I found another issue with media fetching where sometimes a URLError or HTTPError are thrown when trying to request a URL. here's a quick patch i did:

diff --git a/mastodon_archive/media.py b/mastodon_archive/media.py
index cf5a9d7..a1da360 100644
--- a/mastodon_archive/media.py
+++ b/mastodon_archive/media.py
@@ -19,6 +19,8 @@ import sys
 import json
 import time
 import urllib.request
+from urllib.error import HTTPError
+from urllib.error import URLError
 from progress.bar import Bar
 from urllib.parse import urlparse
 
@@ -70,9 +72,14 @@ def media(args):
                     url, data=None,
                     headers={'User-Agent': 'Mastodon-Archive/1.1 '
                              '(+https://github.com/kensanata/mastodon-backup#mastodon-archive)'})
-                with urllib.request.urlopen(req) as response, open(file_name, 'wb') as fp:
+                try:
+                  with urllib.request.urlopen(req) as response, open(file_name, 'wb') as fp:
                     data = response.read()
                     fp.write(data)
+                except HTTPError as he:
+                  print("\nFailed to open " + url + " during a media request.")
+                except URLError as ue:
+                  print("\nFailed to open " + url + " during a media request.")
             except OSError as e:
                 print("\n" + e.msg + ": " + url, file=sys.stderr)
                 errors += 1

kensanata / mastodon-archive Goto Github PK

mastodon-archive's Introduction

Hello!

mastodon-archive's People

Contributors

Stargazers

Watchers

Forkers

mastodon-archive's Issues

Recommend Projects

Recommend Topics

Recommend Org