sibusten / derpibooru-downloader Goto Github PK
View Code? Open in Web Editor NEWA downloader for imageboards running Philomena, such as Derpibooru.
License: MIT License
A downloader for imageboards running Philomena, such as Derpibooru.
License: MIT License
Hi, I run the site this downloader is meant for.
As far as I can tell, your downloader fetches image API metadata by paging into the result set with the page
parameter.
derpibooru-downloader/DerpibooruDownloader/downloadmanager.cpp
Lines 176 to 179 in b89b8e6
derpibooru-downloader/DerpibooruDownloader/downloadmanager.cpp
Lines 309 to 310 in b89b8e6
This is an extremely inefficient way to get results from the API, because it requires the search engine to load page
*per_page
results into memory and sort all of them, again in memory. I regularly see requests at pages far beyond what should be requested in the course of normal operation.
Please change this program to page by specifying either the maximum image ID or maximum created_at date for each request, using a query string such as id.lt:1234
or created_at.lt:2019-06-18T00:00Z
. You can find the maximum ID or created_at value in the last image in the sort order of each request.
This improved method only requires loading per_page
results into memory and sorting them, leading to faster response times for you, and less server load for me.
Hello! Will other Philomena boorus be supported in the future as well, or will this project stay Derpibooru exclusive?
Здравствуйте. У меня есть список любимых художников, их около четырехсот. Раньше я загружал их другим загрузчиком, каждого в свою папку. Но так как он больше не поддерживается, решил попробовать ваш загрузчик. Как мне автоматизировть весь этот процесс? Неужели мне придется каждого отдельного художника через пресет вручную загружать?
The tool looks nice, but unfortunately crashes on starting (Win 7, ZIP).
0xc000007b
Any software required otherwise?
I keep trying to work out I'm suppose to do but I can't figure it out. How am I suppose to apply my derpibooru account filters to the derpibooru downloader? My account is on the .org version of derpibooru for the record.
I do have a quick question.. Does this program detect not-yet-rendered files that just got uploaded? I have a thing that downloads all the images as they come out and a lot of files turn out like this instead of what the image is supposed to be:
Thank you!
Originally posted by @themaneiac in #13 (comment)
I get that error with both my old installation and a fresh installation when I try to download something.
It appears that Derpibooru has started to have Cloudflare Captcha screens while trying to download.
Is there a way to add a way to deal with the captcha?
Hello, I noticed while using this program that is seems to wait for one picture to finish downloading before it begins the next. This is terribly inefficient and causes outrageous wait times to try to download anything more than a few hundred pictures. This could be solved by creating batches of asynchronous requests, say keeping 10 open at a time just to pick a number, so that we can download more efficiently. Thank you for your work!
After downloading items through the downloader, the metadata is saved in JSON files.
Is it possible to put all the image-tag data into a Database for ease of search (instead of just flipping through the deluge of JSON files)?
Hello! Thank you for your hard work on creating this downloader project. :D
I'm running into a fatal error when attempting to connect to the Derpibooru API using the downloader:
[05:28:05] - Network Error - c - Unable to init SSL Context:
Since this is a fully portable program it'd make more sense to save the settings into a local file instead of the registry. If it's due to security concerns you could save the API key into registry but everything else into a normal file so it'd still stay portable without any security concerns.
Program has crashed on me several times after extended use (anywhere from one day to several) and running many downloads over that time.
Unsure what is causing it, could be a memory leak or some other unhandled exception. Process does seem to grow in memory after each run.
Will have to look more into it sometime.
If I use {name}.{ext} for this picture https://derpibooru.org/931457, the program will not download it. This has to do with the :| tag, because ":" and "|" are not accepted characters in filenames on Windows. Saving the pic from the website directly will not add that tag to the file's name, can this also be implemented for your program?
Edit: if I set the download filter to :| specifically, 359 images are detected, but by the end only 20 pics are actually saved.
It may seem simple, but this might be a problem if you are arranging massive amounts of time-based images.
There seems to be a problem when it comes to loading preset written in a different language, e.g. artist's name is written in Korean or in Japanese kanji. The image can be downloaded just fine into the specified artist's folder. But upon loading the preset, the file options are empty.
I am using v1.3.2.
Example tag: artist:부시벅
I was wondering if you can add a way to schedule downloads? For example, allowing the program to be controlled with CMD?
While downloading the last 200,000 images I got 112 errors. 111 were files not found because they were missing a filename and 1 was because it was missing an extension. Is there a way to identify the images and fix the errors?
Missing extension:
[23:07:00] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/6/28/2077126__safe_derpy+hooves_cutie+mark_cutie+mark+only_no+pony_solo_svg_-dot-svg+available_vector. - server replied: Not Found
Missing filenames:
[23:17:56] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/5/15/.png - server replied: Not Found [23:20:09] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/5/6/.jpeg - server replied: Not Found [23:22:58] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/4/25/.jpeg - server replied: Not Found [23:24:30] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/4/20/.jpeg - server replied: Not Found [23:27:34] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/4/9/.jpeg - server replied: Not Found
I can provide the whole log if you want. :)
Been having trouble for a few weeks now. Even with the most basic searches, it just sits "loading" at zero without actually starting the download process.
The program has issues where downloaded images can corrupt. When this happens, the image will be downloaded partially and the rest of it will be black. But I don't have the technical details as to why this happens.
Sometimes a redownload fixes it, but this is really inconvenient with archiving (parts of) the site.
But in some cases it corrupts the same way the second time.
Is there a way to change the location where the images are downloaded?
Filters are working bad.
I tried to download this artist content, he have around 400 pics, but after being processed by my filter, only 320 were left in the webpage.
But when i try to add, the api filter in the app, it only downloaded 53 pics and it stopped. The same if instead of the api i just type the tags i wanna hide in the search queary.
Theres a lot of included pics that arent being downloaded.
And if i use my user api, it just start to download EVERYTHING, even the pics i dont want. Its weird considering i have my filter on.
I was wondering if it's possible to have a input list? Like if I have 10 images, and I know all 10 ids, put them in 'input.txt' or something, and it'll download those 10, is that possible, or do I need to keep looking? (Is redownloading all images I have to highest quality, 500+)
When I create a task to only update jsons it updates about 1 per second. Things are really fast if I have it download both images and jsons, though. If there is an easy fix that would be great. 😄
It would keep my tasks from downloading images at the same time at a couple points each day (which might cause corruption).
It does this in both the GUI and CMD versions.
Network Error - 302 - Error transferring https://www.derpibooru.org/search.json?q=test&page=1&perpage=50&sf=created_at&sd=desc&key=[omitted] - server replied: Bad Request
Error occurs with all queries.
Site is up and loads without issues.
API key is provided to the downloader.
Would it be possible to add the artist as an option when naming files? I would like to save images in the format of Downloads/{artist}/{id}.{ext}
. Using post 8584 as an example, it would download the image to Downloads/kloudmutt/8584.jpg
.
I had it scheduled to run at midnight every night but it seems it didn't download based on my current profile :/
It downloaded 104 GB in C:\Windows\System32\Downloads before I found out it was still running.
It downloaded just the images without the json files like the built in default profile.
Did I do something wrong?
If it used my custom profile it would have only downloaded yesterdays images since I already have all of Derpibooru downloaded.
issues on startup.
I unzipped the file and then had to restore it after my AV found it somehow as a virus.
then nothing, no GUI no nothing.
It's not running anything shows on Task Manager.
putting in tags no longer downloads anything. is there something preventing things from being downloaded?
hi, I love the program.
I was hoping to ask about maybe improving the file system, as in most Windows installs I think there's a * wild function.
the idea is to have a star mean you like some logic to be done.
so say I have a folder for Sunset and Starlight Glimmer.....it will know I already downloaded stuff from there
It's even better if it uses some tag logic to figure out if it's a solo work or not.
so I don't download the same photo every time.
what....I don't have that much memory and it has an issue loading after a folder gets maybe 200 photos in it.
more when it's around 10K
ponerpics.org's json responses use relative URIs instead of URIs with a FQDN, which breaks the downloader. Their view_url's are things like "/img/view/2012/1/2/1__safe_[...]" instead of "https://derpicdn.net/img/view/2012/1/2/1__[...]".
Sample output of https://ponerpics.org/api/v1/json/images/1 for illustration:
{"image":{"legacy_faves":1377,"derpi_faves":1377,"tag_count":37,"deletion_reason":null,"combined_score":1996,"aspect_ratio":1.0,"legacy_score":1996,"width":900,"orig_sha512_hash":null,"description":"","representations":{"full":"/img/view/2012/1/2/1.png","large":"/img/2012/1/2/1/large.png","medium":"/img/2012/1/2/1/medium.png","small":"/img/2012/1/2/1/small.png","tall":"/img/2012/1/2/1/tall.png","thumb":"/img/2012/1/2/1/thumb.png","thumb_small":"/img/2012/1/2/1/thumb_small.png","thumb_tiny":"/img/2012/1/2/1/thumb_tiny.png"},"size":812228,"mime_type":"image/png","animated":false,"derpi_score":1996,"score":12,"name":"1__safe_fluttershy_solo_cloud_happy_flying_upvotes+galore_artist-colon-speccysy_get_sunshine","combined_faves":1377,"height":900,"sha512_hash":"e31a01e5df99a0d7a0f036f2ffb3c7e1abda9996cb00938e0c9978069073c2af1928b2be79728ffa1221f14143166ac97df22857447afd32e3b7146d3b82f66e","tag_ids":[1458,15442,23275,23294,24249,24672,26776,27141,27724,27764,29630,33258,33983,34506,34678,36872,37319,38185,40482,41700,41916,42350,43338,43567,43587,46526,47596,80809,83982,93524,182100,187857,227349,321220,364605,449198,999999],"format":"png","hidden_from_users":false,"combined_downvotes":17,"wilson_score":0.6439530063653618,"legacy_downvotes":17,"updated_at":"2020-07-29T19:10:16","created_at":"2012-01-02T03:12:33","downvotes":0,"combined_upvotes":2013,"view_url":"/img/view/2012/1/2/1__safe_fluttershy_solo_female_pony_mare_pegasus_smiling_cute_wings_eyes+closed_spread+wings_flying_happy_cloud_signature_dead+source_sky_shyabetes_o.png","source_url":"https://speccysy.deviantart.com/art/Afternoon-Flight-215193985","uploader_id":2,"thumbnails_generated":true,"derpi_downvotes":17,"first_seen_at":"2012-01-02T03:12:33","duplicate_of":null,"duration":null,"legacy_upvotes":2013,"faves":7,"processed":true,"id":1,"comment_count":1,"intensities":{"ne":71.08886,"nw":79.232849,"se":70.149523,"sw":72.540942},"upvotes":12,"spoilered":false,"tags":["artifact","artist:speccysy","cloud","cloudy","cute","dead source","eyes closed","female","fluttershy","flying","happy","long hair","mare","messy mane","milestone","outdoors","pegasus","pony","safe","signature","sky","solo","stretching","sunlight","sunshine","upside down","wings","shyabetes","sweet dreams fuel","weapons-grade cute","smiling","spread wings","index get","derpibooru legacy","first fluttershy picture on derpibooru","one of the first","imported from derpibooru"],"uploader":"Derpi Imported","derpi_upvotes":2013},"interactions":[]}
When this happens, the following stack trace is printed before exit:
Unhandled exception: Dasync.Collections.ParallelForEachException: One or more errors occurred. (Invalid URI: The format of the URI could not be determined.)
---> System.UriFormatException: Invalid URI: The format of the URI could not be determined.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.Uri..ctor(String uriString)
at Sibusten.Philomena.Client.Images.PhilomenaImage.get_Name()
at Sibusten.Philomena.Downloader.ImageDownloader.GetFileForPathFormat(IPhilomenaImage image, String filePath, Boolean isSvgImage)
at Sibusten.Philomena.Downloader.ImageDownloader.GetFileForImage(IPhilomenaImage image)
at Sibusten.Philomena.Client.Images.Downloaders.PhilomenaImageFileDownloader.Download(IPhilomenaImage downloadItem, CancellationToken cancellationToken, IProgress`1 progress)
at Sibusten.Philomena.Client.Images.Downloaders.SequentialPhilomenaImageDownloader.Download(IPhilomenaImage image, CancellationToken cancellationToken, IProgress`1 progress)
at Sibusten.Philomena.Client.Images.Downloaders.ConditionalImageDownloader.Download(IPhilomenaImage downloadItem, CancellationToken cancellationToken, IProgress`1 progress)
at Sibusten.Philomena.Client.Images.Downloaders.SequentialPhilomenaImageDownloader.Download(IPhilomenaImage image, CancellationToken cancellationToken, IProgress`1 progress)
at Sibusten.Philomena.Client.Images.Downloaders.ConditionalImageDownloader.Download(IPhilomenaImage downloadItem, CancellationToken cancellationToken, IProgress`1 progress)
at Sibusten.Philomena.Client.Images.Downloaders.SequentialPhilomenaImageDownloader.Download(IPhilomenaImage image, CancellationToken cancellationToken, IProgress`1 progress)
at Sibusten.Philomena.Client.Images.Downloaders.ParallelPhilomenaImageSearchDownloader.<>c__DisplayClass5_0.<<BeginDownload>b__1>d.MoveNext()
--- End of inner exception stack trace ---
at Sibusten.Philomena.Client.Images.Downloaders.ParallelPhilomenaImageSearchDownloader.BeginDownload(CancellationToken cancellationToken, IProgress`1 searchProgress, IProgress`1 searchDownloadProgress, IReadOnlyCollection`1 individualDownloadProgresses)
at Sibusten.Philomena.Downloader.ImageDownloader.StartDownload(CancellationToken cancellation, IImageDownloadReporter downloadReporter)
at Sibusten.Philomena.Downloader.Cmd.Commands.DownloadCommand.DownloadCommandFunc(DownloadArgs args)
at System.CommandLine.Invocation.CommandHandler.GetResultCodeAsync(Object value, InvocationContext context)
at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseParseErrorReporting>b__21_0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass25_0.<<UseVersionOption>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass23_0.<<UseTypoCorrections>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__22_0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseParseDirective>b__20_0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseDebugDirective>b__11_0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__10_0>d.MoveNext()
--- End of stack trace from previous location ---
at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass14_0.<<UseExceptionHandler>b__0>d.MoveNext()
While the downloader is running, the ETA status bar doesn't update at all and remains at --:--:--
the 'Images Per Minute' bar still works and the total image count does.
I don't know what went wrong, but I have to fall back to 1.4.1.
1.4.3:
1.4.1:
also, I already check the SSL is working again but still #20 (comment)
Hello!
In recent days an error started to occur whenever I want to use the downloader.
Network Error - 302 - Error transferring https://derpibooru.org/search.json?q=batpony&page=1&perpage=50&sf=score&sd=desc&filter_id=100073 - server replied: Bad Request
What I found is that when I remove the '.json' part of the link, it opens in browser without any issues.
I suspect that derpi might have changed a thing on their side, but I'm not sure. I'm also not sure if I'm the only one experiencing this issue.
Using a tag in the search query with a "&" symbol in it doesn't work properly.
For example, it's impossible to search for "q&a" since is interpreted as "q,a".
Like the program so far, but I have a small question.
say I want to run this every night and only download the new photos that match my request.
so say I have 1,000 photos on Monday
say I start it the next day.
will it redownload all the photos again or just start from 1,001?
If it's within the page offset then I can wait for the next stable patch....currently on 2.1.0
I want to download all pictures from one artist.
If I do that manually, then I can for example go to:
https://derpibooru.org/2052038
Then click Download to get the largest image size.
Repeat for other images
Does this program extract the largest image size too? If so that would be a great help.
When I first used the downloader it worked perfectly. But after turning my machine off it now won't work except for when there's no tags selected. Then it starts downloading everything. I don't know what's wrong. I put the tags in the box, hit the button, there's a brief flash of a progress bar. And then nothing. I'm unsure as to what's wrong or how to fix it. Any help with this? I've tried both versions of the downloader and have tried deleting and re-downloading them a few times each now.
I am using version 2.0.0 and have tried with both the GUI and CMD line options. The downloader will get stuck repetitively trying to download the first page of images. If I stop the download and change the 'Start page', to 2, it will then download only the second page and again get stuck looping through those images.
Would it be possible to have an option to add leading zeroes to IDs when saving files? I imagine it could work similar to the ID floor option, where a symbol could indicate the minimum length of the ID. For example, an option such as Downloads/{%6}.{ext}
would result in filenames such as 000123.jpg
, 001234.jpg
, 1234567.jpg
.
Note that there is no (known) way to preserve the order of images when using Random, so only one page can be guaranteed to be truly random.
Hi! I'm a Derpibooru site developer, and recently pushed an update to resolve this. To get a predictable order in a random search, you can now provide a numeric seed in the sf-argument like so:
sf=random:7213143
Another seed will produce a different but consistent order.
When the user sets a booruUrl with www. in the UI but the philomena website has a http redirect to the domain without www, querying the API for search results fails silently.
It would be nice if the app would support HTTP 301 or HTTP 302 responses and follow them to the correct search api page, or at least handle such responses by properly handling them.
Would it be possible to use a post's rating as an option when naming files? This would allow users to save images in formats such as Downloads/{rating}/{id}.{ext}
or Downloads/{rating}_{id}.{ext}
. Using post 8584 as an example, it would download the image to Downloads/safe/8584.jpg
or Downloads/safe_8584.jpg
.
In the cases where multiple ratings are present, I believe that it should take the format of ratingA+ratingB
. Where ratingA is one of the mutually exclusive tags safe/suggestive/questionable/explicit
and ratingB (and ratingC and so on) are comprised of the non-mutually exclusive rating tags semi-grimdark/grimdark/grotesque
.
When I use the page offset in the GUI or through the command line, the program appears to ignore it, always starting download with the first image in the search results on the first page. My present work around is to pass in a search query that includes the id field, ex: "id.lt:300000" to get results from latter pages.
Thank you for your work on this tool, it is incredibly convenient.
Derpibooru has migrated away from Rails and released their new server back-end, Philomena, that includes breaking API changes.
https://derpibooru.org/forums/meta/topics/philomena-open-beta-breaking-api-changes
API changes
In the interest of preserving API compatibility, we will attempt to rewrite requests to our exisitng Rails JSON API for as long as is reasonably feasible after the migration (~ several months). We hope most applications will be able to migrate over to the new API format before we remove the Rails app completely.
As before, you can use the filter_id parameter to override your current filter, and the key parameter to authenticate yourself to the API. Here are the new API routes:
[example] GET /api/v1/json/images/:image_id
[example] GET /api/v1/json/search
[example] GET /api/v1/json/oembedThere is also a POST route which takes a query parameter named URL. However, it depends on the scraper; do not expect it to work reliably until the scraper is fixed.
POST /api/v1/json/search/reverse
This is a C# based scraper that can only be called from the terminal. If one wants to use this to categorize images ala Python ML, what would be the best course of action?
What the title says.
I can upload the some jsons to prove it if it seems to be working for you.
I noticed a pattern where the downloader will not progress past certain image ids, which also have a 404 Not Found on the download link to the image on the site. So the problem I have with searching through search tag query is that I will be stuck (e.g. 24/620 images, Elapsed 04:20:16) for an indefinite amount of time because the downloader can't download the image it is on.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.