Git Product home page Git Product logo

derpibooru-downloader's People

Contributors

sibusten avatar trixiether avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

derpibooru-downloader's Issues

Deep pagination

Hi, I run the site this downloader is meant for.

As far as I can tell, your downloader fetches image API metadata by paging into the result set with the page parameter.

//Download more if needed
//This will automatically call getMetadataResults when it is finished.
QUrl searchUrl = DerpiJson::getSearchUrl(searchSettings);
metaDownloader.download(searchUrl);

//Increment page by 1 in the searchSettings
searchSettings.page++;

This is an extremely inefficient way to get results from the API, because it requires the search engine to load page*per_page results into memory and sort all of them, again in memory. I regularly see requests at pages far beyond what should be requested in the course of normal operation.

Please change this program to page by specifying either the maximum image ID or maximum created_at date for each request, using a query string such as id.lt:1234 or created_at.lt:2019-06-18T00:00Z. You can find the maximum ID or created_at value in the last image in the sort order of each request.

This improved method only requires loading per_page results into memory and sorting them, leading to faster response times for you, and less server load for me.

Downloading a list of artists

Здравствуйте. У меня есть список любимых художников, их около четырехсот. Раньше я загружал их другим загрузчиком, каждого в свою папку. Но так как он больше не поддерживается, решил попробовать ваш загрузчик. Как мне автоматизировть весь этот процесс? Неужели мне придется каждого отдельного художника через пресет вручную загружать?

App crash 0xc000007b

The tool looks nice, but unfortunately crashes on starting (Win 7, ZIP).
0xc000007b

Any software required otherwise?

Program downloading unrendered images

I do have a quick question.. Does this program detect not-yet-rendered files that just got uploaded? I have a thing that downloads all the images as they come out and a lot of files turn out like this instead of what the image is supposed to be:
sample

Thank you!

Originally posted by @themaneiac in #13 (comment)

Cloudflare Captcha

It appears that Derpibooru has started to have Cloudflare Captcha screens while trying to download.
Is there a way to add a way to deal with the captcha?

Asynchronous Requests

Hello, I noticed while using this program that is seems to wait for one picture to finish downloading before it begins the next. This is terribly inefficient and causes outrageous wait times to try to download anything more than a few hundred pictures. This could be solved by creating batches of asynchronous requests, say keeping 10 open at a time just to pick a number, so that we can download more efficiently. Thank you for your work!

Tag Metadata Search/Query

After downloading items through the downloader, the metadata is saved in JSON files.
Is it possible to put all the image-tag data into a Database for ease of search (instead of just flipping through the deluge of JSON files)?

Network Error: Unable to Init SSL

Hello! Thank you for your hard work on creating this downloader project. :D

I'm running into a fatal error when attempting to connect to the Derpibooru API using the downloader:

[05:28:05] - Network Error - c - Unable to init SSL Context: 

Add feature to save to settings file instead of registry

Since this is a fully portable program it'd make more sense to save the settings into a local file instead of the registry. If it's due to security concerns you could save the API key into registry but everything else into a normal file so it'd still stay portable without any security concerns.

Potential memory leak causing crash after extended use

Program has crashed on me several times after extended use (anywhere from one day to several) and running many downloads over that time.
Unsure what is causing it, could be a memory leak or some other unhandled exception. Process does seem to grow in memory after each run.

Will have to look more into it sometime.

Program not downloading certain pics

If I use {name}.{ext} for this picture https://derpibooru.org/931457, the program will not download it. This has to do with the :| tag, because ":" and "|" are not accepted characters in filenames on Windows. Saving the pic from the website directly will not add that tag to the file's name, can this also be implemented for your program?

Edit: if I set the download filter to :| specifically, 359 images are detected, but by the end only 20 pics are actually saved.

Foreign Language Loading

There seems to be a problem when it comes to loading preset written in a different language, e.g. artist's name is written in Korean or in Japanese kanji. The image can be downloaded just fine into the specified artist's folder. But upon loading the preset, the file options are empty.

I am using v1.3.2.

Example tag: artist:부시벅

Way to schedule downloads?

I was wondering if you can add a way to schedule downloads? For example, allowing the program to be controlled with CMD?

Files Missing Names

While downloading the last 200,000 images I got 112 errors. 111 were files not found because they were missing a filename and 1 was because it was missing an extension. Is there a way to identify the images and fix the errors?

Missing extension:
[23:07:00] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/6/28/2077126__safe_derpy+hooves_cutie+mark_cutie+mark+only_no+pony_solo_svg_-dot-svg+available_vector. - server replied: Not Found

Missing filenames:
[23:17:56] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/5/15/.png - server replied: Not Found [23:20:09] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/5/6/.jpeg - server replied: Not Found [23:22:58] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/4/25/.jpeg - server replied: Not Found [23:24:30] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/4/20/.jpeg - server replied: Not Found [23:27:34] - Network Error - 203 - Error transferring https://derpicdn.net/img/view/2019/4/9/.jpeg - server replied: Not Found

I can provide the whole log if you want. :)

No longer downloading.

Been having trouble for a few weeks now. Even with the most basic searches, it just sits "loading" at zero without actually starting the download process.

Image corruption

The program has issues where downloaded images can corrupt. When this happens, the image will be downloaded partially and the rest of it will be black. But I don't have the technical details as to why this happens.

Sometimes a redownload fixes it, but this is really inconvenient with archiving (parts of) the site.

But in some cases it corrupts the same way the second time.

Download Directory

Is there a way to change the location where the images are downloaded?

Error with filters.

Filters are working bad.

I tried to download this artist content, he have around 400 pics, but after being processed by my filter, only 320 were left in the webpage.

But when i try to add, the api filter in the app, it only downloaded 53 pics and it stopped. The same if instead of the api i just type the tags i wanna hide in the search queary.
Theres a lot of included pics that arent being downloaded.

And if i use my user api, it just start to download EVERYTHING, even the pics i dont want. Its weird considering i have my filter on.

Question/Request

I was wondering if it's possible to have a input list? Like if I have 10 images, and I know all 10 ids, put them in 'input.txt' or something, and it'll download those 10, is that possible, or do I need to keep looking? (Is redownloading all images I have to highest quality, 500+)

Downloading/updating jsons without saving images is extremely slow

When I create a task to only update jsons it updates about 1 per second. Things are really fast if I have it download both images and jsons, though. If there is an easy fix that would be great. 😄
It would keep my tasks from downloading images at the same time at a couple points each day (which might cause corruption).
It does this in both the GUI and CMD versions.

Artist File Name Option

Would it be possible to add the artist as an option when naming files? I would like to save images in the format of Downloads/{artist}/{id}.{ext}. Using post 8584 as an example, it would download the image to Downloads/kloudmutt/8584.jpg.

The CMD version didn't go off of my default profile

I had it scheduled to run at midnight every night but it seems it didn't download based on my current profile :/
It downloaded 104 GB in C:\Windows\System32\Downloads before I found out it was still running.
It downloaded just the images without the json files like the built in default profile.
Did I do something wrong?
If it used my custom profile it would have only downloaded yesterdays images since I already have all of Derpibooru downloaded.

philomena-downloader-win-x64

issues on startup.
I unzipped the file and then had to restore it after my AV found it somehow as a virus.
then nothing, no GUI no nothing.

It's not running anything shows on Task Manager.

Downloader no longer works

putting in tags no longer downloads anything. is there something preventing things from being downloaded?

Possible idea.

hi, I love the program.
I was hoping to ask about maybe improving the file system, as in most Windows installs I think there's a * wild function.
the idea is to have a star mean you like some logic to be done.

so say I have a folder for Sunset and Starlight Glimmer.....it will know I already downloaded stuff from there
It's even better if it uses some tag logic to figure out if it's a solo work or not.

so I don't download the same photo every time.
what....I don't have that much memory and it has an issue loading after a folder gets maybe 200 photos in it.
more when it's around 10K

Throws exception on relative URIs in JSON API responses

ponerpics.org's json responses use relative URIs instead of URIs with a FQDN, which breaks the downloader. Their view_url's are things like "/img/view/2012/1/2/1__safe_[...]" instead of "https://derpicdn.net/img/view/2012/1/2/1__[...]".

Sample output of https://ponerpics.org/api/v1/json/images/1 for illustration:

{"image":{"legacy_faves":1377,"derpi_faves":1377,"tag_count":37,"deletion_reason":null,"combined_score":1996,"aspect_ratio":1.0,"legacy_score":1996,"width":900,"orig_sha512_hash":null,"description":"","representations":{"full":"/img/view/2012/1/2/1.png","large":"/img/2012/1/2/1/large.png","medium":"/img/2012/1/2/1/medium.png","small":"/img/2012/1/2/1/small.png","tall":"/img/2012/1/2/1/tall.png","thumb":"/img/2012/1/2/1/thumb.png","thumb_small":"/img/2012/1/2/1/thumb_small.png","thumb_tiny":"/img/2012/1/2/1/thumb_tiny.png"},"size":812228,"mime_type":"image/png","animated":false,"derpi_score":1996,"score":12,"name":"1__safe_fluttershy_solo_cloud_happy_flying_upvotes+galore_artist-colon-speccysy_get_sunshine","combined_faves":1377,"height":900,"sha512_hash":"e31a01e5df99a0d7a0f036f2ffb3c7e1abda9996cb00938e0c9978069073c2af1928b2be79728ffa1221f14143166ac97df22857447afd32e3b7146d3b82f66e","tag_ids":[1458,15442,23275,23294,24249,24672,26776,27141,27724,27764,29630,33258,33983,34506,34678,36872,37319,38185,40482,41700,41916,42350,43338,43567,43587,46526,47596,80809,83982,93524,182100,187857,227349,321220,364605,449198,999999],"format":"png","hidden_from_users":false,"combined_downvotes":17,"wilson_score":0.6439530063653618,"legacy_downvotes":17,"updated_at":"2020-07-29T19:10:16","created_at":"2012-01-02T03:12:33","downvotes":0,"combined_upvotes":2013,"view_url":"/img/view/2012/1/2/1__safe_fluttershy_solo_female_pony_mare_pegasus_smiling_cute_wings_eyes+closed_spread+wings_flying_happy_cloud_signature_dead+source_sky_shyabetes_o.png","source_url":"https://speccysy.deviantart.com/art/Afternoon-Flight-215193985","uploader_id":2,"thumbnails_generated":true,"derpi_downvotes":17,"first_seen_at":"2012-01-02T03:12:33","duplicate_of":null,"duration":null,"legacy_upvotes":2013,"faves":7,"processed":true,"id":1,"comment_count":1,"intensities":{"ne":71.08886,"nw":79.232849,"se":70.149523,"sw":72.540942},"upvotes":12,"spoilered":false,"tags":["artifact","artist:speccysy","cloud","cloudy","cute","dead source","eyes closed","female","fluttershy","flying","happy","long hair","mare","messy mane","milestone","outdoors","pegasus","pony","safe","signature","sky","solo","stretching","sunlight","sunshine","upside down","wings","shyabetes","sweet dreams fuel","weapons-grade cute","smiling","spread wings","index get","derpibooru legacy","first fluttershy picture on derpibooru","one of the first","imported from derpibooru"],"uploader":"Derpi Imported","derpi_upvotes":2013},"interactions":[]}

When this happens, the following stack trace is printed before exit:

Unhandled exception: Dasync.Collections.ParallelForEachException: One or more errors occurred. (Invalid URI: The format of the URI could not be determined.)
 ---> System.UriFormatException: Invalid URI: The format of the URI could not be determined.
   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
   at System.Uri..ctor(String uriString)
   at Sibusten.Philomena.Client.Images.PhilomenaImage.get_Name()
   at Sibusten.Philomena.Downloader.ImageDownloader.GetFileForPathFormat(IPhilomenaImage image, String filePath, Boolean isSvgImage)
   at Sibusten.Philomena.Downloader.ImageDownloader.GetFileForImage(IPhilomenaImage image)
   at Sibusten.Philomena.Client.Images.Downloaders.PhilomenaImageFileDownloader.Download(IPhilomenaImage downloadItem, CancellationToken cancellationToken, IProgress`1 progress)
   at Sibusten.Philomena.Client.Images.Downloaders.SequentialPhilomenaImageDownloader.Download(IPhilomenaImage image, CancellationToken cancellationToken, IProgress`1 progress)
   at Sibusten.Philomena.Client.Images.Downloaders.ConditionalImageDownloader.Download(IPhilomenaImage downloadItem, CancellationToken cancellationToken, IProgress`1 progress)
   at Sibusten.Philomena.Client.Images.Downloaders.SequentialPhilomenaImageDownloader.Download(IPhilomenaImage image, CancellationToken cancellationToken, IProgress`1 progress)
   at Sibusten.Philomena.Client.Images.Downloaders.ConditionalImageDownloader.Download(IPhilomenaImage downloadItem, CancellationToken cancellationToken, IProgress`1 progress)
   at Sibusten.Philomena.Client.Images.Downloaders.SequentialPhilomenaImageDownloader.Download(IPhilomenaImage image, CancellationToken cancellationToken, IProgress`1 progress)
   at Sibusten.Philomena.Client.Images.Downloaders.ParallelPhilomenaImageSearchDownloader.<>c__DisplayClass5_0.<<BeginDownload>b__1>d.MoveNext()
   --- End of inner exception stack trace ---
   at Sibusten.Philomena.Client.Images.Downloaders.ParallelPhilomenaImageSearchDownloader.BeginDownload(CancellationToken cancellationToken, IProgress`1 searchProgress, IProgress`1 searchDownloadProgress, IReadOnlyCollection`1 individualDownloadProgresses)
   at Sibusten.Philomena.Downloader.ImageDownloader.StartDownload(CancellationToken cancellation, IImageDownloadReporter downloadReporter)
   at Sibusten.Philomena.Downloader.Cmd.Commands.DownloadCommand.DownloadCommandFunc(DownloadArgs args)
   at System.CommandLine.Invocation.CommandHandler.GetResultCodeAsync(Object value, InvocationContext context)
   at System.CommandLine.Invocation.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseParseErrorReporting>b__21_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass16_0.<<UseHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass25_0.<<UseVersionOption>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass23_0.<<UseTypoCorrections>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__22_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseParseDirective>b__20_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseDebugDirective>b__11_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__10_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass14_0.<<UseExceptionHandler>b__0>d.MoveNext()

ETA doesn't work

While the downloader is running, the ETA status bar doesn't update at all and remains at --:--:--

the 'Images Per Minute' bar still works and the total image count does.

Bad request

Hello!

In recent days an error started to occur whenever I want to use the downloader.

Network Error - 302 - Error transferring https://derpibooru.org/search.json?q=batpony&page=1&perpage=50&sf=score&sd=desc&filter_id=100073 - server replied: Bad Request

What I found is that when I remove the '.json' part of the link, it opens in browser without any issues.
I suspect that derpi might have changed a thing on their side, but I'm not sure. I'm also not sure if I'm the only one experiencing this issue.

Tags with "&" in them don't work

Using a tag in the search query with a "&" symbol in it doesn't work properly.
For example, it's impossible to search for "q&a" since is interpreted as "q,a".

Question

Like the program so far, but I have a small question.
say I want to run this every night and only download the new photos that match my request.

so say I have 1,000 photos on Monday
say I start it the next day.
will it redownload all the photos again or just start from 1,001?
If it's within the page offset then I can wait for the next stable patch....currently on 2.1.0

Downloader not working.

When I first used the downloader it worked perfectly. But after turning my machine off it now won't work except for when there's no tags selected. Then it starts downloading everything. I don't know what's wrong. I put the tags in the box, hit the button, there's a brief flash of a progress bar. And then nothing. I'm unsure as to what's wrong or how to fix it. Any help with this? I've tried both versions of the downloader and have tried deleting and re-downloading them a few times each now.

Only downloading first page of images

I am using version 2.0.0 and have tried with both the GUI and CMD line options. The downloader will get stuck repetitively trying to download the first page of images. If I stop the download and change the 'Start page', to 2, it will then download only the second page and again get stuck looping through those images.

ID File Name Option for Leading Zeroes

Would it be possible to have an option to add leading zeroes to IDs when saving files? I imagine it could work similar to the ID floor option, where a symbol could indicate the minimum length of the ID. For example, an option such as Downloads/{%6}.{ext} would result in filenames such as 000123.jpg, 001234.jpg, 1234567.jpg.

Preserving order in randomly sorted searches

Note that there is no (known) way to preserve the order of images when using Random, so only one page can be guaranteed to be truly random.

Hi! I'm a Derpibooru site developer, and recently pushed an update to resolve this. To get a predictable order in a random search, you can now provide a numeric seed in the sf-argument like so:

sf=random:7213143

Another seed will produce a different but consistent order.

Add support for following HTTP 302 and 301 redirects

When the user sets a booruUrl with www. in the UI but the philomena website has a http redirect to the domain without www, querying the API for search results fails silently.

It would be nice if the app would support HTTP 301 or HTTP 302 responses and follow them to the correct search api page, or at least handle such responses by properly handling them.

Rating File Name Option

Would it be possible to use a post's rating as an option when naming files? This would allow users to save images in formats such as Downloads/{rating}/{id}.{ext} or Downloads/{rating}_{id}.{ext}. Using post 8584 as an example, it would download the image to Downloads/safe/8584.jpg or Downloads/safe_8584.jpg.

In the cases where multiple ratings are present, I believe that it should take the format of ratingA+ratingB. Where ratingA is one of the mutually exclusive tags safe/suggestive/questionable/explicit and ratingB (and ratingC and so on) are comprised of the non-mutually exclusive rating tags semi-grimdark/grimdark/grotesque.

Page offset not working as expected

When I use the page offset in the GUI or through the command line, the program appears to ignore it, always starting download with the first image in the search results on the first page. My present work around is to pass in a search query that includes the id field, ex: "id.lt:300000" to get results from latter pages.

Thank you for your work on this tool, it is incredibly convenient.

Derpibooru API Changes

Derpibooru has migrated away from Rails and released their new server back-end, Philomena, that includes breaking API changes.

https://derpibooru.org/forums/meta/topics/philomena-open-beta-breaking-api-changes

API changes

In the interest of preserving API compatibility, we will attempt to rewrite requests to our exisitng Rails JSON API for as long as is reasonably feasible after the migration (~ several months). We hope most applications will be able to migrate over to the new API format before we remove the Rails app completely.

As before, you can use the filter_id parameter to override your current filter, and the key parameter to authenticate yourself to the API. Here are the new API routes:

[example] GET /api/v1/json/images/:image_id
[example] GET /api/v1/json/search
[example] GET /api/v1/json/oembed

There is also a POST route which takes a query parameter named URL. However, it depends on the scraper; do not expect it to work reliably until the scraper is fixed.

POST /api/v1/json/search/reverse

Downloader function for Python/JS

This is a C# based scraper that can only be called from the terminal. If one wants to use this to categorize images ala Python ML, what would be the best course of action?

404 Error Handling

I noticed a pattern where the downloader will not progress past certain image ids, which also have a 404 Not Found on the download link to the image on the site. So the problem I have with searching through search tag query is that I will be stuck (e.g. 24/620 images, Elapsed 04:20:16) for an indefinite amount of time because the downloader can't download the image it is on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.