deer-spangle / faexport Goto Github PK

View Code? Open in Web Editor NEW

66.0 11.0 11.0 828 KB

The API for Furaffinity you wish existed

Home Page: https://faexport.spangle.org.uk

License: BSD 3-Clause "New" or "Revised" License

Ruby 95.22% CSS 0.28% JavaScript 1.09% Dockerfile 0.32% Makefile 1.07% Shell 0.11% Haml 1.88% Procfile 0.03%

furry furaffinity hacktoberfest web-scraping rest-api

faexport's People

Contributors

Stargazers

Watchers

Forkers

watteo tailes happycamper1337 animalillo giff-h d-ashesss deer-spangle-org radarcz mruac

faexport's Issues

FAExport encountering internal error on all links

Example link: https://faexport.boothale.net/submission/7083912.json

FAExport for the past couple of days has been returning errors such as that. Something with FA might've changed with the recent addition of cloudflare ddos protection.

Submission links issue

And here i come with another bug report!

The link to the full submission file id seems to be broken since last fa update, example response :"full": "http:/full/20015828/",

i'm using the json api

Support multi-arch docker images

Got a request over telegram to support multi architecture docker images.

I don't know much about how they work yet though. So gotta find out how to build them, and how to test them?
Here's the docs:
https://docs.docker.com/build/building/multi-platform/

Furaffinity password resets

Looks like the scraper account needs a new password....

Feature req: Expose user guest access settings

GET /user/{name} should return a "guest_access" key with a bool representing whether the user has selected "allow guests" or "block guests" in their FA account's Site Settings>Privacy Options>Disable Guest Access settings.

There is unfortunately no simple way to scrape this information directly from the user's page AFAIK, it would require attempting to load the page a second time in a logged-out session and checking for:

<!-- {redirect} -->
<div id="[standardpage]()">

    <section class="[aligncenter notice-message]()" style="[margin: 30px auto; max-width: 800px;]()">
        <div class="[section-body alignleft]()">
            <h2>System Message</h2>

            <div class="[redirect-message]()"><p class="[link-override]()">The owner of this page has elected to make it available to registered users only.<br />To view the contents of this page please <a href="[https://www.furaffinity.net/login](https://www.furaffinity.net/login)">log in</a> or <a href="[/register/](https://www.furaffinity.net/register/)">create an account</a>.</p></div>

            <div class="[alignright]()">
                <a class="[button standard go]()" href="[javascript: history.back(-1)]()">Continue &raquo;</a>
            </div>
        </div>
    </section>
</div>


    </div>
    <!-- /<div id="site-content"> -->

My personal use case for this is detecting if a SFW submission will fail to result in a thumbnail embed on discord (SFW submissions from private accounts behave like NSFW ones in this regard) so my image preview bot will only be triggered if required.

Does this seem feasible or would it require a lot of work?

Not seen head nor tail of boothale in many months, so I have now forked this project a bit more firmly.
My repo is over here: https://github.com/Deer-Spangle/faexport
I should see issues and PRs filed in either place.
There's a public and updated copy of the API up here: http://faexport.spangle.org.uk

I am tempted to do some refactoring, but I will try my best to ensure backward compatibility at every stage

RSS is broken

When searching with a search query that has an @ in it, for example

@keywords test

search.json and search.xml returns the results fine but search.rss returns

TypeError: Failed to construct 'URL': Invalid URL

FxFuraffinity links

Would it be possible to replace all post links in the submission RSS feed to use fxfuraffinity.net instead of the original link? This would allow for a post to have an embed with functional preview.

Text in admin notice mistakenly interpreted as submission title

For example, the admin notice today for FA United:

<table class="maintable" id="admin_notice_do_not_adblock" width="95%" cellspacing="1" cellpadding="3" border="0">
	<tbody><tr>
		<td class="cat" valign="top">
		<table width="100%" cellspacing="0" cellpadding="2" border="0">
			<tbody><tr>
				<td width="32" valign="top" align="center">
					<img alt="Error!" src="/themes/classic/img/icons/Error.png">
				</td>
				<td valign="top" align="left">
					<h4>Administrator notice:</h4>
					FA United is LIVE! We're currently streaming the Fox and Pepper Show (<a href="http://www.furaffinity.net/user/foxamoore">Fox Amoore</a> and <a href="http://www.furaffinity.net/user/peppercoyote">Pepper Coyote</a>). Join us for live music at FAU Live. <a href="https://www.youtube.com/watch?v=f7jWFeuW0b0"><b>Click here to watch!</b></a>				</td>
			</tr>
		</tbody></table>
		</td>
	</tr>
</tbody></table>

Slow response times for submission requests on local server

Hello,

I'm experiencing an issue where response times for making submission requests are very slow. Each request takes about a second to complete, here's the output.

[17/Oct/2018:16:57:25 +0000] "GET /submission/28191881.json HTTP/1.1" 200 1096 0.9489
[17/Oct/2018:16:57:26 +0000] "GET /submission/27737463.json HTTP/1.1" 200 1085 0.9656
[17/Oct/2018:16:57:27 +0000] "GET /submission/27737441.json HTTP/1.1" 200 1168 0.9720
[17/Oct/2018:16:57:28 +0000] "GET /submission/26505248.json HTTP/1.1" 200 1188 0.9796
[17/Oct/2018:16:57:29 +0000] "GET /submission/26505213.json HTTP/1.1" 200 1162 0.9602
[17/Oct/2018:16:57:30 +0000] "GET /submission/26015495.json HTTP/1.1" 200 1113 0.9771
[17/Oct/2018:16:57:31 +0000] "GET /submission/25835173.json HTTP/1.1" 200 1219 0.9949
[17/Oct/2018:16:57:32 +0000] "GET /submission/25135350.json HTTP/1.1" 200 1235 0.9902
[17/Oct/2018:16:57:33 +0000] "GET /submission/25060895.json HTTP/1.1" 200 1261 0.9377
[17/Oct/2018:16:57:34 +0000] "GET /submission/25049362.json HTTP/1.1" 200 1352 1.0004
[17/Oct/2018:16:57:35 +0000] "GET /submission/24880819.json HTTP/1.1" 200 1023 0.9701
[17/Oct/2018:16:57:36 +0000] "GET /submission/24078853.json HTTP/1.1" 200 1270 1.0597
[17/Oct/2018:16:57:37 +0000] "GET /submission/24068126.json HTTP/1.1" 200 1137 0.8902
[17/Oct/2018:16:57:38 +0000] "GET /submission/23544162.json HTTP/1.1" 200 1126 0.9866

I made a little project just for testing submission requests from the API, the code can be found here. (excuse the sloppiness)

Also, I must add that response times from the API improve significantly when it's done through a web browser. I have absolutely no idea why this is. Here's the output.

[17/Oct/2018:17:06:28 +0000] "GET /submission/28191881.json HTTP/1.1" 200 1096 0.0787
[17/Oct/2018:17:06:32 +0000] "GET /submission/27737463.json HTTP/1.1" 200 1085 0.0730
[17/Oct/2018:17:06:35 +0000] "GET /submission/27737463.json HTTP/1.1" 200 1085 0.0009
[17/Oct/2018:17:06:42 +0000] "GET /submission/26505248.json HTTP/1.1" 200 1188 0.0987
[17/Oct/2018:17:06:49 +0000] "GET /submission/26505213.json HTTP/1.1" 200 1162 0.0573
[17/Oct/2018:17:06:55 +0000] "GET /submission/26015495.json HTTP/1.1" 200 1113 0.0618
[17/Oct/2018:17:06:59 +0000] "GET /submission/26015495.json HTTP/1.1" 200 1113 0.0008
[17/Oct/2018:17:07:01 +0000] "GET /submission/25835173.json HTTP/1.1" 200 1219 0.0792

Does anyone else have this issue? Any help would be greatly appreciated!

Invalid cookie warning when trying to post journal

Recently I've been getting an error when I try to post a journal, saying that I need a valid FA_COOKIE header, even though I'm providing one. I think this worked at some point last year.

Other endpoints are working fine.

Here's the code I'm using to send the request: https://github.com/libertyernie/FAExportLib/blob/master/FAUserClient.vb#L37

[Feature] Route to fetch latest submissions

Just something to pull IDs from the index page.

Submission info gathering broken

Hello! It's me again, the submission info gathering has died!
Example: http://faexport.boothale.net/submission/2080496.json

Sorry to keep bringing bad news!

Semantic versioning?

Given this is a public API, that some systems seem to use and rely on, shouldn't it have version numbers by now?

https://semver.org/ has a good guide to semantic versioning, but it's just Major.Minor.Patch. Major would need to be at least 1, because this is public and has things using it. We can increment Minor when we add functionality, increment Patch when we make bug fixes, and Major if we ever introduce non-backward-compatible changes, which we try not to do.

And certainly having the version number listed in the docs and having a listing here of releases would be useful to know which features are available in a given deployment of the API. I've certainly found myself using the wrong version API at times!

EDIT: second thoughts, calver, https://calver.org/ seems much more appropriate.

Is your project time-sensitive in any way? Do other external changes drive new project releases?
Business requirements, such as Ubuntu's focus on support schedules.
Security updates, such as certifi's need to update certificates.
Political shifts, such as pytz's handling of timezone changes.
If you answered yes to any of these questions, CalVer's semantics make it a strong choice for your project.

Automated testing

It would be very handy to have some automated tests, run every day or so in travis, that just ensure FA hasn't changed their site and broken everything.
After the shinies update, I threw together a little manual checklist of endpoints and expectations? So this might be a good place to start?

manual faexport tests:

http://localhost:9292/home.json

Ensure "artwork", "writing", "music", "crafts" keys exist, and contain a list of hashes
- Ensure no nulls in values in those hashes

http://localhost:9292/search.json?q=fender

Ensure list of ids is returned

http://localhost:9292/search.json?q=fender&full=1

Ensure list of submission hashes is returned

http://localhost:9292/user/fender.json

Ensure most values are not null. "id" is always null. "profile_id", "featured_submission" can be null, check on a couple pages.

http://localhost:9292/user/fender/watching.json

Ensure list of names is returned

http://localhost:9292/user/fender/watchers.json

Ensure list of names is returned

http://localhost:9292/user/aliena-cordis/commissions.json

Ensure list of submission hashes is returned, values are not null

http://localhost:9292/user/fender/shouts.json

Ensure list of hashes is returned, values in hashes are not null

http://localhost:9292/user/fender/journals.json

Ensure list of IDs is returned

http://localhost:9292/user/fender/journals.json?full=1

Ensure list of journal hashes is returned, values in hashes are not null

http://localhost:9292/user/fender/gallery.json

Ensure list of IDs is returned

http://localhost:9292/user/fender/gallery.json?full=1

Ensure list of submission hashes is returned, values in hashes are not null

http://localhost:9292/user/fender/scraps.json

Ensure list of IDs is returned

http://localhost:9292/user/fender/scraps.json?full=1

Ensure list of submission hashes is returned, values in hashes are not null

http://localhost:9292/user/fender/favorites.json

Ensure list of IDs is returned

http://localhost:9292/user/fender/favorites.json?full=1

Ensure list of submission hashes is returned, values in hashes are not null

http://localhost:9292/submission/23636984.json

Ensure values are not null

http://localhost:9292/submission/23636984/comments.json

Ensure list of comment hashes is returned, values in comments are not null

http://localhost:9292/journal/9150534.json

Ensure journal hash is returned, values are not null

http://localhost:9292/journal/9150534/comments.json

Ensure list of comment hashes is returned, values in comments are not null

I don't really know the first thing about automated testing in ruby though. Such as which libraries are preferred, what style of testing to use, how to add required stuff to the gemfile, etc

Forced theme change broke public API

Problem

With the New Year's site UI update announcement, all accounts were forced over to the modern theme. This has broken the publicly available API at https://faexport.boothale.net/ because the code expects the underlying account being used to be on the Classic theme.

Proposed Solution

Fix the theme settings of the account so that it uses the classic theme.

FA scraper as separate gem?

Scraping stuff from FA is a useful thing to be able do on its own, without serving it as a REST API.
In fact I myself would want to use it in a local script.

Would it be ok to extract the scraper into a ruby gem (needs its own repo or branch) on its own and then refactor the FAExport API rely on that gem?

Artist and title

Somehow now every artist and every title happens to be "click here"

Submission count cuts off at 72

Hi,

I'm experiencing an issue where the API submission list will cut off at 72 submissions if the user's gallery exceeds that number. This happens with both sfw & nsfw submissions. (all links are sfw)

http://faexport.boothale.net/user/strange-fox/gallery.xml?sfw=1
https://www.furaffinity.net/gallery/strange-fox/

this and everything before it isnt in the list: https://www.furaffinity.net/view/2555669/

http://faexport.boothale.net/user/-lofi/gallery.xml?sfw=1
https://www.furaffinity.net/gallery/-lofi/

this and everything before it isnt in the list: https://www.furaffinity.net/view/19306091/

http://faexport.boothale.net/user/kenket/gallery.xml?sfw=1
https://www.furaffinity.net/gallery/kenket/

this and everything before it isnt in the list: https://www.furaffinity.net/view/22732910/

Thanks in advance!

Add "latest submissions" to user profile endpoint

It would be nice to add the latest submissions (and maybe latest favs) to the user profile endpoint.
It'll only be a few of their submissions, but it would also include the latest upload date, which would be handy if checking whether a user is active, without needing to make multiple requests.

Pages on favorites broken

Furaffinity changed how pagination on favorites works. It now tracks which pages to go to next using favorite ids rather than explicit page numbers. Will possibly need a breaking API change for this.

Show Gallery, Scraps and Favorite on title

When I subscribed two and more feeds from one artist, I can't distinguish which feed is from gallery, which is from scraps and etc, due to the same title. So I suggest using <description> as <title>.

Poster's name missing from journal results

Looks like the parsing have have broken for that at some point. Profile name and profile link are still good, but name is broken.

Picking random numbers until I found a valid journal as an example:
https://faexport.boothale.net/journal/233442.json

  "name": "\n                                        \n                                    ",
  "profile": "https://www.furaffinity.net/user/xids/",
  "profile_name": "xids",

(I've got a bunch of feature requests I would love too, but.. I shall put them in their own issue, once I gather them up.. Or maybe try and submit a PR, I have been wanting to learn Ruby!)

Search feature not working

Search feature not working anymore, it used to work, tho

Final page of favourites repeats itself

If you go to a final page of a user's favourites, and specify ?next={fav_id} for the last fav on the page, it displays the same page.

This seems to be an FA issue, but regardless is one we can mitigate in the API, simply check if the last fav id equals the next parameter, return an empty list.

I think this would be much neater functionality for the API, what do you think?

[Feature] SFW parameter for GET /user/{name}/{folder}

It would be nice to filter out NSFW submissions without having to get the details for each one. I wonder if you could implement this by using sfw.furaffinity.net instead of www.furaffinity.net, or by using different a and b cookies?

Artist Type has been replaced with User Title

A change to the website a while ago meant users can now specify any title to themselves on their profile instead of preset types like "Watcher".

https://github.com/boothale/faexport/blob/master/lib/faexport/scraper.rb#L236

How to use the GET /search function?

There is a field for name and submission ID, but I'm not sure how to get faexport to work with the search query.

I've tried https://furaffinityapi.herokuapp.com/search/q=birdo.xml and https://furaffinityapi.herokuapp.com/search/birdo.xml
but it returned a "Not Found".

User watching list pagination not docummented

I was trying to write some artist discovery logic, only to find that the watching list only returns the first page of the list.

Ideally there should be a route that either gets a certain page or every page.

Feature Request: Provide actual username of the poster, not just display name.

Use case for me is that I scrape the submission and then I want to look at userprofile of the poster. While attempting to scrape submission 26386469, I noticed the name property is more of a display name, and therefor you can not just chug it to faexport's /user/{name}.json.

I worked around it by parsing the username out of profile property, it would be nice if I could just ask the API.

More information on watching/watcher endpoints

Can do similar to gallery endpoint, with a ?full=1 parameter.
Would be great to add the actual profile links, and also users can have different symbols alongside their name indicating their status:

~ Normal standing
! Suspended
- Banned
∞ Deceased

search.rss does not work on Heroku

I made sure that the fork is up to date and wanted to use the search.rss to see if it is faster on Heroku than on your site, however it cannot find the feed even after 30 minutes.

Would it be something to do with RedisToGo? I am on the free nano plan 5MB caching.

How does the caching work by the way? How long is the cache stored?