Git Product home page Git Product logo

auger's People

Contributors

liminalcrab avatar maxdeviant avatar thomasorus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

auger's Issues

URL's being rejected by the pull scraper (and date)

Need to figure out why the following URL's are being rejected and implement a fix

URL https://longest.voyage/index.xml is fucked up.
URL https://kokorobot.ca/links/rss.xml is fucked up.
URL https://ameyama.com/blog/rss.xml is fucked up.
URL http://npisanti.com/rss.xml is fucked up.
URL https://phse.net/post/index.xml is fucked up.
URL https://rosano.ca/feed is fucked up.
URL https://teknari.com/feed.xml is fucked up.
URL https://serocell.com/feeds/serocell.xml is fucked up.
URL https://eli.li/feed.rss is fucked up.
URL https://resevoir.net/rss.xml is fucked up.
URL https://sixey.es/feed.xml is fucked up.
URL https://royniang.com/rss.xml is fucked up.
URL https://0xff.nu/feed.xml is fucked up.
URL https://system32.simone.computer/rss.xml is fucked up.
URL https://simply.personal.jenett.org/feed/ is fucked up.
URL https://q.pfiffer.org/feed.xml is fucked up.
URL https://www.edwinwenink.xyz/index.xml is fucked up.
URL https://materialfuture.net/rss.xml is fucked up.
URL https://travisshears.com/index.xml is fucked up.
URL https://www.juliendesrosiers.com/feed.xml is fucked up.
URL https://metasyn.pw/rss.xml is fucked up.
URL https://wolfmd.me/feed.xml is fucked up.
URL https://darch.dk/feed/page:feed.xml is fucked up.
URL https://natehn.com/index.xml is fucked up.
URL https://www.gr0k.net/blog/feed.xml is fucked up.
URL https://wiki.xxiivv.com/links/rss.xml is fucked up.

This line of code isn't splitting certain 'rss' tags
links = [x for x in root if x.tag.split("}")[1] in ("entry", "item")]
https://github.com/LiminalCrab/auggar/blob/main/data/pull.py#L63

failed
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7eb810>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7ebcc0>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079d0e8400>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7b4db0>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7eba90>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7a7c70>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7a8540>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7ad400>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079d0c5900>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c8020e0>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c802360>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c774e00>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c691360>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7e1ae0>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7ad040>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7ad810>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7edd60>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c801950>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7b4f90>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c659810>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7ef8b0>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7b4220>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c7a8090>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c801bd0>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c659220>
EXCEPTION_ROOT:<Element 'rss' at 0x7f079c774b30>

accepted
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079d0c54f0>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c7e15e0>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079d0e8810>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c7a8e00>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c7ed2c0>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c802bd0>
TRY2_ROOT:<Element '{http://www.sitemaps.org/schemas/sitemap/0.9}urlset' at 0x7f079c659ef0>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c673770>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c7a8d60>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c801090>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c673e50>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c6c4cc0>
TRY2_ROOT:<Element '{http://www.w3.org/2005/Atom}feed' at 0x7f079c6c3e00>

PG/SL - Replace any string manipulation done in SQL.

Updating the database is dangerous, it's prone to SQL injection. Data was originally being manipulated with SQL itself, it slowed the application to a grinding halt and actually served as a method of SQL injection.

Refactor modules, reimplement Asyncio.

Coroutines aren't implemented correctly at all, the Asyncio library is not being used to its full potential. It's amazing this application worked when it did. Breaking each main function up into different functions with separate responsibilities should greatly empower Asyncio and increase the speed of the application.

urls.py - Consolidate lists into dictionary.

Two cases for this, since the website url is simply an identifier, the dictionary could be formatted as the following.

"site":"feedurl"

In the case of adding instance usernames to the list, a nested dictionary might be a better solution.

"site":
   "username":"user",
   "feed":"feedurl"

Fix yr shit

From 242f5517d73a073dc5fcee4bd0d940426d2c647b Mon Sep 17 00:00:00 2001
From: Quinlan Pfiffer <[email protected]>
Date: Thu, 8 Apr 2021 09:16:10 -0700
Subject: [PATCH] Fix yr shit.

---
 data/pull.py | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/data/pull.py b/data/pull.py
index 9789755..e237de8 100644
--- a/data/pull.py
+++ b/data/pull.py
@@ -65,15 +65,7 @@ async def main():
             try:
                 links = [x for x in root if x.tag.split("}")[1] in ("entry", "item")]
             except IndexError:
-                links = [x for x in root if x.tag in ("entry", "item")]
-                for match in re.findall('mlns:[^=]+="(?P<url>[^"]+)', response.text):
-                    print("REGEX:", match)
-                    
-                
-                print("LINKS:", links)
-            
-                #print("URL {} is fucked up.".format(url))
-                continue
+                links = [x for x in root[0] if x.tag in ("entry", "item")]
 
             for link in links:
                 title = [x.text for x in link if x.tag.split("}")[1] == "title"]
-- 
2.25.1

This fixes the fucked up ones.

Implement - FastAPI

FastAPI to work as a bridge between data processing on the backend and rendering content on the frontend. Retrieve and modify data with periodic API calls. This will replace the inefficient nested for loop the application is wrapped in as well as CRON. It will also create a better way of transporting data to separate components, and make it easier to make future changes without affecting how the entire application works.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.