astraluma / catfind Goto Github PK
View Code? Open in Web Editor NEWSphinx indexer & search engine
Sphinx indexer & search engine
It's possible for sites to die and go poof.
Detect this and automatically clean them up.
2021-12-20T20:51:36.994145+00:00 app[scheduler.4565]: Updating OpenColorIO (https://opencolorio.readthedocs.io/en/latest/objects.inv)
2021-12-20T20:51:40.165852+00:00 app[scheduler.4565]: Traceback (most recent call last):
2021-12-20T20:51:40.165879+00:00 app[scheduler.4565]: File "/app/.heroku/python/bin/flask", line 8, in <module>
2021-12-20T20:51:40.166012+00:00 app[scheduler.4565]: sys.exit(main())
2021-12-20T20:51:40.166029+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main
2021-12-20T20:51:40.166295+00:00 app[scheduler.4565]: cli.main(args=sys.argv[1:])
2021-12-20T20:51:40.166302+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main
2021-12-20T20:51:40.166486+00:00 app[scheduler.4565]: return super().main(*args, **kwargs)
2021-12-20T20:51:40.166493+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
2021-12-20T20:51:40.166765+00:00 app[scheduler.4565]: rv = self.invoke(ctx)
2021-12-20T20:51:40.166775+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
2021-12-20T20:51:40.167171+00:00 app[scheduler.4565]: return _process_result(sub_ctx.command.invoke(sub_ctx))
2021-12-20T20:51:40.167180+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
2021-12-20T20:51:40.167542+00:00 app[scheduler.4565]: return ctx.invoke(self.callback, **ctx.params)
2021-12-20T20:51:40.167542+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
2021-12-20T20:51:40.167775+00:00 app[scheduler.4565]: return __callback(*args, **kwargs)
2021-12-20T20:51:40.167776+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
2021-12-20T20:51:40.167843+00:00 app[scheduler.4565]: return f(get_current_context(), *args, **kwargs)
2021-12-20T20:51:40.167859+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
2021-12-20T20:51:40.167994+00:00 app[scheduler.4565]: return __ctx.invoke(f, *args, **kwargs)
2021-12-20T20:51:40.168002+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
2021-12-20T20:51:40.168209+00:00 app[scheduler.4565]: return __callback(*args, **kwargs)
2021-12-20T20:51:40.168217+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
2021-12-20T20:51:40.168283+00:00 app[scheduler.4565]: return f(get_current_context(), *args, **kwargs)
2021-12-20T20:51:40.168290+00:00 app[scheduler.4565]: File "/app/catfind/__init__.py", line 270, in auto_index
2021-12-20T20:51:40.168398+00:00 app[scheduler.4565]: ctx.invoke(index, url=proj.inv_url)
2021-12-20T20:51:40.168405+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
2021-12-20T20:51:40.168628+00:00 app[scheduler.4565]: return __callback(*args, **kwargs)
2021-12-20T20:51:40.168638+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
2021-12-20T20:51:40.168711+00:00 app[scheduler.4565]: return f(get_current_context(), *args, **kwargs)
2021-12-20T20:51:40.168718+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
2021-12-20T20:51:40.168876+00:00 app[scheduler.4565]: return __ctx.invoke(f, *args, **kwargs)
2021-12-20T20:51:40.168883+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
2021-12-20T20:51:40.169109+00:00 app[scheduler.4565]: return __callback(*args, **kwargs)
2021-12-20T20:51:40.169117+00:00 app[scheduler.4565]: File "<string>", line 2, in index
2021-12-20T20:51:40.169213+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 552, in new_func
2021-12-20T20:51:40.169394+00:00 app[scheduler.4565]: reraise(exc_type, exc, tb)
2021-12-20T20:51:40.169402+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/utils/utils.py", line 95, in reraise
2021-12-20T20:51:40.169482+00:00 app[scheduler.4565]: try: raise exc.with_traceback(tb)
2021-12-20T20:51:40.169491+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func
2021-12-20T20:51:40.169653+00:00 app[scheduler.4565]: result = func(*args, **kwargs)
2021-12-20T20:51:40.169670+00:00 app[scheduler.4565]: File "/app/catfind/__init__.py", line 203, in index
2021-12-20T20:51:40.169756+00:00 app[scheduler.4565]: ent = Entry.get(project=proj, domain=domain, role=role, name=item.name)
2021-12-20T20:51:40.169772+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4014, in get
2021-12-20T20:51:40.170766+00:00 app[scheduler.4565]: try: return entity._find_one_(kwargs) # can throw MultipleObjectsFoundError
2021-12-20T20:51:40.170776+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4121, in _find_one_
2021-12-20T20:51:40.171856+00:00 app[scheduler.4565]: if obj is None: obj = entity._find_in_db_(avdict, unique, for_update, nowait, skip_locked)
2021-12-20T20:51:40.171865+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4181, in _find_in_db_
2021-12-20T20:51:40.172913+00:00 app[scheduler.4565]: objects = entity._fetch_objects(cursor, attr_offsets, 1, for_update, avdict)
2021-12-20T20:51:40.172922+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4315, in _fetch_objects
2021-12-20T20:51:40.174012+00:00 app[scheduler.4565]: obj._db_set_(avdict)
2021-12-20T20:51:40.174021+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4951, in _db_set_
2021-12-20T20:51:40.175256+00:00 app[scheduler.4565]: throw(UnrepeatableReadError,
2021-12-20T20:51:40.175281+00:00 app[scheduler.4565]: File "/app/.heroku/python/lib/python3.9/site-packages/pony/utils/utils.py", line 106, in throw
2021-12-20T20:51:40.175373+00:00 app[scheduler.4565]: raise exc
2021-12-20T20:51:40.175404+00:00 app[scheduler.4565]: pony.orm.core.UnrepeatableReadError: Value of Entry.last_indexed for OpenColorIO_v2_2dev::Baker::getFormatMetadata was updated outside of current transaction (was: datetime.datetime(2021, 12, 20, 20, 51, 36, 994208, tzinfo=datetime.timezone.utc), now: datetime.datetime(2021, 12, 20, 20, 51, 36, 994208))
Add nice 404 pages, including distinct ones for "idk what this route is" and "I couldn't find the thing you're looking for".
When doing crawling, memory appears to increase, indicating a leak.
Fix this.
(I expect it's the use of @functools.cache
on methods)
Heroku doesn't have this feature. Add it to the app.
Currently, if PyPI or RTD have outages when catfind is doing discovery, it may or may not be able to tell the difference.
Handle that more smarter.
Make the HTML not browser-default ugly.
Add flask-native Prometheus metrics, including:
Project.last_indexed
statisticsUpdating reliability (https://reliability.readthedocs.io/en/latest/objects.inv)
Traceback (most recent call last):
File "/app/.heroku/python/bin/flask", line 8, in <module>
sys.exit(main())
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main
cli.main(args=sys.argv[1:])
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main
return super().main(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/catfind/__init__.py", line 270, in auto_index
ctx.invoke(index, url=proj.inv_url)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "<string>", line 2, in index
File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func
result = func(*args, **kwargs)
File "/app/catfind/__init__.py", line 203, in index
ent = Entry.get(project=proj, domain=domain, role=role, name=item.name)
File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4014, in get
try: return entity._find_one_(kwargs) # can throw MultipleObjectsFoundError
File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4112, in _find_one_
avdict[attr] = attr.validate(val, None, entity, from_db=False)
File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 2547, in validate
throw(ValueError, 'Attribute %s is required' % (
File "/app/.heroku/python/lib/python3.9/site-packages/pony/utils/utils.py", line 106, in throw
raise exc
ValueError: Attribute Entry.role is required
PyPI seems to have some kind of pro-active mirroring support with incremental updates.
Implement support for that, and use it to feed PyPI guessing.
Be able to positively identify when a custom domain is an RTD site.
Updating (https://bernardphp-com.readthedocs.io/projects/chute/objects.inv)
Traceback (most recent call last):
File "/app/.heroku/python/bin/flask", line 8, in <module>
sys.exit(main())
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main
cli.main(args=sys.argv[1:])
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main
return super().main(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/catfind/__init__.py", line 270, in auto_index
ctx.invoke(index, url=proj.inv_url)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "<string>", line 2, in index
File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func
result = func(*args, **kwargs)
File "/app/catfind/__init__.py", line 184, in index
inv = Inventory.load_uri(url)
File "/app/catfind/inventory.py", line 121, in load_uri
self = cls.load(
File "/app/catfind/inventory.py", line 135, in load
return cls.load_v2(reader, uri, joinfunc)
File "/app/catfind/inventory.py", line 168, in load_v2
raise ValueError('invalid inventory header (not compressed): %s' % line)
ValueError: invalid inventory header (not compressed):
If configured for bounce mode, if catfind doesn't know about a thing, it'll instead redirect to another catfind instance.
This would allow private instances indexing internal modules without every instance needing to do a full discovery/crawl.
Updating (https://xilinx.github.io/chipscopy/2021.1/objects.inv)
Traceback (most recent call last):
File "/app/.heroku/python/bin/flask", line 8, in <module>
sys.exit(main())
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main
cli.main(args=sys.argv[1:])
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main
return super().main(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/catfind/__init__.py", line 270, in auto_index
ctx.invoke(index, url=proj.inv_url)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "<string>", line 2, in index
File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func
result = func(*args, **kwargs)
File "/app/catfind/__init__.py", line 184, in index
inv = Inventory.load_uri(url)
File "/app/catfind/inventory.py", line 121, in load_uri
self = cls.load(
File "/app/catfind/inventory.py", line 135, in load
return cls.load_v2(reader, uri, joinfunc)
File "/app/catfind/inventory.py", line 170, in load_v2
for line in stream.read_compressed_lines():
File "/app/catfind/inventory.py", line 93, in read_compressed_lines
for chunk in self.read_compressed_chunks():
File "/app/catfind/inventory.py", line 87, in read_compressed_chunks
yield decompressor.decompress(self.buffer)
zlib.error: Error -3 while decompressing data: invalid distance too far back
select * from project where last_indexed is null and name != '';
These inventories have never been indexed successfully.
jesus christ do I need a test suite.
Support languages for catfind-generated text, and selection of said language.
Support user language preferences and tracking what language various editions of documentation is in.
PonyORM's built-in create_tables
feature only creates whole tables.
Need something that can at least create columns, and preferably delete and modify them, too.
It's pretty clear that the current schema (simply tracking the latest inventory and hope no others get picked up) isn't working.
Also, many PyPI projects will link to docs that are not their own.
It's clear that discovery will be an on-going process, not something we can do just once and be done with it.
Design a next-gen version of this that'll better handle having many inventories for the same project.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.