Git Product home page Git Product logo

catfind's People

Contributors

astraluma avatar

Watchers

 avatar

catfind's Issues

Clean up dead projects

It's possible for sites to die and go poof.

Detect this and automatically clean them up.

UnrepeatableReadError

2021-12-20T20:51:36.994145+00:00 app[scheduler.4565]: Updating OpenColorIO (https://opencolorio.readthedocs.io/en/latest/objects.inv)

2021-12-20T20:51:40.165852+00:00 app[scheduler.4565]: Traceback (most recent call last):

2021-12-20T20:51:40.165879+00:00 app[scheduler.4565]:   File "/app/.heroku/python/bin/flask", line 8, in <module>

2021-12-20T20:51:40.166012+00:00 app[scheduler.4565]:     sys.exit(main())

2021-12-20T20:51:40.166029+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main

2021-12-20T20:51:40.166295+00:00 app[scheduler.4565]:     cli.main(args=sys.argv[1:])

2021-12-20T20:51:40.166302+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main

2021-12-20T20:51:40.166486+00:00 app[scheduler.4565]:     return super().main(*args, **kwargs)

2021-12-20T20:51:40.166493+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main

2021-12-20T20:51:40.166765+00:00 app[scheduler.4565]:     rv = self.invoke(ctx)

2021-12-20T20:51:40.166775+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke

2021-12-20T20:51:40.167171+00:00 app[scheduler.4565]:     return _process_result(sub_ctx.command.invoke(sub_ctx))

2021-12-20T20:51:40.167180+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke

2021-12-20T20:51:40.167542+00:00 app[scheduler.4565]:     return ctx.invoke(self.callback, **ctx.params)

2021-12-20T20:51:40.167542+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke

2021-12-20T20:51:40.167775+00:00 app[scheduler.4565]:     return __callback(*args, **kwargs)

2021-12-20T20:51:40.167776+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func

2021-12-20T20:51:40.167843+00:00 app[scheduler.4565]:     return f(get_current_context(), *args, **kwargs)

2021-12-20T20:51:40.167859+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator

2021-12-20T20:51:40.167994+00:00 app[scheduler.4565]:     return __ctx.invoke(f, *args, **kwargs)

2021-12-20T20:51:40.168002+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke

2021-12-20T20:51:40.168209+00:00 app[scheduler.4565]:     return __callback(*args, **kwargs)

2021-12-20T20:51:40.168217+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func

2021-12-20T20:51:40.168283+00:00 app[scheduler.4565]:     return f(get_current_context(), *args, **kwargs)

2021-12-20T20:51:40.168290+00:00 app[scheduler.4565]:   File "/app/catfind/__init__.py", line 270, in auto_index

2021-12-20T20:51:40.168398+00:00 app[scheduler.4565]:     ctx.invoke(index, url=proj.inv_url)

2021-12-20T20:51:40.168405+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke

2021-12-20T20:51:40.168628+00:00 app[scheduler.4565]:     return __callback(*args, **kwargs)

2021-12-20T20:51:40.168638+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func

2021-12-20T20:51:40.168711+00:00 app[scheduler.4565]:     return f(get_current_context(), *args, **kwargs)

2021-12-20T20:51:40.168718+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator

2021-12-20T20:51:40.168876+00:00 app[scheduler.4565]:     return __ctx.invoke(f, *args, **kwargs)

2021-12-20T20:51:40.168883+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke

2021-12-20T20:51:40.169109+00:00 app[scheduler.4565]:     return __callback(*args, **kwargs)

2021-12-20T20:51:40.169117+00:00 app[scheduler.4565]:   File "<string>", line 2, in index

2021-12-20T20:51:40.169213+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 552, in new_func

2021-12-20T20:51:40.169394+00:00 app[scheduler.4565]:     reraise(exc_type, exc, tb)

2021-12-20T20:51:40.169402+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/utils/utils.py", line 95, in reraise

2021-12-20T20:51:40.169482+00:00 app[scheduler.4565]:     try: raise exc.with_traceback(tb)

2021-12-20T20:51:40.169491+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func

2021-12-20T20:51:40.169653+00:00 app[scheduler.4565]:     result = func(*args, **kwargs)

2021-12-20T20:51:40.169670+00:00 app[scheduler.4565]:   File "/app/catfind/__init__.py", line 203, in index

2021-12-20T20:51:40.169756+00:00 app[scheduler.4565]:     ent = Entry.get(project=proj, domain=domain, role=role, name=item.name)

2021-12-20T20:51:40.169772+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4014, in get

2021-12-20T20:51:40.170766+00:00 app[scheduler.4565]:     try: return entity._find_one_(kwargs)  # can throw MultipleObjectsFoundError

2021-12-20T20:51:40.170776+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4121, in _find_one_

2021-12-20T20:51:40.171856+00:00 app[scheduler.4565]:     if obj is None: obj = entity._find_in_db_(avdict, unique, for_update, nowait, skip_locked)

2021-12-20T20:51:40.171865+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4181, in _find_in_db_

2021-12-20T20:51:40.172913+00:00 app[scheduler.4565]:     objects = entity._fetch_objects(cursor, attr_offsets, 1, for_update, avdict)

2021-12-20T20:51:40.172922+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4315, in _fetch_objects

2021-12-20T20:51:40.174012+00:00 app[scheduler.4565]:     obj._db_set_(avdict)

2021-12-20T20:51:40.174021+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4951, in _db_set_

2021-12-20T20:51:40.175256+00:00 app[scheduler.4565]:     throw(UnrepeatableReadError,

2021-12-20T20:51:40.175281+00:00 app[scheduler.4565]:   File "/app/.heroku/python/lib/python3.9/site-packages/pony/utils/utils.py", line 106, in throw

2021-12-20T20:51:40.175373+00:00 app[scheduler.4565]:     raise exc

2021-12-20T20:51:40.175404+00:00 app[scheduler.4565]: pony.orm.core.UnrepeatableReadError: Value of Entry.last_indexed for OpenColorIO_v2_2dev::Baker::getFormatMetadata was updated outside of current transaction (was: datetime.datetime(2021, 12, 20, 20, 51, 36, 994208, tzinfo=datetime.timezone.utc), now: datetime.datetime(2021, 12, 20, 20, 51, 36, 994208))

Friendly 404 pages

Add nice 404 pages, including distinct ones for "idk what this route is" and "I couldn't find the thing you're looking for".

Fix discovery memory leak

When doing crawling, memory appears to increase, indicating a leak.

Fix this.

(I expect it's the use of @functools.cache on methods)

Be durable against server failures

Currently, if PyPI or RTD have outages when catfind is doing discovery, it may or may not be able to tell the difference.

Handle that more smarter.

Styling

Make the HTML not browser-default ugly.

Add Prometheus metrics

Add flask-native Prometheus metrics, including:

  • Performance by route
  • Estimated count of projects and entries
  • Some Project.last_indexed statistics

Missing role

Updating reliability (https://reliability.readthedocs.io/en/latest/objects.inv)
Traceback (most recent call last):
  File "/app/.heroku/python/bin/flask", line 8, in <module>
    sys.exit(main())
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main
    cli.main(args=sys.argv[1:])
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main
    return super().main(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/catfind/__init__.py", line 270, in auto_index
    ctx.invoke(index, url=proj.inv_url)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "<string>", line 2, in index
  File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func
    result = func(*args, **kwargs)
  File "/app/catfind/__init__.py", line 203, in index
    ent = Entry.get(project=proj, domain=domain, role=role, name=item.name)
  File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4014, in get
    try: return entity._find_one_(kwargs)  # can throw MultipleObjectsFoundError
  File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 4112, in _find_one_
    avdict[attr] = attr.validate(val, None, entity, from_db=False)
  File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 2547, in validate
    throw(ValueError, 'Attribute %s is required' % (
  File "/app/.heroku/python/lib/python3.9/site-packages/pony/utils/utils.py", line 106, in throw
    raise exc
ValueError: Attribute Entry.role is required

Implement the PyPI mirroring protocol

PyPI seems to have some kind of pro-active mirroring support with incremental updates.

Implement support for that, and use it to feed PyPI guessing.

Handle bad inventories

Updating  (https://bernardphp-com.readthedocs.io/projects/chute/objects.inv)
Traceback (most recent call last):
  File "/app/.heroku/python/bin/flask", line 8, in <module>
    sys.exit(main())
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main
    cli.main(args=sys.argv[1:])
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main
    return super().main(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/catfind/__init__.py", line 270, in auto_index
    ctx.invoke(index, url=proj.inv_url)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "<string>", line 2, in index
  File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func
    result = func(*args, **kwargs)
  File "/app/catfind/__init__.py", line 184, in index
    inv = Inventory.load_uri(url)
  File "/app/catfind/inventory.py", line 121, in load_uri
    self = cls.load(
  File "/app/catfind/inventory.py", line 135, in load
    return cls.load_v2(reader, uri, joinfunc)
  File "/app/catfind/inventory.py", line 168, in load_v2
    raise ValueError('invalid inventory header (not compressed): %s' % line)
ValueError: invalid inventory header (not compressed): 

Bounce mode?

If configured for bounce mode, if catfind doesn't know about a thing, it'll instead redirect to another catfind instance.

This would allow private instances indexing internal modules without every instance needing to do a full discovery/crawl.

zlib error

Updating  (https://xilinx.github.io/chipscopy/2021.1/objects.inv)
Traceback (most recent call last):
  File "/app/.heroku/python/bin/flask", line 8, in <module>
    sys.exit(main())
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 994, in main
    cli.main(args=sys.argv[1:])
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 600, in main
    return super().main(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/catfind/__init__.py", line 270, in auto_index
    ctx.invoke(index, url=proj.inv_url)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/flask/cli.py", line 444, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/app/.heroku/python/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "<string>", line 2, in index
  File "/app/.heroku/python/lib/python3.9/site-packages/pony/orm/core.py", line 533, in new_func
    result = func(*args, **kwargs)
  File "/app/catfind/__init__.py", line 184, in index
    inv = Inventory.load_uri(url)
  File "/app/catfind/inventory.py", line 121, in load_uri
    self = cls.load(
  File "/app/catfind/inventory.py", line 135, in load
    return cls.load_v2(reader, uri, joinfunc)
  File "/app/catfind/inventory.py", line 170, in load_v2
    for line in stream.read_compressed_lines():
  File "/app/catfind/inventory.py", line 93, in read_compressed_lines
    for chunk in self.read_compressed_chunks():
  File "/app/catfind/inventory.py", line 87, in read_compressed_chunks
    yield decompressor.decompress(self.buffer)
zlib.error: Error -3 while decompressing data: invalid distance too far back

Problematic Inventories

select * from project where last_indexed is null and name != '';

These inventories have never been indexed successfully.

Tests!!

jesus christ do I need a test suite.

Internationalization

Support languages for catfind-generated text, and selection of said language.

Language support

Support user language preferences and tracking what language various editions of documentation is in.

Schema management

PonyORM's built-in create_tables feature only creates whole tables.

Need something that can at least create columns, and preferably delete and modify them, too.

Restructure how inventory & projects are tracked

It's pretty clear that the current schema (simply tracking the latest inventory and hope no others get picked up) isn't working.

Also, many PyPI projects will link to docs that are not their own.

It's clear that discovery will be an on-going process, not something we can do just once and be done with it.

Design a next-gen version of this that'll better handle having many inventories for the same project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.