Git Product home page Git Product logo

poliscoops's Introduction

Poliscoops

Table of contents

Important links

Bugs and feature requests

Have a bug or a feature request? Please first read the issue guidelines and search for existing and closed issues. If your problem or idea is not addressed yet, please open a new issue.

Install and usage

See this guide to install the Poliscoops API using Docker. There are also a few usage commands to get you started.

Removing items

in the pfl_backend_1 container: curl -XDELETE 'http://elasticsearch:9200/pfl_combined_index_fixed/item/60f7e58767aec51e657ae6848ed15c1c36fe8185'

Documentation

The documentation of the Poliscoops API can be found at docs.poliscoops.eu.

We use Sphinx to create the documentation. The source files are included in this repo under the docs directory.

Contributing

Please read through our contributing guidelines. Included are directions for opening issues, coding standards, and notes on development.

Authors and contributors

The Poliscoops API is based on the Open Raadsinformatie API. Authors and contributors of both projects are:

Authors:

Contributors:

Copyright and license

The Poliscoops API is distributed under the GNU Lesser General Public License v3. The documentation is released under the Creative Commons Attribution 4.0 International license.

poliscoops's People

Contributors

breyten avatar justinvw avatar siccovansas avatar bartdegoede avatar jurrian avatar bennokr avatar mickdelange avatar joostrothweiler avatar nl5887 avatar raz0rwire avatar coret avatar frankstrater avatar ajslaghu avatar

Stargazers

Jiří Podhorecký avatar RogerVerhoeven avatar Matty Smith avatar ʇɐ,ƃuɐldıʞ uıqoᴚ avatar Nikolaus Schlemm avatar nodearcnode avatar  avatar  avatar  avatar

Watchers

 avatar Stef van Grieken avatar  avatar James Cloos avatar  avatar

poliscoops's Issues

Index clean text instead of HTML

This will improve search results. Search for open data on the website without quotes and you can find items which have these terms, but in the html source instead of the text

Crawl more evenly

We currently do updates in one massive load. We should avoid that to avoid hitting the facebook api rate limit and avoid detection on other sites.

'click' on original article links in RSS to retrieve all text

Some RSS feeds only show a summary (or even don't show a summary, but show the article text). In these cases we should open the link to the original article en scrape all text there. Make sure to check if this is necessary, e.g., if all PvdA subsites have RSS feeds that should all the text then we do not need to use this feature for PvdA (which saves us scraping time).

Some examples:
This VVD subsite only shows a summary in the RSS feed
https://poliflw.nl/zoeken?location=Wassenaar&parties=VVD
https://wassenaar.vvd.nl/feeds/nieuws.rss
https://wassenaar.vvd.nl/nieuws/25325/wassenaarders-op-bezoek-in-de-tweede-kamer

This GroenLinks subsite only shows the article text and not the summary in the RSS feed
https://poliflw.nl/zoeken?location=Gooise+Meren&parties=GroenLinks
https://gooisemeren.groenlinks.nl/rss.xml
https://gooisemeren.groenlinks.nl/nieuws/groen-lintje-voor-repair-caf%C3%A9-bussum

NER entities extractor broken

<!--

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/src/app/app/__init__.py", line 40, in articles_entities
    return post_article_ner(request.data)
  File "/usr/src/app/app/endpoints.py", line 23, in post_article_ner
    simple_doc = translate_doc(doc)
  File "/usr/src/app/app/modules/common/utils.py", line 35, in translate_doc
    'id': get_document_identifier(document),
  File "/usr/src/app/app/modules/common/utils.py", line 22, in get_document_identifier
    url = document['meta']['pfl_url']
KeyError: 'pfl_url'

-->

This happens because in this case the enricher is called before the object is saved to elasticsearch. Hence, the pfl_url is not available at this point (It should not be necessary?)

45-minute pause in daily updates

In the updates that happen every 6 hours there is always a 45-minute pause between the GroenLinks and VVD pages. No idea why:

[2017-12-07 01:33:35,216: WARNING/Worker-5] Got https://westland.groenlinks.nl/actie-block/denk-mee with status code : 200
[2017-12-07 01:33:35,446: WARNING/Worker-5] Got https://westland.groenlinks.nl/nieuws/groenlinks-westland-wil-rust-aan-de-kust with status code : 200
[2017-12-07 01:33:35,507: WARNING/Worker-2] Got https://zoetermeer.groenlinks.nl/nieuws/van-de-voorzitter-1 with status code : 200
[2017-12-07 01:33:35,509: WARNING/Worker-3] Got https://zoetermeer.groenlinks.nl/nieuws/nieuws-van-de-campagne with status code : 200
[2017-12-07 01:33:35,552: WARNING/Worker-6] Got https://zoetermeer.groenlinks.nl/agenda/kerstborrel-0 with status code : 200
[2017-12-07 01:33:35,635: WARNING/Worker-1] Got https://zoetermeer.groenlinks.nl/nieuws/gezocht-bestuursleden-vm with status code : 200
[2017-12-07 01:33:35,659: WARNING/Worker-7] Got https://zoetermeer.groenlinks.nl/nieuws/groenlinks-gaat-voor-een-groene-sociale-en-open-stad with status code : 200
[2017-12-07 01:33:35,700: WARNING/Worker-8] Got https://zoetermeer.groenlinks.nl/nieuws/jordy-boerboom-lijsttrekker-groenlinks-zoetermeer with status code : 200
[2017-12-07 01:33:35,724: WARNING/Worker-5] Got https://zoetermeer.groenlinks.nl/nieuws/begrotingsdebat-tweede-termijn-een-groene-en-sociale-stad-een-klein-stukje-dichterbij with status code : 200
[2017-12-07 01:33:35,747: WARNING/Worker-2] Got https://zoetermeer.groenlinks.nl/nieuws/gemeenteraadsverkiezingen-2018-wil-jij-op-onze-kandidatenlijst with status code : 200
[2017-12-07 01:33:35,752: WARNING/Worker-4] Got https://zoetermeer.groenlinks.nl/nieuws/fractievoorzitter-marcel-van-der-tol-stelt-zich-niet-kandidaat-voor-een-tweede-termijn-als with status code : 200
[2017-12-07 01:33:35,901: WARNING/Worker-5] Got https://zoetermeer.groenlinks.nl/nieuws/sponsor-een-stationsposter with status code : 200
[2017-12-07 01:33:41,791: WARNING/Worker-7] Got https://zwijndrecht.groenlinks.nl/nieuws/het-recht-om-uit-te-dagen with status code : 200
[2017-12-07 01:33:41,806: WARNING/Worker-1] Got https://zwijndrecht.groenlinks.nl/nieuws/algemene-beschouwingen-kadernota-2017 with status code : 200
[2017-12-07 01:33:41,809: WARNING/Worker-4] Got https://zwijndrecht.groenlinks.nl/nieuws/politici-laten-luisteren with status code : 200
[2017-12-07 01:33:41,822: WARNING/Worker-5] Got https://zwijndrecht.groenlinks.nl/nieuws/kandidaten-voor-de-gemeenteraden-gezocht with status code : 200
[2017-12-07 01:33:41,847: WARNING/Worker-2] Got https://zwijndrecht.groenlinks.nl/nieuws/aan-de-slag-voor-groenlinks with status code : 200
[2017-12-07 01:33:41,875: WARNING/Worker-8] Got https://zwijndrecht.groenlinks.nl/nieuws/vrijwilligerscompliment-van-groenlinks with status code : 200
[2017-12-07 01:33:41,930: WARNING/Worker-6] Got https://zwijndrecht.groenlinks.nl/nieuws/groenlinks-betreurt-vertrek-wethouder-mirck with status code : 200
[2017-12-07 01:33:41,958: WARNING/Worker-3] Got https://zwijndrecht.groenlinks.nl/nieuws/mens-en-natuur-visie-kadernota with status code : 200
[2017-12-07 01:33:41,992: WARNING/Worker-4] Got https://zwijndrecht.groenlinks.nl/nieuws/uw-zorg-onze-zorg with status code : 200
[2017-12-07 01:33:42,020: WARNING/Worker-2] Got https://zwijndrecht.groenlinks.nl/nieuws/veiligheid-de-lift with status code : 200
[2017-12-07 02:15:16,593: WARNING/Worker-8] Got https://www.vvdamsterdam.nl/nieuws/25281/masterclass-vvd-amsterdam-gaat-weer-van-start with status code : 200
[2017-12-07 02:15:16,645: WARNING/Worker-3] Got https://www.vvdamsterdam.nl/nieuws/24543/de-vvd-zoekt-de-handhaver-van-het-jaar with status code : 200
[2017-12-07 02:15:16,775: WARNING/Worker-3] Got https://www.vvdamsterdam.nl/nieuws/25255/definitieve-kandidatenlijst-amsterdamse-vvd with status code : 200
[2017-12-07 02:15:17,736: WARNING/Worker-6] Got https://www.vvdamsterdam.nl/nieuws/25694/nieuwjaarsreceptie-amsterdamse-vvd-9-januari with status code : 200
[2017-12-07 02:15:17,878: WARNING/Worker-6] Got https://www.vvdamsterdam.nl/nieuws/25223/definitief-verkiezingsprogramma-vastgesteld with status code : 200
[2017-12-07 02:15:18,208: WARNING/Worker-5] Got https://www.vvdamsterdam.nl/nieuws/25721/libertijn-11-12-handhaver-van-het-jaar with status code : 200
[2017-12-07 02:15:18,393: WARNING/Worker-8] Got https://www.vvdamsterdam.nl/nieuws/25114/vvd-ontruim-kraakpand-we-are-here-groep with status code : 200
[2017-12-07 02:15:18,403: WARNING/Worker-5] Got https://www.vvdamsterdam.nl/nieuws/25226/maakindustrie-en-circulaire-bedrijven-krijgen-toegang-tot-het-havengebied with status code : 200
[2017-12-07 02:15:20,497: WARNING/Worker-1] Got https://www.vvdamsterdam.nl/nieuws/25538/tjakko-dijk-wil-hoorzitting-met-deskundigen-over-haven-stad with status code : 200
[2017-12-07 02:15:20,576: WARNING/Worker-7] Got https://www.vvdamsterdam.nl/nieuws/25375/marianne-poot-eenduidige-definitie-antisemitisme with status code : 200
[2017-12-07 02:15:20,583: WARNING/Worker-4] Got https://www.vvdamsterdam.nl/nieuws/25263/thema-avond-duurzaam-ondernemen-met-vvd-landsmeer with status code : 200
[2017-12-07 02:15:20,583: WARNING/Worker-2] Got https://www.vvdamsterdam.nl/nieuws/25262/rik-torn-over-graffiti-geen-welstandscommissie-voor-vandalisme with status code : 200
[2017-12-07 02:15:20,688: WARNING/Worker-3] Got https://www.vvdamsterdam.nl/nieuws/24989/ondernemen-in-de-stad-jeroen-de-zeeuw with status code : 200
[2017-12-07 02:15:20,756: WARNING/Worker-6] Got https://www.vvdamsterdam.nl/nieuws/24888/rik-torn-breinsessies-afvaloverlast-bizar-plan with status code : 200
[2017-12-07 02:15:20,810: WARNING/Worker-7] Got https://www.vvdamsterdam.nl/nieuws/25115/meer-aandacht-voor-programmeren-in-het-onderwijs with status code : 200
[2017-12-07 02:15:20,934: WARNING/Worker-5] Got https://www.vvdamsterdam.nl/nieuws/24990/vvd-verbetert-doorstroom-verkeer-op-schellingwouderbrug with status code : 200
[2017-12-07 02:15:20,935: WARNING/Worker-2] Got https://www.vvdamsterdam.nl/nieuws/25074/bierfietsen-straks-in-de-hele-stad-verboden with status code : 200
[2017-12-07 02:15:20,937: WARNING/Worker-1] Got https://www.vvdamsterdam.nl/nieuws/25040/ondernemen-in-de-stad-kerstbomen-amsterdam with status code : 200
[2017-12-07 02:15:21,082: WARNING/Worker-1] Got https://www.vvdamsterdam.nl/nieuws/24887/libertijn-13-11-veiligheid-boven-privacy with status code : 200
[2017-12-07 02:15:21,791: WARNING/Worker-4] Got https://www.vvdamsterdam.nl/nieuws/25077/vvd-blij-met-komst-willibrordusgarage-in-de-pijp with status code : 200
[2017-12-07 02:15:33,796: WARNING/Worker-6] Got http://www.vvdachtkarspelen.nl/?p=508 with status code : 200
[2017-12-07 02:15:33,804: WARNING/Worker-4] Got http://www.vvdachtkarspelen.nl/?p=528 with status code : 200
[2017-12-07 02:15:33,805: WARNING/Worker-8] Got http://www.vvdachtkarspelen.nl/?p=562 with status code : 200
[2017-12-07 02:15:33,815: WARNING/Worker-1] Got http://www.vvdachtkarspelen.nl/?p=505 with status code : 200
[2017-12-07 02:15:33,816: WARNING/Worker-7] Got http://www.vvdachtkarspelen.nl/?p=581 with status code : 200

What to do with documents without a description

Some documents don't have a description, e.g.: https://api.poliflw.nl/v0/combined_index/1a466b696c8f6861498faff10897cb9c7c011b7f

Why is this? If you go to the source of that document you see that there actually is text available. Did the parsing fail? Was there no text when we scraped it?

What should we do when we end up with no text for a document. Keep it for completeness sake (and users can check the source themselves), or don't save the document at all?

Location search on home page not working

When I enter Utrecht in the bottom search bar, the URL changes to https://poliflw.nl/?location=Utrecht.
It should link to https://poliflw.nl/zoeken?location=Utrecht.
In other words, the string should contain zoeken.

Front-end design issues

  • Discover more Scoops button bigger than expected

Screenshot 2020-04-08 at 09 35 15

Screenshot 2020-04-08 at 09 36 30

  • Missing easy toggle buttons between languages / countries

Screenshot 2020-04-08 at 09 37 47

  • Alignment of boxes on front-page not straight

Screenshot 2020-04-08 at 09 39 16

  • Strange line through the "Contribute Now" box

Screenshot 2020-04-08 at 09 40 33

  • "Collect Now" Box too big on the bottom

Screenshot 2020-04-08 at 09 41 12

In design:
Screenshot 2020-04-08 at 09 41 44

  • Alignment issues in the Collect & Subscribe box

Screenshot 2020-04-08 at 09 42 37

  • Spelling error in European Union in country overview

Locations should be stored in normalized form

Location specifications vary from (political) source to source. These specifications should be normalized. Things that need to be done:

  • create a normalization file (Thx, arjan)
  • Make the sources generating script use the normalized file
  • Make a script which converts locations for existing source configurations
  • Make a item transformer which normalizes the location for existing content
  • Run this item transformer

limit mem on redis?

Apparently redis uses too much memory sometimes, so we should limit it somehow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.