Git Product home page Git Product logo

offenesparlament's People

Contributors

benfreu avatar fin avatar horrendus avatar laszlokiraly avatar lyrixderaven avatar mfit avatar shellmayr avatar sk7 avatar stefanm7 avatar themistress avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

offenesparlament's Issues

ES search returns 0 results for categories containing ":" or "/"

Steps to reproduce:
Search for current LLP, Suchtyp: Gesetze, Kategorie: "Regierungsvorlage: Bundes(verfassungs)gesetz"

The server seems to understand the query correctly, as it logs this to the console:

INFO:offenesparlament.views.search:Searching <class 'op_scraper.models.Law'> with arguments [{'limit': 50, 'facet_filters': {'category': u'Regierungsvorlage: Bundes(verfassungs)gesetz', 'llps': u'XXV'}, 'offset': 0}]

The same happens for "Vorl. ü. Initiative/Beschluss des Europ. Rates und des Rates" and "Regierungsvorlage: Staatsvertrag"

Error when creating an email alert based on search result

When trying to set up an Email alert, I get the message:
"Benachrichtigungen abonnieren
Ihr Abo für Periode XXV: amtsgeheimnis konnte nicht eingerichtet werden. Bitte versuchen Sie es erneut."

I searched for "amtsgeheimnis" on the front page, then tried to set up an alert for that serach

Comittee Scraper: Active vs. Non-Active Comittees produce integrity errors for NR

The 'unique_together' meta info for the comitees doesn't work like that. For BR comittees, that might make sense, but since the meta attribute is valid for NR comittees as well, we get duplicates that are the same except for the status. This needs to be fixed, since subsequent scrapes of the person scraper produce errors when trying to update-or-create comittees (since they then produce two results when searching for the respective comittee).

Distribute issues

bootstrap.sh fails - apparently in one of the "pip install" stages, with:

No module named pkg_resources

[CodeBounty] Free-Form

Free-Form:

Datendarstellung bzw -zusammenführung

Frontend: € 400

Eine Free-Form Code Bounty!
Hast Du spannende Ideen für Datenvisualisierungen, die man in OffenesParlament.at integrieren könnte?
Kann man verschiedene Bereiche von Parlamentsaktivitäten zusammenführen, die noch nicht miteinander verknüpft sind?

Hier kannst Du’s versuchen!

Die Abnahme dieser Code Bounty erfolgt sehr subjektiv - Wir geben gerne Feedback zu Design-Ideen!

Subscriptions: Single/Detail Page subscription URL isn't working properly

Subscribing a detail page (like the Person detail page for Rosa Ecker doesn't provide a proper, single request link base on the parl_id (or parl_id and llp for laws), but instead uses the generic 'personen' search, which results in this link:

http://offenesparlament.vm:8000/personen/search?llps=XXV&type=Personen&limit=-1&fieldset=all

That of course isn't correct and yields problems when trying to subscribe and or collect the changes.

Use Django-Configuration to create&maintain different test/deploy configs

Search View doesn't change the URL when on Detail page

When I'm on a detail page (for instance, this one) and change/edit the search bar in any way, i get the search results again (so far i think it's a wanted behaviour). But then these things happen:

  • the URL in the address bar doesn't change
    • a refresh of the page reloads the detail page, not the search page
    • subscribing the content will possibly subscribe the detail page, not the search page (please confirm this, i'm just guessing here)
  • the title of the page is still the one of the detail page

This shouldn't happen, I think.

Haystack error when indexing freshly scraped models

ERROR:root:Error updating op_scraper using default
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 188, in handle_label
    self.update_backend(label, using)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 233, in update_backend
    do_update(backend, index, qs, start, end, total, verbosity=self.verbosity, commit=self.commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 96, in do_update
    backend.update(index, current_qs, commit=commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/backends/elasticsearch_backend.py", line 166, in update
    prepped_data = index.full_prepare(obj)
  File "/usr/local/lib/python2.7/dist-packages/haystack/indexes.py", line 212, in full_prepare
    self.prepared_data = self.prepare(obj)
  File "/usr/local/lib/python2.7/dist-packages/haystack/indexes.py", line 203, in prepare
    self.prepared_data[field.index_fieldname] = field.prepare(obj)
  File "/usr/local/lib/python2.7/dist-packages/haystack/fields.py", line 103, in prepare
    raise SearchFieldError("The model '%s' combined with model_attr '%s' returned None, but doesn't allow a default or null value." % (repr(obj), self.model_attr))
SearchFieldError: The model '<Person: Hafenecker Christian, MA>' combined with model_attr 'ts' returned None, but doesn't allow a default or null value.
Traceback (most recent call last):
  File "manage.py", line 21, in <module>
    run()
  File "manage.py", line 14, in run
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 330, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 390, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 441, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/rebuild_index.py", line 26, in handle
    call_command('update_index', **options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 120, in call_command
    return command.execute(*args, **defaults)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 441, in execute
    output = self.handle(*args, **options)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 183, in handle
    return super(Command, self).handle(*items, **options)
  File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 619, in handle
    label_output = self.handle_label(label, **options)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 188, in handle_label
    self.update_backend(label, using)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 233, in update_backend
    do_update(backend, index, qs, start, end, total, verbosity=self.verbosity, commit=self.commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/management/commands/update_index.py", line 96, in do_update
    backend.update(index, current_qs, commit=commit)
  File "/usr/local/lib/python2.7/dist-packages/haystack/backends/elasticsearch_backend.py", line 166, in update
    prepped_data = index.full_prepare(obj)
  File "/usr/local/lib/python2.7/dist-packages/haystack/indexes.py", line 212, in full_prepare
    self.prepared_data = self.prepare(obj)
  File "/usr/local/lib/python2.7/dist-packages/haystack/indexes.py", line 203, in prepare
    self.prepared_data[field.index_fieldname] = field.prepare(obj)
  File "/usr/local/lib/python2.7/dist-packages/haystack/fields.py", line 103, in prepare
    raise SearchFieldError("The model '%s' combined with model_attr '%s' returned None, but doesn't allow a default or null value." % (repr(obj), self.model_attr))
haystack.exceptions.SearchFieldError: The model '<Person: Hafenecker Christian, MA>' combined with model_attr 'ts' returned None, but doesn't allow a default or null value.

fix celery config

default install gives me

ProgrammingError: relation "celery_taskmeta" does not exist
LINE 1: ...taskmeta"."hidden", "celery_taskmeta"."meta" FROM "celery_ta...

when running scrapers from the admin console
django-celery still seems to be required for using django-orm as celery results backend

Upgrade Django Version

I just ran into the following bug: [https://code.djangoproject.com/ticket/24513]
Upgrade to 1.8.7 fixed it.

I suggest changing requirements.txt django version to: Django>=1.8.0,<1.9

Law Details Page: "None" in last_updated, no Categories set

It looks like I have reached an interesting state after a first scrape session:

image
image

No entry has a value in "Aktualisierung", but we shouldn't put "None" in there, IMHO.
Also, some don't have an "Art", even though they are the same parliament type (ME) as the others

AJAX Requests in Facet Search

When I open the person-search-page and then select the search bar, one request is made to fetch the available facets, as expected. But when I then select 'party' to add a party filter, as soon as I select the party facet, an ajax request like this is made:

http://offenesparlament.vm:8000/personen/search?llps=XXV&type=Personen&party=

This is not only unnecessary, but also quite costly, since it returns a large amount of data that's no necessarily relevant to the list (with the new addition of debates, this can be a request up to 30 MB!). If this behaviour is also existing for laws, the amount of data in need of transferring might even be bigger.

In my opinion, we shouldn't make a new AJAX request to the search view before the user has selected a value for the facet in question (i.e. not before they actually select the party they want to filter by).

syncdb fails on empty database

CommandError: Conflicting migrations detected (0004_auto_20150814_1941, 0005_keyword__title_urlsafe in op_scraper).
To fix them run 'python manage.py makemigrations --merge'

can you reproduce this?

Subscriptions: Add POST parameter to subscription requests for page category

Given that we have to split the changes we collect for different subscriptions into one of the four categories person, law, debate or search, i added a new field to the SubscribedContent model named category, which I need so set when creating a new subscription. For this I need to know what's being subscribed, so I need a POST-parameter named 'category' which contains one of those four categories:

  • person
  • law
  • debate
  • search

The first three obviously refer to a single-result-search, the last one can contain search results of any type (I have to figure this out in the subscription changes code myeslf anyways).

[CodeBounty] Seitenweite Textfilter

Seitenweite Textfilter

Infrastruktur: € 500

Beim Parsen von Texten wollen wir automatisch gewisse Entitäten verlinken.
Infrastruktur, die dies ermöglicht, ist Teil dieser Bounty

Verlinkung von Parlamentariern

Scraper: € 200

Wenn Parlamentarier namentlich erwähnt sind, aber nicht verlinkt sind, sollen sie verlinkt werden.

Verlinkung von Parlament-URLs

Scraper: € 200

Rewrite von Parlament-URLs auf OffenesParlament.at URLs

Verlinkung von Gesetz (ris.bka.gv.at Integration)

Scraper: 500 €
Frontend: 200 €

Wenn ein Gesetz erwähnt wird, soll es als solches erkannt werden.

Petitions: Handle Exceptions for Petitions Scraper properly

Currently, the petitions scraper still throws one or the other exception, for instance:

ERROR:scrapy.core.scraper:Spider error processing <GET     http://www.parlament.gv.at/PAKT/VHG/XXV/BI/BI_00058/index.shtml> (referer: None)
Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in _runCallbacks
   current.result = callback(current.result, *args, **kw)
 File "/vagrant/offenesparlament/op_scraper/scraper/parlament/spiders/petitions.py", line 174, in parse
   petition_creators = self.parse_creators(response)
 File "/vagrant/offenesparlament/op_scraper/scraper/parlament/spiders/petitions.py", line 442, in parse_creators
   creators = PETITION.CREATORS.xt(response)
 File "/vagrant/offenesparlament/op_scraper/scraper/parlament/resources/extractors/petition.py", line 54, in xt
   parl_id = creator_sel.xpath("//a/@href").extract()[0].split("/")[2]
IndexError: list index out of range

While it's ok that some things don't work out when scraping, we need to catch all exceptions, or otherwise the Django Reversion stop the database commits, and nothing that was scraped ends up saved.

Error while reindexing with ES

When reindexing via python manage.py rebuild_index after a few minutes the following error occurs:

WARNING:elasticsearch:POST http://localhost:9200/haystack/modelresult/_bulk [status:N/A request:1.792s]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 74, in perform_request
    response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 608, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python2.7/dist-packages/urllib3/util/retry.py", line 224, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 558, in urlopen
    body=body, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 353, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python2.7/httplib.py", line 966, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1000, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 962, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 822, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 798, in send
    self.sock.sendall(data)
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
ProtocolError: ('Connection aborted.', error(104, 'Connection reset by peer'))
INFO:urllib3.connectionpool:Starting new HTTP connection (2): localhost

This might be linked to chunk size and the error happens around indexing of debate_statements

Send email on error/stacktrace from production

The logging setup for production should be adapted so we/an email-adress of our choice/a mailing-list receives an email with the error and the link whenever a stacktrace happens. This would make debugging errors that occur in production much easier.

[CodeBounty] Petitionen & Bürgerinitiativen

Petitionen & Bürgerinitiativen

Scraper: € 600
Frontend: € 200

Petitionen werden von mindestens fünf ParlamentarierInnen eingebracht
Bürgerinitiativen werden von mindestens 500 BürgerInnen (handschriftlich Unterschrieben) eingebracht

Petitionen & BIs können über die Parlaments-Website “unterschrieben” werden, wenn sie einmal eingebracht wurden

Scraper

Einstiegsseite: http://www.parlament.gv.at/PAKT/BB/
Beispielseite (BI): http://www.parlament.gv.at/PAKT/VHG/XXV/BI/BI_00083/index.shtml
Beispielseite (Pet): http://www.parlament.gv.at/PAKT/VHG/XXV/PET/PET_00032/index.shtml

Zu scrapen:

  • Einbringer
  • Parlamentarisches Verfahren
  • Zustimmungserklärungen

Frontend

Anzahl der Zustimmungen, welche Parteien dafür/dagegen, wenn abgeschlossen:
http://www.parlament.gv.at/PAKT/VHG/XXV/PET/PET_00032/index.shtml#tab-ParlamentarischesVerfahren

[CodeBounty] Protokolle - Reden, Zwischenrufe

Protokolle - Reden, Zwischenrufe

Scraper: 2200
Frontend: 1000

Scraper

Einstiegsseite: http://www.parlament.gv.at/PAKT/STPROT/
Detailsseite (Beispiel): http://www.parlament.gv.at/PAKT/VHG/XXV/NRSITZ/NRSITZ_00072/fnameorig_447647.html

Zu bearbeiten:

  • Scrapen des Inhaltsverzeichnisses (‘Verhandlungen’) der Protokolle auf RednerInnen (sowohl Abgeordnete als auch Mitglieder der Regierung!)
  • Scrapen der einzelnen Reden
  • Extrahieren von Links auf Personen; Ersetzen der Links auf die Personen-Detailseite auf parlament.gv.at durch Links auf die Personen-Detailseite auf OffenesParlament.at
  • Crossreferencing: Anpassen des ‘Statement’ Models um den Volltext aufnehmen zu können, Verlinkung mit bereits existierenden Statement-Model oder Neu-Erstellen von Statement-Model und deren Link zu Personen-Model

Achtung: Aufgrund der Datenmenge, muss ein besonderes Augenmerk auf Performance gelegt werden; falls das Scrapen zu lange dauert, muss eine Lösung zum asychronen Triggern des Scrapens einzelner Protokoll entwickelt werden (um den Server nicht zu überlasten).

Das Einbinden der gescrapeten Statements in ElasticSearch übernimmt das Team von OffenesParlament!

Frontend

Tasks:

  • Entwicklung von Darstellungsform einer Detailseite pro Rede
  • Einbetten von (Volltext) Reden/Wortmeldungen in Personen-Detailsseite

[CodeBounty] Visualisierung des Gesetzgebungsprozesses

Visualisierung des Gesetzgebungsprozesses

Frontend: € 500

Der Gesetzgebungsprozess ist nicht ohne Tücken. Wie weit ein Gesetz in diesem Prozess ist, und was es noch vor sich hat, wollen wir sinnvoll darstellen können.

Challenges:

  • Wie auf Mobil?
  • Womöglich mehrere Varianten, je nach Platzangebot

Die Abnahme dieser Code Bounty erfolgt sehr subjektiv - Wir geben gerne Feedback zu Design-Ideen!

Model/"Enum" for Nationalrat & Bundesrat

In vielen Fällen (Mandate, Ausschüsse, Sitzungsprotokolle) wär es gut ein zentrales Model für die Kammern (Nationalrat, Bundesrat) zu haben, um dieses mit anderen Models verbinden zu können.

Standardize Functions Model

Das Model Function ist im Moment mM nach ein bischen unschön (und lässt sich daher nicht so gut für andere Codeteile verwenden, zB Ausschüsse)

Ein paar Ideen:

  • Mapping zwischen der weiblichen & männlichen Bezeichnung auf eine gemeinsamen Form (z.B. Bundesminister -> BundesministerIn, Bundesministerin -> BundesministerIn)
  • evtl. nurmehr short title verwenden (um alle Bundesminister & Bundesministerinnen, Staatsekretäre & Staatsekretärinnen zu finden, egal in welchem Ressort) und Ressort zB in Mandate oder ein eigenes Model schieben?

/personen/search returns an array of arrays as llps

(actually an array of strings that contain a JSON array)

eg "['1986-12-17 - 1990-11-04 (XVII)', '1983-05-19 - 1986-12-16 (XVI)', '1979-06-05 - 1983-05-18 (XV)']" instead of simply "1986-12-17 - 1990-11-04 (XVII)"

settings cwd issue with celery/configurations

so,

this is a tricky one.

this - in settings.py - is problematic:

# Import scrapy settings
c = os.getcwd()
os.chdir(str(c) + '/op_scraper/scraper')
d = os.getcwd()
path.append(d)
os.chdir(c)
d = os.getcwd()
os.environ['SCRAPY_SETTINGS_MODULE'] = 'parlament.settings'

especially problematic is the '''os.chdir''', since this changes every import made after the execution of settings.py.

what's currently happening - i think - is that every "import celery" after this imports offenesparlament/celery.py instead of the global celery.py

however, that's not the cause of a problem i'm currently debugging, just an annoying side effect. can we fix this anyway?

Django configuration install error

I followed the instructions up to vagrant up. The system packages installation looks good, but during the pip install steps I get the error

==> offenesparlament: Obtaining django-configurations-head from git+https://github.com/jezdez/django-configurations.git@5ece107044#egg=django-configurations-head (from -r requirements.txt (line 30))
==> offenesparlament:   Directory /vagrant/src/django-configurations-head already exists, and is not a git clone.
==> offenesparlament:   The plan is to install the git repository https://github.com/jezdez/django-configurations.git
==> offenesparlament: Exception:
==> offenesparlament: Traceback (most recent call last):
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/basecommand.py", line 211, in main
==> offenesparlament:     status = self.run(options, args)
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/commands/install.py", line 294, in run
==> offenesparlament:     requirement_set.prepare_files(finder)
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 334, in prepare_files
==> offenesparlament:     functools.partial(self._prepare_file, finder))
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 321, in _walk_req_to_install
==> offenesparlament:     more_reqs = handler(req_to_install)
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/req/req_set.py", line 433, in _prepare_file
==> offenesparlament:     req_to_install.update_editable(not self.is_download)
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/req/req_install.py", line 573, in update_editable
==> offenesparlament:     vcs_backend.obtain(self.source_dir)
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/vcs/git.py", line 109, in obtain
==> offenesparlament:     if self.check_destination(dest, url, rev_options, rev_display):
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/vcs/__init__.py", line 241, in check_destination
==> offenesparlament:     prompt[1])
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/utils/__init__.py", line 135, in ask_path_exists
==> offenesparlament:     return ask(message, options)
==> offenesparlament:   File "/usr/local/lib/python2.7/dist-packages/pip/utils/__init__.py", line 146, in ask
==> offenesparlament:     response = input(message)
==> offenesparlament: EOFError: EOF when reading a line
==> offenesparlament: What to do?  (i)gnore, (w)ipe, (b)ackup Your response ('exit') was not one of the expected responses: i, w, b

which later leads to this

==> offenesparlament: Successfully installed Django-1.8.6 Jinja2-2.8 MarkupSafe-0.23 Pygments-2.0.2 alabaster-0.7.6 babel-2.1.1 django-debug-toolbar-1.4 django-debug-toolbar-template-timings-0.6.4 docutils-0.12 pytz-2015.7 six-1.10.0 snowballstemmer-1.2.0 sphinx-1.3.1 sphinx-rtd-theme-0.1.9 sqlparse-0.1.18
==> offenesparlament: Traceback (most recent call last):
==> offenesparlament:   File "manage.py", line 21, in <module>
==> offenesparlament:     
==> offenesparlament: run()
==> offenesparlament:   File "manage.py", line 13, in run
==> offenesparlament:     
==> offenesparlament: from configurations.management import execute_from_command_line
==> offenesparlament: ImportError
==> offenesparlament: : No module named configurations.management
==> offenesparlament: Traceback (most recent call last):
==> offenesparlament:   File "manage.py", line 21, in <module>
==> offenesparlament:     
==> offenesparlament: run()
==> offenesparlament:   File "manage.py", line 13, in run
==> offenesparlament:     
==> offenesparlament: from configurations.management import execute_from_command_line
==> offenesparlament: ImportError
==> offenesparlament: : 
==> offenesparlament: No module named configurations.management
==> offenesparlament: Traceback (most recent call last):
==> offenesparlament:   File "./manage.py", line 21, in <module>
==> offenesparlament:     
==> offenesparlament: run()
==> offenesparlament:   File "./manage.py", line 13, in run
==> offenesparlament:     
==> offenesparlament: from configurations.management import execute_from_command_line
==> offenesparlament: ImportError
==> offenesparlament: : 
==> offenesparlament: No module named configurations.management
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

Any ideas on what could be going wrong?

vagrant/instruction issues

point 8 on the vagrant setup instructions is to run grunt in /vagrant, however
(1) it complains that grunt-contrib-watch is not installed
(2) after npm install, it complains that ruby and sass are needed for things to work:

Warning:
You need to have Ruby and Sass installed and in your PATH for this task to work.
More info: https://github.com/gruntjs/grunt-contrib-sass
Used --force, continuing.
Warning: spawn ENOENT Used --force, continuing.
Warning: spawn ENOENT Used --force, continuing.

could you shed a light on this?

Should we move the public ES endpoints to different URLs?

This is not important. But would it make sense to bundle our current ES search endpoints and more formally make it the first part of our API?

So instead of /search, /gesetze/search, /personen/search we would have something like /api/v1/search/[searchtype] ?

[CodeBounty] Ausschüsse

Ausschüsse

Scraper: 400
Frontend: 400

Ausschuss-Liste plus Verbindung zwischen Ausschuss-Model und Parlamentarier-Model herstellen

Scraper

Einstiegsseite: http://www.parlament.gv.at/PAKT/AUS/
Detailseite: http://www.parlament.gv.at/PAKT/VHG/XXV/A-AU/A-AU_00001_00361/index.shtml

Tips:

  • Der RSS-Link auf der Einstiegsseite rechts oben bezieht sich immer auf die aktuell gewählten Formulareinstellungen - praktisch zum reverse engineering der URLs.
  • Tabs auf den Detailseiten sind rein im Frontend implementiert; für das Scrapen daher am Besten immer die HTML-Source betrachten!

Zu Beachten:

  • Ausschüsse können sowohl dem Bundesrat als auch dem Nationalrat zugehörig sein - bitte beides beachten.
  • Unbedingt zu scrapen: Sitzungstermine, wo vorhanden
  • ‘Meldungen der Parlamentskorrespondenz’ können derzeit noch ignoriert werden

Beispielseite für Ausschussmitglieder: http://www.parlament.gv.at/PAKT/VHG/XXV/A-AU/A-AU_00001_00361/MIT_00361.html
oder
http://www.parlament.gv.at/WWER/PAD_51564/#tab-Ausschuesse

Eher von der Parlamentarier-Seite ausgehen, Achtung: Posten können in der Vergangenheit liegen. Bitte diese dann trotzdem parsen (:. Generell wird es sich anbieten, für Ausschusszugehörigkeit ein eigenes Model zu erstellen (zb. Membership), da dann auch ein Start- und End-Datum der Zugehörigkeit pro Person/Ausschuss möglich ist.

Frontend

Tasks:

  • Grundlegende Ausschuss-Seite, inklusive Liste der Mitglieder
  • Erweiterung der Parlamentarier-Seiten um die Liste der Ausschussmitgliedschaften

Ausschuss-Details

Scraper: 500
Frontend: 100

Scraper

Tasks:

  • Tagesordnungen
  • Verhandlungsgegenstände
  • Liste der Veröffentlichungen

Beispielseite Tagesordnungen
http://www.parlament.gv.at/PAKT/VHG/XXV/A-AU/A-AU_00001_00361/index.shtml#tab-Sitzungsueberblick

    1. Sitzung hat TOPs. Diese Scrapen und mit den Verhandlungsgegenständen (Gesetzen, Petitionen) - falls schon gescraped - verbinden

Beispielseite Verhandlungsgegenstände
http://www.parlament.gv.at/PAKT/VHG/XXV/A-AU/A-AU_00001_00361/index.shtml#tab-Verhandlungsgegenstaende

  • Scrapen und mit den Verhandlungsgegensätzen - falls schon gescraped - verbinden

Beispielseite Veröffentlichungen
http://www.parlament.gv.at/PAKT/VHG/XXV/A-AU/A-AU_00001_00361/index.shtml#tab-VeroeffentlichungenBerichte

Frontend

Einfach auf Ausschuss-Seite die Listen für die drei Kategorien ausgeben

[CodeBounty] Parlamentarische Anfragen & Beantwortungen

Parl. Anfragen & Beantwortungen

Scraper: € 600
Frontend: € 400

Scraper

Einstiegsseite: http://www.parlament.gv.at/PAKT/JMAB/
Detailseite (Beispiel): http://www.parlament.gv.at/PAKT/VHG/XXV/J/J_06430/index.shtml
Detailseite (Beispiel, beantwortet): http://www.parlament.gv.at/PAKT/VHG/XXV/J/J_05835/index.shtml
Detailseite 2 (Beantwortung): http://www.parlament.gv.at/PAKT/VHG/XXV/AB/AB_05632/index.shtml

Zu Beachten:

  • Anfragen können sowohl vom Bundesrat als auch vom Nationalrat gestellt werden - bitte beides beachten.
  • Bitte immer die “Steps” - Historie der Anfrage inklusive Timestamps - parsen.
  • ACHTUNG: Es könnte auch mündliche Beantwortungen geben

Tips:

  • Tabs auf den Detailseiten sind rein im Frontend implementiert; für das Scrapen daher am Besten immer die HTML-Source betrachten!
  • Der RSS-Link auf der Einstiegsseite rechts oben bezieht sich immer auf die aktuell gewählten Formulareinstellungen - praktisch zum reverse engineering der URLs.

Frontend

Challenge:

  • gleichzeitige Darstellung von Anfragen und Antworten mittels pdf.js oder iframes, wenn HTML-Versionen vorhanden sind.

Add-On: Inhalte aus PDFs & rudimentäre Fragen-Zuordnung

Scraper: € 1.800 (€ 600 ohne OCR, € 1800 mit OCR)
Frontend: € 600

Scraper

  • pdf2text oÄ die PDFs
  • Zuordnung von Antworten zu Fragen (zB “Zu 1, 2, 3-9” den Fragen zuordnen)
  • Für Bonus-€: OCR über die PDFs, um die Antworten durchsuchbar zu machen und die Antworten auch in gescannten Dokumenten den Fragen rudimentär zuordnen zu können

Frontend

  • Darstellung von Fragen und Antworten - auch Mobil - in einem Format, in dem sie zueinander zuordenbar sind.
  • Reihenfolge umschaltbar: Reihenfolge der Fragen, Reihenfolge der Antworten.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.