stl-public-meetings / city-scrapers-stl Goto Github PK

View Code? Open in Web Editor NEW

5.0 2.0 18.0 405 KB

Scrape, standardize and share public meetings from local government websites in St. Louis

License: MIT License

Python 99.89% Shell 0.11%

web-scraping python open-data scrapy city-scrapers

city-scrapers-stl's Introduction

City Scrapers St. Louis

What are the City Scrapers and why do we want them?

Public meetings are important spaces for democracy where any resident can participate and hold public figures accountable. But how does the public know when meetings are happening? It isn’t easy! These events are spread across dozens of websites, rarely in useful data formats.

Our Mission

The mission of the City Scrapers project is to increase access and transparency around public meetings across the St. Louis County by making it easier for everyone to know when and where public meetings are held.

All of the meetings gathered by our spiders can be viewed here!

What can I learn from working on the City Scrapers?

A lot about the City of St. Louis (and other municipalities of the Greater St. Louis area)! What is City Council talking about this week? What are the local school councils, and what community power do they have? What neighborhoods is the police department doing outreach in? Who governs our water?

From building a scraper, you'll gain experience with:

How the web works (HTTP requests and responses, reading HTML)
Writing functions and tests in Python
Version control and collaborative coding (git and Github)
A basic data file format (JSON), working with a schema and data validation
Problem-solving, finding patterns, designing robust code

Contributing

We welcome both coders and non-coders to help out with our project!

Fill out this form to join our Slack channel and meet the community!
Read about how we collaborate for more details.

Don't see your local public meetings?

Fill out this form to join our Slack channel! We love hearing from the community and learning about how we can better serve our city.

If there are any public meetings that you would like us to create a scraper for, please fill out this form to make a request.

When reviewing scraper requests, we might consider things such as:

Are these one-off meetings or recurring?
If they are one-off meetings, do we expect more in the future to be announced using a similar structure?
Is there historical data that could also be scraped using the same spider and might that be useful?
What is the estimated time and effort to write the scraper vs manual entry (i.e. if it takes 2-3 minutes to manually enter a single meeting and there are x number of meetings, how does that compare to the time taken to write the scraper)

Notes

This project is based off of a template repo provided by City Bureau. You can read more about what they do at citybureau.org.

We would also like to thank Pat at City Bureau for his patience and help setting up this project.

city-scrapers-stl's People

Contributors

Stargazers

Watchers

Forkers

ledaliang bchao99 pwiecko samatha-kodali pjsier mikejoo mudterc azlu20 atkirtland gracehamilton meme1015 jeffrey-chi perceptualcells adxsh pranjalguptaa hemantsirsat

city-scrapers-stl's Issues

Spider: Creve Coeur Building Code Board of Appeals

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1/1/1900&To=12/31/9999
Spider Name: cc_building_code
Agency Name: Creve Coeur Building Code Board of Appeals

See the contribution guide for information on how to get started

Spider: St. Louis Zoning Section

URL: https://www.stlouis-mo.gov/government/departments/public-safety/building/zoning/index.cfm
Spider Name: stl_zoning
Agency Name: St. Louis Zoning Section

See the contribution guide for information on how to get started

Spider: Creve Coeur Finance Committee

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1/1/1900&To=12/31/9999
Spider Name: cc_finance
Agency Name: Creve Coeur Finance Committee

See the contribution guide for information on how to get started

Sentry Integration

Spider: Creve Coeur Police and Safety Committee

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1%2f1%2f1900&To=12%2f31%2f9999
Spider Name: cc_police_safety
Agency Name: Creve Coeur Police and Safety Committee

See the contribution guide for information on how to get started

Spider: St. Louis Local Development Company

URL: https://www.stlouis-mo.gov/government/departments/sldc/boards/Local-Development-Company.cfm
Spider Name: stl_development
Agency Name: St. Louis Local Development Company

See the contribution guide for information on how to get started

Spider: St. Louis Board of Aldermen

URL: https://www.stlouis-mo.gov/government/departments/aldermen/
Spider Name: stl_aldermen
Agency Name: St. Louis Board of Aldermen

See the contribution guide for information on how to get started

Spider Enhancement - Creve Coeur Building Code Board of Appeals

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1/1/1900&To=12/31/9999
Spider Name: cc_building_code
Agency Name: Creve Coeur Building Code Board of Appeals

Fix classification for existing spider

Spider: Clayton Board Of Aldermen

URL: https://www.claytonmo.gov/government/mayor-board-of-aldermen
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-toggle-all/-seldept-2
Spider Name: clay_aldermen
Agency Name: Clayton Board Of Aldermen

See the contribution guide for information on how to get started

Spider: Clayton Plan Commission And Architectural Review Board

URL: https://www.claytonmo.gov/government/boards-and-commissions/plan-commission-and-architectural-review-board
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-seldept-8/-toggle-all
Spider Name: clay_plan_arb
Agency Name: Clayton Plan Commission And Architectural Review Board

See the contribution guide for information on how to get started

Spider: Creve Coeur Planning and Zoning Commission

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1%2f1%2f1900&To=12%2f31%2f9999
Spider Name: cc_planning
Agency Name: Creve Coeur Planning and Zoning Commission

See the contribution guide for information on how to get started

Spider: Clayton Sustainability Advisory Committee

URL: https://www.claytonmo.gov/government/boards-and-commissions/sustainability-advisory-committee
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-seldept-10/-toggle-all
Spider Name: clay_sustainability
Agency Name: Clayton Sustainability Advisory Committee

See the contribution guide for information on how to get started

Spider: Creve Coeur Parks and Historic Preservation Committee

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1%2f1%2f1900&To=12%2f31%2f9999
Spider Name: cc_parks_preservation
Agency Name: Creve Coeur Parks and Historic Preservation Committee

See the contribution guide for information on how to get started

Spider Enhancement: Creve Coeur Arts Committee

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?View=List&From=1/1/1900&To=12/31/9999
Spider Name: cc_arts
Agency Name: Creve Coeur Arts Committee

Needs to scrape cancelled meetings.

Spider: St. Louis Forest Park Advisory Board

URL: https://www.stlouis-mo.gov/government/departments/parks/parks/forest-park-advisory-board.cfm
Spider Name: stl_forest_park_advisory
Agency Name: Parks, Recreation and Forestry

See the contribution guide for information on how to get started

Spider: Creve Coeur Board of Adjustment

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1%2f1%2f1900&To=12%2f31%2f9999
Spider Name: cc_adjustment
Agency Name: Creve Coeur Board of Adjustment

See the contribution guide for information on how to get started

Spider: St. Louis Tax Increment Financing Commission

URL: https://www.stlouis-mo.gov/government/departments/sldc/boards/Tax-Increment-Financing-Commission.cfm
Spider Name: stl_tax_financing
Agency Name: St. Louis Tax Increment Financing Commission

See the contribution guide for information on how to get started

Spider: Clayton Non-Uniformed Employees Retirement Board

URL: https://www.claytonmo.gov/government/boards-and-commissions/non-uniformed-employees-retirement-board
Calendar URL: https://www.claytonmo.gov/government/boards-and-commissions/non-uniformed-employees-retirement-board
Spider Name: clay_nonuniform_retirement
Agency Name: Clayton Non-Uniformed Employees Retirement Board

See the contribution guide for information on how to get started

Spider: St. Louis Lambert International Airport Commission

URL: https://www.stlouis-mo.gov/government/departments/airport/index.cfm
Spider Name: stl_airport
Agency Name: St. Louis Airport Commission

See the contribution guide for information on how to get started

Spider: Creve Coeur Horticulture, Ecology and Beautification Committee

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1%2f1%2f1900&To=12%2f31%2f9999
Spider Name: cc_horticulture
Agency Name: Creve Coeur Horticulture, Ecology and Beautification Committee

See the contribution guide for information on how to get started

Spider: St. Louis Port Authority Commission

URL: https://www.stlouis-mo.gov/government/departments/sldc/boards/port-authority-commission.cfm
Spider Name: stl_port_authority
Agency Name: SLDC Development Boards

See the contribution guide for information on how to get started

Spider: Clayton Community Equity Commission

URL: https://www.claytonmo.gov/government/boards-and-commissions/community-equity-commission
Calendar URL: https://www.claytonmo.gov/calendar-6/-seldept-2
Spider Name: clay_equity_commission
Agency Name: Clayton Community Equity Commission

See the contribution guide for information on how to get started

Spider: St. Louis Enhanced Enterprise Zone Board

URL: https://www.stlouis-mo.gov/government/departments/sldc/boards/enhanced-enterprise-zone-commission-july-26-2016.cfm
Spider Name: stl_enterprise_zone
Agency Name: St. Louis Enhanced Enterprise Zone Board

See the contribution guide for information on how to get started

Spider: St. Louis Land Clearance for Redevelopment Authority

URL: https://www.stlouis-mo.gov/government/departments/sldc/boards/Land-Clearance-for-Redevelopment-Authority.cfm
Spider Name: stl_redevelopment
Agency Name: SLDC Development Boards

See the contribution guide for information on how to get started

Spider: St. Louis Board of Public Service

URL: https://www.stlouis-mo.gov/government/departments/public-service/index.cfm
Spider Name: stl_public_service
Agency Name: St. Louis Board of Public Service

See the contribution guide for information on how to get started

Spider: St. Louis Clean Energy Development Board

URL: https://www.stlouis-mo.gov/government/departments/sldc/boards/Clean-Energy-Development-Board.cfm
Spider Name: stl_energy_development
Agency Name: SLDC Development Boards

See the contribution guide for information on how to get started

Documentation

Spider: St. Louis Planning Commission

URL: https://www.stlouis-mo.gov/government/departments/planning/planning/planning-commission/index.cfm
Spider Name: stl_planning
Agency Name: St. Louis Planning Commission

See the contribution guide for information on how to get started

Spider: St. Louis Mechanical Section

URL: https://www.stlouis-mo.gov/events/past-meetings.cfm?department=359
Spider Name: stl_mechanical
Agency Name: St. Louis Mechanical Section

See the contribution guide for information on how to get started

Spider: Clayton Community Foundation

URL: https://www.claytonmo.gov/government/boards-and-commissions/clayton-century-foundation
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-toggle-all/-seldept-12
Spider Name: clay_community_foundation
Agency Name: Clayton Community Foundation

See the contribution guide for information on how to get started

Spider: Creve Coeur Venable Park Task Force

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx?From=1%2f1%2f1900&To=12%2f31%2f9999
Spider Name: cc_venable_park
Agency Name: Creve Coeur Venable Park Task Force
See the contribution guide for information on how to get started

Spider: Clayton Uniformed Employees Retirement Board

URL: https://www.claytonmo.gov/government/boards-and-commissions/uniformed-employees-retirement-board
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-seldept-5/-toggle-all
Spider Name: clay_uniform_retirement
Agency Name: Clayton Uniformed Employees Retirement Board

See the contribution guide for information on how to get started

Spider Enhancement: Creve Coeur Horticulture, Ecology and Beautification Committee

See the contribution guide for information on how to get started

Spider needs to scrape phone conferences

Spider: Clayton Economic Development Advisory Committee

URL: https://www.claytonmo.gov/government/boards-and-commissions/economic-development-advisory-committee
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-seldept-3/-toggle-all
Spider Name: clay_economic_development
Agency Name: Clayton Economic Development Advisory Committee

See the contribution guide for information on how to get started

Spider: Creve Coeur Economic Development Committee

URL: https://crevecoeurcitymo.iqm2.com/Citizens/Calendar.aspx
Spider Name: cc_economic_development
Agency Name: Creve Coeur Economic Development Committee
See the contribution guide for information on how to get started

Spider: St. Louis Affordable Housing Commission

URL: https://www.stlouis-mo.gov/government/departments/affordable-housing/index.cfm
Spider Name: stl_affordable_housing
Agency Name: St. Louis Affordable Housing Commission

See the contribution guide for information on how to get started

Spider: Clayton Parks And Recreation Commission

URL: https://www.claytonmo.gov/government/boards-and-commissions/parks-and-recreation-commission
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-seldept-7/-toggle-all
Spider Name: clay_parks
Agency Name: Clayton Parks And Recreation Commission

See the contribution guide for information on how to get started

Spider: St. Louis Joint Boards of Health and Hospitals

URL: https://www.stlouis-mo.gov/government/departments/health/board/index.cfm
Spider Name: stl_health
Agency Name: St. Louis Joint Boards of Health and Hospitals

See the contribution guide for information on how to get started

Archive scraped pages and documents on the Wayback Machine

Great work getting this up and running so quickly! One of the aspects of the City Scrapers project we haven't documented very well is our use of a Python package we created, scrapy-wayback-middleware.

The overall goal of the City Scrapers project is to improve transparency and create an archive not just of upcoming meetings, but past meetings and related documents as well as how they change over time. An important part of that for us has been archiving (almost) every page and document we scrape on the Internet Archive's Wayback Machine as well as in our static output.

Having a second, more public and accessible location makes the meeting information more available regardless of how long the project goes. We've even used it to track potential violations of open meetings laws, since it provides an external source for seeing what content was or was not on a website at a given time. Here's an example of snapshots of the Chicago Plan Commission's website over time.

The con of this approach is that it can make cron builds take significantly longer, but currently we're well under the 6 hour GitHub Actions time limit with over 100 scrapers on the main City Scrapers repo.

If you're interested, you can add scrapy-wayback-middleware as a dependency, and then you'll likely want to subclass the middleware to also scrape any documents you find like we've done in our main middleware.py. Then you can add it in your settings/prod.py like we did in our settings.

We're only activating it when the WAYBACK_ENABLED environment variable is set, and the template cron.yml file already sets this so once it's added in your settings file you should be good to go!

Let me know if you have any questions, and I'm happy to put in a PR for this if it's helpful

URL: https://www.claytonmo.gov/government/boards-and-commissions/clayton-recreation-sports-and-wellness-commission
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-toggle-all/-seldept-7
Spider Name: clay_sports_wellness
Agency Name: Clayton Recreation, Sports And Wellness Commission

See the contribution guide for information on how to get started

Spider: St. Louis Industrial Development Authority

URL: https://www.stlouis-mo.gov/government/departments/sldc/boards/Industrial-Development-Authority.cfm
Spider Name: stl_industrial_development
Agency Name: St. Louis Industrial Development Authority

See the contribution guide for information on how to get started

Spider: Clayton Board Of Adjustment

URL: https://www.claytonmo.gov/government/boards-and-commissions/board-of-adjustment
Calendar URL: https://www.claytonmo.gov/calendar/meetings/-toggle-all/-seldept-8
Spider Name: clay_adjustment_board
Agency Name: Clayton Board Of Adjustment

See the contribution guide for information on how to get started

stl-public-meetings / city-scrapers-stl Goto Github PK

city-scrapers-stl's Introduction

City Scrapers St. Louis

What are the City Scrapers and why do we want them?

Our Mission

What can I learn from working on the City Scrapers?

Contributing

Don't see your local public meetings?

Notes

city-scrapers-stl's People

Contributors

Stargazers

Watchers

Forkers

city-scrapers-stl's Issues

Recommend Projects

Recommend Topics

Recommend Org