Git Product home page Git Product logo

popolo-spec's People

Contributors

almereyda avatar dracos avatar girogiro avatar jpmckinney avatar pm5 avatar tmtmtmtm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

popolo-spec's Issues

Entity name in different languages

We should be able to define names in different languages for an entity. For example the organization "Legislative Yuan" should be "立法院" in zh-Hant.

I'd like to propose recommending the use other_names with locale:-prefix note entries for this purpose. the value after locale: is the language of the entry, defined in the IANA Language Subtag Registry

For example:

{
  "id": "ly.gov.tw",
  "name": "Legislative Yuan",
  "other_names": [
    {
      "name": "立法院",
      "note": "locale:zh-Hant"
    },
    {
      "name": "Yuan Legislativo",
      "note": "locale:es"
    }
  ]
}

add patronymicName?

From ISA Person Core Vocabulary:

Patronymic names are important in some countries. Iceland does not have a concept of family name in the way that many other European countries do, for example. In Bulgaria and Russia, patronymic names are in every day usage.

No other vocabulary defines a similar property.

On the other hand, if we add every possible component of a name, the vocabulary will be very large (see CIQ example).

Dublin Core has an excellent article on the subject of name representation.

gender over time?

It's possible for a person's gender to change over time.

  1. Should we track the changes over time, as we do for a person's name?
  2. How do we implement this?

Implementation

New fields strategy

PopIt gives start and end dates for each other name a person has. We can do the same for gender. PopIt also uses other_names to represent alternative names that may not have a start and end date, and allows tagging of the name.

Versioning strategy

We can alternatively version the entire person document, and handle both types of changes using the same mechanism. In Mongoid, versions are embedded documents, in much the same way other_names is an embedded document in PopIt. The only issue with embedding full versions is staying within the maximum MongoDB document size of 16MB (which should be easy).

Lookups

Lookups for previous names and genders would be the same in either strategy: the embedded documents would have start and end dates to locate the appropriate embedded document, and then the name or gender would be read.

Entering historical data

Backfilling data is likely easier to implement using other_names and other_genders fields than using versions. Versions require careful version management, e.g. you would not want to create a new version to correct an error or complete a document, but you would to change a person's name due to an event, e.g. knighting, marriage, etc. Using versions may also conflict with edit histories in certain implementations.

Conclusion

Unless we can identify a large number of fields whose values we want to track over time, the "new fields" strategy seems to be preferable in terms of implementation.

What properties should be required on a Post?

Currently, only label is required.

organization_id was required, but when posts are embedded in an organization document, this property is unnecessary. role was required, but it's conceivable that a use case doesn't require categorizing posts.

choice of unique identifier

MongoDB will automatically assign a 12-byte ID, which comes out to a 24-character hexadecimal string. PopIt uses standard MongoDB IDs. Billy sets the ID to the uppercase jurisdiction code, e.g. "CA" for California or "PA-PHILADELPHIA" for Philadelphia, followed by a one-letter code for the document type, e.g. "L" for legislators, and a six-digit number.

Popolo currently has no recommendation for identifiers. Systems may choose any identifier scheme.

Specify how to denormalize JSON

e.g. to include a person's memberships on their Person object, use a memberships field, whose value is an array of Membership objects.

Job sharing

If a parliament allows two people to jobshare the post of Member of Parliament for Avalon (a mooted proposal of at least one party here in the UK), and in general any Post that is shared between multiple people, how would this be modelled?

how to represent custom fields?

Billy prefixes fields that its schema is not aware of with a +, for example, +tollfree for a tollfree telephone number. This indicates to API users that these fields are not standard, and should not be assumed to exist on all records. Other alternatives are:

  • define the field on the document without a + prefix
  • define the field on an embedded document (the embedded document could be named extra or details)

Support for historical property values

If a use case wants to track historical data, e.g. previous headshots, etc., one option is to have the value of the property be an array of objects, with start_date (schema:validFrom) and end_date (schema:validThrough) properties. In RDF, this is already possible. More work is required for a JSON solution. We may consider promoting the schema:validFrom and schema:validThrough properties as metadata properties.

Another option is to maintain a history of changes to the full object (an actual history, not an edit history). Implementations would need to distinguish edits that are corrections from edits that are updates. Implementations would likely initialize the history with all of the object's initial property values, and assign end dates to each value as they go out of date. Consumers would then loop through the history to reconstruct a former object.

Look at how PROV marks up provenance information for inspiration.

Once historical property values are added, the semantics of other_names should be changed to only apply to alternate names, not former names.

Vote class and properties

To close this issue (additional feedback would go in new issues):

  • Add Turtle examples for new classes
  • Add JSON-Schema for new classes
  • Add JSON-LD contexts for new classes
  • Link the new classes from the navigation

PoplusCon notes:

Multi-jurisdiction specifications:

Government data sources (incomplete):

Single-jurisdiction implementations:

Availability of voting results in parliaments

Possibly the best approach is to do a survey of government data sources, and design the specification around that.

Legislative sessions and terms

Billy assigns all objects to a legislative term and/or session. How should this be modeled more generally? Do we need terms/sessions in Popolo? Are terms/sessions sufficiently internationalizable? What alternative modeling is there?

Use cases for sessions:

  • To mark the date on which all orders (bills, motions, etc.) are expunged
  • To disambiguate vote and bill numbers, which are often reset at the start of sessions

Use cases for terms:

  • To mark the date on which all (or a subset of) elected officials lose their seats
  • To describe the term lengths of posts within the organization
  • To determine who has left before the end of the term

If we were to add terms and sessions, we would need to:

  • Survey the meaning of these terms in various countries

Properties for Session:

  • name or number
  • start_date
  • end_date
  • term_id

Properties for Term:

  • name or number
  • start_date
  • end_date
  • organization_id

Both can be a subclass of Event.

time zone information

How should the timezone be communicated for time-sensitive information? Billy uses a capitol_timezone key in its metadata dictionaries. This becomes an issue once tracking multiple jurisdictions. Should a time zone field be added to Organization?

Add dates to ContactDetail

e.g.

older contact details may help to discover that one of the MP's offices three years ago was located at the same address as a company of a man he claims not to know

Use valid_from and valid_through?

Distinguish former from alternate names in the JSON representation

The other_names field conflates alternate names and former names in the JSON representation. (The RDF representation does not have this issue). If the other_names field is preserved (to maintain backwards compatibility), we will need to change the semantics of the field. Two options:

  1. Add a new former_names field
  2. Use a general solution for all former values of properties, which is being discussed in #47

Event class and properties

#18 and #20 point to the need for an Event class, which was identified during the initial spec's development as an eventual requirement beyond the basic people/org classes. Events are particularly relevant to changes to organizations, memberships, people, e.g. mergers, appointments, marriages, etc.

Once Event is added, we might consider adding start_event and end_event properties to Membership, to satisfy the TheyWorkForYou use case described in #20.

Motion and VoteEvent refer to a legislative session, which can/should refer to an Event.

Event

RDF

Postal address class

The INSPIRE vs vCard debate #1 may be moot, given that address data is usually not available as streetAddress, locality, region, country and postalCode (in vCard terms) but more commonly as a chunk of text that would be difficult to parse into those fields. It may be best to just use a single string (that can contain newlines) to store a postal address. This may particularly be true, given the less common addressing systems of Japan and Nicaragua, which neither INSPIRE nor vCard can express except as blocks of text.

Multilingual support

Multilingual support can be added to attract greater international attention. For a website to be available in multiple languages, some fields must be translateable.

For example, within some ORMs, the name field on people and organizations would be a hash, like:

{
  "en": "The Right Honourable Stephen Harper",
  "fr": "Le très honorable Stephen Harper"
}

Adding such hashes may prove too difficult. Options:

Adding multilingual support to the RDF serialization is easy: just add a language tag to a plain literal.

Multilingual support may justify a Role class (like in ORG). Maintaining translations on each Post and Membership would be too great a burden; it would be easier to have a Role class with a translatable name field.

Organization classifications and contact detail types should be taken from controlled vocabularies, which may or may not provide translations of their terms. They can be treated like code lists. Genders and telephone types are taken from code lists and are expected to be formatted by the application.

Example in Relations needing both Membership and Posts

“Some use cases may therefore require both Post and Membership classes to satisfy their requirements.”

I think it would be useful to have an example, say of the mentioned Member of Parliament for Avalon, that requires both Memberships and Posts to both handle historical information and have the position represented when no-one holds it, which I think might be a common situation. Let's take two people, Joe Bloggs and Fred Smith; Joe is the current MP for Avalon, Fred is a previous one; Joe is going to resign in a paragraph or two. If I've understood it right (quite possibly not :) ), am I right in that Joe needs both a Membership and a Post to be able to both assign the right label and have a start date?

Before resignation:

  • Post: label: Member of Parliament for Avalon, role: Member of Parliament, org: Parliament, person: Joe Bloggs.
  • Membership: role: Member of Parliament, person: Fred Smith, org: Parliament, start_date: 2005-02-01. end_date: 2010-04-05.
  • Membership: role: Member of Parliament, person: Joe Bloggs, org: Parliament, start_date: 2010-05-06, end_date: -empty- (or 'future' or whatever is needed if you want to know difference between unknown and future).

Joe resigns:

  • Post: label: Member of Parliament for Avalon, role: Member of Parliament, org: Parliament, person: -empty-.
  • Membership: role: Member of Parliament, person: Fred Smith, org: Parliament, start_date: 2005-02-01. end_date: 2010-04-05.
  • Membership: role: Member of Parliament, person: Joe Bloggs, org: Parliament, start_date: 2010-05-06, end_date: 2012-02-27.

As one question, how do I know who were all the previous Members of Parliament for Avalon?

Use W3C namespace to define missing properties

In order to create the JSON-LD contexts, I had to add a few missing RDF properties. All new properties are, for the time being, in the http://www.w3.org/ns/opengov# namespace, which this Community Group may use according to a W3C FAQ.

  • Define the terms
  • Include the conformance section
  • Write an RDF/OWL file

Use http://www.w3.org/respec/guide.html

People and organizations

  • OtherName
  • Membership#post
  • Membership#onBehalfOf
  • Membership#area, Organization#area, Post#area
  • Membership#contactDetail, Organization#contactDetail, Person#contactDetail, Post#contactDetail
  • Organization#otherName, Person#otherName
  • Organization#dissolutionDate
  • Person#nationalIdentity

Motions and voting

  • VoteEvent
  • VoteEvent#motion
  • VoteEvent#count
  • VoteEvent#vote
  • Count
  • YesCount, NoCount, AbstainCount
  • Count#voteEvent, Vote#voteEvent
  • Vote
  • Vote#party
  • Vote#role
  • Vote#weight
  • Vote#pair

Area class and properties

New issue description: To close this issue (additional feedback would go in new issues):

  • Add JSON-Schema
  • Add JSON-LD context
  • Link Area from the navigation

Some posts (like "MP for Avalon", "UK's Ambassador to the USA" ) require an area. How should areas be represented and stored?

In PopIt we were going to use a string field which could either be the name of the area or a url to an external service that has an entry for an area (like MapIt). More details and a discussion at mysociety/popit#193

In the W3C org ontology they allow organisations to have sites (an office or a premise) and a location (an internal physical location, like where mail is delivered). Posts are a sub class of organisations so can have sites too. However neither of site or location would seem intended for an entire administrative area.

Add support for historical organizational hierarchy

http://lists.w3.org/Archives/Public/public-opengov/2013Sep/0005.html

In order to communicate past relations between organizations in terms of organization hierarchy, add a parents property, whose values are instances of a new class, e.g. in JSON:

{
  "parents": [
    {
      "parent_id": "foo",
      "start_date": "2010-01-01",
      "end_date": "2010-12-31"
    },
    {
      "parent_id": "bar",
      "start_date": "2011-01-01"
    }
  ]
}

The new class would have properties start_date, end_date, parent_id and child_id, though generally child_id will be omitted as it's understood to be the embedding object.

As this new class bears some resemblance to the Membership class, it may be worthwhile to devise an appropriate superclass for both.

Naming of dates for easy updating when required.

Summary: Each date should have an optional name that identifies the event it represents. This can subsequently be used to batch update dates - eg. changing imprecise future dates into precise ones when dates for events such as elections are confirmed.

This is something that might not belong in this spec, but it is something that in Mzalendo we would have found very useful in the run up to the election.

All the MPs has positions with an end date of 2012 which is when the election was expected to occur, although the exact date was not known. The election was subsequently pushed to 2013 and then confirmed as being 4 Mar 2013.

This required us to update the end dates of the MP positions. To do this we needed to do a query to find all the MP positions and then trust that the 2012 was in fact the anticipated end date.

Had we had these dates named we could have searched for them more easily, and the users of the data would have had more information about what the date represents. We'd have called it something like "Dissolution of 2007 Parliament".

Some way to search for existing date names would be required so that it is easy to find and reuse existing date names.

This need not be part of the spec, but it should be possible to add this sort of data to the records. PopIt almost certainly would.

Explanation of 'Post' is slightly confusing

"For example, people in different organizations can all fulfill the role of CEO, but only one person can hold the post of CEO at Apple Inc."

This is slightly confusing at first — it can be read in a couple of different ways, and the correct reading isn't strictly true. I'd suggest something like "For example, many people fulfill the role of "CEO" in different organizations, but only one person holds the post of "CEO at Apple Inc."

Add Role class?

Multilingual support may justify a Role class (like in ORG). Maintaining translations on each Post and Membership would be too great a burden; it would be easier to have a Role class with a translatable name field.

email on Person and/or Post?

In many cases, an email address includes the person's name, e.g. [email protected]. In some cases, it is the same email for anyone occupying that post, e.g. [email protected].

Given that the former case is quite common, and to make email lookups more predictable, the email field should be on Person documents only.

Add a party-like property to the Membership class

Once resolved, add more complex examples for the party membership of candidates and members, e.g.

  • simple membership
    • between a person and an organization
    • issue: cannot infer whether the person holding a post represents that party in that post
  • complex membership
    • issue: the person may be a member of a party, but not represent that party
    • resolution: a party-like property must therefore be on the membership object

Specific example: Ross Anderson, presidential candidate, 2012 ran as many different parties depending on the state:

  • AZ: Peace and Freedom
  • CT: Justice
  • MT: Write-In
  • NM: New Mexico Independent Party
  • Natural Law Party
  • Progressive
  • Independent
  • Other

Add ethnicity or culture identification to individuals

When representing people, we at @texastribune often want to attach an ethnicity to that Person. The current spec does not offer a way to do that.

It is worth noting that this could possibly be done via the HAL-style links that we have chatted about. Rather than adding to the spec, it might be best to allow for race to be a link. The implementation in JSON might look something like this:

{
  "_links": {
    "ethnicity": {
      "href": "http://example.com/api/ethnicity/{{ ethnicity ID }}",
      "rel": "http://example.com/rels/ethnicity/"
    }
  }
}

Going the HAL-style for this external links provides a documented way to specify ethnicity, without clouding the spec.

Edited: Remove /race/ from the example URIs.

telephone numbers on Address and/or embedding relation?

It is common for elected officials to have offices at the legislature and in their constituencies, each with their own address and telephone numbers. It is also possible for an elected official to have a telephone number that is associated with his seat but that is not tied to either office, e.g. a mobile number.

As in #5, given that the former case is very common, and to make telephone lookups more predictable, telephone fields should be on Address documents only.

Store last modified date

It would be very useful to have a last modified date. In PopIt we would use this to list all the documents that had changed after the last check.

It could be optional, but if included should:

  • be under the key last_modified at the top_level
  • be a full timestamp in UTC
  • reflect any change to the document or nested documents - not just meaningful changes

If a property is not set, how to determine whether it is of unknown value or known to be null?

Uncertainty is a frequent topic in RDF circles - need to research. So far:

For the MongoDB and JSON serializations, we can propose that if a value is inapplicable or known to be zero/none, then that field should not be on the record; if the value is unknown, then it should be set to null.

Add support for electoral candidates

Candidacy is not the same as membership in an organization. Should Candidacy simply be a straight subclass of Membership, with the same properties but with different semantics?

Likely additional properties of Candidacy:

  • incumbent (boolean)
  • election (Election)

Region-specific properties that are unlikely to be in Popolo:

  • write_in (boolean)
  • fundraising_committee_id

Notes:

  • In NY, CT, SC a person may be the candidate for multiple parties. The parties do not necessarily form a coalition. In RDF, we can simply declare onBehalfOf multiple times. In JSON-LD, on_behalf_of can be either an array or a single value. It would be a single value in most implementations.

We will likely need to add a class for Election in most use cases (requires research into variety of electoral systems).

Add other_labels to Posts

People and Organizations have alternative/historic names; Posts currently don't.

You might want to say, for example, that the post of Prime Minister used to be called 'First Minister', or that 'CEO' is an alternative name for 'Chief Executive Officer'.

active field versus foundingDate and dissolutionDate

It's necessary to know whether an organization is active or defunct. However, is it necessary to know the precise dates at which it became active or defunct? Options:

  1. require the use of foundingDate and/or dissolutionDate fields
  2. require the use of an active field
  3. require the use of either an active field or date fields to express an organization's status
  4. require the use of an active field with optional date fields

From a developer perspective, (3) is worst because the data may live in one of two places, which is less predictable. All others are roughly equivalent from this standpoint.

From a data integrity perspective, it's cleaner to just have one way to describe an attribute of an object, making (1) and (2) better than (4). From this perspective, it's also best to be precise, making (1) better than (2). Imprecise dates (#7) can be used in specific dates are unknown.

From a data entry perspective, having an active field is attractive (2,3,4), as precise dates are not always needed. Having the option to add precision (4) is also attractive.

Document class and properties

Add a Document class to represent things like bills, agendas, etc. The base Document class may be fairly generic, with subclasses providing additional properties.

Issues

  • The requirements for drafting legislation are sometimes in tension with those of parsing and displaying legislation. Different specifications/standards may emerge that are better tailored for one or the other.

Existing specifications

Legislative

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.