Git Product home page Git Product logo

civics_cdf_validator's Introduction

civics_cdf_validator is a script that checks if a NIST 1500-100 data feed follows best practices. It will output errors, warnings, and info messages for common issues.

This is not an official Google product.

INSTALLATION

The package is available from PyPi and can be installed using the command below.

pip install civics_cdf_validator

civics_cdf_validator relies on lxml which will be installed if it isn't already installed. You may need to install libxslt development libraries in order to build lxml.

USAGE

Branch Definitions

  • master - Branch used in production.
  • staging - Branch that contains next version of production code, typically available one month in advance of a production push.
  • dev - Branch that contains development code with latest changes changes to the validator. This branch is rolled into staging on a monthly basis.

Supported feeds

You can use civics_cdf_validator to check different types of feed:

  • Officeholder
  • Candidate / results
  • Committee
  • Election Dates

List rules

You can list the default validation rules attached with a brief description of each by using the "list" command:

civics_cdf_validator list

You can also customize the displayed list by specifying your set of rules or at least you can filter the default list using parameters as the feed type / ignore rules flag.

For more details, you can use the command help :

civics_cdf_validator list --help

Validate a file

The validate command has 2 required arguments:

  • the election file to be validated
  • the XSD file to validate against

The command to validate the election file against all the rules in the file is

civics_cdf_validator validate election_file.xml --xsd civics_cdf_spec.xsd

The validator is capable of validating either election or officeholder data feeds, depending on the value of the --rule_set flag (election is the default). To validate an officeholder feed:

civics_cdf_validator validate election_file.xml --xsd civics_cdf_spec.xsd --rule_set officeholder

One can choose to only validate one or more comma separated rules by using the -i flag

civics_cdf_validator validate election_file.xml --xsd civics_cdf_spec.xsd -i Schema

Or choose to exclude one or more comma separated rules using the -e flag

civics_cdf_validator validate election_file.xml --xsd civics_cdf_spec.xsd -e Schema

By default, the script only shows a summary of issues found. You can get a verbose report by adding the -v flag

civics_cdf_validator validate election_file.xml --xsd civics_cdf_spec.xsd -v

civics_cdf_validator's People

Contributors

ajphukan avatar alyssavessey avatar azuser avatar ccongchen avatar civics-copybara avatar ckaminer avatar jdmgoogle avatar jloutsenhizer avatar jmcmanus avatar kant avatar klaash avatar markcwal-google avatar miano avatar rahul-nath avatar riturajj avatar rsimoes avatar tenyenhuis avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

civics_cdf_validator's Issues

Identify primary contests not associated with a party

Contests which are partisan primaries should use the PrimaryPartyIds field to link to the correct political party. For a NIST Election element of ElectionType primary, partisan-primary-open, or partisan-primary-closed, the Contests in that ContestCollection (type CandidateContest) should have a PrimartyPartyIds that is present and non-empty.

Check if valid enumeration value is encoded as OtherType

We've seen several instances where a valid enumeration value is instead stored in the OtherType field. E.g.,

<GpUnit objectId="ru1" xsi:type="ReportingUnit">
  <Name>Virginia</Name>
  <Type>other</Type>
  <OtherType>state</Type>
</GpUnit>

Even though state is a valid ReportingUnitType, it has been represented in the OtherType field. The correct XML should be

<GpUnit objectId="ru1" xsi:type="ReportingUnit">
  <Name>Virginia</Name>
  <Type>state</Type>
</GpUnit>

This is an error since it can cause ingestion to fail.

Refactor validator to work as a stand-alone library

It would be advantageous to possible future open source projects if we were able to refactor the bulk of the validation code into its own library. Right now it doesn't seem like we're too far from that, but I'd like to take the plunge.

Check for candidates missing party data

Each candidate should really have party data associated with them, even if the referenced Party object is "Unaffiliated", "Non-Partisan" or something similar.

Identify issues when non-candidates are encoded as candidate

The validator shall be able to identify BallotMeasureContest formatted/created as CandidateContest. For example, "For", "Against", "Yes", "No", or other values (for Ballot Name) indicates that these are not people (Candidates) and they are ballot measure instead.

Create pre-election and election results modes for validator

Pre-election mode will require more complete information about the candidates, contests, etc, whereas the election results mode will check information about the VoteCounts objects and ensure that SubUnitsReporting and TotalSubUnits information is present.

Improve error messages for OCD-IDs

Improve the error messages surrounding OCD-IDs to distinguish between the following cases:

  1. GpUnit doesn't have any external identifier
  2. GpUnit has an ocd-id but it does not have a valid value
  3. GpUnit has an external identifier but the the type is not set as "ocd-id" (could be "OCD-ID", in which case saying that that GpUnit doesn't have a valid ocd-id could be a bit confusing).

Use TLS instead of SSLv[23] in connections

Clients which are using older versions of Python are using http/urllib/etc libraries which by default are using SSLv2 or SSLv3 in making HTTPS connections. This is leading to a variety of errors, including 'SNIMissingWarning' and 'InsecurePlatformWarning'.

There are similar errors when using the PyGithub module and it attempts to make a secure connection.

Add warning if contest names are not unique

The validator should warn users if the names of contests within a file are not unique. We've seen several feeds where the contest name field is just "Congressional", which isn't particularly helpful.

Support default namespaces

Currently, validation fails if a default namespace is defined in the XML document.

When a namespace is defined, find fails to locate elements in the document, as element searches must include the namespace prefix.

Support default namespaces that are defined, for example, by including xmlns="http://www.votegis.com/schema/NISTV50.xsd" on the ElectionReport element.

More robust downloading of country-us.csv

Our validator depends on the python requests package, but does not do any sophisticated checking of SSL certs. If the version of Python or the requests package is sufficiently out-of-date, this can lead to the download failing. Because of the way the current code operates, a download failure still leads to country-us.csv being present, but empty. This means the whitelist of valid OCD-IDs is empty, and every check against the list returns false, regardless of whether the OCD-ID is actually valid.

Enable logging level on the command-line

This feature enables user to set/filter out specific logging levels on the command line - e.g. whether the validator will print out info and worse messages (info, warning, and error), warning and worse (warning and error), or just error.

Fix OCD-ID Validation with election_results_xml_validator

User feedback:

Your validator reads the local csv file correctly but then does not find the ocd id in the array/object generated from the local csv file.

the problem is that he's comparing the following things here https://github.com/google/election_results_xml_validator/blob/master/rules.py#L454:
ocd-division/country:ee (from the candidates-xml) with

{b'ocd-division/country:ee', b'id'} (from the local .csv file)

And he doesn’t comes in the if-statement. Without the b from {b'ocd-division/country:ee', b'id'} it works well.
Example:

text = "ocd-division/country:ee";

ocds = {'id', 'ocd-division/country:ee'};

if text in ocds:
    print("in")
else:
    print("not in")

There is also a TODO here in your code https://github.com/google/election_results_xml_validator/blob/master/rules.py#L428

Your Code:
with io.open(countries_file, mode="rb") as fd:
    for line in fd:
        if line is not "":
            # TODO use a CSV Reader
            ocd_id_codes.add(line.split(b",")[0])

Replaced with:

with open(countries_file) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count > 0:
            ocd_id_codes.add(row[0])
        line_count += 1

Throw a warning if names are IN ALL CAPS

We've seen several issues where Name elements of Person, Candidate, or Contest elements are in all caps. These are generally not usable and should be a warning.

Validator crashes on ExternalIdentifier missing a Value field

The code that checks for valid OCD-IDs in GpUnits referenced by Contests will crash if an ExternalIdentifier is missing a Value field. Technically not having a Value field means that the feed will fail XSD validation, but we should still be defensive against this error.

Add support for JSON feeds

NIST is creating a JSON representation of the 1500-100 spec. Our validator should be able to

  1. Include an instance of the JSON schema definition as opposed to an XSD.
  2. Use the above JSON schema definition to perform schema validation.
  3. Perform the same content validation as it currently does on the XML validation.

Call to deprecated method get_dir_contents()

Deprecated method warning

$ election_results_xml_validator validate samples/post_election_sample_feed_precincts.xml --xsd election_data_spec.xsd

--------- Results after validating file: samples/post_election_sample_feed_precincts.xml 
Validator version: 0.10.1.1
SHA-512/256 checksum: 0x2d06124ff9213a43c1a1c6c3eae3aba0de5b52f17e5404c273ca9228ed200ba7
/usr/local/lib/python3.7/site-packages/election_results_xml_validator/rules.py:475: DeprecationWarning: Call to deprecated method get_dir_contents. (
        Repository.get_dir_contents() is deprecated, use
        Repository.get_contents() instead.
        )
  dir_contents = self.github_repo.get_dir_contents(self.GITHUB_DIR)
     2 Error messages found
         1 GpUnitsHaveSingleRoot Error message
         1 ElectionEndDates Error message
     1 Warning message found
         1 ElectionStartDates Warning message
    12 Info messages found
         6 DuplicatedPartyAbbreviation Info messages
         6 MissingPartyAbbreviationTranslation Info messages
$ 

Support for 1500-100 v2

Add rules support for NIST 1500-100 v2 spec, currently pending official publishing. A non-exhaustive list of changes is available here.

I believe a great number of rules will work as-is, and others only need to be modified insomuch that they can locate the target nodes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.