weso / cwr-dataapi Goto Github PK

View Code? Open in Web Editor NEW

33.0 16.0 29.0 4.7 MB

CWR-DataApi

License: MIT License

Python 99.38% Makefile 0.24% Batchfile 0.38%

cisac cwr common-works-registration

cwr-dataapi's People

Contributors

Stargazers

Watchers

cwr-dataapi's Issues

Component and Authored Work records seem to be very similar

Check if ComponentRecord and AuthoredWorkRecord can be at least partially combined.

Make field grammar creation homogeneous

The grammar for the fields should be homogeneous. Try to make all of them be created through a method with the same type of parameters (columns, compulsory, name, etc).

Use a factory on the parser

Right now the model instances are created on the same module the rules are contained.

Instead of this, a factory should be used to decouple the parser from the model.

Redo JSON conversion classes

The classes to parse into and from JSON should be redone. Adding new tests for it.

Check speed of the parsing.

At least during the tests, the parsing process seems to be slow. Try to make it as fast as possible.

CWRConfiguration coupling

In a similar way to the problem with CWRTables, the CWRConfiguration is not configurable.

While this problem is reduced by the fact that this is meant to be set up editing the configuration files, it has the problem of being referred directly on several modules.

The CWRConfiguration class should implement an interface or an abstract class, and then be accessed through a factory or a facade.

File encoding tag on source files

The file encoding is indicated using:
'# -- coding: utf-8 --'

Instead of:
'# -- encoding: utf-8 --'

This should be corrected on all files.

Dictionary decoder.

Create a decoder which transforms a dictionary into a CWR class.

Add tests for Transmission dictionary parser

The dictionary parser should be checked with a Transmission containing several Groups.

Validate ISRC country code correctly

The country code on the ISRC follows the ISO 3166-1-Alpha-2 standard, and should be validated against it.

Currently it is not doing so.

Writer and Publisher records seem to be very similar

Check if the Writer and Publisher records can be at least partially combined.

CWRFileDecoder only decodes the new filename format

The CWRFileDecoder class only decodes the filename into a FileTag if it follows the new naming format.

It should try to decode the new format, but if it fails then it should try again with the old format. A second failure means the filename can't be decoded.

JSON encoder

Create an encoder for JSON.

Can use the basic dictionary encoder for help.

Add missing constraints

There are many record constraints missing which should be added.

Override Python data model methods were needed

The CWR model classes should override the model methods, such as the special methods str or repr, where needed.

More information about this can be found at:
https://docs.python.org/2/reference/datamodel.html

EntireWorkTitle & OriginalWorkTitle

EntireWorkTitle & OriginalWorkTitle seem to be equivalent. Must check and remove if needed.

ISRC field not accepting all values

There seem to be several variations of the ISRC field. There must be a way to accept them all.

Otherwise, a text field should be used.

Grouping transactions into classes or into collections?

While there are classes in the model to represent transactions, these are right now being grouped into just a collection.

It is necessary to find out which of the two would be better, checking both the most simple and the most complex cases of each type of transaction.

Grammar exceptions messages

Some of the exceptions used on the grammar, mostly for fields values, seem to be incorrectly initialized, and they will cause an 'unprintable exception' error when raised.

Mongo decoder

Create a decoder for Mongo.

Can use the basic dictionary decoder for help.

Alternate Title and Non-Roman Title records seem very similar

AlternateTitleRecord and NATRecord are very similar. Check if they can be combined.

Add a good validation system to the CWR file parser

Right now the validation is added manually to the nodes.

There should be a system where a node is assigned a constraint identifier, and then the validation gets configured.

It should be noted that the validation configuration is composed of, at least, the following pieces (check CWR and error expecifications for the actual requirements):

Constraint imposing limitations
Validation level (record, transaction, group, transmission...)
Action on failure (set to default, set to given value, reject...)
Failure message

They should be configurable for two reasons:

There is a huge number of errors and constraints (a quick look shows a few hundred of them)
There are variations (different societies have different constraints, and may even have different versions)

It would be better if this configuration was set on a file which could be easily modified.

Also, it should be possible to completely deactivate the validation system for testing purposes, or to swap it for another one.

Field factories should be configurable

Field factories should be fully configurable. For example, the adapters are right now hardcored, but should be set with parameters or a configuration file.

Make the names of fields on the model homogeneous

The name of some fields may change from a model class to another. They all should have the same name.

Also, the field names should be as close to the specification as possible.

Support for Acknowledgement files in the model

Right now only the first kind of CWR files is supported, the ones being sent to the receiver for processing.

A second type, the acknowledgement file, is created from the first, indicating the results from processing the file.

While this is closely related to the validation process, the parser, the console printer, and any other piece using the model classes, should give support for reading Acknowledgement files.

NOW, NPR and NWN records seem to be very similar

Check if NOWRecord and NPRRecord can be partially combined.

Controlled publishers tree

Controlled publishers stored on the CWR file are representing a tree which indicates the relationship between them and the territories.

The model contains classes to build this tree, but currently it is not being done.

Make sure the parser takes care of this.

Grammar fields names issues

There is a problem with grammar fields where sometimes it gives the base field name instead of the one of the current field.

For example, after creating a numeric field, and naming it 'Field 1', it's name may still be 'Numeric Field'. This seems to be a problem related with fields being a combination of optional rules.

The easiest way to solve this is to add a 'name' parameter to the field creation methods.

Add Python interpreter shebang where required

Executable Python files, such as tests, should have a shebang like:

!/usr/bin/python

CWRTables coupling

While the CWRTables is useful for keeping all the CWR tables info and files in a single place, the parser may be too dependant on it.

Right now it is used only for the Lookup fields on the grammar.field.table module. Which reduces the coupling, but this can be improved.

It should be possible to implement a custom instance of this class, so an interface or abstract base class should and then an implementation should be accessible through a factory or a facade.

Storing the Interested Parties information globally

When the same Interested Party appears several times on the file a new instance containing all his data is created.

Instead of this, it may be a good idea to create a single instance for the Party and then reuse it each time he appears.

NAT and NRA Work records are very similar

NATRecord and NRARecordWork are very similar. Maybe they can be partially combined.

Writing the model back into a file (acknowledgement files)

After reading a CWR file and creating the model it is needed to be able to save it again into a file.

Mostly this is required for generating Acknowledgement files.

The distribution file includes compiled classes

On the distribution file there are .pyc files on the data folder.

They should be removed.

Parsing File classes into dictionaries

The CWRFile and FileTag classes should be parsed into dictionaries

Problem with the Transmission Trailer and line end

The Transmission Trailer rule should indicate that this record ends on a line end.

But this causes an error when reading from a file if it doesn't end at the end of the trailer.

I've been unable to replicate this with tests, and for now the Transmission Trailer rule lacks the end of line requirement.

Add tests for transactions Group dictionary parser

The dictionary parser should be checked with a Group containing several transactions.

ValueEntity inheritors

Classes created from ValueEntity seem to be all the same. It may be possible to remove them all, using only the base ValueEntity.

Mongo encoder

Create an encoder for Mongo.

Can use the basic dictionary encoder for help.

Prepare a Github page

A Github page, giving details about the project and its background, would help a lot to make the library usable for third parties.

Maybe Sphinx would help here?

AgreementTerritory & Territory

AgreementTerritory & Territory are very similar. It may be possible to swap AgreementTerritory for Territory, removing the first on the process.

Update for the revision 7

The current implementation has been prepared for revision 3 and the current is revision 13. Check what changes came and apply them.

Parsing Transactions into dictionaries

Add support for Transactions on the Dictionary encoders.

Make wiki more readable

Move most of the CWR details to another document (maybe the Github page?), and use the wiki just as a quick guide for the library.

Add a logger to the file parser

As the file parsing takes very long, a logger would help to know if everything is going as expected.

Interested Party class

Check the InterestedParty class.

Is it really needed? Should it be removed? Should it be used more often?

Some related classes which don't use it:

AgreementInterestedParty

JSON decoder

Create a decoder for JSON.

Can use the basic dictionary decoder for help.

Try to add Jython support

This is not required, but it can be nice to add Jython support.

In practice, it would mean adding Jython to the Tox test environment. I've already tried doing so, but Travis was unable to make it run (a problem with Java versions).

Still, some basic configurations, including a script, are still on the project, waiting for another try.

Right now Travis runs a few hundred of tests. While it is necessary to add a lot more to check the parser, it is also needed to somehow simplify those cases where the same groups of tests are repeated, as for example is the case of the fields grammar variations.

leaveWhitespace() method on grammar used too often

The leaveWhitespace() method is used too often. It should be required only on the fields.

Which means, only the terminal rules of the grammar should care about keeping the whitespaces or not.

weso / cwr-dataapi Goto Github PK

cwr-dataapi's People

Contributors

Stargazers

Watchers

Forkers

cwr-dataapi's Issues

!/usr/bin/python

Recommend Projects

Recommend Topics

Recommend Org