weso / cwr-dataapi Goto Github PK
View Code? Open in Web Editor NEWCWR-DataApi
License: MIT License
CWR-DataApi
License: MIT License
Check if ComponentRecord and AuthoredWorkRecord can be at least partially combined.
The grammar for the fields should be homogeneous. Try to make all of them be created through a method with the same type of parameters (columns, compulsory, name, etc).
Right now the model instances are created on the same module the rules are contained.
Instead of this, a factory should be used to decouple the parser from the model.
The classes to parse into and from JSON should be redone. Adding new tests for it.
At least during the tests, the parsing process seems to be slow. Try to make it as fast as possible.
In a similar way to the problem with CWRTables, the CWRConfiguration is not configurable.
While this problem is reduced by the fact that this is meant to be set up editing the configuration files, it has the problem of being referred directly on several modules.
The CWRConfiguration class should implement an interface or an abstract class, and then be accessed through a factory or a facade.
The file encoding is indicated using:
'# -- coding: utf-8 --'
Instead of:
'# -- encoding: utf-8 --'
This should be corrected on all files.
Create a decoder which transforms a dictionary into a CWR class.
The dictionary parser should be checked with a Transmission containing several Groups.
The country code on the ISRC follows the ISO 3166-1-Alpha-2 standard, and should be validated against it.
Currently it is not doing so.
Check if the Writer and Publisher records can be at least partially combined.
The CWRFileDecoder class only decodes the filename into a FileTag if it follows the new naming format.
It should try to decode the new format, but if it fails then it should try again with the old format. A second failure means the filename can't be decoded.
Create an encoder for JSON.
Can use the basic dictionary encoder for help.
There are many record constraints missing which should be added.
The CWR model classes should override the model methods, such as the special methods str or repr, where needed.
More information about this can be found at:
https://docs.python.org/2/reference/datamodel.html
EntireWorkTitle & OriginalWorkTitle seem to be equivalent. Must check and remove if needed.
There seem to be several variations of the ISRC field. There must be a way to accept them all.
Otherwise, a text field should be used.
While there are classes in the model to represent transactions, these are right now being grouped into just a collection.
It is necessary to find out which of the two would be better, checking both the most simple and the most complex cases of each type of transaction.
Some of the exceptions used on the grammar, mostly for fields values, seem to be incorrectly initialized, and they will cause an 'unprintable exception' error when raised.
Create a decoder for Mongo.
Can use the basic dictionary decoder for help.
AlternateTitleRecord and NATRecord are very similar. Check if they can be combined.
Right now the validation is added manually to the nodes.
There should be a system where a node is assigned a constraint identifier, and then the validation gets configured.
It should be noted that the validation configuration is composed of, at least, the following pieces (check CWR and error expecifications for the actual requirements):
They should be configurable for two reasons:
It would be better if this configuration was set on a file which could be easily modified.
Also, it should be possible to completely deactivate the validation system for testing purposes, or to swap it for another one.
Field factories should be fully configurable. For example, the adapters are right now hardcored, but should be set with parameters or a configuration file.
The name of some fields may change from a model class to another. They all should have the same name.
Also, the field names should be as close to the specification as possible.
Right now only the first kind of CWR files is supported, the ones being sent to the receiver for processing.
A second type, the acknowledgement file, is created from the first, indicating the results from processing the file.
While this is closely related to the validation process, the parser, the console printer, and any other piece using the model classes, should give support for reading Acknowledgement files.
Check if NOWRecord and NPRRecord can be partially combined.
Controlled publishers stored on the CWR file are representing a tree which indicates the relationship between them and the territories.
The model contains classes to build this tree, but currently it is not being done.
Make sure the parser takes care of this.
There is a problem with grammar fields where sometimes it gives the base field name instead of the one of the current field.
For example, after creating a numeric field, and naming it 'Field 1', it's name may still be 'Numeric Field'. This seems to be a problem related with fields being a combination of optional rules.
The easiest way to solve this is to add a 'name' parameter to the field creation methods.
Executable Python files, such as tests, should have a shebang like:
While the CWRTables is useful for keeping all the CWR tables info and files in a single place, the parser may be too dependant on it.
Right now it is used only for the Lookup fields on the grammar.field.table module. Which reduces the coupling, but this can be improved.
It should be possible to implement a custom instance of this class, so an interface or abstract base class should and then an implementation should be accessible through a factory or a facade.
When the same Interested Party appears several times on the file a new instance containing all his data is created.
Instead of this, it may be a good idea to create a single instance for the Party and then reuse it each time he appears.
NATRecord and NRARecordWork are very similar. Maybe they can be partially combined.
After reading a CWR file and creating the model it is needed to be able to save it again into a file.
Mostly this is required for generating Acknowledgement files.
On the distribution file there are .pyc files on the data folder.
They should be removed.
The CWRFile and FileTag classes should be parsed into dictionaries
The Transmission Trailer rule should indicate that this record ends on a line end.
But this causes an error when reading from a file if it doesn't end at the end of the trailer.
I've been unable to replicate this with tests, and for now the Transmission Trailer rule lacks the end of line requirement.
The dictionary parser should be checked with a Group containing several transactions.
Classes created from ValueEntity seem to be all the same. It may be possible to remove them all, using only the base ValueEntity.
Create an encoder for Mongo.
Can use the basic dictionary encoder for help.
A Github page, giving details about the project and its background, would help a lot to make the library usable for third parties.
Maybe Sphinx would help here?
AgreementTerritory & Territory are very similar. It may be possible to swap AgreementTerritory for Territory, removing the first on the process.
The current implementation has been prepared for revision 3 and the current is revision 13. Check what changes came and apply them.
Add support for Transactions on the Dictionary encoders.
Move most of the CWR details to another document (maybe the Github page?), and use the wiki just as a quick guide for the library.
As the file parsing takes very long, a logger would help to know if everything is going as expected.
Check the InterestedParty class.
Is it really needed? Should it be removed? Should it be used more often?
Some related classes which don't use it:
Create a decoder for JSON.
Can use the basic dictionary decoder for help.
This is not required, but it can be nice to add Jython support.
In practice, it would mean adding Jython to the Tox test environment. I've already tried doing so, but Travis was unable to make it run (a problem with Java versions).
Still, some basic configurations, including a script, are still on the project, waiting for another try.
Right now Travis runs a few hundred of tests. While it is necessary to add a lot more to check the parser, it is also needed to somehow simplify those cases where the same groups of tests are repeated, as for example is the case of the fields grammar variations.
The leaveWhitespace() method is used too often. It should be required only on the fields.
Which means, only the terminal rules of the grammar should care about keeping the whitespaces or not.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.