Git Product home page Git Product logo

tableschema-php's People

Contributors

courtney-miles avatar orihoch avatar pwalsh avatar roll avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tableschema-php's Issues

Getting started

Description

This issue documents the initial steps to get started with a new Frictionless Data implementation.

Tasks

  • Travis configuration
  • Coveralls configuration
  • Basic setup of README with badges
  • Basic setup of license
  • Review the whole family of specifications
  • Review the implementation notes
  • Review either the Python (Data Package Table Schema) or JavaScript (Data Package Table Schema) reference implementations (whichever language you feel most comfortable reading)
    • Note that we have high nineties test coverage on these libraries. Similar test coverage is expected here
  • Review the stack reference
  • Review the blog post that announces v1 of the specifications
  • Review the test packages that can be used to test your work (in addition to the normal and expected unit tests)
  • Review the OKFN Coding Standards
    • Parts of the coding standard are language specific, and parts are workflow specific. The workflow points are important for you. If you want to contribute language-related standards to our docs for your language, we welcome it!
  • Write a set of high-level issues for each library, on the respective issue tracker, that outline the work plan
    • Note the structure of this issue: A narrative description and a specific list of tasks. Follow a similar pattern
    • The sequence of work is important: start with the Table Schema library first, as the Data Package library has a direct dependency on it
  • Note the communication protocol for this work: All communication around the implementation must be in public. We want our work dynamic here to serve as an example for other implementors, and to share as much information as possible. There are two channels of communication:
  • Note the staff from OKI who are here to support you:
    • Jo Barratt - Project Manager for Frictionless Data
    • Evgeny Karev - Tech Lead for core Frictionless Data libraries
    • Serah Rono - Developer Advocate at Open Knowledge International, Dissemination Lead for the Tool Fund
    • Dan Fowler - Developer Advocate at Open Knowledge International, Pilot Lead for Frictionless Data, OKI Labs Lead
    • Adam Kariv - Engineering Lead at Open Knowledge International, Tech Lead on OpenSpending
    • Paul Walsh - Chief Product Officer at Open Knowledge International
  • Any communication around the grant agreement should be directly done by email with Jo Barratt, Frictionless Data Project Manager

Table should provide more read options

feature requests from @roll (See original request in #25)

Table class should provide more read options:

  • option to get uncast data
    • e.g. table.read(cast=false) // list of strings
    • It allow to work with malformed data sources and validate it e.g. filed-based with custom error handling.
  • I think we need provide documentation how e.g. read(limit=10) could be achieved
  • Based on readme only keyed rows are emitted. Python and JavaScript also support:
    • default rows ['value1', 'value2', ...] - esp. useful with malformed data and cast=false (because header-values map doesn't work in this case)
    • extended rows [1, ['header1', 'header2'], ['value1, value2']] - to get row number

Schema class should support casting and validating a row of data

use frictionlessdata\tableschema\Schema

$schema = new Schema((object)[
    "fields" => [
        (object)["name" => "id", "type" => "integer"],
        (object)["name" => "height", "type" => "integer"]
    ]
]);

$row = $schema->castRow(["id" => "123", "height" => "456"]);
// $row == ["id" => 123, "height" => 456]

$rowValidationResult = $schema->validateRow(["id" => "abc", "height" => "def"]);
if ($rowValidationResult->isValid()) {
  print($rowValidationResult->row);
}  else {
  print($rowValidationResult->getMessages());
}

implementation notes

  • behind the scenes - should use the Field cast/validate value function, see #11

Table API feedback

Overview

Based on this readme listing I'm adding feedback based on existent implementations and expected lib user competencies (as we target many almost non-tech users - publisher, data wranglers etc).

// iterate over a remote data source conforming to a table schema
$table = new tableschema\Table(
    new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv"), 
    new tableschema\Schema("http://www.example.com/data-schema.json")
);
foreach ($table as $person) {
    print($person["first_name"]." ".$person["last_name"]);
}

// infer schema of a remote data source
$dataSource = new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv");
$schema = new tableschema\InferSchema();
$table = new tableschema\Table($dataSource, $schema);
foreach ($table as $row) {
    var_dump($row); // row will be in inferred native values
    var_dump($schema->descriptor()); // will contain the inferred schema descriptor
    // the more iterations you make, the more accurate the inferred schema might be
    // once you are satisifed with the schema, lock it
    $rows = $schema->lock();
    // it returns all the rows received until the lock, casted to the final inferred schema
    // you may now continue to iterate over the rest of the rows
};

Is it possible to hide under Table class data source and schema creation?

As a {USER} I'd more like to write just $table = new tableschema\Table('data-path.csv', 'schema-path.json'); instead of creating data source and schema by myself. Especially it's actual if you don't know before runtime what kind of data source you have e.g. new tableschema\Table('data-path.csv-or-xls') (we don't support Excel here but as an example). In this case there should be $table.schema exposed.

Infer schema if schema argument is just omitted?

As a {USER} I'd more like to just have $table = new tableschema\Table('data.csv'); without schema argument to have schema infer instead of having a deal with tableschema\InferSchema(); additional class.

Provide headers?

As a {USER} I'd like to have $table.headers property (it's a new but useful property in https://github.com/frictionlessdata/implementations reference)

Provide save method?

$table.save('data.csv') is useful method in addition to $table.schema.save('schema.json') and it could be re-used on data package level to save a data package (e.g. as zip).

Option to don't cast data?

It was often requested feature:

table.read(cast=false) // list of strings

It allow to work with malformed data sources and validate it e.g. filed-based with custom error handling.


Related to usage of Iterator interface as a Table core:

  • It seems cool but have some comments
  • I think we need provide documentation how e.g. read(limit=10) could be achieved
  • Based on readme only keyed rows are emitted. Python and JavaScript also support:
    • default rows ['value1', 'value2', ...] - esp. useful with malformed data and cast=false (because header-values map doesn't work in this case)
    • extended rows [1, ['header1', 'header2'], ['value1, value2']] - to get row number

Drop support for PHP 5?

hi @OriHoch @roll @lwinfree ,

Are we prepared to drop support for PHP 5 yet?

My vote is to drop support for PHP 5 -- any software still running on PHP 5 can still use version 0.1.9.


Background:

In #38 I initially dropped support for PHP 5 to be able to upgrade to Carbon 2. However, there was suggestion at the time that ongoing support for PHP 5 was preferred and the PR went stale.

There is now another PR to upgrade Carbon in #43

If we don't drop support for PHP 5, then this package cannot be used in software that also uses Carbon and uses PHP 7.

There's another irky PHP compatibility issue with #42 where PHP 7.4 now throws \Error instead of \Exception. The ideal solution would be to catch \Throwable but that change would not be compatible with PHP 5.

Cheers
Courtney

Schema API feedback

Overview

Based on this readme listing I'm adding feedback based on existent implementations and expected lib user competencies (as we target many almost non-tech users - publisher, data wranglers etc).

// validate and cast a row according to schema
$schema = new Schema('{"fields": ["name": "id", "type": "integer"]}');
$row = $schema->castRow(["id" => "1"]);
// raise exception if row fails validation
// returns row with all native values

// EditableSchema extends the Schema object with editing capabilities
$schema = new EditableSchema();
// set fields
$schema->fields([
    "id" => FieldsFactory::field((object)["name" => "id", "type" => "integer"])
]);
// remove field
$schema->removeField("age");
// edit primaryKey
$schema->primaryKey(["id"]);

// after every change - schema is validated and will raise Exception in case of validation errors
// finally, you can save the schema to a json file
$schema->save("my-schema.json");

Schema vs EditableSchema?

Consider as a {DATA WRANGLER} I create $schema = new Schema('{"fields": ["name": "id", "type": "integer"]}'); in REPL then just decide to edit it. So having a deal with two type of schemas could be not very user-friendly. In other languages editable objects don't have this separation.

Accept PHP array?

It's actual for all Table/Schema/Field APIs

For now schema descriptor should be either object, json-string or url-path:

Schema objects can be constructed using any of the following:

php object
string containing json
string containg value supported by file_get_contents 

Not sure do I understand PHP correctly here or not but why we can't write (using Field example):

FieldsFactory::field(["name" => "id", "type" => "integer"])

instead of casting to object first:

FieldsFactory::field((object)["name" => "id", "type" => "integer"])

I suppose it could be handled inside class (cast to object) but reduce usage errors.

should have a Table class that provides services for working with a data source

use frictionlessdata\tableschema\Table

$datapackage_descriptor = json_decode(file_get_contents("https://raw.githubusercontent.com/frictionlessdata/testsuite-extended/master/datasets/country-codes/datapackage.json"));
$schema_descriptor = $datapackage_descriptor->schema;
$table = new Table("https://raw.github.com/datasets/country-codes/master/data/country-codes.csv", $schema_descriptor);
foreach ($table->read() as $row) {
  print($row["name"]);
};

Table class should support inferring schema from data source

use frictionlessdata\tableschema\DataSources\CsvDataSource;
use frictionlessdata\tableschema\Schema;
use frictionlessdata\tableschema\Table;

$dataSource = new CsvDataSource("https://raw.github.com/datasets/country-codes/master/data/country-codes.csv");
$schema = new InferSchema();
$table = new Table($dataSource, $schema);

// iterate over as many rows as you want, as you iterate the schema will become more accurate and will raise exceptions if rows invalidate the inferred schema

$numRows = 0;
foreach ($table as $row) {
    if (++$numRows > 10) break;
};

$inferredDescriptor = $schema->descriptor();

[Task] investigate possible schema inconsistency

According to the table schema spec, foreignKeys[0].reference.resource should be uri format and is required.

However, according to description of resource, it includes a possibility to have empty string which references to self.

This does not pass schema validation..

For now, I implemented a hack which allows it to pass validation, but need to see what's the proper way to fix

    "foreignKeys": {
     // ...
          "reference": {
            "type": "object",
            "required": [
              "resource",
              "fields"
            ],
            "properties": {
              "resource": {
                "type": "string",
                "format": "uri"
              },

Update development packages to reflect PHP 7.1 requirements

Where PHP 7.1 is the minimum support version for this package, the development packages are locked to versions that require older versions of PHP.

Specifically, these changes are required:

  • Upgrade phpunit/phpunit to a version compatible with PHP 7.1
  • Replace satooshi/php-coveralls with php-coveralls/php-coveralls.

Provide a Docker configuration

Provide a Docker configuration to allow developers to contribute without having to change their local environment.

Noteably, it's a problem right now to run tests if we have PHP 8. So a docker image will allow us to run tests without having to downgrade.

Primary Key is not enforced

This package allows a primary key to be set, but does not enforce the behaviour as per the specification:

A primary key is a field or set of fields that uniquely identifies each row in
the table. Per SQL standards, the fields cannot be null, so their use in the
primary key is equivalent to adding required: true to their
constraints.

The expectation is that if a table contains a row with a duplicate primary key, then an exception should thrown.

Additionally, an exception should also be thrown if a field that is part of the primary key returns null.

Field object should support casting and validating values

use frictionlessdata\tableschema\Field;
use frictionlessdata\tableschema\FieldCastValueError;

$field = new Field((object)["name" => "id", "type" => "integer"]);
try {
  $value = $field->castValue("5"); // casts to integer
  $value == 5
} catch (FieldCastValueError $e) {
 // value failed validation
}

Table class should support validation of data source

use frictionlessdata\tableschema\Table;
use frictionlessdata\tableschema\InferSchema;
use frictionlessdata\tableschema\DataSources\CsvDataSource;
use frictionlessdata\tableschema\TableValidationError;
use frictionlessdata\tableschema\TableValidator;

// will infer a schema and validate against it
// will raise exception in case of problem inferring the schema
$schema = InferSchema();
$dataSource = new CsvDataSource("https://raw.github.com/datasets/country-codes/master/data/country-codes.csv")
$table = new Table($dataSource, $schema);
foreach ($table->read() as $row) {
  // will raise Exception if a row's data doesn't match the schema
  // $row data is cast to correct native value according to schema
};

implementation notes

  • behind the scenes - should use the Schema validateRow function, see #9

Enforce lowest compatible versions of dependencies

Some earlier versions of packages are allowed in composer.json that have fatal bugs or are incompatible with some versions of PHP.

The lowest compatible version for each dependency should be updated to ensure this package will function correctly.

PHP 8.1

Overview

Given that PHP 8.0 EOL is on November 2023, I'd like to know if there's any ongoing effort to make this library compatible with php 8.1.

Below are the results of running PHPStan agains the src directory. Since I'm working on a project that uses this library, I'd be more than happy to help out with this effort.

 ------ ----------------------------------------------------------------------------------- 
  Line   Fields/BaseField.php                                                               
 ------ ----------------------------------------------------------------------------------- 
  :101   Unsafe usage of new static().                                                      
         ๐Ÿ’ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static  
  :118   Unsafe usage of new static().                                                      
         ๐Ÿ’ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static  
 ------ ----------------------------------------------------------------------------------- 

 ------ ----------------------------------------------------------------------------------- 
  Line   Schema.php                                                                         
 ------ ----------------------------------------------------------------------------------- 
  :76    Unsafe usage of new static().                                                      
         ๐Ÿ’ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static  
 ------ ----------------------------------------------------------------------------------- 

 ------ -------------------------------------------------------------------------------------------- 
  Line   SchemaValidator.php                                                                         
 ------ -------------------------------------------------------------------------------------------- 
  :28    Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$descriptor.  
  :29    Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$errors.      
  :37    Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$errors.      
  :52    Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$errors.      
  :59    Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$descriptor.  
  :78    Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$descriptor.  
 ------ -------------------------------------------------------------------------------------------- 

 ------ ------------------------------------------------------------------------------------------------------------------------------------------------------------ 
  Line   Table.php                                                                                                                                                   
 ------ ------------------------------------------------------------------------------------------------------------------------------------------------------------ 
  :57    Unsafe usage of new static().                                                                                                                               
         ๐Ÿ’ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static                                                                           
  :163   Return type mixed of method frictionlessdata\tableschema\Table::current() is not covariant with tentative return type mixed of method Iterator::current().  
         ๐Ÿ’ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.                                                      
  :216   Return type mixed of method frictionlessdata\tableschema\Table::rewind() is not covariant with tentative return type void of method Iterator::rewind().     
         ๐Ÿ’ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.                                                      
  :226   Return type mixed of method frictionlessdata\tableschema\Table::key() is not covariant with tentative return type mixed of method Iterator::key().          
         ๐Ÿ’ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.                                                      
  :231   Return type mixed of method frictionlessdata\tableschema\Table::next() is not covariant with tentative return type void of method Iterator::next().         
         ๐Ÿ’ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.                                                      
  :238   Return type mixed of method frictionlessdata\tableschema\Table::valid() is not covariant with tentative return type bool of method Iterator::valid().       
         ๐Ÿ’ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.                                                      
 ------ ------------------------------------------------------------------------------------------------------------------------------------------------------------ 

EDIT: Adding deprecated warnings issued by PHP 8.1:

PHP Deprecated:  Return type of frictionlessdata\tableschema\Table::current() should either be compatible with Iterator::current(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 163

PHP Deprecated:  Return type of frictionlessdata\tableschema\Table::next() should either be compatible with Iterator::next(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 231

PHP Deprecated:  Return type of frictionlessdata\tableschema\Table::key() should either be compatible with Iterator::key(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 226

PHP Deprecated:  Return type of frictionlessdata\tableschema\Table::valid() should either be compatible with Iterator::valid(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 238

PHP Deprecated:  Return type of frictionlessdata\tableschema\Table::rewind() should either be compatible with Iterator::rewind(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 216


Please preserve this line to notify @courtney-miles (lead of this repository)

Schema objects should provide functions for getting descriptor data

use frictionlessdata\tableschema\Schema

$schema = new Schema((object)[
    "fields" => [
        (object)["name" => "id", "type" => "integer"],
        (object)["name" => "height", "type" => "integer"]
    ]
]);

foreach ($schema->fields() as $field) {
  print($field->name);  // "id"
  print($field->type->type); // "integer"
  print($field->type->castValue("456")) // 456 (native integer type)
}

$schema->hasField("foo");  // check if there is a field with that name
$schema->getField("id");  // get the field object for the field with the given name
$schema->headers(); // ["id", "name"] -  list of all field names
$schema->primaryKeys();
$schema->foreignKeys();

Field API feedback

Overview

Based on this readme listing I'm adding feedback based on existent implementations and expected lib user competencies (as we target many almost non-tech users - publisher, data wranglers etc).

$filed = new FieldsFactory::field((object)["name" => "id", "type" => "integer"])

Factory class?

I think it's good for programmers but non-tech users more prefer to don't have a deal with advanced programming concepts. Technically it should not be different but as a {USER} I'd more like to have just $field = new Field($descriptor) or $field = Field.create/load($descriptor).

Schema objects should support schema creation, editing and saving

use frictionlessdata\tableschema\Schema;
use frictionlessdata\tableschema\Field;

// create new schema
$schema = Schema::create();
// must set fields first for new Schemas
$schema->fields = [
  new Field((object)["name" => "id", "type" => "integer"])
];
// after every change - schema is validated and will raise Exception in case of validation errors

// edit schema
$fields = $schema->fields;
$fields[] = new Field(...);
$schema->fields = $fields;
$schema->foreignKeys = ["id", "name"];

// will raise exception in case the edit causes invalid schema

// save schema
$schema->save($file_path);

Release version-1

Description

This issue describes the set of tasks to complete in order to finish up work on the library.

Tasks

  • Touch base with the @jobarratt and @pwalsh to notify that you consider the work complete
  • Provide a short description / link to code for how each action is implemented, with a link to unit tests that prove each action
  • Tag your candidate code as v0.1
  • Setup travis to auto deploy tagged versions to the package management solution for your language
  • Ensure that the OKI account on the package management platform is an administrator/maintainer of the package, along with yourself
  • Receive code review from @pwalsh and address any remaining issues
  • Publish final version

Update table-schema schema

The JSON schema for table-schema has not been been updated since Nov 2017.

There have been some small changes since then.

Script generation to export datasets to SQL

The SQL COPY is easy to use, but not generates a compatible CREATE TABLE... The SQL-table structure can be obtained by some datatype-translations of $table->schema() information... So a suggestion is to add, for instance, $table->SQLcopy() and $table->SQLcreate() methods to accomplish the SQL exportation scripts.

Provide a standalone infer function

Overview

For now there is only one infer mention in readme. And it says:

You can instantiate a table object without schema, in this case the schema will be inferred automatically based on the data

$table = new Table("tests/fixtures/data.csv");
$table->schema()->fields();  // ["first_name" => StringField, "last_name" => StringField, "order" => IntegerField]

It useful (and often used in other languages) to have this functionality ALSO not related to the Table class e.g. as:

  • Schema::infer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.