frictionlessdata / tableschema-php Goto Github PK
View Code? Open in Web Editor NEWA php library for working with Table Schema.
License: MIT License
A php library for working with Table Schema.
License: MIT License
This issue documents the initial steps to get started with a new Frictionless Data implementation.
feature requests from @roll (See original request in #25)
Table class should provide more read options:
table.read(cast=false) // list of strings
read(limit=10)
could be achievedcast=false
(because header-values map doesn't work in this case)use frictionlessdata\tableschema\Schema
$schema = new Schema((object)[
"fields" => [
(object)["name" => "id", "type" => "integer"],
(object)["name" => "height", "type" => "integer"]
]
]);
$row = $schema->castRow(["id" => "123", "height" => "456"]);
// $row == ["id" => 123, "height" => 456]
$rowValidationResult = $schema->validateRow(["id" => "abc", "height" => "def"]);
if ($rowValidationResult->isValid()) {
print($rowValidationResult->row);
} else {
print($rowValidationResult->getMessages());
}
See http://specs.frictionlessdata.io/table-schema/#number (also now there is no format: currency
)
It seems she is actively using the library, and last 2 PRs were opened by her
I'm less involved with this library, so I suggest to make her the lead of the repository (if she agrees..)
Hi @OriHoch ,
I have released version 0.2.0 and the new version is not appearing on Packagist.
Packagist is suggesting that it is not auto-updated.
Are you able to help with the release?
Based on this readme listing I'm adding feedback based on existent implementations and expected lib user competencies (as we target many almost non-tech users - publisher, data wranglers etc).
// iterate over a remote data source conforming to a table schema
$table = new tableschema\Table(
new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv"),
new tableschema\Schema("http://www.example.com/data-schema.json")
);
foreach ($table as $person) {
print($person["first_name"]." ".$person["last_name"]);
}
// infer schema of a remote data source
$dataSource = new tableschema\DataSources\CsvDataSource("http://www.example.com/data.csv");
$schema = new tableschema\InferSchema();
$table = new tableschema\Table($dataSource, $schema);
foreach ($table as $row) {
var_dump($row); // row will be in inferred native values
var_dump($schema->descriptor()); // will contain the inferred schema descriptor
// the more iterations you make, the more accurate the inferred schema might be
// once you are satisifed with the schema, lock it
$rows = $schema->lock();
// it returns all the rows received until the lock, casted to the final inferred schema
// you may now continue to iterate over the rest of the rows
};
Table
class data source and schema creation?As a {USER} I'd more like to write just $table = new tableschema\Table('data-path.csv', 'schema-path.json');
instead of creating data source and schema by myself. Especially it's actual if you don't know before runtime what kind of data source you have e.g. new tableschema\Table('data-path.csv-or-xls')
(we don't support Excel here but as an example). In this case there should be $table.schema
exposed.
As a {USER} I'd more like to just have $table = new tableschema\Table('data.csv');
without schema argument to have schema infer instead of having a deal with tableschema\InferSchema();
additional class.
As a {USER} I'd like to have $table.headers
property (it's a new but useful property in https://github.com/frictionlessdata/implementations reference)
$table.save('data.csv')
is useful method in addition to $table.schema.save('schema.json')
and it could be re-used on data package level to save a data package (e.g. as zip).
It was often requested feature:
table.read(cast=false) // list of strings
It allow to work with malformed data sources and validate it e.g. filed-based with custom error handling.
Related to usage of Iterator
interface as a Table
core:
read(limit=10)
could be achievedcast=false
(because header-values map doesn't work in this case)Are we prepared to drop support for PHP 5 yet?
My vote is to drop support for PHP 5 -- any software still running on PHP 5 can still use version 0.1.9.
Background:
In #38 I initially dropped support for PHP 5 to be able to upgrade to Carbon 2. However, there was suggestion at the time that ongoing support for PHP 5 was preferred and the PR went stale.
There is now another PR to upgrade Carbon in #43
If we don't drop support for PHP 5, then this package cannot be used in software that also uses Carbon and uses PHP 7.
There's another irky PHP compatibility issue with #42 where PHP 7.4 now throws \Error
instead of \Exception
. The ideal solution would be to catch \Throwable
but that change would not be compatible with PHP 5.
Cheers
Courtney
currently field validation and casting is very basic, need to add more field types
also, need to add some more tests for validating and casting
see PHP supported vesions - PHP < 7 is not supported anymore.
This PR introduced a change which is not supported in PHP 5.6
Should we officially drop support for PHP < 7 ?
Based on this readme listing I'm adding feedback based on existent implementations and expected lib user competencies (as we target many almost non-tech users - publisher, data wranglers etc).
// validate and cast a row according to schema
$schema = new Schema('{"fields": ["name": "id", "type": "integer"]}');
$row = $schema->castRow(["id" => "1"]);
// raise exception if row fails validation
// returns row with all native values
// EditableSchema extends the Schema object with editing capabilities
$schema = new EditableSchema();
// set fields
$schema->fields([
"id" => FieldsFactory::field((object)["name" => "id", "type" => "integer"])
]);
// remove field
$schema->removeField("age");
// edit primaryKey
$schema->primaryKey(["id"]);
// after every change - schema is validated and will raise Exception in case of validation errors
// finally, you can save the schema to a json file
$schema->save("my-schema.json");
Consider as a {DATA WRANGLER} I create $schema = new Schema('{"fields": ["name": "id", "type": "integer"]}');
in REPL then just decide to edit it. So having a deal with two type of schemas could be not very user-friendly. In other languages editable objects don't have this separation.
It's actual for all Table/Schema/Field APIs
For now schema descriptor should be either object, json-string or url-path:
Schema objects can be constructed using any of the following:
php object
string containing json
string containg value supported by file_get_contents
Not sure do I understand PHP correctly here or not but why we can't write (using Field
example):
FieldsFactory::field(["name" => "id", "type" => "integer"])
instead of casting to object first:
FieldsFactory::field((object)["name" => "id", "type" => "integer"])
I suppose it could be handled inside class (cast to object) but reduce usage errors.
use frictionlessdata\tableschema\Table
$datapackage_descriptor = json_decode(file_get_contents("https://raw.githubusercontent.com/frictionlessdata/testsuite-extended/master/datasets/country-codes/datapackage.json"));
$schema_descriptor = $datapackage_descriptor->schema;
$table = new Table("https://raw.github.com/datasets/country-codes/master/data/country-codes.csv", $schema_descriptor);
foreach ($table->read() as $row) {
print($row["name"]);
};
use frictionlessdata\tableschema\DataSources\CsvDataSource;
use frictionlessdata\tableschema\Schema;
use frictionlessdata\tableschema\Table;
$dataSource = new CsvDataSource("https://raw.github.com/datasets/country-codes/master/data/country-codes.csv");
$schema = new InferSchema();
$table = new Table($dataSource, $schema);
// iterate over as many rows as you want, as you iterate the schema will become more accurate and will raise exceptions if rows invalidate the inferred schema
$numRows = 0;
foreach ($table as $row) {
if (++$numRows > 10) break;
};
$inferredDescriptor = $schema->descriptor();
Travis CI is "dying" so we're migrating all the Frictionless libraries to Github Actions.
@courtney-miles, I think we can just copy the @DiegoPino's setup from datapackage-php
. WDYT?
According to the table schema spec, foreignKeys[0].reference.resource should be uri format and is required.
However, according to description of resource, it includes a possibility to have empty string which references to self.
This does not pass schema validation..
For now, I implemented a hack which allows it to pass validation, but need to see what's the proper way to fix
"foreignKeys": {
// ...
"reference": {
"type": "object",
"required": [
"resource",
"fields"
],
"properties": {
"resource": {
"type": "string",
"format": "uri"
},
Where PHP 7.1 is the minimum support version for this package, the development packages are locked to versions that require older versions of PHP.
Specifically, these changes are required:
phpunit/phpunit
to a version compatible with PHP 7.1satooshi/php-coveralls
with php-coveralls/php-coveralls
.Provide a Docker configuration to allow developers to contribute without having to change their local environment.
Noteably, it's a problem right now to run tests if we have PHP 8. So a docker image will allow us to run tests without having to downgrade.
This package allows a primary key to be set, but does not enforce the behaviour as per the specification:
A primary key is a field or set of fields that uniquely identifies each row in
the table. Per SQL standards, the fields cannot be null, so their use in the
primary key is equivalent to adding required: true to their
constraints.
The expectation is that if a table contains a row with a duplicate primary key, then an exception should thrown.
Additionally, an exception should also be thrown if a field that is part of the primary key returns null.
use frictionlessdata\tableschema\Field;
use frictionlessdata\tableschema\FieldCastValueError;
$field = new Field((object)["name" => "id", "type" => "integer"]);
try {
$value = $field->castValue("5"); // casts to integer
$value == 5
} catch (FieldCastValueError $e) {
// value failed validation
}
use frictionlessdata\tableschema\Table;
use frictionlessdata\tableschema\InferSchema;
use frictionlessdata\tableschema\DataSources\CsvDataSource;
use frictionlessdata\tableschema\TableValidationError;
use frictionlessdata\tableschema\TableValidator;
// will infer a schema and validate against it
// will raise exception in case of problem inferring the schema
$schema = InferSchema();
$dataSource = new CsvDataSource("https://raw.github.com/datasets/country-codes/master/data/country-codes.csv")
$table = new Table($dataSource, $schema);
foreach ($table->read() as $row) {
// will raise Exception if a row's data doesn't match the schema
// $row data is cast to correct native value according to schema
};
Some earlier versions of packages are allowed in composer.json
that have fatal bugs or are incompatible with some versions of PHP.
The lowest compatible version for each dependency should be updated to ensure this package will function correctly.
Given that PHP 8.0 EOL is on November 2023, I'd like to know if there's any ongoing effort to make this library compatible with php 8.1.
Below are the results of running PHPStan agains the src
directory. Since I'm working on a project that uses this library, I'd be more than happy to help out with this effort.
------ -----------------------------------------------------------------------------------
Line Fields/BaseField.php
------ -----------------------------------------------------------------------------------
:101 Unsafe usage of new static().
๐ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static
:118 Unsafe usage of new static().
๐ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static
------ -----------------------------------------------------------------------------------
------ -----------------------------------------------------------------------------------
Line Schema.php
------ -----------------------------------------------------------------------------------
:76 Unsafe usage of new static().
๐ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static
------ -----------------------------------------------------------------------------------
------ --------------------------------------------------------------------------------------------
Line SchemaValidator.php
------ --------------------------------------------------------------------------------------------
:28 Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$descriptor.
:29 Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$errors.
:37 Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$errors.
:52 Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$errors.
:59 Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$descriptor.
:78 Access to an undefined property frictionlessdata\tableschema\SchemaValidator::$descriptor.
------ --------------------------------------------------------------------------------------------
------ ------------------------------------------------------------------------------------------------------------------------------------------------------------
Line Table.php
------ ------------------------------------------------------------------------------------------------------------------------------------------------------------
:57 Unsafe usage of new static().
๐ก See: https://phpstan.org/blog/solving-phpstan-error-unsafe-usage-of-new-static
:163 Return type mixed of method frictionlessdata\tableschema\Table::current() is not covariant with tentative return type mixed of method Iterator::current().
๐ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.
:216 Return type mixed of method frictionlessdata\tableschema\Table::rewind() is not covariant with tentative return type void of method Iterator::rewind().
๐ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.
:226 Return type mixed of method frictionlessdata\tableschema\Table::key() is not covariant with tentative return type mixed of method Iterator::key().
๐ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.
:231 Return type mixed of method frictionlessdata\tableschema\Table::next() is not covariant with tentative return type void of method Iterator::next().
๐ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.
:238 Return type mixed of method frictionlessdata\tableschema\Table::valid() is not covariant with tentative return type bool of method Iterator::valid().
๐ก Make it covariant, or use the #[\ReturnTypeWillChange] attribute to temporarily suppress the error.
------ ------------------------------------------------------------------------------------------------------------------------------------------------------------
EDIT: Adding deprecated warnings issued by PHP 8.1:
PHP Deprecated: Return type of frictionlessdata\tableschema\Table::current() should either be compatible with Iterator::current(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 163
PHP Deprecated: Return type of frictionlessdata\tableschema\Table::next() should either be compatible with Iterator::next(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 231
PHP Deprecated: Return type of frictionlessdata\tableschema\Table::key() should either be compatible with Iterator::key(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 226
PHP Deprecated: Return type of frictionlessdata\tableschema\Table::valid() should either be compatible with Iterator::valid(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 238
PHP Deprecated: Return type of frictionlessdata\tableschema\Table::rewind() should either be compatible with Iterator::rewind(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /Users/.../vendor/frictionlessdata/tableschema/src/Table.php on line 216
Please preserve this line to notify @courtney-miles (lead of this repository)
Hi @courtney-miles,
I need to change the default branch in tableschema/datapackage to main as a part of frictionlessdata/frictionlessdata.io#537. OK?
use frictionlessdata\tableschema\Schema
$schema = new Schema((object)[
"fields" => [
(object)["name" => "id", "type" => "integer"],
(object)["name" => "height", "type" => "integer"]
]
]);
foreach ($schema->fields() as $field) {
print($field->name); // "id"
print($field->type->type); // "integer"
print($field->type->castValue("456")) // 456 (native integer type)
}
$schema->hasField("foo"); // check if there is a field with that name
$schema->getField("id"); // get the field object for the field with the given name
$schema->headers(); // ["id", "name"] - list of all field names
$schema->primaryKeys();
$schema->foreignKeys();
Based on this readme listing I'm adding feedback based on existent implementations and expected lib user competencies (as we target many almost non-tech users - publisher, data wranglers etc).
$filed = new FieldsFactory::field((object)["name" => "id", "type" => "integer"])
I think it's good for programmers but non-tech users more prefer to don't have a deal with advanced programming concepts. Technically it should not be different but as a {USER} I'd more like to have just $field = new Field($descriptor)
or $field = Field.create/load($descriptor)
.
need to integrate with Scrutinizer-CI - I need admin access to the repo to do that
use frictionlessdata\tableschema\Schema;
use frictionlessdata\tableschema\Field;
// create new schema
$schema = Schema::create();
// must set fields first for new Schemas
$schema->fields = [
new Field((object)["name" => "id", "type" => "integer"])
];
// after every change - schema is validated and will raise Exception in case of validation errors
// edit schema
$fields = $schema->fields;
$fields[] = new Field(...);
$schema->fields = $fields;
$schema->foreignKeys = ["id", "name"];
// will raise exception in case the edit causes invalid schema
// save schema
$schema->save($file_path);
travis fails when running on php versions older then 7
this is due to the phpunit and behat dependencies (used for BDD / unit tests)
see this travis log for details:
https://travis-ci.org/frictionlessdata/tableschema-php/builds/219181670
This issue describes the set of tasks to complete in order to finish up work on the library.
The JSON schema for table-schema has not been been updated since Nov 2017.
There have been some small changes since then.
The SQL COPY
is easy to use, but not generates a compatible CREATE TABLE
... The SQL-table structure can be obtained by some datatype-translations of $table->schema()
information... So a suggestion is to add, for instance, $table->SQLcopy()
and $table->SQLcreate()
methods to accomplish the SQL exportation scripts.
In PR #54 the PHP CS Fixer had to be updated and the enforced code style was modified to get the new github action to pass.
We now need to restore the enforcement of Symfony Coding Standard and update the code to pass.
we are creating some temporary files during tests, need to delete them at the end of each scenario
The Github CI Action (https://github.com/frictionlessdata/tableschema-php/blob/main/.github/workflows/ci.yml) is missing a step to submit the generate code coverage to Coveralls.
For now there is only one infer
mention in readme. And it says:
You can instantiate a table object without schema, in this case the schema will be inferred automatically based on the data
$table = new Table("tests/fixtures/data.csv");
$table->schema()->fields(); // ["first_name" => StringField, "last_name" => StringField, "order" => IntegerField]
It useful (and often used in other languages) to have this functionality ALSO not related to the Table
class e.g. as:
Schema::infer
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.