semsol / arc2 Goto Github PK

View Code? Open in Web Editor NEW

331.0 331.0 92.0 867 KB

ARC RDF Classes for PHP

License: Other

PHP 96.13% JavaScript 2.83% Ruby 0.87% Makefile 0.05% Dockerfile 0.12%

arc2's People

Contributors

Stargazers

Watchers

Forkers

scor br3nda zhangtaihao haschek clockwerx kwijibo weborganics davechallis knurg rgesit enridaga liberata csurface barobba mattisg cgutteridge jonphipps luantruong sna4ever davetaz olberger abrahaj geger mitchellzen contextworks stuartraetaylor pietercolpaert asanchez75 theoryno3 knjazevandr keinai coreation in2deep bvockner web5design danmichaelo myeniad jrenelg sindicatoesp evectis talis vlinhd11 h4ck3rm1k3 ferlyz05 dorothydorothy jafarsidik calonsot niltonfjunior slnsw-webdev luisj123 ehsanmoh milesw royopa jundong958 maximelefrancois86 wendonglei arenou jianglili007 semberryltd johnulist anukat2015 fondrenlibrary samuell jaw111 debmalya bungkoko duboisp babibubebon sensecollective gubala1getu patrickmcsweeney duranto2009 peip-mirror djebr jefferyq jythks shedy2 ailintom castoredc alam1710 simontaurus 0910604546 is4code wikivalley

arc2's Issues

COUNT(distinct ?s) doesn't work with a remote endpoint

This test query -

select ?s (COUNT(distinct ?s) AS ?count)
where {
?s a ?Concept .
FILTER (?s = http://dbpedia.org/ontology/deathDate)
}
GROUP BY ?s
LIMIT 1

should work at the dbpedia endpoint.

With arc2 it works only without 'distinct', i.e. as -

select ?s (COUNT(?s) AS ?count)

EDIT by @k00ni: Formatted the SPARQL queries.

PHP 5.5 Deprecated mysql

The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead

Would you?

BIND in remote SPARQL endpoint

Hi,

I am using ARC2 for attempting a query in a remote endpoint which makes use of SPARQL 1.1 "BIND". I've tried the query directly on the endpoint and its works. But when I try to invoke it through ARC, I am getting an error message because ARC2 is trying to parse the query and cannot recognize BIND. How can I tweak arc to just issue the query on the remote endpoint without trying to parse it?

Thanks!
Lena

Errors storing triples with "(),." inside a name

Dear ARC2 community,

I am trying to INSERT DATA INTO a graph with ARC2
And it runs quite well for most values, as long as the entities which I insert do not contain the following chars:

( ) parenthesis

minus sign
, comma

A sparql query used in my application is thus:
prefix owl: http://www.w3.org/2002/07/owl#
prefix xsd: http://www.w3.org/2001/XMLSchema#
prefix rdfs: http://www.w3.org/2000/01/rdf-schema#
prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
prefix Ontology1312540720780: http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#
INSERT INTO http://localhost
{
Ontology1312540720780:'charge_carriers(kind-of-carriers)' rdfs:seeAlso 'stripe' .
}

results in an error:

Construct Template not found in ARC2_SPARQLPlusParser

-> This should not happen, since the requirements for an xml name (see http://www.w3.org/TR/REC-xml/#NT-Nmtoken) foresee the presence of parenthesis inside a name

-> Could anybody tell me how to fix that?
Thanks
Cheers
Redskate

Microdata

Hi,

this is a great project, really thanks! But I don't understand if there is any support to Microdata. On "https://github.com/semsol/arc2/wiki/Extracting-RDF-from-HTML" I don't find any "Available Extractors" for Microdata, but on "https://github.com/semsol/arc2/wiki/Old-Release-Notes" I find "new Microdata methods getMicrodataAttrs and mdAttrs shortcut". So is there any support about microdata?? Best regards

Paolo Pancaldi

Unable to identify format and parse chinese content?

While trying to parse http://semweb.csdb.cn/csdb/data/database/10017567 I obtain zero triples

$parser = ARC2::getRDFParser();
$parser->parse("http://semweb.csdb.cn/csdb/data/database/10017567");
$triples = $parser->getTriples();

When asking for the content type, it returns correctly though.

$ curl -I  http://semweb.csdb.cn/csdb/data/database/10017567 
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Vary: Accept
Content-Type: text/turtle;charset=utf-8
Content-Length: 5583
Date: Wed, 12 Sep 2012 16:51:42 GMT

It seems the problem is trying to identify the format of the document.

RDF/XML Blank node not parsing properly.

When loading rdf/xml like:

...
<documents:jurisdiction rdf:resource="http://rdf.muninn-project.org/ontologies/documents#US_Copyright_Act_of_1909"/>
documents:date_published
<time:before rdf:parseType="http://www.w3.org/2001/XMLSchema#gYear">1920/time:before
/documents:date_published
...

with

$parser->parse(the_file.rdf);

generates

...
<ns0:jurisdiction rdf:resource="http://rdf.muninn-project.org/ontologies/documents#US_Copyright_Act_of_1909"/>
<ns0:date_published rdf:nodeID="arce720b1"/>
/rdf:Description

<rdf:Description rdf:nodeID="arce720b1">
<rdf:type rdf:resource="http://www.w3.org/2006/time#before"/>
/rdf:Description

The '1920' and gdate are lost lost.

Interestingly, parsing of "rdf:parseType" is case sensitive and the parsers drops information unless 'parsetype' is in lower case:

<documents:Image rdf:about="http://rdf.muninn-project.org/ww1/2011/11/11/Image/e73684a85eda5e96189b2cff8c85d814">
<documents:date_retrieved rdf:parseType="http://www.w3.org/2001/XMLSchema#date">20120215/documents:date_retrieved
/documents:Image

get munged to

<rdf:Description rdf:about="http://rdf.muninn-project.org/ww1/2011/11/11/Image/e73684a85eda5e96189b2cff8c85d814">
<rdf:type rdf:resource="http://rdf.muninn-project.org/ontologies/documents#Image"/>
/rdf:Description

but

<documents:Image rdf:about="http://rdf.muninn-project.org/ww1/2011/11/11/Image/e73684a85eda5e96189b2cff8c85d814">
<documents:date_retrieved rdf:parsetype="http://www.w3.org/2001/XMLSchema#date">20120215/documents:date_retrieved
/documents:Image

gives

I haven't tracked down what makes the date datatype disappear but that seems to be another issue.

Specifying XML Schema datatype in SELECT clause

Hi everyone,

I ran upon an interesting issue when I needed to use XSD conversion in SELECT clause. My app communicates with Virtuoso graph database at localhost for now. When I run this query using the sparql web interface of my Virtuoso conductor, it gives me the correct result. Unfortunately, the exact same query returns an empty result using ARC2 in my PHP app.

My first problem is that SPARQL does not recognize the XSD specified in my data objects, so i.e. 5142.21^^xsd:decimal is still handled as a string. But that is not the point. I was always solved it with i.e. xsd:decimal(?value) conversion. This is the first time I used this conversion in SELECT clause and the problem came up.

The query looks like this (the prefixes are omitted):

SELECT ?region (sum(xsd:decimal(?value)) as ?sum)
FROM <http://mytest.com>
WHERE { 
    ?region a cm:Region ;
        gn:officialName     ?regionName .
    ?entity a cm:Municipality ;
        cm:inRegion         ?region .          
    ?statement a <IncomeStatement> ;
        <reportedBy>        ?entity ;
        <profitBeforeTaxation>  [ gr:hasCurrencyValue ?value]          
 }
GROUP BY(?region)

The critical part is count(xsd:decimal(?value)) as ?count. Because when I switch the SUM clause for COUNT and remove the XSD conversion, everything is ok. But how come that the query works well in the Virtuoso conductor sparql interface and not using ARC?

Ondra

Code coverage failure in unit tests

Hi.

I've tried and run the test suite, and even though the tests are fine, the code coverage isn't.

Any suggestion ?

$ phpunit --verbose --coverage-html /home/olivier/Developpement/git/php-arc/tests/coverage --filter Test .
PHPUnit 3.6.10 by Sebastian Bergmann.

Configuration read from /home/olivier/Developpement/git/php-arc/tests/unit/phpunit.xml

........................................

Time: 0 seconds, Memory: 6.50Mb

OK (40 tests, 114 assertions)

Generating code coverage report, this may take a moment.PHP Fatal error: Cannot redeclare class ARC2_SPARQLParser in /home/olivier/Developpement/git/php-arc/parsers/ARC2_SPARQLParser.php on line 777
PHP Stack trace:
PHP 1. {main}() /usr/bin/phpunit:0
PHP 2. PHPUnit_TextUI_Command::main() /usr/bin/phpunit:46
PHP 3. PHPUnit_TextUI_Command->run() /usr/share/php/PHPUnit/TextUI/Command.php:130
PHP 4. PHPUnit_TextUI_TestRunner->doRun() /usr/share/php/PHPUnit/TextUI/Command.php:192
PHP 5. PHP_CodeCoverage_Report_HTML->process() /usr/share/php/PHPUnit/TextUI/TestRunner.php:373
PHP 6. PHP_CodeCoverage->getReport() /usr/share/php/PHP/CodeCoverage/Report/HTML.php:133
PHP 7. PHP_CodeCoverage_Report_Factory->create() /usr/share/php/PHP/CodeCoverage.php:141
PHP 8. PHP_CodeCoverage->getData() /usr/share/php/PHP/CodeCoverage/Report/Factory.php:65
PHP 9. PHP_CodeCoverage->processUncoveredFilesFromWhitelist() /usr/share/php/PHP/CodeCoverage.php:173
PHP 10. include_once() /usr/share/php/PHP/CodeCoverage.php:520
PHP 11. ARC2::inc() /home/olivier/Developpement/git/php-arc/debian/libarc-php/usr/share/php/arc/parsers/ARC2_SPARQLPlusParser.php:11

Thanks in advance.

Strange triples s and o values after loading a big OWL file

Hello Benjamin / dear ARC community

I am using ARC to import OWL files into an ARC local store.
This worked well until I found an ontology for which after importing the triples in the store looked very funny … instead of the value for s and o you see there expressions like _:b809455593_arcb6bfb

For instance a triple looks like:

_:b809455593_arcb6bfb1 http://www.w3.org/2002/07/owl#Annotation _:b928156658_arcb6bfb5

This is not what it was expected to be seen in the triple store. Something goes here wrong.
The OWL file loads perfectly into protege 4.1 …

Some snippets of the OWL file are listen below:

<Declaration>
    <Class IRI="#2D_Graphics_(Cairo,_QPainter)"/>
</Declaration>
<ClassAssertion>
    <Class IRI="http://sw.nokia.com/DP-1/Feature"/>
    <NamedIndividual IRI="http://sw.nokia.com/DP-1/Feature/mobile_tv"/>
</ClassAssertion>

What can I do in order to get ARC to parse properly this OWL file?

Thank you for any information on it.

Regards
Fabio

PS: I found in ARC2_StoreLoadQueryHandler.php() some code storing expressions like the one above.
(Code Snippet ... follows) but I have still no idea what it does and why.

function addT($s, $p, $o, $s_type, $o_type, $o_dt = '', $o_lang = '') {
if (!$this->has_lock) return 0;
$type_ids = array ('uri' => '0', 'bnode' => '1' , 'literal' => '2');
$g = $this->getStoredTermID($this->target_graph, '0', 'id');
$s = (($s_type == 'bnode') && !$this->keep_bnode_ids) ? ':b' . abs(crc32($g . $s)) . '' . (strlen($s) > 12 ? substr(substr($s, 2) , -10) : substr($s, 2)) : $s;
$o = (($o_type == 'bnode') && !$this->keep_bnode_ids) ? ':b' . abs(crc32($g . $o)) . '' . (strlen($o) > 12 ? substr(substr($o, 2), -10) : substr($o, 2)) : $o;
/* triple */
$t = array(
's' => $this->getStoredTermID($s, $type_ids[$s_type], 's'),
'p' => $this->getStoredTermID($p, '0', 'id'),
'o' => $this->getStoredTermID($o, $type_ids[$o_type], 'o'),
'o_lang_dt' => $this->getStoredTermID($o_dt . $o_lang, $o_dt ? '0' : '2', 'id'),
'o_comp' => $this->getOComp($o),
's_type' => $type_ids[$s_type],
'o_type' => $type_ids[$o_type],
);

Support for contains, lcase...

Is it a mean to get some useful string functions like CONTAINS, LCASE...?

ARC2_RemoteStore::runQuery() decides when to use GET or POST.

Currently $mthd is used to decide both the Method and whether reading or writing is occurring. This should probably be separated. Then, it might be better if a variable could be set to override this behavior (or config option).

The reason I bring this up, is my hosting services restricts the query string value lengths by default to 512, and my SPARQL queries were running ~800 in length. I'm fixing my account, but it got me thinking... maybe it would be better to have a config option for using GET or POST.

Also, the current implementation uses the "$mthd" (method) to decide whether to use 'store_read_key" or "store_write_key". Instead, a variable called "$is_read_operation" could be used.

Some of the examples in RFC3986 5.4.1 fail when tested against calcUri()

I have been testing the calcUri() against the examples in RFC3986, Section 5.4.1 and most pass but there a few that fail:

"?y" = "http://a/b/c/d;p?y"
"." = "http://a/b/c/"
"./" = "http://a/b/c/"
".." = "http://a/b/"
"../" = "http://a/b/"
"../.." = "http://a/"

See also the tests at the bottom of:
https://github.com/dajobe/raptor/blob/master/src/raptor_rfc2396.c

String-serializer for ARC2 library

String-serializer would be a very useful addition to the library.
Are there any plans to add it in the near future?

empty results when testing simple search asking for predicate an objects

I have tried to test the following with a remote store:
SELECT *
WHERE {
http://page/resource/product123 ?p ?o
}

var_dump($this->store->query($q,'rows'));
die();

The following returns an emtpy array when using arc2 , but when using another semantic webtool it returns the expected results.
What am i doing wrong?

(PS: i have included < and > with my subject url

Fail detect format with LOAD query

if xml has remark in beginning document > 1024 byte

....

in ARC2_Reader->getFormat (...readStream( )
-> ARC2::getFormat($v...
-> ARC2_getFormat() detect format (by remark content) as "ntriples" (need "rdfxml")

example:
LOAD http://www.ebusiness-unibw.org/ontologies/eclass/5.1.4/eclass_514en.owl

getErrors() return
Array (
[0] => too many loops: 501. Could not parse "
)

SPARQL parsing doesn't support SPARQL 1.1

Currently there is only support for SPARQL 1.0, will be there any plans on supporting SPARQL 1.1?

Query (with UNION) on remote sparql endpoint.

I can't make this query work:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT
    ?p
    ?genre
    MAX(?hometown) AS ?hometown
    MAX(?birthPlace) AS ?birthPlace
    ?birthDate
    ?activeYearsStartYear
WHERE
{
  {
    ?p rdf:type <http://schema.org/MusicGroup> .
    ?p rdfs:label "Francesco Guccini"@en .
    ?p dbpedia-owl:genre ?genre .
    OPTIONAL {?p <http://dbpedia.org/ontology/hometown> ?hometown }
    OPTIONAL {?p <http://dbpedia.org/ontology/birthPlace> ?birthPlace }
    OPTIONAL {?p <http://dbpedia.org/ontology/birthDate> ?birthDate }
    OPTIONAL {?p <http://dbpedia.org/ontology/activeYearsStartYear> ?activeYearsStartYear}
  }
  UNION
  {
    ?p rdf:type <http://schema.org/MusicGroup> .
    ?p foaf:name "Francesco Guccini"@en .
    ?p dbpedia-owl:genre ?genre .
    OPTIONAL {?p <http://dbpedia.org/ontology/hometown> ?hometown }
    OPTIONAL {?p <http://dbpedia.org/ontology/birthPlace> ?birthPlace }
    OPTIONAL {?p <http://dbpedia.org/ontology/birthDate> ?birthDate }
    OPTIONAL {?p <http://dbpedia.org/ontology/activeYearsStartYear> ?activeYearsStartYear}
  }
} LIMIT 1

The endpoint is: http://dbpedia.org/sparql/
As you can see here the query is working.

Am I doing something wrong?
Thanks for your help.

RDFa parsing fails if no trailing \n

We just spotted an issue with the RDFa parser in ARC. It looks like it is prone to forget some triples if the document does not end with a \n.

The following gist shows the issue: https://gist.github.com/725644 - this gives 5 triples. However, 10 are embedded in the HTML document. Adding a \n at the end of that document gives 10 triples.

This behavior has been tested with the latest version of ARC.

Turtle parse fails if @base appears after first triple

There seems to be an assumption in the TTL parser that the @base term can appear only once, and before any triples are specified. This breaks parsing http://www.w3.org/ns/prov# (need to accept: text/turtle to see that URI as turtle)

Many delete requests causes mysql error: Could not get lock in "cleanTableReferences"

Even if I make the minimum time half a second apart between requests, after 10 consequentual requests or so I get this error.

Anything that can be done either to fix this or get around it?

Support RDFa 1.1

It looks like ARC2's parsers don't support RDFa 1.1:

Test script:

require_once "arc2/ARC2.php";
$parser = ARC2::getRDFParser();
$parser->parse( 'test.html' );
var_dump( $parser->getTriples() );

This being test.html

<!DOCTYPE html>
<html prefix="dc: http://purl.org/dc/elements/1.1/" lang="en" dir="ltr">
<head>
    <title>...</title>
    <base href="http://example.org/">
</head>
<body>
    <div about="http://example.org/FooBar">
        <span property="dc:title">FooBar</span>
    </div>
</body>
</html>

In the resulting triples 'p' for http://example.org/FooBar is dc:title instead of the http://purl.org/dc/elements/1.1/title you'd get using RDFa 1.0's xmlns.

No subquerys possible?

Hi,

I'm trying to run a query like this:
http://www.cambridgesemantics.com/2008/09/sparql-by-example/#%2845%29

PREFIX gr: <http://purl.org/goodrelations/v1#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#>                     
SELECT * WHERE {
    {
        SELECT DISTINCT ?grL WHERE {
            ?grL a gr:LocationOfSalesOrServiceProvisioning .
        } LIMIT 5
    }
    ?grL rdfs:label ?name .
    OPTIONAL { ?grL rdfs:comment ?comment } .
    OPTIONAL { ?grL vcard:label ?address_label } .
    OPTIONAL { ?grL vcard:tel ?telefone } .
}

It works on the SPARQL-Endpoint's Web Interface but not if I use ARC. Any solution or workaround?

credentials for sesame remote_store_endpoint?

I have a Sesame 2.7.11 store I can access fine with Workbench and the Win2 client. The Win2 client respects permissions set in web.xml, so I can secure it. Can't run queries with arc2 though if any security is set. arc2 does work if I leave the store wide open.

How can I send BASIC authentication credentials? I see in ARC2_Reader.php an arc_reader_credentials array() and tried un-commenting and supplying a 'host' => 'user:pass' for the endpoint but I get 500 errors in the browser. Tried hard and failed to locate any other documentation.

any suggestions appreciated, thanks

Table does not unlock after delete

Hi,
i get some DB errors (Could not get lock in "runQuery" ...) when i delete a triple and after that insert a new one.
Think theres a missing release/unlock Table after delete.
In class ARC2_StoreDeleteQueryHandler function cleanValueTables().

Ill add $this->store->releaseLock(); at the end of function near line 229.

Error is gone.

Function LANG

Hi,

This query seem to be complaint with SPARQL 1.0

SELECT DISTINCT lang(?label) as ?lang WHERE { ?data skos:prefLabel ?label } limit 1

but I obtain "No result bindings specified. in ARC2_SPARQLPlusParser"

What's wrong... I try this query in http://dbpedia.org/sparql and runs.

There is any other method to get the different languages of the labels.

Thanks in advance,
Juan

Broken Download Link

http://arc.semsol.org/download links to what is now a 404 error.

Traversing a subgraph

Hello everybody

I need to traverse a subgraph in my ARC2 RDF store, some SPARQL engines like virtuoso use transitive closures to get all data inside a subgraph starting from a given node. I would like to do the same with ARC, starting from a node (a subject) inside an RDF graph in the ARC triple store and getting all the outgoing triples from that node on in the store.

Is there an elegant solution without using SPARQLSCRIPT?
Thank you very much in advance for any suggestion / answer

Cheers

JSON Serializer produces invalid JSON if a value ends with quote

serializers/ARC2_RDFJSONSerializer.php->jsonEscape($v) begins by escaping the passed var using json_encode(). If the var is a string, the result will be a string wrapped by quotes. The function then removes these quotes using trim(),

if (function_exists('json_encode')) return trim(json_encode($v), '"');

However, if a string passed into the function ends with a quote, json_encode() will result in,

"This is a string "ending in a quote/""

trim() will then remove both quotes, resulting in,

This is a string "ending in a quote/

When this string is passed to the JSON output, the result is,

{
"example": "This is a string "ending in a quote/"
}

... where the errant slash at the end escapes the JSON ending quote, resulting in invalid JSON.

This is the update I've made locally, which explicitly removes a single quote from the beginning and end if they exist.

if (function_exists('json_encode')) { // Updated by Craig
$v = json_encode($v);
if ('"'==substr($v,0,1)&&'"'==substr($v,-1,1)) $v = substr($v, 1, -1);
return $v;
}

(Should probably use regex instead.)

Thanks!

problem with relative path on windows eg $parser->parse('a/file.ttl')

Bonjour,
when running arc2 on a windows system (for instance for testing on localhost with windows&xampp), the relative filename are not handled properly, files are therefore not found.

For instance the following statement returns error
$parser->parse('a/file1.ttl')
$parser->parse('file2.ttl')
although the file exist.

According to a first investigation, the problem seems to be in
ARC2_Class::calcURI or ARC2_Class::calcBase
In particular the php function realpath($r) returns path like c:\xampp\xxx etc. on windows

A workaround is to provide the absolute path and use "file:"
The examples below work properly
$parser->parse('file:C:\xampp\dir\a/file1.ttl')
$parser->parse('file:".dirname(FILE).'/a/file1.ttl')

SERVICE clause is not parsed

The Service clause is specified on SPARQL 1.1 documentation [1]. However, when parsing any query specifiying this clause, the parser fails.

Code:

include_once('arc2-master/ARC2.php');
$parser = ARC2::getSPARQLParser();

$q= '
PREFIX foaf:   <http://xmlns.com/foaf/0.1/>
SELECT ?name
FROM <http://example.org/myfoaf.rdf>
WHERE
{
  <http://example.org/myfoaf/I> foaf:knows ?person .
  SERVICE <http://people.example.org/sparql> { 
    ?person foaf:name ?name . } 
}
';

$parser->parse($q);

yields the error Incomplete or invalid Group Graph pattern. Could not handle " SERVICE <http://people.exam"

[1] http://www.w3.org/TR/2012/PR-sparql11-federated-query-20121108/

count(*) seems to break the SPARQLPLUSPARSER

Hi all,

Upon performing the following query:

select (count(*) AS ?count) WHERE { ?s ?p ?o }

The SPARQLPLUSPARSER break, and throws an error that it cannot find an column value with key 'value'. After further investigation, I've come to the line where this is thrown which is line 51 (ish) in the corresponding php file of SPARQLPLUSPARSER.

After some debugging, I couldn't find what went wrong, but the value key wasnt present. This made me try the following work around, inspired by the comment that stands above the if which was "* or var" :

/* * or var */
    if ((list($sub_r, $sub_v) = $this->x('\*', $v)) && $sub_r) {

//[START WORKAROUND]
      // The regex fails to parse * from the count(*) as a selector
      $val = '*';

      if (!empty($sub_r['value'])) {
        $val = $sub_r['value'];
      }
// [END WORKAROUND]
      return array(array('var' => $val, 'aggregate' => $aggregate, 'alias' => $aggregate ? $result_var : ''), $sub_v);
    }

Does anyone have an idea if this might break other things, I'm going to try some queries with this workaround, also non * selectors. In any case, I'll do a fork and do my queries through the fork. If this is a valid bug, and this (sorta) solves it. I'd be happy to provide a pull request.

I Can't execute queries with FROM clause

Hi,
I tried to execute a sparql query containing FROM clause, the result was empty, while when I executed the same query here : http://demo.openlinksw.com/sparql/
there were a set of results.

Namespaces on INSERT INTO lead to a '##' (wrong) token storage

Hi I use the latest version of ARC (July 2011)
and I am adding in ARC2 triples
with the statement

INSERT INTO http://localhost
{
Ontology1312540720780:charge_carriers rdfs:seeAlso 'stripe' .
}

After that, I list all the triples in the store and I detect that all my triples containing rdfs:seeAlso have a qualifier with 2 # (hende: '##') which is an error, and the RDF serializer makes ns0:seeAlso out of it and protege does not read it inside the same element (charge_carriers)
.

The bug is probably parsing the namespaces. Correct namespaces may contain a # at the end of the URI representation. See below - see a correct OWL output of protege 4.1 further below this note in the appendix section.

TODO: Analyze namespace processing and prevent ## to be produced when inserting triples.

THANKS
Cheers
redskate

Appendix: Correct representation of an OWL file made by protege 4.1:

<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
<!ENTITY Ontology13125407207802 "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#6" >
<!ENTITY Ontology13125407207803 "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#4" >
<!ENTITY Ontology13125407207804 "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#2" >
<!ENTITY Ontology13125407207806 "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#15" >
<!ENTITY Ontology13125407207805 "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#0-10" >
<!ENTITY Dimming "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#Dimming/" >
<!ENTITY Boost "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#Step-up_(Boost)" >
<!ENTITY EMI_filtering_ "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#EMI_filtering_&amp;" >
<!ENTITY Buck "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#Step-down_(Buck)" >
<!ENTITY charge_pump "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#Inductorless_(charge_pump)" >
<!ENTITY Networking_with_existing_wiring "http://www.semanticweb.org/ontologies/2011/7/Ontology1312540720780.owl#Networking_with_existing_wiring/" >

RDF prefixes not parsed in turtle parser

Steps to reproduce:

Place the following code in a PHP file, test.php, inside the ARC2 folder:

<?php

require_once('ARC2.php');

// Init parser
$parser = ARC2::getTurtleParser();

$turtle = <<<EOD

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix cd: <http://www.recshop.fake/cd#> .
@prefix countries: <http://www.countries.org/onto/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://www.recshop.fake/cd/Empire Burlesque>
    cd:artist "Bob Dylan" ;
    cd:country countries:USA ;
    cd:company "Columbia" ;
    cd:price "10.90" ;
    cd:year "1985" .

<http://www.recshop.fake/cd/Hide your heart>
    cd:artist "Bonnie Tyler" ;
    cd:country "UK" ;
    cd:company "CBS Records" ;
    cd:price "9.90" ;
    cd:year "1988" .

countries:USA
    rdfs:label "USA" .

countries:Albums
    rdfs:subClassOf countries:MediaCollections .

EOD;

$parser->parse( $turtle );

// Print out the name spaces array
print_r( $parser->nsp );

Execute the php script:

php test.php

Expected result:

Array
(
    [http://www.w3.org/1999/02/22-rdf-syntax-ns#] => rdf
    [http://www.recshop.fake/cd#] => cd
    [http://www.countries.org/onto/] => countries
    [http://www.w3.org/2000/01/rdf-schema#] => rdfs
)

Actual result:

Array
(
    [http://www.w3.org/XML/1998/namespace] => xml
    [http://www.w3.org/1999/02/22-rdf-syntax-ns#] => rdf
    [http://www.w3.org/2001/XMLSchema#] => xsd
)

Encoding issue in the NTriplesSerializer

Hi,

I figured out a problem with escape function

Problem:
André -> the é is nicely escaped with \u00E9 (iirc)
Andréé -> the éé is replaced with \uAAA9 ( a square character)

Now what I found through debugging is that when putting through characters with the preg_replace_callback, the "éé" sequence is seen as 1 character, even with the mb_strlen functionality. If I however comment the line where you utf8_decode a string on the second line of the escape function, this "éé" sequence is done properly with two \u00E9 sequences.

My guess is that the utf8_decode unwillingly decodes a good utf8-string (why in the first place is this necessary?) and this messes up the mb_strlen, where utf-8 is given as the character encoding, yet the string is now ISO-8859-1 through the utf8_decode...

Nondeterministic Occurrence of XML_WAR_NS_URI

While parsing using ARC2_RDFXMLParser, I am able to predictably change the location of a thrown XML_WAR_NS_URI (XML Error Code 99) exception by editing the maximum fread() $d_size in ARC2_Reader.

My document appears to be well-formed, declares all required namespaces, and contains only valid UTF-8 characters.

Does ARC2 support reading and writing to default graph?

I am trying to find out how ARC2 supports default graph operations. Writing to default graphs seems to work:

INSERT DATA { http://example.org/a http://example.org/b http://example.org/c }

But afterwards the tables are still empty, nothing was inserted.

Same goes for the following SPARQL SELECT query:

SELECT * { http://example.org/a ?p ?o }

It gets passed, but returns nothing.

Any ideas?

DROP graph not supported?

I currently inject triples into named graphs, which works just fine. However I seem to run into problems when I perform a "DROP GRAPH <name_of_the_graph>" query. Is this type of query supported out of the box (I don't see the drop statement in the store documentation), and if not supported is there a way to manage these named graphs, because if you can't delete them, it seems like a small mem-leak to me?

Thanks in advance.

Bnode Id's Re-used in INSERT queries

Hello I have encountered an issue whereby if I insert two sets of triples with blank nodes attached, the blank node ID's are the same for both sets.

For example, running the two insert statements below in separate requests results in each of the recipes having both ingredient lists attached.


INSERT INTO <...>
myRecipe:Pizza a recipe:Recipe ;
rdfs:label "Pizza"
recipe:ingredients [ a recipe:IngredientList ;
rdf:_1 [ a recipe:Ingredient ;
recipe:food myFood:Chicken ;
]
]


INSERT INTO <...>
myRecipe:Carbonara a recipe:Recipe ;
rdfs:label "Spaghetti Carbonara"
recipe:ingredients [ a recipe:IngredientList ;
rdf:_1 [ a recipe:Ingredient ;
recipe:food myFood:Bacon ;
]
]

Therefore running a query such as the following will return both of the previously inserted recipes


SELECT ?recipe
WHERE {
?recipe a recipe:Recipe ;
recipe:ingredients ?ingredients .
?ingredients ?p ?s .
?s a recipe:Ingredient ;
recipe:food myFood:Chicken
}

Note: this is only an issue when using the insert mechanism, the bnodes are parsed correctly when importing from a file in ttl format using the load mechanism.

JSON Serializer and end-quote

Hi everyone:

Today I installed ARC2, from git source, in a small server and load into it around 200K triples. So I've tested the recommended endpoint, as appears in the wiki page.

But JSON serializer has some problem: I've a parsing error with end quote of some strings. I checked it and it's true: some string are like this:

"title": {
"type": "literal",
"value": "The new wave "A book of the test"
},

And the parser won't like this string :(.

I hace implemened the solution in the issue #10, but the problem still exists. Any idea?

Thanks in advance

Add support for serializing to JSON-LD

As JSON-LD is standardized for a while, would be nice to support it as a serialization format

PHP function "crc32" is platform-dependent.

BACKGROUND:
I'm trying to export the ARC2 database data to a new server, but a straight MySQL table copy might cause problems when using ARC2 on the new data.

So instead of a table copy, I am exporting and importing using createBackup() and then query('LOAD ...'), which is also VERY slow:

// To export
$store->createBackup('backup_file.spog');

// To import
$store->query('LOAD <file://FULL_PATH_TO_FILE/backup_file.spog>');

ISSUE:
ARC2 uses the http://php.net/crc32 hashing function to look up a subject URI. The resulting integer is not 32 bits. Even though crc32() prepares a checksum using the "input string in 32-bit lengths at a time", the resulting integer value is platform-dependent, which can be seen in the PHP_INT_SIZE and PHP_INT_MAX constants.

SOLUTION:
One possible solution is:

Provide an alternative function to crc32(), which is platform-independent, like http://us3.php.net/manual/en/function.crc32.php#96262
Provide a switch to identify whether ARC2 should use crc32() OR the new platform-independent function
Provide an "update script" to re-write the values using the new platform-independent function, and then mark the switch so the new platform-independent function would be used.

Wrong Escaping in TurtleSerializer

If there is a " in a string, ARC2 will enclose the whole string with ' instead of escaping it the right way.

Code:
Escaping in TurtleSerializer::getTerm() in serializers/ARC2_TurtleSerializer.php ln. 52ff

Right solution:
As you can see in http://www.w3.org/TeamSubmission/turtle/#sec-grammar-grammar strings have to be enclosed with "
[14] literal ::= quotedString ( '@' language )? | datatypeString | integer | double | decimal | boolean
[35] quotedString ::= string | longString
[36] string ::= #x22 scharacter* #x22
Furthermore in http://www.w3.org/TeamSubmission/turtle/#sec-strings you can see, escaping must be '"' (inside string and longString)

DELETE query fails without triple patterns

As far as I can tell, queries like DELETE { ?s ?p ?o } will fail with a MySQL syntax error. The problem seems to be an empty MySQL WHERE clause caused by the lack of any SPARQL WHERE patterns.

If I understood the arguments of ARC2_Store::query() function correctly, the only thing passing through to ARC2_StoreDeleteQueryHandler is the query itself. Assuming that, here is a possible fix: zhangtaihao/arc2@7a73320. It's working in my test environment.

If this is all that's needed, I would like the fix to be done on your end so I can rebase on yours.

Please document the test suite

I believe it might be useful for some to know about the test suite.

AFAICT, ant + phpunit is needed.

Maybe a wiki page could help ?

Thanks in advance.

Triple table does not extend when triple id reaches 16M

All tables in arc2 are defined as mediumint(8). As soon as the store reaches 16M (=mediumint(8)) triples, it should extend the tables to int(10). This is done if the id of the s2val, o2val or id2val table hits the 16M. However in the sourcecode of StoreQueryLoadHandler the table does not consider the case that the id of the triple or g2t table hits this border first. If this is the case, the tables must be extended, too.

The patch can be found at:
Knurg@c0bc1e6

Many thanks to G. Hohmann ([email protected]) for the files and the testing support. This patch is provided by the WissKI-Project (http://www.wiss-ki.eu) - Have a look. :)

SQL syntax error when I use UNION in my sparql query

Hi all,

I'm using arc2 with mysql 5.5.22. But when I use a UNION in my sparql
query it says: "Error: You have an error in your SQL syntax; check the
manual that corresponds to your MySQL server version for the right
syntax to use near 'JOIN arc2store_triple T_0_0_1 ON ( (T_0_0_1.s =
T_0_0_0.o) AND (T_0_0_1.p = ' at line 77 via
ARC2_StoreSelectQueryHandler"

When I used a previous mysql version I didn't have this error. But
when I upgraded the new version i have this error.
My opering system is Ubuntu LTS 12.04

Best regards,

Remote endpoint problem

When I do request from my development machine to my remote endpoint, all is OK
When I do the same request from deployed server to my remote endpoint, I get the following message
Query errorsArray ( [0] => Socket error: Could not connect to "http://137.194.54.188:8890/sparql?query=%0A%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0Aselect+distinct+%3Fp+%3Fo+where+%7B%0AGRAPH+%3Chttp%3A%2F%2Fgivingsense.eu%2Ffrscol%2FFrSchoolSystem%2F%3E+%7B%0A%3Chttp%3A%2F%2Fgivingsense.eu%2Ffrscol%2FFrSchoolSystem%2FPeriod1%3E+%3Fp+%3Fo+.%0A%7D%0A%7D%0A++" (proxy: 0): Connexion terminée par expiration du délai d'attente in ARC2_Reader [1] => missing stream in "getFormat" via ARC2_Reader [2] => missing stream in "readStream" http://137.194.54.188:8890/sparql?query=%0A%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0Aselect+distinct+%3Fp+%3Fo+where+%7B%0AGRAPH+%3Chttp%3A%2F%2Fgivingsense.eu%2Ffrscol%2FFrSchoolSystem%2F%3E+%7B%0A%3Chttp%3A%2F%2Fgivingsense.eu%2Ffrscol%2FFrSchoolSystem%2FPeriod1%3E+%3Fp+%3Fo+.%0A%7D%0A%7D%0A++ via ARC2_Reader )

If I copy/paste in a browser the request included in the message, it works:
http://137.194.54.188:8890/sparql?query=%0A%0APREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX+owl%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0Aselect+distinct+%3Fp+%3Fo+where+%7B%0AGRAPH+%3Chttp%3A%2F%2Fgivingsense.eu%2Ffrscol%2FFrSchoolSystem%2F%3E+%7B%0A%3Chttp%3A%2F%2Fgivingsense.eu%2Ffrscol%2FFrSchoolSystem%2FPeriod1%3E+%3Fp+%3Fo+.%0A%7D%0A%7D%0A++

I'm working on that issue from a while without finding my way
Help will be appreciated

semsol / arc2 Goto Github PK

arc2's People

Contributors

Stargazers

Watchers

Forkers

arc2's Issues

Recommend Projects

Recommend Topics

Recommend Org