Git Product home page Git Product logo

fcrepo-message-consumer's Introduction

This project is no longer maintained

Users are encouraged to make use of the fcrepo-camel-toolbox for similar functionality.

fcrepo-message-consumer

This is a fcrepo 4.x indexer that listens to the Fedora JMS topic, retrieves a message including pid and eventType, looks up object properties, gets and passes the transformed or untransformed properties on to any number of registered handlers. It is built relying heavily on Spring machinery, including:

  • spring-lang
  • spring-jms
  • activemq-spring (?)

Running the indexer

In the simplest case, the indexer can be configured in the same container as the repository. See kitchen-sink/fuseki for an example of this configuration.

For production deployment, it is more typical to run the indexer on a separate machine. So we also have a stand-alone mode where the indexer is run as its own webapp:

$ git clone https://github.com/futures/fcrepo-jms-indexer-pluggable.git
$ cd fcrepo-jms-indexer-pluggable/fcrepo-jms-indexer-webapp
$ mvn -D jetty.port=9999 install jetty:run

Configuring the indexer

Test Spring Configuration

Production Spring Configuration

indexer-core.xml

  <!-- sparql-update indexer -->
  <bean id="sparqlUpdate" class="org.fcrepo.indexer.SparqlIndexer">
    <!-- base URL for triplestore subjects, PID will be appended -->
    <property name="prefix" value="http://localhost:${fcrepo.dynamic.test.port:8080}/rest/objects/"/>

    <!-- fuseki (used by tests) -->
    <property name="queryBase" value="http://localhost:3030/test/query"/>
    <property name="updateBase" value="http://localhost:3030/test/update"/>
    <property name="formUpdates">
      <value type="java.lang.Boolean">false</value>
    </property>

    <!-- sesame -->
    <!--
    <property name="queryBase" value="http://localhost:8080/openrdf-sesame/repositories/test"/>
    <property name="updateBase" value="http://localhost:8080/openrdf-sesame/repositories/test/statements"/>
    <property name="formUpdates">
      <value type="java.lang.Boolean">true</value>
    </property>
    -->
  </bean>
  
  <!--Embedded Server used in spring-test -->
  <!--
  
  <bean id="multiCore" class="org.apache.solr.core.CoreContainer"
    factory-method="createAndLoad" c:solrHome="target/test-classes/solr"
    c:configFile-ref="solrConfig"/>
    
  <bean class="java.io.File" id="solrConfig">
    <constructor-arg type="String">
      <value>target/test-classes/solr/solr.xml</value>
    </constructor-arg>
  </bean>

  <bean id="solrServer"
    class="org.apache.solr.client.solrj.embedded.EmbeddedSolrServer"
    c:coreContainer-ref="multiCore" c:coreName="testCore"/>
    -->
  <!-- end Embedded Server-->
  
  <!--Standardalone solr Server -->
  <bean id="solrServer" class="org.apache.solr.client.solrj.impl.HttpSolrServer">
    <constructor-arg index="0" value="http://${fcrepo.host:localhost}:${solrIndexer.port:8983}/solr/" />
  </bean>
  
  <!-- Solr Indexer START-->
    <bean id="solrIndexer" class="org.fcrepo.indexer.solr.SolrIndexer">
    <constructor-arg ref="solrServer" />
    </bean>

  <!-- file serializer -->
  <bean id="fileSerializer" class="org.fcrepo.indexer.FileSerializer">
    <property name="path" value="./target/test-classes/fileSerializer/"/>
  </bean>

  <!-- Message Driven POJO (MDP) that manages individual indexers -->
  <bean id="indexerGroup" class="org.fcrepo.indexer.IndexerGroup">
    <property name="repositoryURL" value="http://localhost:${fcrepo.dynamic.test.port:8080}/rest/objects/" />
    <property name="indexers">
      <set>
        <ref bean="sparqlUpdate"/>
        <ref bean="solrIndexer"/>
        <ref bean="fileSerializer"/>
      </set>
    </property>
  </bean>
  <!--end indexer-core.xml-->

Here 3 indexers are implemented, sparqlUpdate writing to an as configured fuseki triplestore, solrIndexer writing to an as configured standalone solr instance, and fileSerializer writing to an arbitrary path.

indexer-events.xml

  <bean id="connectionFactory"
    class="org.apache.activemq.ActiveMQConnectionFactory">
    <property name="brokerURL" value="vm://localhost"/>
  </bean>

  <bean id="pooledConnectionFactory"
    class="org.apache.activemq.pool.PooledConnectionFactory"
    depends-on="connectionFactory">
    <property name="connectionFactory" ref="connectionFactory"/>
    <property name="maxConnections" value="1"/>
    <property name="idleTimeout" value="0"/>
  </bean>
  
  <!-- ActiveMQ queue to listen for events -->
  <bean id="destination" class="org.apache.activemq.command.ActiveMQTopic">
    <constructor-arg value="fedora" />
  </bean>

  <!-- and this is the message listener container -->
  <bean id="jmsContainer" class="org.springframework.jms.listener.DefaultMessageListenerContainer"
    depends-on="destination, pooledConnectionFactory">
    <property name="connectionFactory" ref="connectionFactory"/>
    <property name="destination" ref="destination"/>
    <property name="messageListener" ref="indexerGroup" />
    <property name="sessionTransacted" value="true"/>
  </bean>

The magic is in the jmsContainer bean. It listens to the destination for messages, and pass them onto our messageListener. The messageListener retrieves the Fedora object from the repo (for adds/updates) and passes the pid and content to each indexer class defined in the indexers set.

Dependencies

Currently, the tests work with either Jena Fuseki or Sesame triplestores/SPARQL servers. To switch between them, edit src/test/resources/spring-test/indexer-core.xml.

Fuseki

Fuseki is the easiest to setup -- just download it from http://www.apache.org/dist/jena/binaries/, unpack and start fuseki-server:

curl -O http://www.apache.org/dist/jena/binaries/jena-fuseki-0.2.7-distribution.tar.gz
tar xvfz jena-fuseki-0.2.7-distribution.tar.gz
cd jena-fuseki-0.2.7
./fuseki-server --update --mem /test

Sesame

Sesame requires a little more setup to run with the tests, since by default it uses the same port as Fedora. To setup Sesame with Tomcat running on an alternate port:

  • Download Sesame from http://sourceforge.net/projects/sesame/files/Sesame%202/

  • Download Tomcat from http://tomcat.apache.org/download-70.cgi* Unpack Sesame and Tomcat, and move the Sesame WAR file into the Tomcat webapps directory

  • Change the Tomcat port to something other than 8080 to avoid conflict with Fedora, and then start Tomcat.

  • Use the Sesame console to create a repository

    curl -L -O http://downloads.sourceforge.net/project/sesame/Sesame%202/2.7.5/openrdf-sesame-2.7.5-sdk.tar.gz
    curl -O http://www.apache.org/dist/tomcat/tomcat-7/v7.0.42/bin/apache-tomcat-7.0.42.tar.gz
    tar xvfz apache-tomcat-7.0.42.tar.gz
    tar xvfz openrdf-sesame-2.7.5-sdk.tar.gz
    cp openrdf-sesame-2.7.5/war/openrdf-sesame.war apache-tomcat-7.0.42/webapps/
    cat apache-tomcat-7.0.42/conf/server.xml | sed -e's/8080/${tomcat.port}/' > tmp.xml
    mv tmp.xml apache-tomcat-7.0.42/conf/server.xml
    export CATALINA_HOME=`pwd`/apache-tomcat-7.0.42
    export JAVA_OPTS="$JAVA_OPTS -Dtomcat.port=8081"
    apache-tomcat-7.0.42/bin/startup.sh
    openrdf-sesame-2.7.5/bin/console.sh
    > connect http://localhost:8081/openrdf-sesame.
    > create native.
    Repository ID [native]: test
    Repository title [Native store]: test
    Triple indexes [spoc,posc]: spoc,posc
    > quit.

Solr

Solr can be installed embedded into a jetty server (recommended for test) or in a tomcat container (recommended for production). Download install and configuration are here: https://cwiki.apache.org/confluence/display/solr/Getting+Started

Maven Build

Use the following MAVEN_OPTS on build

MAVEN_OPTS="-Xmx750M -XX:MaxPermSize=300M" mvn clean install

Caveat: Blank Nodes

Fedora doesn't currently support blank nodes.

Authenticated repo

If REST calls to your Fedora repository require BASIC authentication, you'll need to set two system variables in your servlet container, fcrepo.username and fcrepo.password. In Jetty/Maven 3, you can set some values in your settings.xml file that will later be set to these two system variables:

<profiles>
  <profile>
    <id>fcrepo</id>
    <activation>
      <activeByDefault>true</activeByDefault>
    </activation>
    <properties>
      <fcrepo.username>example</fcrepo.username>
      <fcrepo.password>xxxxxxxx</fcrepo.password>
    </properties>
  </profile>
</profiles>

In Tomcat 7 you can set the following command line options in your conf/setenv.sh file:

JAVA_OPTS="$JAVA_OPTS -Dfcrepo.username=example -Dfcrepo.password=xxxxxxxx "

fcrepo-message-consumer's People

Contributors

acoburn avatar ajs6f avatar cbeer avatar claussni avatar daines avatar escowles avatar giuliah avatar ksclarke avatar lsitu avatar mikedurbin avatar nianma avatar nikhiltri avatar osmandin avatar ruebot avatar yulgit1 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fcrepo-message-consumer's Issues

Root element is not indexed.

Even if the root element has rdf:type โ†’ http://fedora.info/definitions/v4/indexing#indexable, it is not indexed by the SPARQL indexer.

To reproduce:

indexing.rdf

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX indexing: <http://fedora.info/definitions/v4/indexing#>

DELETE { }
INSERT {
  <> rdf:type indexing:indexable .
}
WHERE { }

title.rdf

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX indexing: <http://fedora.info/definitions/v4/indexing#>

DELETE { }
INSERT {
  <> dc:title "Test Title" . 
}
WHERE { }

On the command line:

curl -X PATCH -H "Content-Type: application/sparql-update" --data-binary "@indexing.rdf" "http://localhost:8080/fcrepo/rest"
curl -X PATCH -H "Content-Type: application/sparql-update" --data-binary "@title.rdf" "http://localhost:8080/fcrepo/rest"

The root element is not added to the external indexer.

Thank you!

"Connection refused" between Message Consumer and Sesame

Message Consumer is throwing "Connection Refused" errors in catalina.out

Here's a snippet of indexer-core.xml

<!-- sparql-update indexer -->
  <bean id="sparqlUpdate" class="org.fcrepo.indexer.sparql.SparqlIndexer">
    <!-- sesame -->
    <property name="queryBase" value="http://localhost:8080/openrdf-sesame/repositories/test"/>
    <property name="updateBase" value="http://localhost:8080/openrdf-sesame/repositories/test/statements"/>
    <property name="formUpdates">
      <value type="java.lang.Boolean">true</value>
    </property>
  </bean>

http://localhost:8080/openrdf-sesame/repositories/test is correct in my setup. I can use OpenRDF Workbench to add statements to Sesame.

Here's the error:

INFO 23:05:53.261 (FedoraLdp) GET resource '1'
INFO 23:06:05.232 (FedoraLdp) PATCH for '1'
DEBUG 23:06:05.296 (IndexerGroup) Received message: ID:d8-f4-dev-50727-1418251743674-3:1:1:1:5
DEBUG 23:06:05.296 (IndexerGroup) Discovered id: /1 in message.
DEBUG 23:06:05.296 (IndexerGroup) Discovered event type: http://fedora.info/definitions/v4/repository#PROPERTY_ADDED,http://fedora.info/    definitions/v4/repository#PROPERTY_CHANGED in message.
DEBUG 23:06:05.296 (IndexerGroup) Discovered baseURL: http://localhost:38080/fcrepo/rest/ in message.
DEBUG 23:06:05.296 (IndexerGroup) Discovered properties: http://www.jcp.org/jcr/1.0lastModifiedBy,http://www.jcp.org/jcr/1.0lastModified,    http://purl.org/dc/elements/1.1/creator in message.
DEBUG 23:06:05.296 (IndexerGroup) It is false that this is a removal operation.
WARN 23:06:05.299 (DefaultMessageListenerContainer) Execution of JMS message listener failed, and no ErrorHandler has been set.
java.lang.RuntimeException: java.net.ConnectException: Connection refused
    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-18.0.jar:na]
    at org.fcrepo.indexer.RdfRetriever.get(RdfRetriever.java:107) ~[fcrepo-message-consumer-core-4.0.0.jar:na]
    at org.fcrepo.indexer.RdfRetriever.get(RdfRetriever.java:51) ~[fcrepo-message-consumer-core-4.0.0.jar:na]
    at com.google.common.base.Suppliers$MemoizingSupplier.get(Suppliers.java:125) ~[guava-18.0.jar:na]
    at org.fcrepo.indexer.IndexerGroup.index(IndexerGroup.java:279) ~[fcrepo-message-consumer-core-4.0.0.jar:na]
    at org.fcrepo.indexer.IndexerGroup.onMessage(IndexerGroup.java:255) ~[fcrepo-message-consumer-core-4.0.0.jar:na]
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:685) ~[spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:623) ~[spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:591) ~[spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:308) [spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:246) [spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1142) [spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1134) [spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1031) [spring-jms-4.1.1.RELEASE.jar:4.1.1.RELEASE]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.7.0_71]
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) ~[na:1.7.0_71]
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) ~[na:1.7.0_71]
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) ~[na:1.7.0_71]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.7.0_71]
    at java.net.Socket.connect(Socket.java:579) ~[na:1.7.0_71]
    at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:117) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) ~[httpclient-4.3.3.jar:4.3.3]
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) ~[httpclient-4.3.3.jar:4.3.3]
    at org.fcrepo.indexer.RdfRetriever.get(RdfRetriever.java:76) ~[fcrepo-message-consumer-core-4.0.0.jar:na]
    ... 13 common frames omitted

connections not closing?

We're investigating the use of fedora with this service to index in solr and/or a triplestore. For now just solr.

For my development set up I have fedora, solr, and fcrepo-message-consumer all running in different instances of jetty on OSX 10.10.

I do a small ingest of 5000 items into fedora including the rdf to have them indexed.

Prior to running the ingest lsof | grep CLOSE_WAIT | wc shows about 10-20 lines.

I run the ingest. It completes successfully and the slower solr indexing continues for some time afterward. Repeating the above shows the CLOSE_WAIT count creeping up a little behind but in step with the number of items indexed in solr.

When the solr indexing is completed there are 5020 lines in CLOSE_WAIT, suggesting that there is one for each item that has been solr indexed.

This does not go down with time.

Killing fedora and solr it remains the same.

Killing fcrepo-message-consumer immediately drops it back down to the 20 range or so.

Initially I had tried a larger (10000 item) ingest, but this failed (on too many open files).

Another test with the solr connector off but the file system connectors still on showed the same behavior; the ingest into fedora completes and the number of connections open continues to rise as the messages are picked up and the rdf is serialized to the filesystem. I had desired to run another test with no serializers, but for some reason I don't seem to be able to get mvn to build the project anymore. Not sure what that might be and don't want to figure it out right now.

I don't know if there is something I am doing wrong here or if it is an issue with fcrepo-message-consumer, but best to report it in case there is something amiss.

This is with git commit 5d6d9dc

First paragraph of readme should have example

Something like "For example, writing Fedora Documents into the Solr search engine or a Fuseki Triple store". The line that says it is built on Spring components is largely irrelevant to a user and can be relegated to a less prominent place in the readme.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.