Git Product home page Git Product logo

borndigital's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

borndigital's Issues

VIAA MAM schema validation fails

The validation fails and logs the following message.

The schema validation should either be disabled or updated to not throw these errors as often.

2018-12-14 10:40:54,438 [amqpReceiver.01] WARN  org.mule.util.xmlsecurity.DefaultXMLSecureFactories - Can't configure XML entity expansion for Validator (com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl), this could introduce XXE and BL vulnerabilities
2018-12-14 10:40:54,438 [amqpReceiver.01] WARN  org.mule.util.xmlsecurity.DefaultXMLSecureFactories - org.xml.sax.SAXNotRecognizedException: Property 'http://javax.xml.XMLConstants/property/accessExternalStylesheet' is not recognized.
2018-12-14 10:40:54,440 [amqpReceiver.01] INFO  org.mule.api.processor.LoggerMessageProcessor - ERROR: MESSAGE PAYLOAD:

Find CP name for fxp request via organisation_api

In PI_SIP_DELIVERY_GENERIC_ESSENCE, the flowVar destinationPath (for the FXP-request) is being set as such: #["/" + flowVars.cp.toLowerCase() + "/TAPE-SHARE-EVENTS"] (see referenced line below), where the flowVars.cp is the value that was posted in the AMQP-message to trigger borndigital.

The CP-key in the AMQP-message, in turn, is the value for the command-line argument to the specific node-watchfolder process that sent the message to the borndigital.input queue (see documentation here: https://github.com/viaacode/node_watchfolder#arguments).

In anticipation of the switch to a folder structure on OR-id instead of CP-name on MediaHaven's transport-servers (as is already the case on the FTP-servers), borndigital should lookup the cp_name via or-id using the organisations_api, instead of relying on the value for the CP-key in the incoming message (the problem being that a different process posting a message on the borndigital.input queue might fill in a different value in the message's CP-key, which is the case for batch-intake: see here).

<set-variable variableName="destinationPath" value="#[&quot;/&quot; + flowVars.cp.toLowerCase() + &quot;/TAPE-SHARE-EVENTS&quot;]" doc:name="Set destinationPath to /CP(lowercase)/TAPE-SHARE-EVENTS"/>

Borndigital, as well, should be able to make non-existent destination-paths.

When BD transfers a sidecar to the destination host (tra-server), it requires the destination path to exist. If not, it fails (Failed to change working directory to <destination_path>. Ftp error: 550. Type: class java.io.IOException).

The risk for this happening is currently being mitigated by the fact that the FXP service is capable of creating the required destination paths. However, this can only work if FXP is not busy.

As an illustration: BD couldn't create the dir //atv/TAPE-SHARE-EVENTS and keeps retrying. First retry on 2019-08-13 14:26, last retry (nr. 34705!!) on 2019-08-14 09:55 (after manual shutdown), aka, retrying for over 19 hours!.

The current "solution" is sub-optimal, to say the least.

An ticket for this issue exists: https://www.mulesoft.org/jira/browse/MULE-5192.
As per that ticket, it would seem that the latest version of Mule's FTP connector should be able to create directories on the fly: http://www.mulesoft.org/docs/site/current3/apidocs/org/mule/transport/sftp/SftpClient.html#createSftpDirIfNotExists(org.mule.api.endpoint.ImmutableEndpoint,%20java.lang.String)

Other solutions exist as well: http://www.javaroots.com/2014/09/mule-ftp-create-directory-if-not-exist.html.

Relevant code:

<flow name="SidecarFTPGeneric" processingStrategy="synchronous">
<ftp:outbound-endpoint host="${ftpDestinationSidecar.host}" port="21" path="#[flowVars.destinationPath]" user="${ftpDestinationSidecar.username}" password="${ftpDestinationSidecar.password}" outputPattern="#[flowVars.pid + &quot;.xml&quot;]" responseTimeout="1000000" doc:name="FTP" connector-ref="FTP"/>
<catch-exception-strategy doc:name="Catch Exception Strategy">
<logger message="FTP Timeout for sidecar! Retrying" level="INFO" doc:name="Logger"/>
<expression-component doc:name="Sleep"><![CDATA[Thread.sleep(2000);]]></expression-component>
<flow-ref name="SidecarFTPGeneric" doc:name="SidecarFTPGeneric"/>
</catch-exception-strategy>

Implement solution for empty sidecars/essences on FTP

BD chokes on empty sidecars. The error is produced on line:

<set-variable variableName="local_id" value="#[xpath3(&quot;//dc_identifier_localid/text()&quot;)]" doc:name="Set local_id from dc_identifier_localid"/>

however, it should/could be catched earlier, namely right after fetching the sidecar from ftp, here:
<set-variable variableName="incomingXml" value="#[payload]" doc:name="Set incomingXml"/>

--> check for empty payload and alert. (Full stacktrace below)

Update logging: change DefaultRolloverStrategy, SizeBasedTriggeringPolicy and filename

Currently:

  • SizeBasedTriggeringPolicy is set to 10MB
  • the DefaultRolloverStrategy for log rotation in borndigital is set to 10
  • filename pattern is: borndigital-%i.log.

Longer logging retention for such an important flow is helpful, thus, update to, say:

  • 20MB
  • 20 files, and,
  • borndigital.log-%i

Lines:

<RollingFile name="file" fileName="${sys:mule.home}${sys:file.separator}logs${sys:file.separator}borndigital.log"

and further.

XML is autoDeleted

When the sidecar is read, the file is removed.
When the borndigital flow stops then the XML is gone before the package is transfered to the transport server.

Fix by deleting it after the flow is done

Remove file get_cp_id and move outside of BD

Remove the abomination that is get_cp_id.xml. CP_id is set based on CP_name in a if-else structure with 178 choices...

Move to call to organisation_api? Pull in "snapshot" of all cp's via organisations_api?

https://github.com/viaacode/borndigital/blob/1ee3d7cfaad35e3a2763061f4604152d1c1919a3/src/main/app/get_cp_id.xml

Moreover, this sub-flow is, of course, not environment-aware, ie., all CP-id's are PRD values.

This sub-flow, however, apparently is only being called in case of "custom cp's":

  • vrt
  • vlaamsparlement
  • Medialaan
    (->
    <when expression="#[flowVars.input.flow_id.equals(&quot;vrt.video.1&quot;) || flowVars.input.flow_id.equals(&quot;plantentuinmeise&quot;) || flowVars.input.flow_id.equals(&quot;vlaamsparlement&quot;) || flowVars.input.flow_id.equals(&quot;Medialaan&quot;)]">
    <flow-ref name="get_cp_id" doc:name="get_cp_id"/>
    )

Rewrite string-replace functions in XML

In subflow 'metadata_corrections' XML-tags are being string-replaced, for example:

#[flowVars.mappedXml.replaceAll('<trefwoord>', '<Trefwoord>').replaceAll('</trefwoord>', '</Trefwoord>')]

Bad practice:

  • find reason,
  • rewrite, preferably in a transformation processor.

Subflow here:

<sub-flow name="metadata_corrections">
<set-variable variableName="mappedXml" value="#[flowVars.mappedXml.replaceAll('&lt;trefwoord&gt;', '&lt;Trefwoord&gt;').replaceAll('&lt;/trefwoord&gt;', '&lt;/Trefwoord&gt;')]" doc:name="Fix trefwoord -&gt; Trefwoord"/>
<set-variable variableName="mappedXml" value="#[flowVars.mappedXml.replaceAll('&lt;auteursrechthouder&gt;', '&lt;Auteursrechthouder&gt;').replaceAll('&lt;/auteursrechthouder&gt;','&lt;/Auteursrechthouder&gt;')]" doc:name="Fix auteursrechthouder -&gt; Auteursrechthouder"/>
<set-variable variableName="mappedXml" value="#[flowVars.mappedXml.replaceAll('&lt;licentiehouder&gt;', '&lt;Licentiehouder&gt;').replaceAll('&lt;/licentiehouder&gt;','&lt;/Licentiehouder&gt;')]" doc:name="Fix licentiehouder -&gt; Licentiehouder"/>

"CreationDate" has ':' instead of '-' in MH-metadata

Issue already fixed in https://github.com/viaacode/vrt_dailies/commit/c3cb1129e2df49e952272b3281eae0bc51e42ad6.

Offending lines:

<set-variable variableName="CreationDate" value="#[xpath3(&quot;//*[local-name() = 'dcterms_issued']/text()&quot;) != &quot;&quot; ? (xpath3(&quot;//*[local-name() = 'dcterms_issued']/text()&quot;).replaceAll('-', ':').substring(0,10) + &quot; 00:00:00&quot;) : null]" doc:name="Set CreationDate from dcterms_issued (from EDTF to EXIF)"/>

and
<set-variable variableName="CreationDate" value="#[payload.get(&quot;vergadering&quot;).get(&quot;datumbegin&quot;).toString().substring(0,10).replaceAll('-', ':') + &quot; &quot; + payload.get(&quot;vergadering&quot;).get(&quot;datumbegin&quot;).toString().substring(11,19).replaceAll('-',':')]" doc:name="Set CreationDate (MH)"/>

Change log level to DEBUG, OR, change what is being logged (+ change logger description)

Logger:

<logger message="FILE OK: #[message.payload]" level="INFO" doc:name="Logger" />
.

This logger logs the entire XML payload (multiline): useful in debugging mode (ie., TST or QAS) but not in PRD.

So:

  • change level to DEBUG, or,
  • keep level on INFO but map some useful values as string in a one-liner

Anypoint pointer: part of subflow "validateMetadataFromVIAAtoMAM"

Also, update logger description.

Research and fix error "Execution of the expression "payload.pid" failed."

The following error is quite abundant in the borndigital log files:

2018-11-05 14:52:49,781 [amqpReceiver.02] ERROR org.mule.exception.CatchMessagingExceptionStrategy - 
********************************************************************************
Message               : Execution of the expression "payload.pid" failed. (org.mule.api.expression.ExpressionRuntimeException).
Element               : /pollerFlow/processors/2/1/1/0/0/2 @ borndigital-v0.4.6:poller.xml:50
--------------------------------------------------------------------------------
Exception stack is:
Execution of the expression "payload.pid" failed. (org.mule.api.expression.ExpressionRuntimeException). (org.mule.api.MessagingException)
  org.mule.mvel2.integration.impl.ClassImportResolverFactory.getVariableResolver(ClassImportResolverFactory.java:112)
  org.mule.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:40)
  org.mule.mvel2.optimizers.impl.refl.nodes.NullSafe$1.getValue(NullSafe.java:47)
  org.mule.mvel2.optimizers.impl.refl.nodes.NullSafe.getValue(NullSafe.java:62)
  org.mule.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:37)
  org.mule.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:109)
  org.mule.mvel2.MVELRuntime.execute(MVELRuntime.java:86)
  org.mule.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)
  org.mule.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119)
  org.mule.mvel2.MVEL.executeExpression(MVEL.java:953)
  org.mule.el.mvel.MVELExpressionExecutor.execute(MVELExpressionExecutor.java:87)
  org.mule.el.mvel.MVELExpressionLanguage.evaluateInternal(MVELExpressionLanguage.java:228)
  org.mule.el.mvel.MVELExpressionLanguage.evaluate(MVELExpressionLanguage.java:163)
  org.mule.el.mvel.MVELExpressionLanguage.evaluate(MVELExpressionLanguage.java:142)
  org.mule.expression.DefaultExpressionManager.evaluate(DefaultExpressionManager.java:216)
  org.mule.expression.DefaultExpressionManager.evaluate(DefaultExpressionManager.java:187)
  org.mule.module.db.internal.resolver.param.DynamicParamValueResolver.resolveParams(DynamicParamValueResolver.java:42)
  (156 more...)

  (set debug level logging or '-Dmule.verbose.exceptions=true' for everything)
********************************************************************************

2018-11-05 14:52:49,781 [amqpReceiver.02] INFO  org.mule.api.processor.LoggerMessageProcessor - Catch error

Root cause: it seems that, somehow, payload.pid is not there.

The lines "causing" (these lines are, of course, not the cause) this error are:

<db:select config-ref="mediahaven" doc:name="Check pid with Mediahaven monitoring">
<db:parameterized-query><![CDATA[select *
from sips
where external_id = #[payload.pid]]]></db:parameterized-query>
</db:select>
.

--> Research cause and implement solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.