mokko / mpapi Goto Github PK
View Code? Open in Web Editor NEWUnofficial Open-Source Client for MuseumPlus MpRIA Webservice in Python
License: GNU General Public License v3.0
Unofficial Open-Source Client for MuseumPlus MpRIA Webservice in Python
License: GNU General Public License v3.0
I have included tests in pre-commit hook in the meantime, next would be to have more fast non http tests
Frank hätte gerne ISILs für relatedWorks.
Ich könnte diese im Python Schritt in RIA nachschlagen und gleich in LiDO-XML eintragen.
Alternativ müsste ich in MpApi die relatedWorks herunterladen, einen Mechanismus definieren, wie diese von anderen Objektdaten zu unterscheiden sind. Sie sollen ja nicht automatisch veröffentlicht werden. Das würde MpApi Prozess verlangsamen.
try out async IO to speed up big, slow requests.
Should probably be a new project or at least a branch
Needs replacement for requests -> aiohttp
todo
Many records have no public photos.
Try poetry
Issues with multiple nodes. I dont know why I get this. Is it possible that I have the same records multiple times?
Perhaps test it with specially prepared data and the write a select statement that loops only thru distinct records.
split/DE-MUS-019118/211835.lido.xml
split/DE-MUS-019118/211836.lido.xml
Error in xsl:result-document/@href on line 20 column 75 of splitLido.xsl:
XTDE1490 Cannot write more than one result document to the same URI:
file:/C:/m3/zml2lido/sdata/3Wege/split/DE-MUS-019118/211836.lido.xml
In template rule with match="/lido:lidoWrap/lido:lido" on line 15 of splitLido.xsl
invoked by xsl:apply-templates at file:/C:/m3/zml2lido/xsl/splitLido.xsl#12
In template rule with match="/" on line 11 of splitLido.xsl
Cannot write more than one result document to the same URI: file:/C:/m3/zml2lido/sdata/3Wege/split/DE-MUS-019118/211836.lido.xml
Traceback (most recent call last):
File "C:\m3\zml2lido\src\lido.py", line 328, in <module>
getattr(m, args.job)()
File "C:\m3\zml2lido\src\lido.py", line 142, in smb
self.splitLido(input=linklido_fn) # individual records as files
File "C:\m3\zml2lido\src\lido.py", line 235, in splitLido
self._saxon(input=input, xsl=xsl["splitLido"], output="o.xml")
File "C:\m3\zml2lido\src\lido.py", line 303, in _saxon
subprocess.run(
File "subprocess.py", line 528, in run
subprocess.CalledProcessError: Command 'java -Xmx1450m -jar C:\m3\SaxonHE10-5J\saxon-he-10.5.jar -s:C:\m3\zml2lido\sdata\3Wege\3Wege20211129-links.onlyPub.lido.xml -xsl:C:\m3\zml2lido\xsl\splitLido.xsl -o:o.xml' returned non-zero exit status 2.
add a new method
getByApprovalGrp(id=id,
module=module)
instance is what I call a vocabulary. The instance has many nodes. Nodes include one or several terms
instance (=Vokabular)
: type
node
terms
Todo: Find out more about termClasses
thesis: instance
mainly provides type
and termClass
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<instance xmlns="http://www.zetcom.com/ria/ws/vocabulary"
logicalName="ObjIconographyKeywordProjectVgr"
id="61671">
<uuid>3951a563-351d-46c1-b20a-bd565d8e00b3</uuid>
<version>2020-06-28T15:44:46.099Z</version>
<lastModified>2020-06-28T15:44:46.098Z</lastModified>
<lastModifiedUser>zetLS</lastModifiedUser>
<orgUnit logicalName="VocabularyAdministration"/>
<type logicalName="simpleThesaurus"/>
<termClasses>
<termClass logicalName="Untereintrag"/>
<termClass logicalName="Haupteintrag"/>
</termClasses>
</instance>
Thesis: Node
mainly provides id
, hierarchy (parents
) and terms
<node xmlns="http://www.zetcom.com/ria/ws/vocabulary"
logicalName="Europeana-Fashion##Modeobjekte##Visuelle und verbale Kommunikation##analoge Medien##Zeichnung"
id="4254998">
<uuid>a356c4d0-b24a-5a6a-4eaf-e1305df2b6c0</uuid>
<version>2021-12-01T11:59:23.128Z</version>
<lastModified>2021-12-01T11:59:23.124Z</lastModified>
<lastModifiedUser>IFM_FvH</lastModifiedUser>
<orgUnit logicalName="VocabularyAdministration"/>
<status logicalName="valid"/>
<parents>
<parent nodeId="4254966"/>
</parents>
<instance logicalName="ObjIconographyKeywordProjectVgr"
id="61671"/>
<terms>
<term id="4745128">
<uuid>9f1e3769-8bdf-ad28-b039-452caaa25916</uuid>
<version>2020-11-30T16:03:28.682Z</version>
<lastModified>2020-11-30T16:03:28.682Z</lastModified>
<lastModifiedUser>MPRiaImporter</lastModifiedUser>
<isoLanguageCode>de</isoLanguageCode>
<content>Zeichnung</content>
<order>1</order>
<status logicalName="valid"/>
<category logicalName="preferred"/>
</term>
</terms>
</node>
```
Thoughts on a possible solution
1st request: ask how many results
2nd request: get the first package
Nth request: repeat until all results
E.g. size=3333
For now package size will be a global variable in SAR.
New method: PaginatedSearch with same interface as search
Currently we save all info from one export in one big xml file. That doesn't scale well. At about 1GB I can't process xml anymore on a regular laptop. So there is currently a limit to how big a package can be and it is about 10000 records.
I could change the output format to a split format where every record is saved in its own file obj123456.xml, pk1234567.xml etc.
While currently we have a multi-entity file as output, the new system would save atomic records as files.
Then all consumers have to be rewritten. Currently only two: SHF/npx, Lido.
Consumers will then need to read multiple xml files in xslt. Lido and npx are more compact, so we could package way more records in one GB, but in principle there is still a size limit.
Another alternative is that we (simply) introduce pagination, i.e. download 1000 or so records at a time. This is also required by the atomic file system format, so independent of the choice between atomic and package solution.
We would then continue to have multi entity files. Download might get considerably slower if MpApi has to request individual files. But that might be avoidable.
So in reality we want a configurable size for the packages, but as a first attempt I will assume the package size doesn't exceed 1000.
Transparently try out existing code points. Goal is to find fastest way of getting terms associated with a set of objects
To this end, I will write a new test.
Bei längeren Prozessen wäre es hilfreich wenn der Status angezeigt würde 15% von 100 ... egal was oder wie.
Hauptsache es wird deutlich das etwas geschieht....
Bei einer längeren Wartezeit (>5 Minuten) wächst die Unsicherheit ob sich nicht das Programm aufgehängt hat....
<moduleReference name="ObjOwnerRef" targetModule="Address" multiplicity="N:1" size="1">
<moduleReferenceItem moduleItemId="165950">
<formattedValue language="en">Museum für Asiatische Kunst, Staatliche Museen zu Berlin</formattedValue>
</moduleReferenceItem>
</moduleReference>
Command line utility that runs tests against zml(=mpx) data files. It tests if there is Sachbegriff, Objekttyp, verwaltende Institution etc.
Better support of type hints
i suspect short requests like a getItem profit from session more than long requests, so it won't help me much atm. That's why I didn't do it earlier.
practically, recherche.smb only shows freigebene multimedia resources if the last object has been changed after the freigabe on resource level has been granted. So let's write a replacer plugin that finds exactly those records and changes them.
Todo: implement a new method in Module that expects downloaded xml and returns a "cleaned version" suitable for upload.
Background: sometimes or always Ria doesn't accept the xml that it spits out for renewed upload.
Clean_for_upload() uses side-effect, I.e works on internal xml.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.