not sure if I can ask questions here, but I got stuck on this when I try to download a TMX from the OPUS JW300 set. It has nothing to do with the set, I think.
Alignment file /proj/nlpl/data/OPUS/JW300/latest/xml/en-ta.xml.gz not found. The following files are available for downloading:
8 MB https://object.pouta.csc.fi/OPUS-JW300/v1/xml/en-ta.xml.gz
263 MB https://object.pouta.csc.fi/OPUS-JW300/v1/xml/en.zip
94 MB https://object.pouta.csc.fi/OPUS-JW300/v1/xml/ta.zip
365 MB Total size
Downloading 3 file(s) with the total size of 365 MB. Continue? (y/n) y
JW300_latest_xml_en-ta.xml.gz ... 100% of 8 MB
JW300_latest_xml_en.zip ... 100% of 263 MB
JW300_latest_xml_ta.zip ... 100% of 94 MB
Traceback (most recent call last):
File "your_script.py", line 3, in <module>
opus_reader.printPairs()
File "C:\Users\gertv\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\opustools_pkg\opus_read.py", line 350, in printPairs
lastline = self.readAlignment(gzipAlign)
File "C:\Users\gertv\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\opustools_pkg\opus_read.py", line 308, in readAlignment
lastline = self.outputPair(self.par, line)[1]
File "C:\Users\gertv\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\opustools_pkg\opus_read.py", line 251, in outputPair
self.sendPairOutput(wpair)
File "C:\Users\gertv\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\opustools_pkg\opus_read.py", line 210, in sendPairOutput
self.resultfile.write(wpair[0])
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1264.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 83-89: character maps to <undefined>
I have no idea how to fix this. Your help is highly appreciated.