Comments (9)
I got something done.
The "'float' object does not have attribute 'replace'" was, as I suspected, somewhere in Python (because that's a Python error message, but of course you can never be 100% sure...), and it wasn't caused by the unimplemented #coordinates:
local maplink = mw.getCurrentFrame():extensionTag{
name = 'maplink',
content = mw.text.jsonEncode( jsonParams ),
args = {
text = displaycoords[1] .. ", " .. displaycoords[2],
zoom = zoom( extraparams ) or default_zoom,
latitude = decLat,
longitude = decLong,
}
}
This calls our implementation of frame.extensionTag, which creates something that looks like an HTML tag but isn't (in this case <maplink>
, which the parser will complain about too), and it passes a Lua table of args that has floats (and an int) in it that crashes when html.escape
tries to handle it. That's where the .replace()
call was!!
This is a bug fix, but doesn't fix this issue; #coordinates is still unimplemented (I just had it return an error message for my testing), <maplink>
(and <mapframe>
) from https://www.mediawiki.org/wiki/Help:Extension:Kartographer/fi are not allowed (or implemented), etc.
from wikitextprocessor.
Yeah, this is just something that was never implemented because Wiktionary parsing didn't really need it. Looking at all the parameters in {{coord|15.8134|47.7905|type:country_region:YE_dim:1150000_source:dewiki|format=dms|display=title}}
, it's not wonder Tatu didn't want to do it, that seems like an annoying bit to get correct.
Lower on the page we also see {{coord|15|21|18|N|44|12|25.2|E|type:city}}
, which looks like a completely different kind of signature.
from wikitextprocessor.
After looking at this, "coord" can have a bunch of parameters, but it's not insurmountable. Should check if Python has some standard library coordinate conversion stuff...
https://en.wikipedia.org/wiki/Template:Coord
- Read the coordinates and make conversions (if the conversion flag is used).
- Implement
display=inline
anddisplay=inline,title
: thecoord
template either creates some text in the main text ('inline'), or adds a coordinate on the top of the page ('title'), or both, and in this case we will just ignore the title. Getting coordinate data for a Wikipedia page is an implementation that should be left to a Wikipedia-equivalent to Wiktextract, using a template parsing middle-man function passed into the code, like how we handle specific templates in Wiktextract. - Implement
format=
to switch between which format to display (converting the input if needed) - Implement
notes=
, that's just text added after the coordinates. - Ignore
name=
andqid=
, this is stuff related to infoboxes and maps, etc. that should be implemented in *Wikipedia-extract - Ignore
coordinate parameters
(not template parameters like above). This seems like a bunch of metadata meant to WikiMaps etc., which should be implemented by *Wikipedia-extract.
from wikitextprocessor.
Are you sure MediaWiki has the #coordinates
parser function? The parser function doc page doesn't have it: https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions
The Modèle:Coord template is trying to call the Coordinates Lua module.
from wikitextprocessor.
#coordinates is from the extensions Maps or GeoData. I didn't even realize we might have to handle extensions... But this helps, if the parser function #coordinates is simpler than implementing the whole Coord module.
from wikitextprocessor.
I should read the code more carefully... the #coordinates
parser function is called in the "Coordinates" Lua module.
And here is the #coordinates
parser function document: https://www.mediawiki.org/wiki/Extension:GeoData#Parser_function and the GeoHack parameters document: https://www.mediawiki.org/wiki/GeoHack
The GeoData extension is not installed on Wiktionary sites and it seems also not used to create page text on Wikipedia site, we might better return an empty string or an error message.
from wikitextprocessor.
At this moment, I don't think the fault lies with #coordinates being called. While, yeah, the Unimplemented error is caused by #coordinates, but the python error with a float lacking a replace-attribute is something else, and I can't figure out what. Everywhere in our code that we use .replace(), it should logically be a string; other string methods are called on it before replace, etc. etc. Currently I think I might just need to try to bisect where the location of the miscalled .replace() is by... Well, prints don't work, so asserts??
We need to implement #coordinates just to get some data out of it (or it might be that it just returns "succeeded" or "failed", but it does return a string... but the stated main purpose of #coordinates is to store GeoData somewhere else).
from wikitextprocessor.
I am the dumbest.
I left #coordinates languishing as an function that returns an error or an empty string, just to look at the things around it first... Sure, it has to return something, but we can handle that later, right?
Well, the Coordinates module I was testing it with (and the parameters given to it) wasn't actually using the path that the #coordinates call was in, so for the longest time it wasn't even working.
It seemed like that the code wasn't actually doing anything with the #coordinates return value, except to check for 'error' in the string, but that's probably nothing, right? Surely the return value is otherwise meaningful.
So I finally, finally decide to look deeper into it...
AND IT REALLY JUST WAS A DATABASE SAVING FUNCTION. It takes coordinate data and saves it somewhere else. It returns and empty string or a mediawiki error string!! Our complete parser function implementation is complete with just one line:
return ""
https://en.wiktionary.org/wiki/LOL#English
from wikitextprocessor.
from typing import Optional
from unittest import TestCase
from wikitextprocessor import Wtp, WikiNode
from wiktextract.page import clean_node
from wikitextprocessor.parser import NodeKind, print_tree
from wiktextract.wxr_context import WiktextractContext
from wiktextract.config import WiktionaryConfig
class TestCoord(TestCase):
def setUp(self):
extension_tags = {
"maplink": {"parents": ["phrasing"], "content": ["phrasing"]},
}
self.wxr = WiktextractContext(
wtp=Wtp(
db_path="/home/kristian/Data/htmlgen/frwp/frwp-wikt.db",
lang_code="fr",
project="wikipedia",
extension_tags=extension_tags,
),
config=WiktionaryConfig(),
)
def tearDown(self):
self.wxr.wtp.close_db_conn()
def test_coord(self):
def maplink_hanlder_fn(node: WikiNode) -> Optional[str]:
if node.kind == NodeKind.HTML and node.sarg == "maplink":
return "MAPLINK RETURN"
return None
self.wxr.wtp.add_page(
"Modèle:testcoord",
10,
body="""<includeonly>{{#Invoke:Coordinates |coord|{{{1|}}}|{{{2|}}}|{{{3|}}}|{{{4|}}}|{{{5|}}}|{{{6|}}}|{{{7|}}}|{{{8|}}}|{{{9|}}}| format = {{{format|}}} | name = {{{name|}}} | display = {{{display|}}} }}<nowiki /></includeonly><noinclude>{{Documentation}}</noinclude>""",
)
self.wxr.wtp.start_page("Test")
tree = self.wxr.wtp.parse(
text="{{testcoord|15.8134|47.7905|type:country_region:YE_dim:1150000_source:dewiki|format=dms|display=inline}}",
expand_all=True,
)
print(tree)
print_tree(tree)
text = clean_node(
wxr=self.wxr,
sense_data={},
wikinode=tree,
node_handler_fn=maplink_hanlder_fn,
)
self.assertEqual(text, "MAPLINK RETURN")
This should now work with a recent commit: extension tags are now supported with extension_tags= for Wtp(), and we already could handle arbitrary nodes in clean_node with node_handler_fn. In this case, just for the purposes of demonstration, I've handled all <maplink>elements</maplink>
so that they return a string; the return could be a list of..., or a string or a WikiNode, or None to signal that there was no handling done and to process the node as usual. Usually, a maplink
would not return anything inside the text itself, but would be transformed into it's own map box on the side.
#coordinates does nothing. 👍
from wikitextprocessor.
Related Issues (20)
- Shouldn't `<gallery ...>` Tag be filtered by clean_value ? HOT 8
- mediawiki_languagecodes.get_all_names should return all possible language codes as keys, even if they would have an empty string value? HOT 1
- Template {{refnec|....}} is misinterpreted? HOT 2
- Avoiding magic numbers for Wiki namespace ids HOT 2
- Square brackets around a quotation block breaks parsing HOT 1
- Strip newline character at the end of unnamed template arguments HOT 3
- ERROR: LUA error in #invoke('Biblio', 'lienWeb') HOT 2
- WARNING: unrecognized time syntax
- LUA error in #invoke('Bandeau', 'bandeau') HOT 3
- EVOL: Store ID Page in SQLite database file. HOT 3
- WARNing "unrecognized time syntax in #time ..." HOT 2
- ERROR: unimplemented parserfn #property HOT 6
- ERROR: unimplemented parserfn PAGESIZE HOT 5
- ERROR: unimplemented parserfn filepath HOT 7
- Template {{Voir homonymes|....}} is misinterpreted HOT 21
- Checklist-1 for existing errors. HOT 59
- Infinite loop during `clean_node()` HOT 1
- Parasitic display "Uri:parse unexpected stuff at end:" HOT 3
- Presence of spurious text. HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wikitextprocessor.