Git Product home page Git Product logo

Comments (9)

kristian-clausal avatar kristian-clausal commented on June 13, 2024 1

I got something done.

The "'float' object does not have attribute 'replace'" was, as I suspected, somewhere in Python (because that's a Python error message, but of course you can never be 100% sure...), and it wasn't caused by the unimplemented #coordinates:

	local maplink = mw.getCurrentFrame():extensionTag{
		name = 'maplink',
		content = mw.text.jsonEncode( jsonParams ),
		args = {
			text = displaycoords[1] .. ", " .. displaycoords[2],
			zoom = zoom( extraparams ) or default_zoom,
			latitude = decLat,
			longitude = decLong,
		}
	}

This calls our implementation of frame.extensionTag, which creates something that looks like an HTML tag but isn't (in this case <maplink>, which the parser will complain about too), and it passes a Lua table of args that has floats (and an int) in it that crashes when html.escape tries to handle it. That's where the .replace() call was!!

This is a bug fix, but doesn't fix this issue; #coordinates is still unimplemented (I just had it return an error message for my testing), <maplink> (and <mapframe>) from https://www.mediawiki.org/wiki/Help:Extension:Kartographer/fi are not allowed (or implemented), etc.

from wikitextprocessor.

kristian-clausal avatar kristian-clausal commented on June 13, 2024

Yeah, this is just something that was never implemented because Wiktionary parsing didn't really need it. Looking at all the parameters in {{coord|15.8134|47.7905|type:country_region:YE_dim:1150000_source:dewiki|format=dms|display=title}}, it's not wonder Tatu didn't want to do it, that seems like an annoying bit to get correct.

Lower on the page we also see {{coord|15|21|18|N|44|12|25.2|E|type:city}}, which looks like a completely different kind of signature.

from wikitextprocessor.

kristian-clausal avatar kristian-clausal commented on June 13, 2024

After looking at this, "coord" can have a bunch of parameters, but it's not insurmountable. Should check if Python has some standard library coordinate conversion stuff...

https://en.wikipedia.org/wiki/Template:Coord

  • Read the coordinates and make conversions (if the conversion flag is used).
  • Implement display=inline and display=inline,title: the coord template either creates some text in the main text ('inline'), or adds a coordinate on the top of the page ('title'), or both, and in this case we will just ignore the title. Getting coordinate data for a Wikipedia page is an implementation that should be left to a Wikipedia-equivalent to Wiktextract, using a template parsing middle-man function passed into the code, like how we handle specific templates in Wiktextract.
  • Implement format= to switch between which format to display (converting the input if needed)
  • Implement notes=, that's just text added after the coordinates.
  • Ignore name= and qid=, this is stuff related to infoboxes and maps, etc. that should be implemented in *Wikipedia-extract
  • Ignore coordinate parameters (not template parameters like above). This seems like a bunch of metadata meant to WikiMaps etc., which should be implemented by *Wikipedia-extract.

from wikitextprocessor.

xxyzz avatar xxyzz commented on June 13, 2024

Are you sure MediaWiki has the #coordinates parser function? The parser function doc page doesn't have it: https://www.mediawiki.org/wiki/Help:Extension:ParserFunctions

The Modèle:Coord template is trying to call the Coordinates Lua module.

from wikitextprocessor.

kristian-clausal avatar kristian-clausal commented on June 13, 2024

#coordinates is from the extensions Maps or GeoData. I didn't even realize we might have to handle extensions... But this helps, if the parser function #coordinates is simpler than implementing the whole Coord module.

from wikitextprocessor.

xxyzz avatar xxyzz commented on June 13, 2024

I should read the code more carefully... the #coordinates parser function is called in the "Coordinates" Lua module.

And here is the #coordinates parser function document: https://www.mediawiki.org/wiki/Extension:GeoData#Parser_function and the GeoHack parameters document: https://www.mediawiki.org/wiki/GeoHack

The GeoData extension is not installed on Wiktionary sites and it seems also not used to create page text on Wikipedia site, we might better return an empty string or an error message.

from wikitextprocessor.

kristian-clausal avatar kristian-clausal commented on June 13, 2024

At this moment, I don't think the fault lies with #coordinates being called. While, yeah, the Unimplemented error is caused by #coordinates, but the python error with a float lacking a replace-attribute is something else, and I can't figure out what. Everywhere in our code that we use .replace(), it should logically be a string; other string methods are called on it before replace, etc. etc. Currently I think I might just need to try to bisect where the location of the miscalled .replace() is by... Well, prints don't work, so asserts??

We need to implement #coordinates just to get some data out of it (or it might be that it just returns "succeeded" or "failed", but it does return a string... but the stated main purpose of #coordinates is to store GeoData somewhere else).

from wikitextprocessor.

kristian-clausal avatar kristian-clausal commented on June 13, 2024

I am the dumbest.

I left #coordinates languishing as an function that returns an error or an empty string, just to look at the things around it first... Sure, it has to return something, but we can handle that later, right?

Well, the Coordinates module I was testing it with (and the parameters given to it) wasn't actually using the path that the #coordinates call was in, so for the longest time it wasn't even working.

It seemed like that the code wasn't actually doing anything with the #coordinates return value, except to check for 'error' in the string, but that's probably nothing, right? Surely the return value is otherwise meaningful.

So I finally, finally decide to look deeper into it...

AND IT REALLY JUST WAS A DATABASE SAVING FUNCTION. It takes coordinate data and saves it somewhere else. It returns and empty string or a mediawiki error string!! Our complete parser function implementation is complete with just one line:

    return ""

https://en.wiktionary.org/wiki/LOL#English

from wikitextprocessor.

kristian-clausal avatar kristian-clausal commented on June 13, 2024
from typing import Optional
from unittest import TestCase

from wikitextprocessor import Wtp, WikiNode
from wiktextract.page import clean_node
from wikitextprocessor.parser import NodeKind, print_tree
from wiktextract.wxr_context import WiktextractContext
from wiktextract.config import WiktionaryConfig


class TestCoord(TestCase):
    def setUp(self):
        extension_tags = {
            "maplink": {"parents": ["phrasing"], "content": ["phrasing"]},
        }
        self.wxr = WiktextractContext(
            wtp=Wtp(
                db_path="/home/kristian/Data/htmlgen/frwp/frwp-wikt.db",
                lang_code="fr",
                project="wikipedia",
                extension_tags=extension_tags,
            ),
            config=WiktionaryConfig(),
        )

    def tearDown(self):
        self.wxr.wtp.close_db_conn()

    def test_coord(self):
        def maplink_hanlder_fn(node: WikiNode) -> Optional[str]:
        	if node.kind == NodeKind.HTML and node.sarg == "maplink":
        		return "MAPLINK RETURN"
        	return None

        self.wxr.wtp.add_page(
            "Modèle:testcoord",
            10,
            body="""<includeonly>{{#Invoke:Coordinates |coord|{{{1|}}}|{{{2|}}}|{{{3|}}}|{{{4|}}}|{{{5|}}}|{{{6|}}}|{{{7|}}}|{{{8|}}}|{{{9|}}}| format = {{{format|}}} | name = {{{name|}}} | display = {{{display|}}} }}<nowiki /></includeonly><noinclude>{{Documentation}}</noinclude>""",
        )
        self.wxr.wtp.start_page("Test")

        tree = self.wxr.wtp.parse(
            text="{{testcoord|15.8134|47.7905|type:country_region:YE_dim:1150000_source:dewiki|format=dms|display=inline}}",
            expand_all=True,
        )
        print(tree)
        print_tree(tree)
        text = clean_node(
            wxr=self.wxr,
            sense_data={},
            wikinode=tree,
            node_handler_fn=maplink_hanlder_fn,
        )
        self.assertEqual(text, "MAPLINK RETURN")

This should now work with a recent commit: extension tags are now supported with extension_tags= for Wtp(), and we already could handle arbitrary nodes in clean_node with node_handler_fn. In this case, just for the purposes of demonstration, I've handled all <maplink>elements</maplink> so that they return a string; the return could be a list of..., or a string or a WikiNode, or None to signal that there was no handling done and to process the node as usual. Usually, a maplink would not return anything inside the text itself, but would be transformed into it's own map box on the side.

#coordinates does nothing. 👍

from wikitextprocessor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.