controversial / wikipedia-map Goto Github PK

View Code? Open in Web Editor NEW

426.0 21.0 77.0 1.9 MB

A web app for visualizing the connections between Wikipedia pages.

Home Page: https://wikipedia.luk.ke

License: MIT License

Python 5.02% HTML 10.78% JavaScript 71.31% CSS 12.88%

wikipedia flask visjs visualization web website

wikipedia-map's Issues

Hovering article name should highlight the corresponding node

Show All Links of full Article as Nodes

Hallo,
My colleagues and I regularly play a game where the goal is to find the shortest path between two Wikipedia articles. I found this software to enable a checking function. Therefore, I have two questions:

Is it possible to extract all links within an article using this script?
Is it possible to display all links as nodes?

Start flask server

Hi, I am running flask on python2.7. Is the code for the localserver gone? I cant find the api/api.py anymore :(

Regards

Unicode problem: "Zürich"

https://en.wikipedia.org/wiki/Z%C3%BCrich

Single quote problem: "League of God's House"

https://en.wikipedia.org/wiki/League_of_God%27s_House

Node shows edge to self

When you enter facebook as one of your search terms, the main topic node will show an edge to itself

CORS

Hi Luke, may want to check:

XMLHttpRequest cannot load https://wikipedia-map.now.sh/pagename?page=dna. Redirect from 'https://wikipedia-map.now.sh/pagename?page=dna' to 'https://zeit.co/controversial/api/eokchmmhgl?redirect=1' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://luke.deentaylor.com' is therefore not allowed access.

api backend result seems broken

Hi,

http://luke.deentaylor.com/wikipedia/api/links?page=Cats

results in a "list index out of range" error.

Doesn't work

Hi, I'd like to see your page in action but unfortunately it doesn't seem to work.
I type an article then press go but it remains in the "tour" section.

Unicode problems with random button

I've got the random button working for special characters like , and and., but it still doesn't work well for very strange characters. For example, there's a severe display bug with wikipedia page titles like "Dąbrowice, Gmina Maków." In the text box, it displays with a HTML character encoding (the second character displays asą, with the full text ofDąbrowice, Gmina Mak�w), and on the node it displays like

It is likely that this is in part Python's fault, and also JavaScript's fault as well. I'll try to fix it.

remove all nodes with only one connection

It would be nice if there was a way to simplify a graph by removing all nodes with only one connection. This would allow you to build a complex, meaningful network fairly quickly (explore a few things, find some connections, then delete the guff).

It would also be nice to be able to manually delete selected nodes somehow.

executing `api/api.py`

Hi,
i have the error below when i try to execute >python api.py to run the flask server.

Traceback (most recent call last):
File "api.py", line 10, in
from wikipedia_parse import *
File "C:\WikiMap\wikipedia-map-master\api\wikipedia_parse.py", line 164
print is_article(":Cows"), is_article("WP:UA") # Test if it's an article
^
SyntaxError: invalid syntax

Thank you in advance for your help

Can't figure out how to collapse nodes

Tried on Safari and Android (chrome) but don't see how to collapse nodes

Not showing all wikipedia article links

When searching for https://en.wikipedia.org/wiki/San_Diego_Comic-Con only one node is created, but there are over 800 wikipedia article links in that page.

Feature: Add next terms

It would be great if it was possible to add a term after initial search without losing connections - e.g. for brainstorming purposes.

Great tool, btw - really appreciate

A bit more helpful text

"Geographic Coordinates System" marked as only link for many pages about places

Recently it seems the structure of many pages has changed such that the box indicating the coordinates of a place is contained within the first direct p descendant of .mw-parser-output. This causes "Geographic Coordinate System" to be marked as the only link from all of these articles.

In this image, highlighted p node contains the coordinates information, but is structurally the first p node that is a direct child of .mw-parser-output

Failed to load resource: net::ERR_FILE_NOT_FOUND

Hi,
When i test with the following url http://localhost:12345/links?page=cats , i have the ansewer below.

["Homo sapiens", "Carnivore", "Fur", "Feral cat", "Felidae", "Pet", "Vermin", "Latin language", "Felinae", "Mammal"]

but when i try to start index.html i have an error in the console of my browser

thank you in advance

Direct linking to graphs

Coming from erabug/wikigraph#2 and fedwiki/wiki#63 we know that linking to certain states of the graph would be interesting.

Similar to what CoGraph allows, but by using URL fragments known from @fedwiki lineups.

If I searched for Space and Time, they'd automatically be added to the URL and therefore create a stable view onto the data. Those nodes should be expanded by default on load.

Better help

Use a tool like shepherd or tourist for an interactive tour of the app.

Duplicate Nodes.

Problem

Sometimes, in a graph, the same node is shown twice. In this graph, you can see that J.K. Rowling is shown twice, once with a space between initials, and once without; both "J. K. Rowling" and "J.K. Rowling".

Cause

This is because one page links to https://en.m.wikipedia.org/wiki/J._K._Rowling, while the other links to https://en.m.wikipedia.org/wiki/J.K._Rowling. These both redirect to the same page, but are different URLs. Therefore, wikipedia_parse.py, which only looks at the last segment of the URL, interprets them differently.

Possible solutions

Look at the actual title of pages, after following the link, call get_page_name on each node that is added. This would be very slow, better to pursue a faster method.

some cases could be solved by simply storing a lowercased version of page titles with spaces removed as node IDs, and using the full thing for node labels. However, this still would not resolve things like Cat vs Cats, which go to the same page but might be linked differently.

The "about" dialog has quite a few problems.

You can't close it or scroll it on mobile. Also, it's not really sized right.

Data Source for the Map

Hi,

Can you please let me know the data source which you are using in the Flask Server API?

Thanks,

Multiple start nodes

Compare start nodes by entering a comma separated list. Idea from https://www.reddit.com/r/dataisbeautiful/comments/4alqq0/wikipedia_mapper_oc/

All edges directly from a highlighted node is highlighted in path

When you highlight a path, certain nodes are highlighted. All edges that directly originate from these nodes are also highlighted, not just the edges that are in the path.

Traceback function is not working

@controversial please guide me how traceback functions works

License?

Hello.

I want to reuse parts of your projects in our open source project https://github.com/dchaplinsky/pep.org.ua.

Do you have any license restrictions/objections?

Lag in traceback

In very large networks, the traceback can sometimes be very slow. The whole network pauses during slow tracebacks.

A hackish solution could be to call traceBack asynchronously from a setTimeout call, which might not freeze the network in the same way. However, this would address the symptoms rather than the problem, and not actually improve the speed.

A much better solution would be to increase speed by reducing the number of iterations that are made through the traceback nodes. Right now, 6 iterations are made:

Iterate through parents to identify traceback nodes
Iterate through identified nodes to adjust color
Iterate through identified nodes again inside vis.DataSet.update
Iterate through parents to identify traceback edges
Iterate through identified edges to adjust color
Iterate through the edges again inside vis.DataSet.update

Looking into the code for vis.DataSet.update, it appears that commit dfc633e was made in error. This added two more iterations to the list, further slowing down the traceback, rather than speeding it up.

To bring this down to one loop, traceBack, getTraceBackNodes, and getTraceBackEdges could be merged into a single function with one iteration. nodes.update() and edges.update() could be called once for each item as they are identified and modified. This could bring the total loops made through the same data during a traceBack down to one.

Child node created although it is the same as the parent node

When I query for "The Smashing Pumpkins," and click on "James Iha," a child node is created for "The Smashing Pumpkins." Since the child IS the same as the parent, the node should not be created or at least filtered out after the list comes back.

Get more links for selected nodes

The current script gets the links from the first paragraph, but this is sometimes not particularly useful. For example, Dog only returns "Carl Linnaeus" (this might be a bug though, because the first paragraph of https://en.wikipedia.org/wiki/Dog has more links than that..).

It would be good to be able to (optionally) use more paragraphs to rip links from, so that nodes with weak first paragraphs can be expanded..

Also, I wonder if it wouldn't be better to use the first 3 paragraphs by default. I have a local copy that gets the first three, and it seems to capture a much more representative set of links..

controversial / wikipedia-map Goto Github PK

wikipedia-map's Issues

Problem

Cause

Possible solutions

Recommend Projects

Recommend Topics

Recommend Org