Git Product home page Git Product logo

wiki-link's Introduction

This software package contains code that computes the basic statistics on the Google WikiLinks dataset, and downloads & process webpages to construct a version of the dataset that contains contexts around each mention.

Note: This file assumes knowledge of the Wikilinks dataset, see: http://iesl.cs.umass.edu/data/wiki-links

For documentation of how to use this library, see the Wiki here: https://code.google.com/p/wiki-link/w/

For any questions or issues, contact us at [email protected].

wiki-link's People

Contributors

sameersingh avatar brianmartin avatar

wiki-link's Issues

rawTextOffset for Expanded data set

What is rawTextOffset supposed to be ? 

I just want to extract the text from the HTMLs together with the exact offset 
of each of the Wikipedia hyperlinks.

See comments on the Wiki page about this issue.

Original issue reported on code.google.com by [email protected] on 30 Apr 2015 at 12:52

Contributing: solving some issues + Documenting some of the steps needed to generate the dataset.

I've been playing with the code provided in the repository to generate the 
dataset.
I wonder if it is possible to contribute, I found the following issues:

1. I had problems calling retrieve.sh part of the problem was that I was using 
OSX and the script  uses GNU terminal commands. I changed some of those parts 
of the scripts so that they are cross OS.

2. a bug in retrieve_chunk.sh, the script was getting the wrong parameter to 
use as URL, this was solved by changing the index used by awk.

3. Added a 'how to' with steps on how to generate the dataset.

Here is my fork:
https://code.google.com/r/davalejandro-wikilink/source/list

Original issue reported on code.google.com by dav.alejandro on 9 Jul 2013 at 11:32

wikilink compilation problem

Hi Sameer,

Awesome work! Very basic question - we try to use this dataset for our own project (first use of Scala). It seems like the the WikiLinkItemIterator should be sufficient for our needs ,jet we still struggle in compiling this project.
Up to now we downloaded the wikilink project (wiki-link/source-archive.zip) and it's dependencies (wikilink-0.1-SNAPSHOT-jar-with-dependencies.jar) and tried to make it in intelliJ 14.1.7 with Scala.
We get the following compilation error - Error:scalac: bad option: '-make:transitive'
We also have a version error for the -

      <groupId>org.sameersingh.utils</groupId>
      <artifactId>timing(also for coref and cmdots)</artifactId>
      <version>0.1.1-SNAPSHOT</version>

in the pom.xml file
We use a windows 10 64-bit machine.

Is there any steps that we are missing? It does seems like it is a problem in the Scala library, however we use the most current distribution...

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.