Git Product home page Git Product logo

wikipedia's Introduction

WIKIPEDIA build Test Coverage Contributions npm version

Wikipedia for node. Works in the browser as well.

Implements legacy wiki endpoints and also the newer REST API.

Try out the new summary() REST endpoint for a introduction to your page and the main images optimized for browsers and mobile!

You can also now get the events which happened on a particular day using the onThisDay() api, which supports filtering by event types as well.

Built with latest ES6 and native support for async/await and promises.

Built with TypeScript - exports all the types used.

INSTALLATION

$ npm install wikipedia

Highlights

For detailed documentation of methods available on wiki and page,

What can it do?

  • Get a summary for a page which contains the intro and main image optimized for web and mobile with the new wikipedia REST APIs
  • Fetch article content
  • Find all links/images/categories in a page
  • Gets all the relevant events that happened on a particular day. You can further filter this by event type
  • Find related articles from the given article
  • Find articles by geographical location
  • Get a wikipedia page as a pdf document
  • Supports switching languages
  • Parses infoboxes using infobox-parser

Usage

const wiki = require('wikipedia');

(async () => {
	try {
		const page = await wiki.page('Batman');
		console.log(page);
		//Response of type @Page object
		const summary = await page.summary();
		console.log(summary);
		//Response of type @wikiSummary - contains the intro and the main image
	} catch (error) {
		console.log(error);
		//=> Typeof wikiError
	}
})();

The page method returns a Page class object which has fields like pageid, title, parentid, revisionid and methods like summary(), intro(), images(), html() and more.

All the page methods can take a title parameter or a pageId. Read up on the Page documentation here to see a detailed overview of the methods available in page.

You can also call methods like summary() on the wiki object directly. Read up here to see when you should use the page object and when you should call summary() directly. There's a performance difference! Long story short, use the method directly if you are using only the summary of the page and are not expecting to use any of the other page attributes.

const wiki = require('wikipedia');

(async () => {
	try {
		const summary = await wiki.summary('Batman');
		console.log(summary);
		//Response of type @wikiSummary - contains the intro and the main image
	} catch (error) {
		console.log(error);
		//=> Typeof wikiError
	}
})();

You can now get the events which happened on a particular day using the new onThisDay() api on the wiki object.

const wiki = require('wikipedia');

(async () => {
	try {
		const events = await wiki.onThisDay();
		const deaths = await wiki.onThisDay({type:'deaths', month:'2', day:'28'});
		console.log(events); // returns all the events which happened today
		console.log(deaths); // returns all deaths which happened on Feb 28
	} catch (error) {
		console.log(error);
		//=> Typeof wikiError
	}
})();

There are other methods like search(), geoSearch(), suggest(), setLang(), setUserAgent() which should be called on the wiki object directly. Read up on the wiki documentation to see a complete list of methods available on the wiki default object.

const wiki = require('wikipedia');

(async () => {
	try {
		const searchResults = await wiki.search('Batma');
		console.log(searchResults);
		//Response of type @wikiSearchResult - contains results and optionally a suggestion
		const newUrl = await wiki.setLang('fr');
		console.log(newUrl);
		//Returns the api url with language changed - use `languages()` method to see a list of available langs
	} catch (error) {
		console.log(error);
		//=> Typeof wikiError
	}
})();

You can export types or even specific methods if you are using modern ES6 js or TypeScript.

import wiki from 'wikipedia';
import { wikiSummary, summaryError } from 'wikipedia';
import { summary } from 'wikipedia';

(async () => {
	try {
        let summary: wikiSummary; //sets the object as type wikiSummary
		summary = await wiki.summary('Batman');
		console.log(summary);
        let summary2 = await summary('Batman');//using summary directly
	} catch (error) {
		console.log(error);
		//=> Typeof summaryError, helpful in case you want to handle this error separately
	}
})();

Options

All methods have options you can pass them. You can find them in optionTypes documentation.

Result Types

All the returned result types are documented as well. You can find them here.

Contributing

Before opening a pull request please make sure your changes follow the contribution guidelines.

Contributors

The project would not be the way it is without these rockstars.

dopecodez
Govind S
friendofdog
Kevin Kee
bumbummen99
Patrick
gtibrett
Brett
0xflotus
0xflotus
Greeshmareji
Greeshma R
zactopus
Zac [they/them]
bigmistqke
Bigmistqke
yg-i
Null

wikipedia's People

Contributors

0xflotus avatar bigmistqke avatar bumbummen99 avatar dopecodez avatar friendofdog avatar github-actions[bot] avatar greeshmareji avatar gtibrett avatar yg-i avatar zoetrope69 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

wikipedia's Issues

Implement media list REST API

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a ​/page​/media-list​/{title} endpoint which lists the media files used in the page. This is something I would love to have in wikipedia.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

Implement events on this day API

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /feed/onthisday/{type}/{mm}/{dd} endpoint which which provides events that historically happened on the provided day and month. We should support month and date in string format and also support the types of events.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

Failing coverage check on forked Pull Requests

As seen on #17 , #12 , #11 and any other forked PRs to master, the coverage check fails.

The check fails due to CC_TEST_REPORTER_ID which is used in uploading test reports to codeclimate not being available to forks. The discussion at https://github.community/t/make-secrets-available-to-builds-of-forks/16166 is inconclusive, meaning we have to find our own solution or remove the check completely.

Possible solutions include:

  1. Find a way to make CC_TEST_REPORTER_ID available to forks following the above link.
  2. Remove the check from PRs. This will involve playing around with the main.yaml github action file to get it just right.
  3. Non Ideal Make CC_TEST_REPORTER_ID public. This is something we really shouldnt do as people using parts of the code might end up using this secret.

Implement pdf api

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/pdf{title} endpoint which provides the page in a pdf format.

The API returns a file for straight download, so my intial thought is we'll have to stream the data to actually get the file to the user.

Any discussion on this is welcome.

Move from node-fetch to got,ky or axios

Issue

Updating node-fetch is a headache because the module seems to be using different formats which are not supported by modern typescript compilers and jest, both libraries which are very widely used. Additionally, node-fetch takes up a few more mbs as shown in https://www.npmjs.com/package/got#comparison, and also is less actively maintained.

Solution

Analyze the other major HTTP modules like got, ky, or Axios and transition Wikipedia to these libraries instead of node-fetch.
My initial feeling is using got because of its small size and active maintenance but other suggestions are welcome.

Other languages not fully working

Getting results in other languages has problems. Grabbing a page works, but things like summaries and On This Day are not. It seems like it's not using the correct REST url.
For example, for the Swedish site the summary page should be sv.wikipedia.org/api/rest_v1/page/summary/Stockholm
but it tries to use sv.wikipedia.org/v1/page/page/summary/Stockholm instead.

const wiki = require('wikipedia');
 
(async () => {
    try {
        const changedLang = await wiki.setLang('sv');
        const page = await wiki.page('Stockholm'); // Works
        const summary = await wiki.summary('Stockholm'); // Fails
        console.log(page, summary);
    } catch (error) {
        console.log(error);
    }
})();

Implement random page API

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page​/random​/{format} endpoint which which gives a page in given format. This is something I would love to have in wikipedia. Find more details here.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

Implement mobile sections

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/mobile-sections/{title} endpoint which which provides mobile friendly html.

Implementation for this can follow #17. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

Using `instanceof` to detect a `pageError`

I want to detect when a page doesnt exist so Im catching exceptions, and trying to work out if the exception is a pageError.

So Im trying to use :

      if (wikiError instanceof pageError) {

It works if I import the class using :

import { pageError } from "wikipedia/dist/errors";

..but if I use the barrelled main export types from the d.ts like so

import { pageError } from "wikipedia";

It fails with

Right-hand side of 'instanceof' is not an object

Obviously I don't want to rely on digging into the dist folder, any ideas?

When I attempt to make multiple requests at once I get a lot of PageErrors (parallel or sequential) even on valid items

When I do these one at a time they all resolve with a page, when I do more than ten at a time they start to throw pageerrors. I made a little code sample that perfectly illustrates the issue:

const wiki = require('wikipedia');

const subjects = [ "University of Washington", "USC Gould School of Law", "Watergate", "Supreme Court", "Justice Clarence Thomas", "Harlan Crow", "resignation", "impeachment", "public trust", "code of ethics", "University of Washington", "USC Gould School of Law", "Watergate", "Supreme Court", "Justice Clarence Thomas", "Harlan Crow", "resignation", "impeachment", "public trust", "code of ethics" ];

async function GetWikiSummary(subject) {
    let result = {};

	try {
        result.subject = subject;
		const page = await wiki.page(subject);
        result.canonicalurl = page.canonicalurl;
	} catch (error) {
        result.error = error;
	}

    return result;
}

async function getWikiSummaries(subjects) {
    const results = [];
  
    for (const subject of subjects) {
      try {
        const summary = await GetWikiSummary(subject);
        results.push(summary);
      } catch (error) {
        results.push({ subject });
      }
    }
  
    return results;
}

console.log("Starting");

(async () => {
    const converted = await getWikiSummaries(subjects);
    //const converted = await GetWikiSummary('impeachment');
    console.log(JSON.stringify(converted, null, 2));
})();

console.log("Done");

the list is actually duplicated items to show that the first 10 resolve and the last 10 (even though they are the same) will throw page errors. If I have a lost of 20 items what is the recommended way to get 20?

the result of the above code looks like this:

Done
[
  {
    "subject": "University of Washington",
    "canonicalurl": "https://en.wikipedia.org/wiki/University_of_Washington"
  },
  {
    "subject": "USC Gould School of Law",
    "canonicalurl": "https://en.wikipedia.org/wiki/USC_Gould_School_of_Law"
  },
  {
    "subject": "Watergate",
    "canonicalurl": "https://en.wikipedia.org/wiki/Watergate_scandal"
  },
  {
    "subject": "Supreme Court",
    "canonicalurl": "https://en.wikipedia.org/wiki/Supreme_court"
  },
  {
    "subject": "Justice Clarence Thomas",
    "canonicalurl": "https://en.wikipedia.org/wiki/Clarence_Thomas"
  },
  {
    "subject": "Harlan Crow",
    "canonicalurl": "https://en.wikipedia.org/wiki/Harlan_Crow"
  },
  {
    "subject": "resignation",
    "canonicalurl": "https://en.wikipedia.org/wiki/Resignation"
  },
  {
    "subject": "impeachment",
    "canonicalurl": "https://en.wikipedia.org/wiki/Impeachment"
  },
  {
    "subject": "public trust",
    "canonicalurl": "https://en.wikipedia.org/wiki/Public_trust"
  },
  {
    "subject": "code of ethics",
    "canonicalurl": "https://en.wikipedia.org/wiki/Ethical_code"
  },
  {
    "subject": "University of Washington",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "USC Gould School of Law",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Watergate",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Supreme Court",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Justice Clarence Thomas",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "Harlan Crow",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "resignation",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "impeachment",
    "error": {
      "name": "pageError"
    }
  },
  {
    "subject": "public trust",
    "canonicalurl": "https://en.wikipedia.org/wiki/Public_trust"
  },
  {
    "subject": "code of ethics",
    "error": {
      "name": "pageError"
    }
  }
]

Thanks for the help!

Implement mobile html

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/mobile-html/{title} endpoint which which provides mobile friendly html.

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

I'll be happy to help anyone who wants to pick this up.

How do I pass page url instead of page text?

There are no clear instructions on page input parameters. There is just a singleton example with text input 'Batman'.
How do I account for passing a page url as data?

As an example, use page('Oliver Ellsworth') and then retrieve the infobox()
And then retrieve page('1st United States Congress') and then retrieve infobox()

geoSearchError: wikiError: TypeError: url_1.URLSearchParams is not a constructor

I'm currently working on a Vue application that has the following method

    async getLocations() {
      this.pages = []
      try {
        const geoResult = await wiki.geoSearch(2.088, 4.023, {
          radius: 5000,
          limit: 20,
        })
        console.log(geoResult[0]) // the closest page to given coordinates
      } catch (error) {
        console.log(error)
      }
}

Unfortunately it returns this exception:

geoSearchError: wikiError: TypeError: url_1.URLSearchParams is not a constructor
    at AsyncFunction.wiki.geoSearch (webpack-internal:///./node_modules/wikipedia/dist/index.js:469:15)    
wiki.geoSearch = async (latitude, longitude, geoOptions) => {
    try {
        const geoSearchParams = {
            'list': 'geosearch',
            'gsradius': (geoOptions === null || geoOptions === void 0 ? void 0 : geoOptions.radius) || 1000,
            'gscoord': `${latitude}|${longitude}`,
            'gslimit': (geoOptions === null || geoOptions === void 0 ? void 0 : geoOptions.limit) || 10,
            'gsprop': 'type'
        };
        const results = await request_1.default(geoSearchParams);
        const searchPages = results.query.geosearch;
        return searchPages;
    }
    catch (error) {
        throw new errors_1.geoSearchError(error);
    }
};

CORS error fetching summary in Firefox

I created a vue app where I want to show info to a specific location.
This is my code for fetching the summary:
const summary = await wiki.summary(pageName);

In chrome everything works perfectly, but in Firefox I'm getting this error:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://de.wikipedia.org/api/rest_v1/page/summary/Stuttgart. (Reason: header ‘user-agent’ is not allowed according to header ‘Access-Control-Allow-Headers’ from CORS preflight response).

Incorrect data parsed from Infobox

https://en.wikipedia.org/wiki/All_Around_the_World_(Lisa_Stansfield_song)

const page = await wiki.page(pageTitle);
return page.infobox({ redirect: false });

returns

{
  name: 'All Around the World',
  cover: 'Lisa Stansfield - All Around the World.jpg',
  border: true,
  caption: 'Artwork for releases outside North America',
  type: 'Singles',
  artist: '2003',
  album: 'Affection (Lisa Stansfield album)',
  bSide: '"Wake Up Baby" (7"),"The Way You Want It" (12")',
  released: '16 October 1989',
  recorded: '1989',
  length: 'Duration',
  label: 'Arista Records',
  writer: [ 'Lisa Stansfield', 'Ian Devaney', 'Andy Morris' ],
  producer: [ 'Ian Devaney', 'Andy Morris' ],
  prevTitle: '8-3-1',
  prevYear: '2001',
  nextTitle: 'Too Hot (Kool & the Gang song)',
  nextYear: 'External music video',
  misc: 'Extra chronology',
  title: 'All Around the World (Norty Cotto Mixes)',
  year: '2003'
}

artist: '2003' is off

Error using page() to get infobox()

Do you have any thoughts on what Invalid attempt to destructure non-iterable instance is referring to in this context?

/PROJECTS/research/node_modules/wikipedia/dist/page.js:256
                throw new errors_1.infoboxError(error);
                      ^

infoboxError: infoboxError: TypeError: Invalid attempt to destructure non-iterable instance
    at Page.infobox (/PROJECTS/research/node_modules/wikipedia/dist/page.js:256:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: undefined
}

Node.js v18.16.0

The code:

const wiki = require('wikipedia')
let page, infobox
async function getPage(input) {
  try {
    page = await wiki.page(input)
    infobox = await page.infobox()
    console.log(infobox)
  } catch (error) {
    console.log(error)
  }
  return infobox
}

getPage('John M. Vining')

Dropping support for node 10, 12

We are planning to drop support for node versions:

10.x.x
12.x.x
Our minimum version will be node 14.x.x.

It would be great to hear if the community feels like there might be an issue to dropping these versions.

Clarity on browser support

The README claims that the package can be used in browsers, but I couldn't find any documentation on it.

Is this actually feasible, how so?

Implement featured content api

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Anyone who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /feed/featured/{year}/{mm}/{dd} endpoint which provides featured content for that particular day. Implementation for this can follow #8 .

Implementation for this should follow the summary or related method flow. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

Implement generate citation data

There are a lot of new REST APIs for wikipedia present in the REST API docs.

We'll look through them one by one and implement the same. Any one who wants to pick up on any other REST API should ideally open a new issue.

The REST API has a /page/mobile-sections/{title} endpoint which which provides citation data for a given url.

Implementation for this can follow #17. Remember to write unit tests for all possible scenarios in your new functions, and try to use types as far as possible.

Error with proxy

How can i use it with a proxy ? I have this error

searchError: wikiError: FetchError: request to https://en.wikipedia.org/w/api.php?list=search&srprop=&srlimit=3&srsearch=Who%20is%20Harry%20Potter?&srinfo=suggestion&format=json&redirects=&action=query&o
rigin=*& failed, reason: connect ECONNREFUSED 185.15.58.224:443
    at AsyncFunction.wiki.search (D:\developpement\Nodejs\wikipedia\node_modules\wikipedia\dist\index.js:55:15)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async D:\developpement\Nodejs\wikipedia\index.js:5:31 {
  code: undefined
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.