Git Product home page Git Product logo

citepy's Introduction

Cite.py

A Python program for analyzing bibliographic data in your browser. The current version supports data exported from the Thomson Reuters Web of Science.

The long-term purpose of this program is to create similar functionality as in the Histcite package, but using free software instead of proprietary. This way, it will be easier for researchers to modify specific functions for pursuing their research questions.

It should be noted that this software should be interpreted more as a suggestion or concept, not as a user-friendly and ready-made application.

Features include:

  • Sort publications by Author, Years, Keywords, Source, Times Cited.
  • Analyze Cited References.
  • Export word co-occurrence networks and co-author networks as .gexf files for further analysis in Gephi.
  • Easy-to-share output (a few static html files).

For a live demo, please visit this page.

Requirements

This program is written in Python 3. The following external packages are required:

  • pandas
  • bokeh
  • lxml
  • pygexf

Depending on your platform, lxml and pygexf may have to be compiled from source rather than relying on the standard package managers. Please google a bit, things may be especially complicated in Windows regarding lxml.

External software bundled with this repository

This program uses Tsorter by Terrill Dent, which is licensed under the MIT licence.

Usage

Input data: Use with data exported as .tsv files from Web of Science. I have only tried with the "Mac OS/UTF-8" output option.

python3 Cite.py YOURDATA.tsv

The script will automatically start up a webserver running on port 8000. Go to localhost:8000 with your favourite browser (with support for Javascript). You can use the included example_data.tsv as a test dataset.

Output

The script will create a handful of html files and gexf network files.

Known bugs

  • Large datasets (more than tens of thousands of articles) will eventually make the Javascript sorting function very slow. Currently the software will limit the number of records to the 500 most cited. Change the value in for record in dict(collections.Counter(indexbodydict).most_common(500)) to increase this number.

Todo / Future updates

  • Enable larger data sets (se above).
  • Enable more input files, such as Scopus, SwePub etc.
  • Add more useful functionality.
  • ??? (please send me a message to suggest further improvements).

citepy's People

Contributors

christopherkullenberg avatar intensifier avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.