Git Product home page Git Product logo

semantic-clusters's Introduction

ASReview Semantic Clustering

This repository contains the Semantic Clustering plugin for ASReview. It applies multiple techniques (SciBert, PCA, T-SNE, KMeans, a custom Cluster Optimizer) to an ASReview data object, in order to cluster records based on semantic differences. The end result is an interactive dashboard:

Alt Text

Installation

The packaged is called semantic_clustering and can be installed from the download folder with:

pip install .

or from the command line directly with:

python -m pip install git+https://github.com/asreview/semantic-clusters.git

Commands

For help use:

asreview semantic_clustering -h
asreview semantic_clustering --help

Other options are:

asreview semantic_clustering -f <input> -o <output.csv>
asreview semantic_clustering --filepath <input> --output <output.csv>
asreview semantic_clustering -a <output.csv>
asreview semantic_clustering --app <output.csv>
asreview semantic_clustering -v
asreview semantic_clustering --version
asreview semantic_clustering --transformer

Usage

The functionality of the semantic clustering extension is implemented in a subcommand extension. The following commands can be run:

Processing

In the processing phase, a dataset is processed and clustered for use in the interactive interface. The following options are available:

asreview semantic_clustering -f <input.csv or url> -o <output_file.csv>

Using -f will process a file and store the results in the file specified in -o.

Semantic_clustering uses an ASReviewData object, and can handle files, urls and benchmark sets:

asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o output.csv
asreview semantic_clustering -f van_de_Schoot_2017.csv -o output.csv

If an output file is not specified, output.csv is used as output file name.

Transformer

Semantic Clustering uses the allenai/scibert_scivocab_uncased transformer model as default setting. Using the --transformer <model> option, another model can be selected for use instead:

asreview semantic_clustering -f benchmark:van_de_schoot_2017 -o <output_file.csv> --transformer bert-base-uncased

Any pretrained model will work. Here is an example of models, but more exist.

Dashboard

Running the dashboard server is also done from the command line. This command will start a Dash server in the console and visualize the processed file.

asreview semantic_clustering -a output.csv
asreview semantic_clustering --app output.csv

When the server has been started with the command above, it can be found at http://127.0.0.1:8050/ in your browser.

License

MIT license

Contact

Got ideas for improvement? For any questions or remarks, please send an email to [email protected].

semantic-clusters's People

Contributors

j535d165 avatar jteijema avatar rensvandeschoot avatar sasafrass avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jteijema

semantic-clusters's Issues

Change name of repository

ASReview extensions start with the asreview- prefix. This helps users to understand the content of the repo. This might also be the moment to reconsider the name of the entry point. I think a name without hyphen or underscore is preferred, although I'm also fine with keeping it the way it is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.