Git Product home page Git Product logo

vspandan / queryexpansion Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 3.0 29.02 MB

Query Expansion is the project developed at IIIT hyderabad, as part of course work. We have gone through existing applications and proposed an approach combination of N grams and Markov model, implemented the same and tested our results on data set comprising "The Telegraph - Calcutta" news stories

Java 100.00%

queryexpansion's Introduction

Steps to run

1. Set the properties.java with path variables depending on where the input data set is present
   and where the results need to be stored.
   
2. Run Doc2MatInput.java :This converts the dataset to format that is required by doc2mat perl script

3. Run doc2mat perl script : This generates the matrix file representation of data set
	Ex: perl E:\doc2mat-1.0\doc2mat -nostem E:\QueryEpansion_ver1_results\Output_ver1\Doc2MatInput\File E:\QueryEpansion_ver1_results\Output_ver1\Doc2MatInput\File.mat
		perl "location of perl script" -nostem "Inputfile" "Outputfile"
4. Run Cluto for clustering the documents.The last argument denotes the number of clusters to be formed
	Ex: vcluster.exe -clmethod="rb" E:\QueryEpansion_ver1_results\Output_ver1\Doc2MatInput\File.mat 150
		vcluster.exe -clmethod="rd" "Input matrix file" "No of clusters"

5. Update Properties File if needed

6. Run Cluster.java which put the documents into clusters

7. Run IndexClusters.java which indexes the clusters using lucene

8. Run tagclusters.java which associated tags to each cluster

9. Run SearchIndex.java : Enter you initial query here. Expanded queries are displayed.

For Evaluation:


1. Save all the input in text file. Start with string "Enter Query" followed by input query and augmented queries in next lines.

2. Run IndexDocuments.java which indexes the clusters using lucene

3. Run QueryDocuments.java which outputs excel sheet of precision and recall values.

Note: Change Properties.java interface accordingly

queryexpansion's People

Contributors

vspandan avatar

Stargazers

Trevor Lazarus avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.