Git Product home page Git Product logo

nzlul03 / indexing_and_querying_bm25_dlm Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 51 KB

This repository contains my work for the Assignment of Advanced Information Retrieval Course at the University of Indonesia. Assignment: Indexing and Querying using BM25 and Dirichlet Language Modelling

Jupyter Notebook 100.00%
bm25 evaluating-models indexing information-retrieval querying dirichlet-language-model

indexing_and_querying_bm25_dlm's Introduction

Indexing and Querying using BM25 and Dirichlet LM

This repository contains my work for the Advanced Information Retrieval Course at the University of Indonesia

Requirements

  • Python 3.7 or above
  • Library:
    • Pyterrier
    • pandas

Definition

  • BM25 is ranking function which calculates score to represent a document's relevance with respect to query.
  • Dirichlet Language Model (DLM) is retrieval model yields longer sentences than using BM25.

Task

  • Melakukan querying menggunakan scoring function BM25 dengan hanya mengambil top 10 documents untuk setiap query
  • Melakukan evaluasi hasil retrieval BM25 seluruh query dengan metric evaluasi precision@10, recall@10, dan MRR
  • Melakukan evaluasi hasil retrieval BM25 per query dengan metric evaluasi precision@10, recall@10, dan MRR
  • Melakukan querying menggunakan scoring function Dirichlet Language Model (DLM) dengan hanya mengambil top 10 documents untuk setiap query
  • Melakukan evaluasi hasil retrieval DLM seluruh query dengan metric evaluasi precision@10, recall@10, dan MRR
  • Melakukan evaluasi hasil retrieval DLM per query dengan metric evaluasi precision@10, recall@10, dan MRR

Task Analisis Hasil

  1. Manakah metode yang memiliki efektivitas lebih baik antara BM25 dan LM? Apakah perbedaan skor yang diperoleh sinifikan secara statistik?
  2. Pada query mana saja BM25 lebih unggul, dan pada query mana saja LM lebih unggul? Berikan analisis Anda mengapa hal ini bisa terjadi
  3. Query ID berapa yang memiliki nilai evaluasi terbaik dengan metode BM25? Query ID berapa yang memiliki nilai evaluasi terbaik dengan metode LM? Berikan analisis Anda mengapa hal ini bisa terjadi.
  4. Query ID berapa yang memiliki nilai evaluasi terburuk dengan metode BM25? Query ID berapa yang memiliki nilai evaluasi terburuk dengan metode LM? Berikan analisis Anda mengapa hal ini bisa terjadi.

References

indexing_and_querying_bm25_dlm's People

Contributors

nzlul03 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.