Git Product home page Git Product logo

price_detector_fa's Introduction

price_detector_fa

price_detector_fa extracts product/price/amount tuples from Persian text using rule-based methods.

Contributers

  • Feraidoon Mehri
  • Fahime Hosseini
  • Soroush Vafaie Tabar

Installation

This library does not work on Windows.

  1. Run the following in this project’s directory:
pip install -e .
bash install.sh
  1. Install graphviz using your OS package manager.

Usage

from price_detector_fa.samples import *
from price_detector_fa.utils import *
from price_detector_fa.extractors import *
from price_detector_fa.preprocessing import *
from price_detector_fa.hardcoded import *


def matching_extract(sample):
    output = []
    for s in sentence_tokenizer.tokenize(sample):
        s_tokens, s_spans = preprocess(s)

        s_parsed = parser.parse(s_tokens)
        s_spans = find_spans(s_parsed, s_spans)

        matchings = all_extract(s_parsed)
        output = output + list(
            matching_show(matching, s_spans) for matching in matchings
        )
    return output


import pprint 
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(matching_extract("عباس‌آقا ده فروند شتر را به بهای پنجاه قران خریداری نموده و و خوشال شدند"))
[ { 'price_amount': ['مقدار:  پنجاه'],
    'price_unit': ['مقدار:  قران'],
    'product_amount': ['مقدار:  ده'],
    'product_name': 'مقدار:  شتر',
    'product_name_span': (18, 21),
    'product_unit': ['مقدار:  فروند']}]
pp.pprint(matching_extract("با سه هزار تومان میشود یک عدد بادکنک خرید."))
[ { 'price_amount': ['مقدار:  سه هزار'],
    'price_unit': ['مقدار:  تومان'],
    'product_amount': ['مقدار:  یک'],
    'product_name': 'مقدار:  بادکنک خرید .',
    'product_name_span': (30, 42),
    'product_unit': ['مقدار:  عدد']}]
print(sample_16_2)
pp.pprint(matching_extract(sample_16_2))
قیمت هندوانه ارزان شد و قیمت  هر گرم طلا هزار تومان است
[ { 'price_amount': ['مقدار:  هزار'],
    'price_unit': ['مقدار:  تومان'],
    'product_amount': ['مقدار:  یک'],
    'product_name': 'مقدار:  طلا',
    'product_name_span': (37, 40),
    'product_unit': ['مقدار:  گرم']}]

price_detector_fa's People

Contributors

fhn98 avatar nightmachinery avatar svafaiet avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.