Git Product home page Git Product logo

dict_voikko's Introduction

dict_voikko

PostgreSQL full text search dictionary extension utilizing the Finnish dictionary Voikko.

dict_voikko uses the base form of Finnish words as lexems. For compound words it uses the base form of the non-compound words from which the compound word is built. If dict_voikko doesn't recognise the it returns NULL, so dict_voikko should always be chained with another dictionary.

Dependencies

The dictionary needs libvoikki and its dependencies. A suomi-malaga dictionary with morphological analysis for Voikko is neede (e.g. dict-morpho from http://www.puimula.org/htp/testing/voikko-snapshot/) and at the moment it needs to be called mor-morpho.

Installation

Put dict_voikko in [POSTGRES]/contrib/ and compile.

###In PostgreSQL

Run something like:

CREATE EXTENSION dict_voikko;

CREATE TEXT SEARCH DICTIONARY voikko_stopwords (
    TEMPLATE = voikko_template, StopWords = finnish
);

CREATE TEXT SEARCH CONFIGURATION voikko (COPY = public.finnish);

ALTER TEXT SEARCH CONFIGURATION voikko 
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part 
    WITH voikko_stopwords, finnish_stem;

Test with

select ts_lexize('voikko', 'kerrostalollekohan');

The result should be:

 ts_lexize   
-----------
 {kerros,talo}
(1 row)

dict_voikko's People

Contributors

joux3 avatar ljungqvist avatar tlaitinen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dict_voikko's Issues

segmentation fault when running select ts_lexize('voikko', 'kerrostalollekohan');

Tested with:

  • postgresql 9.3.11 (ubuntu 14.04) and 9.5.2 (from source)
  • libvoikko that comes with ubuntu-14.04 and with libvoikko 4.0.2 (from source)

(lldb) process attach --pid 29958
Process 29958 stopped

  • thread #1: tid = 29958, 0x00007efcc5311110 libc.so.6__poll + 16, name = 'postgres', stop reason = trace frame #0: 0x00007efcc5311110 libc.so.6__poll + 16
    -> 0x7efcc5311110 <__poll+16>: cmpq $-0xfff, %rax
    0x7efcc5311116 <__poll+22>: jae 0x7efcc5311149 ; __poll + 73
    0x7efcc5311118 <__poll+24>: retq
    0x7efcc5311119 <__poll+25>: subq $0x8, %rsp

Executable module set to "/usr/local/pgsql/bin/postgres".
Architecture set to: x86_64-pc-linux.
(lldb) c
Process 29958 resuming
Process 29958 stopped

  • thread #1: tid = 29958, 0x00007efcbbc16374 libvoikko.so.1voikkoAnalyzeWordUcs4 + 4, name = 'postgres', stop reason = invalid address (fault address: 0x38) frame #0: 0x00007efcbbc16374 libvoikko.so.1voikkoAnalyzeWordUcs4 + 4
    -> 0x7efcbbc16374 <voikkoAnalyzeWordUcs4+4>: movq 0x38(%rdi), %rdi
    0x7efcbbc16378 <voikkoAnalyzeWordUcs4+8>: movq (%rdi), %rax
    0x7efcbbc1637b <voikkoAnalyzeWordUcs4+11>: callq *(%rax)
    0x7efcbbc1637d <voikkoAnalyzeWordUcs4+13>: movq %rax, %rbx
    (lldb) bt
  • thread #1: tid = 29958, 0x00007efcbbc16374 libvoikko.so.1`voikkoAnalyzeWordUcs4 + 4, name = 'postgres', stop reason = invalid address (fault address: 0x38)
    • frame #0: 0x00007efcbbc16374 libvoikko.so.1voikkoAnalyzeWordUcs4 + 4 frame #1: 0x00007efcbbc164f4 libvoikko.so.1voikkoAnalyzeWordCstr + 68
      frame #2: 0x00007efcbbe624e6 dict_voikko.sodvoikko_lexize(fcinfo=0x00007ffcffde9ec0) + 302 at dict_voikko.c:137 frame #3: 0x000000000094d5e3 postgresFunctionCall4Coll(flinfo=0x000000000196a538, collation=0, arg1=26661176, arg2=26059604, arg3=18, arg4=140724601266912) + 226 at fmgr.c:1375
      frame #4: 0x0000000000807c69 postgrests_lexize(fcinfo=0x00000000019669b0) + 141 at dict.c:39 frame #5: 0x000000000067d936 postgresExecMakeFunctionResultNoSets(fcache=0x0000000001966940, econtext=0x0000000001967250, isNull=0x00007ffcffdea45c, isDone=0x0000000000000000) + 305 at execQual.c:2019
      frame #6: 0x000000000067e34d postgresExecEvalFunc(fcache=0x0000000001966940, econtext=0x0000000001967250, isNull=0x00007ffcffdea45c, isDone=0x0000000000000000) + 181 at execQual.c:2410 frame #7: 0x0000000000682351 postgresExecEvalExprSwitchContext(expression=0x0000000001966940, econtext=0x0000000001967250, isNull=0x00007ffcffdea45c, isDone=0x0000000000000000) + 70 at execQual.c:4391
      frame #8: 0x0000000000750920 postgresevaluate_expr(expr=0x00000000018dace0, result_type=1009, result_typmod=-1, result_collation=100) + 148 at clauses.c:4643 frame #9: 0x000000000074fd4c postgresevaluate_function(funcid=3723, result_type=1009, result_typmod=-1, result_collid=100, input_collid=100, args=0x00000000018dac28, funcvariadic='\0', func_tuple=0x00007efcc5d560e0, context=0x00007ffcffdeb670) + 452 at clauses.c:4203
      frame #10: 0x000000000074f258 postgressimplify_function(funcid=3723, result_type=1009, result_typmod=-1, result_collid=100, input_collid=100, args_p=0x00007ffcffdea5d8, funcvariadic='\0', process_args='\x01', allow_non_const='\x01', context=0x00007ffcffdeb670) + 305 at clauses.c:3842 frame #11: 0x000000000074cd14 postgreseval_const_expressions_mutator(node=0x00000000018da758, context=0x00007ffcffdeb670) + 1524 at clauses.c:2520
      frame #12: 0x00000000006ce920 postgresexpression_tree_mutator(node=0x00000000018da7b0, mutator=0x000000000074c720, context=0x00007ffcffdeb670) + 5956 at nodeFuncs.c:2635 frame #13: 0x000000000074ec9d postgreseval_const_expressions_mutator(node=0x00000000018da7b0, context=0x00007ffcffdeb670) + 9597 at clauses.c:3492
      frame #14: 0x00000000006ceb12 postgresexpression_tree_mutator(node=0x00000000018da808, mutator=0x000000000074c720, context=0x00007ffcffdeb670) + 6454 at nodeFuncs.c:2684 frame #15: 0x000000000074ec9d postgreseval_const_expressions_mutator(node=0x00000000018da808, context=0x00007ffcffdeb670) + 9597 at clauses.c:3492
      frame #16: 0x000000000074c6cd postgreseval_const_expressions(root=0x00000000018da990, node=0x00000000018da808) + 96 at clauses.c:2362 frame #17: 0x00000000007300b2 postgrespreprocess_expression(root=0x00000000018da990, expr=0x00000000018da808, kind=1) + 111 at planner.c:727
      frame #18: 0x000000000072f86a postgressubquery_planner(glob=0x00000000018da1c0, parse=0x00000000018d9e00, parent_root=0x0000000000000000, hasRecursion='\0', tuple_fraction=0, subroot=0x00007ffcffdeb818) + 801 at planner.c:444 frame #19: 0x000000000072f292 postgresstandard_planner(parse=0x00000000018d9e00, cursorOptions=0, boundParams=0x0000000000000000) + 410 at planner.c:229
      frame #20: 0x000000000072f0ee postgresplanner(parse=0x00000000018d9e00, cursorOptions=0, boundParams=0x0000000000000000) + 81 at planner.c:157 frame #21: 0x00000000007f3d42 postgrespg_plan_query(querytree=0x00000000018d9e00, cursorOptions=0, boundParams=0x0000000000000000) + 112 at postgres.c:809
      frame #22: 0x00000000007f3df5 postgrespg_plan_queries(querytrees=0x00000000018da958, cursorOptions=0, boundParams=0x0000000000000000) + 103 at postgres.c:868 frame #23: 0x00000000007f4100 postgresexec_simple_query(query_string=0x00000000018d8e08) + 727 at postgres.c:1033
      frame #24: 0x00000000007f8534 postgresPostgresMain(argc=1, argv=0x00000000018c01a8, dbname=0x00000000018c0008, username=0x00000000018bffe0) + 1829 at postgres.c:4030 frame #25: 0x0000000000775703 postgresBackendRun(port=0x00000000018df480) + 588 at postmaster.c:4239
      frame #26: 0x0000000000774e18 postgresBackendStartup(port=0x00000000018df480) + 334 at postmaster.c:3913 frame #27: 0x00000000007715be postgresServerLoop + 719 at postmaster.c:1684
      frame #28: 0x0000000000770baa postgresPostmasterMain(argc=3, argv=0x00000000018be970) + 4330 at postmaster.c:1292 frame #29: 0x00000000006c909f postgresmain(argc=3, argv=0x00000000018be970) + 686 at main.c:228
      frame #30: 0x00007efcc5245ec5 libc.so.6`__libc_start_main + 245
      frame #31: 0x0000000000463b79 postgres
      (lldb) ^C
      (lldb)
  • thread #1: tid = 29958, 0x00007efcbbc16374 libvoikko.so.1`voikkoAnalyzeWordUcs4 + 4, name = 'postgres', stop reason = invalid address (fault address: 0x38)
    • frame #0: 0x00007efcbbc16374 libvoikko.so.1voikkoAnalyzeWordUcs4 + 4 frame #1: 0x00007efcbbc164f4 libvoikko.so.1voikkoAnalyzeWordCstr + 68
      frame #2: 0x00007efcbbe624e6 dict_voikko.sodvoikko_lexize(fcinfo=0x00007ffcffde9ec0) + 302 at dict_voikko.c:137 frame #3: 0x000000000094d5e3 postgresFunctionCall4Coll(flinfo=0x000000000196a538, collation=0, arg1=26661176, arg2=26059604, arg3=18, arg4=140724601266912) + 226 at fmgr.c:1375
      frame #4: 0x0000000000807c69 postgrests_lexize(fcinfo=0x00000000019669b0) + 141 at dict.c:39 frame #5: 0x000000000067d936 postgresExecMakeFunctionResultNoSets(fcache=0x0000000001966940, econtext=0x0000000001967250, isNull=0x00007ffcffdea45c, isDone=0x0000000000000000) + 305 at execQual.c:2019
      frame #6: 0x000000000067e34d postgresExecEvalFunc(fcache=0x0000000001966940, econtext=0x0000000001967250, isNull=0x00007ffcffdea45c, isDone=0x0000000000000000) + 181 at execQual.c:2410 frame #7: 0x0000000000682351 postgresExecEvalExprSwitchContext(expression=0x0000000001966940, econtext=0x0000000001967250, isNull=0x00007ffcffdea45c, isDone=0x0000000000000000) + 70 at execQual.c:4391
      frame #8: 0x0000000000750920 postgresevaluate_expr(expr=0x00000000018dace0, result_type=1009, result_typmod=-1, result_collation=100) + 148 at clauses.c:4643 frame #9: 0x000000000074fd4c postgresevaluate_function(funcid=3723, result_type=1009, result_typmod=-1, result_collid=100, input_collid=100, args=0x00000000018dac28, funcvariadic='\0', func_tuple=0x00007efcc5d560e0, context=0x00007ffcffdeb670) + 452 at clauses.c:4203
      frame #10: 0x000000000074f258 postgressimplify_function(funcid=3723, result_type=1009, result_typmod=-1, result_collid=100, input_collid=100, args_p=0x00007ffcffdea5d8, funcvariadic='\0', process_args='\x01', allow_non_const='\x01', context=0x00007ffcffdeb670) + 305 at clauses.c:3842 frame #11: 0x000000000074cd14 postgreseval_const_expressions_mutator(node=0x00000000018da758, context=0x00007ffcffdeb670) + 1524 at clauses.c:2520
      frame #12: 0x00000000006ce920 postgresexpression_tree_mutator(node=0x00000000018da7b0, mutator=0x000000000074c720, context=0x00007ffcffdeb670) + 5956 at nodeFuncs.c:2635 frame #13: 0x000000000074ec9d postgreseval_const_expressions_mutator(node=0x00000000018da7b0, context=0x00007ffcffdeb670) + 9597 at clauses.c:3492
      frame #14: 0x00000000006ceb12 postgresexpression_tree_mutator(node=0x00000000018da808, mutator=0x000000000074c720, context=0x00007ffcffdeb670) + 6454 at nodeFuncs.c:2684 frame #15: 0x000000000074ec9d postgreseval_const_expressions_mutator(node=0x00000000018da808, context=0x00007ffcffdeb670) + 9597 at clauses.c:3492
      frame #16: 0x000000000074c6cd postgreseval_const_expressions(root=0x00000000018da990, node=0x00000000018da808) + 96 at clauses.c:2362 frame #17: 0x00000000007300b2 postgrespreprocess_expression(root=0x00000000018da990, expr=0x00000000018da808, kind=1) + 111 at planner.c:727
      frame #18: 0x000000000072f86a postgressubquery_planner(glob=0x00000000018da1c0, parse=0x00000000018d9e00, parent_root=0x0000000000000000, hasRecursion='\0', tuple_fraction=0, subroot=0x00007ffcffdeb818) + 801 at planner.c:444 frame #19: 0x000000000072f292 postgresstandard_planner(parse=0x00000000018d9e00, cursorOptions=0, boundParams=0x0000000000000000) + 410 at planner.c:229
      frame #20: 0x000000000072f0ee postgresplanner(parse=0x00000000018d9e00, cursorOptions=0, boundParams=0x0000000000000000) + 81 at planner.c:157 frame #21: 0x00000000007f3d42 postgrespg_plan_query(querytree=0x00000000018d9e00, cursorOptions=0, boundParams=0x0000000000000000) + 112 at postgres.c:809
      frame #22: 0x00000000007f3df5 postgrespg_plan_queries(querytrees=0x00000000018da958, cursorOptions=0, boundParams=0x0000000000000000) + 103 at postgres.c:868 frame #23: 0x00000000007f4100 postgresexec_simple_query(query_string=0x00000000018d8e08) + 727 at postgres.c:1033
      frame #24: 0x00000000007f8534 postgresPostgresMain(argc=1, argv=0x00000000018c01a8, dbname=0x00000000018c0008, username=0x00000000018bffe0) + 1829 at postgres.c:4030 frame #25: 0x0000000000775703 postgresBackendRun(port=0x00000000018df480) + 588 at postmaster.c:4239
      frame #26: 0x0000000000774e18 postgresBackendStartup(port=0x00000000018df480) + 334 at postmaster.c:3913 frame #27: 0x00000000007715be postgresServerLoop + 719 at postmaster.c:1684
      frame #28: 0x0000000000770baa postgresPostmasterMain(argc=3, argv=0x00000000018be970) + 4330 at postmaster.c:1292 frame #29: 0x00000000006c909f postgresmain(argc=3, argv=0x00000000018be970) + 686 at main.c:228
      frame #30: 0x00007efcc5245ec5 libc.so.6`__libc_start_main + 245
      frame #31: 0x0000000000463b79 postgres

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.