Git Product home page Git Product logo

trie4j's People

Contributors

dependabot[bot] avatar eliaslevy avatar miurahr avatar takawitter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trie4j's Issues

feature request: Support of Java Platform Module System (JPMS)

Java 9 has new feature JPMS that was also known as Project jigsaw.
Because trie4j library has no dependency, it can be a leaf of software book of materials, it is valuable to support JPMS for me, at dependeents projects such as mdict4j, stardict4j.

see.
https://central.sonatype.com/artifact/com.github.takawitter/trie4j/0.9.8/dependents

You can support JPMS with Multi-Release jar, even when target Java is not 9.
see https://nipafx.dev/multi-release-jars-multiple-java-versions/

ArrayIndexOutOfBoundsException on lookup in MapDoubleArray

Hi. I am using trie4j to store natural-language dictionary entries. With a particular dictionary index and a particular lookup word I am getting an ArrayIndexOutOfBoundsException on MapDoubleArray.get(). Sample code:

MapTrie<Object> mpt = new MapPatriciaTrie<>();
Files.lines(Paths.get("dicttest-Harrap_s_Business_fran_ais_ang.idx.txt"))
        .forEach(word -> mpt.insert(word, null));
MapDoubleArray<Object> da = new MapDoubleArray<>(mpt);
System.out.println(mpt.get("you")); // OK
System.out.println(da.get("you")); // Throws exception

The test file loaded above is available here: https://gist.github.com/amake/ba10166a7bf59ffac180defcecbec8d4

The lookup word "you" is not contained in the index, so I expect both get() calls to return null; however the second one results in this exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 128198
    at org.trie4j.doublearray.DoubleArray.getNodeId(DoubleArray.java:212)
    at org.trie4j.doublearray.DoubleArray.getTermId(DoubleArray.java:220)
    at org.trie4j.AbstractTermIdMapTrie.get(AbstractTermIdMapTrie.java:154)
    at test.main(Test.java:13)

This occurs with version 0.9.4.

ArrayIndexOutOfBoundsException in DoubleArray

A DoubleArray trie with input strings of

php.a
php.e
php.o
e
php.elu
php.s
php.x

results in

java.lang.ArrayIndexOutOfBoundsException: -1
	at org.trie4j.doublearray.DoubleArray.commonPrefixSearchWithTermId(DoubleArray.java:313)

when you call commonPrefixSearchWithTermId with the value php.ele.

use bytebuffer abstraction instead of byte[]

Hey,
I would love it if trie4j could use ByteBuffer instead of byte[]. This could be useful for avoiding heap space for large static tries, by using DirectByteBuffer, MappedByteBuffer etc..

I only looked at the code briefly, and found that it could be quite simple to modify BytesSuccinctBitVector.java to use ByteBuffer. Wanted to know if there are more places I should be looking at?

WDYT?

Thanks!

org.trie4j.util.TrieMap#entrySet fails with class-cast-exception

TrieMap.entrySet() fails as it tries to use a TreeSet with incomparable elements. i fail to successfully build the project since i'm missing test data, but this would fix it:

public class TrieMap<T> extends AbstractMap<String, T> {

...

    @Override
    public Set<Entry<String, T>> entrySet() {
        Set<StringEntry<T>> ret = new TreeSet<StringEntry<T>>();
        for (final String s : trie.predictiveSearch("")) {
            ret.add(new StringEntry<>(s, trie.get(s)));
        }
        return Collections.unmodifiableSet(ret);
    }

    private static final class StringEntry<T> implements Map.Entry<String, T>, Comparable<StringEntry<T>> {
        private final String k;
        private final T v;

        StringEntry(String k, T v) {
            this.k = k;
            this.v = v;
        }

        @Override
        public int compareTo(StringEntry<T> o) {
            return k.compareTo(o.getKey());
        }

        @Override
        public String getKey() {
            return k;
        }

        @Override
        public T getValue() {
            return v;
        }

        @Override
        public T setValue(T value) {
            throw new UnsupportedOperationException();
        }
    }

...

}

ArrayIndexOutOfBoundsException when traversing nodes of DoubleArray

When traversing nodes of DoubleArray which is built from PatriciaTrie that contains wikipedia japanese titles, the ArrayIndexOutOfBoundsException occurred at the line 73 of Rank1OnlySuccinctBitVector (inside "isOne" method).

        Algorithms.traverseByBreadth(trie.getRoot(), new NodeVisitor() {
            @Override
            public boolean visit(Node node, int nest) {
                return true;
            }
        });

Refactor Test code

Test codes contain a lot of duplications and inefficient inheritance structure.

Small size improvement for DoubleArray.

Currently, DoubleArray tries use base and check array as is created during build.
So arrays may contain unused area because that were extended by 1.5 each time that is extended.
Resizing arrays at the end of build process can reduce memory footprint.

`o.t.io.TrieWriter#writeSuccinctBitVector` don't have enough `if` conditions

I found that there are not enough if conditions for o.t.io.TrieWriter#writeSuccinctBitVector method.

Argument SuccinctBitVector sbv is checked by instanceof

  • BytesSuccinctBitVector
  • BytesRank0OnlySuccinctBitVector
  • BytesRank0OnlySuccinctBitVector
  • BytesRank1OnlySuccinctBitVector
  • LongsSuccinctBitVector

but it also can be

  • LongsRank0OnlySuccinctBitVector
  • LongsRank1OnlySuccinctBitVector

that become IOException.

I observed it when running TrieWriterTest with a Japanese wikipedia title data at 20230901.

Target Java version

Why not update target Java version to Java 8?
We know Java 21 is just released in last September, and Java 1.7 has been out of support.
Contributors can use Java 8 features such as NIO2 when update.

Unnecessary check of negative

Hi, I found unnecessary check code in o.t.doublearray.DoubleArray and o.t.doublearray.TailDoubleArray``#getChild method.

   public DoubleArrayNode getChild(char c) {
            int code = charToCode[c];
            if (code == -1) return null;
            int nid = base[nodeId] + code;

Here charToCode is defined char[] and char is ranging from 0x0000 to 0xffff in language specification and never be negative.

I think it can be like

        public DoubleArrayNode getChild(char c) {
            int nid = base[nodeId] + charToCode[c];

I found it when I check the project by SpotBugs, the above code produce an error.

Question: unclear procedure in method

I found a strange condition in bv.LongsRank0OnlySussinctBitVector.java#select0 Line 189

    @Override
    public int select0(int count) {
        for (int i = 0; i < longs.length; i++) {
            if (i * BITS_IN_BLOCK >= size) return -1;
            long v = longs[i];
            int c = BITS_IN_BLOCK - Long.bitCount(v);
            if (count <= c) {
                for (int j = 0; j < BITS_IN_BLOCK; j++) {
                    if (i * BITS_IN_BLOCK + j >= size) return -1;
                    if ((v & 0x8000000000000000L) != 1) {
                        count--;
                        if (count == 0) {
                            return i * BITS_IN_BLOCK + j;
                        }
                    }
                    v <<= 1;
                }
                return -1;
            }
            count -= c;
        }
        return -1;
    }

A if condition if ((v & 0x8000000000000000L) != 1) always true so a for loop will run until j * BIT_IN_BLOCK exceed size or exhausted count to 0. Is this intended?

I think it may be if ((v & 0x8000000000000000L) == 0).

Similar thing is also in org.trie4j.bv.LongsRank1OnlySuccinctBitVector.select0 line 189.

Values are lost when using MapPatriciaTrie

After initializing a MapPatriciaTrie in the following way:

MapTrie<Integer> mapTrie = new MapPatriciaTrie<Integer>();

mapTrie.insert("Test", 1);
mapTrie.insert("Tes", 4);

the following code returns null:

mapTrie.get("Test")

I believe it's because you are not setting the value of newChild on lines 164-166 of MapPatriciaTrie.

question

trie must be loaded all in the memory? is not possible to load a part of the trie in memory and a way to flush the trie motification in the file without to write all the trie ?

MapPatriciaTrie.predictiveSearchEntries returns wrong value

MapPatriciaTrie.predictiveSearchEntries seem to returns wrong value.
Please check below example.

    public static void main(String[] args) {
        MapPatriciaTrie<String> trie = new MapPatriciaTrie<>();
        System.out.println(trie.insert("A", "valueA"));
        System.out.println(trie.insert("AB", "valueAB"));
        System.out.println(trie.insert("ABC", "valueABC"));
        trie.freeze();

        System.out.println("=====");
        for (Map.Entry<String, String> e: trie.predictiveSearchEntries("A")) {
            System.out.println(e.getKey() + "\t" + e.getValue());
        }

    }
null
null
null
=====
A   valueA
AB  valueA    # Wrong value
ABC valueAB   # Wrong value

Exception when looking up particular strings in MapDoubleArray

The following produces an ArrayIndexOutOfBoundsException:

MapPatriciaTrie<Object> mpt = new MapPatriciaTrie<>();
mpt.insert("FOO");
mpt.get("F");
mpt.get("f");

MapDoubleArray<Object> mda = new MapDoubleArray<>(mpt);
mda.get("F");
mda.get("f"); // exception here
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at org.trie4j.bv.BytesRank1OnlySuccinctBitVector.isOne(BytesRank1OnlySuccinctBitVector.java:90)
    at org.trie4j.bv.BytesRank1OnlySuccinctBitVector.get(BytesRank1OnlySuccinctBitVector.java:80)
    at org.trie4j.doublearray.DoubleArray.getTermId(DoubleArray.java:243)
    at org.trie4j.AbstractTermIdMapTrie.get(AbstractTermIdMapTrie.java:141)

I would expect this to simply return null rather than throw an exception. My use case is (natural-language) dictionary lookup with mixed-case keys.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.