Git Product home page Git Product logo

Comments (10)

cowtowncoder avatar cowtowncoder commented on July 26, 2024

True, existing way is very simple and perhaps simplistic; I only tested it with simplest of sorts (byte array based one), where output speed was bounded by i/o speed.

from java-merge-sort.

cowtowncoder avatar cowtowncoder commented on July 26, 2024

Hmmh. Actually, checking complexity again (re-reading code I wrote), am I missing something if I suggested that current implementation has same complexity as a binary heap, i.e. log2n? (with respect to number of comparisons). While various heap impls do have constant (at least amortized) access time for top entry, it is followed by insertion (remove + insert) which typically is log N isnt it?

from java-merge-sort.

whoschek avatar whoschek commented on July 26, 2024

Yes, indeed.

Wolfgang.

On Dec 15, 2011, at 1:25 PM, Tatu Saloranta wrote:

Hmmh. Actually, checking complexity again (re-reading code I wrote), am I missing something if I suggested that current implementation has same complexity as a binary heap, i.e. log2n? (with respect to number of comparisons). While various heap impls do have constant (at least amortized) access time for top entry, it is followed by insertion (remove + insert) which typically is log N isnt it?


Reply to this email directly or view it on GitHub:
#3 (comment)

from java-merge-sort.

cowtowncoder avatar cowtowncoder commented on July 26, 2024

Ok. So current impl might work well enough, with respect to number of comparisons. Although one could reduce number of calls being made (due to chaining). So maybe I'll still consider binary heap, like adapting https://github.com/simonwrafter/BinaryHeap

from java-merge-sort.

whoschek avatar whoschek commented on July 26, 2024

I think a java.util.PriorityQueue would work just fine, no? At least that's what we ended up using for our external merge sort impl a few years ago.

BTW, every time you take an entry object from that queue you can reuse it and put it right back into the queue, with a new value from the next item in the input stream. This is to avoid gc issues.

Wolfgang.

On Dec 15, 2011, at 2:38 PM, Tatu Saloranta wrote:

Ok. So current impl might work well enough, with respect to number of comparisons. Although one could reduce number of calls being made (due to chaining). So maybe I'll still consider binary heap, like adapting https://github.com/simonwrafter/BinaryHeap


Reply to this email directly or view it on GitHub:
#3 (comment)

from java-merge-sort.

cowtowncoder avatar cowtowncoder commented on July 26, 2024

My question is basically would PriorityQueue work better? Or would it just be for reducing amount of code? (which is nice thing of course, assuming it'd work as well)
I guess I better have a look at that impl; I only learnt about it very recently :)

Although if one can recycle entries, that's good (to know) -- I wasn't sure, and if not, it would indeed produce tons of unnecessary garbage.

from java-merge-sort.

whoschek avatar whoschek commented on July 26, 2024

Just for reducing code I guess. I think they're all binary heap impls with the same performance characteristics.

Here is an outline of how we reuse the entries:

    /**
     * Returns the next smallest element from the queue and refills the
     * queue if needed. This is a performance sensitive routine.
     * 
     * Instead of an intuitive LinkedList we use a less intuitive ArrayList
     * in combination with reverse() to reduce memory footprint.
     */
    protected Object read() throws IOException {
        PriorityQueueEntry entry = (PriorityQueueEntry) queue.poll();
        if (entry == null) {
            return null; // EOS
        }
        Object minElement = entry.elem;
        int min = entry.blockIndex;
        ArrayList block = blocks[min];
        if (block.size() == 0) { // fetch more data from run
            readers[min].read(block, blockSize);
            Collections.reverse(block);
        }
        if (block.size() > 0) { // refill queue
            entry.elem = block.remove(block.size()-1);
            queue.add(entry);
        } else {
            blocks[min] = null; // help gc
            readers[min].close(); // free resources as early as possible
        }
        return minElement;
    }

private static final class PriorityQueueEntry {
    public PriorityQueueEntry(Object elem, int index) { this.elem = elem; this.blockIndex = index; }
    public Object elem;
    public int blockIndex; 
}

Wolfgang.

On Dec 15, 2011, at 3:15 PM, Tatu Saloranta wrote:

My question is basically would PriorityQueue work better? Or would it just be for reducing amount of code? (which is nice thing of course, assuming it'd work as well)
I guess I better have a look at that impl; I only learnt about it very recently :)

Although if one can recycle entries, that's good (to know) -- I wasn't sure, and if not, it would indeed produce tons of unnecessary garbage.


Reply to this email directly or view it on GitHub:
#3 (comment)

from java-merge-sort.

cowtowncoder avatar cowtowncoder commented on July 26, 2024

Ok, gotcha. Just wanted to be sure I understood -- for some reason I first thought I had used equivalent of linear lookup, plus, didnt realize JDK had this nice class available. :-)

from java-merge-sort.

cowtowncoder avatar cowtowncoder commented on July 26, 2024

Just read through the code: indeed it is using the binary heap on array so it should work just fine.
I still want to verify timing, just to make sure (this might even be slightly faster). Thanks for suggestion!

from java-merge-sort.

dsachs avatar dsachs commented on July 26, 2024

I am not familiar enough with these issues to completely follow the discussion here, but I thought I'd share a link to a similar library that uses a PriorityQueue (though perhaps not in the manner implied in the discussion above). Written by Daniel Lemire (https://github.com/lemire) and in the public domain.

http://code.google.com/p/externalsortinginjava/source/browse/trunk/src/main/java/com/google/code/externalsorting/ExternalSort.java

PS Wolfgang: huge fan of Nux, btw!

from java-merge-sort.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.