Git Product home page Git Product logo

quniq's Introduction

quniq

Accelerate uniq with using multi goroutines and large amount of memory.

Installation

$ go get github.com/syucream/quniq

Usage

Usage of ./quniq:
  -c    print with count
  -d    output only duplicated lines
  -i    enable case insentive comparison
  -inbuf-weight int
        number of input buffer items(used specified value * 1024 * 1024) (default 1)
  -max-workers int
        number of max workers (default 1)
  -u    output only uniuqe lines
  • for example:
$ cat file | quniq -c -max-workers 2

Note

  • It doesn't require input data is sorted.
  • Its order of output lines is random.

Benchmarks

Environment

  • MacBook Air Early 2014
  • Core i7 4650U 1.6GHz
  • Mem 8GB

Target file

$ cat /dev/urandom | tr -dc '0-9' | fold -w 4 | head -n 100000000 > randlog_0
$ cp randlog_0 randlog_1
...

Execution time comparison

  • sort | uniq
bash-3.2$ time cat randlog_* | LANG=C gsort | guniq > /dev/null

real    15m39.761s
user    12m59.955s
sys     2m10.857s
  • sort | uniq with --parallel
bash-3.2$ time cat randlog_* | LANG=C gsort --parallel 4 | guniq > /dev/null

real    14m24.231s
user    12m32.867s
sys     2m7.031s
  • awk
bash-3.2$ time cat randlog_* | awk '!_[$0]++' > /dev/null

real    11m13.350s
user    10m59.538s
sys     0m5.868s
  • sort -u
bash-3.2$ time cat randlog_* | LANG=C gsort -u > /dev/null

real    6m4.100s
user    5m46.810s
sys     0m10.659s
  • sort -u with --parallel
bash-3.2$ time cat randlog_* | LANG=C gsort -u --parallel 4 > /dev/null

real    5m56.870s
user    5m40.977s
sys     0m10.251s
  • quniq
bash-3.2$ time cat randlog_* | ./quniq --max-workers 4 > /dev/null

real    1m45.362s
user    4m29.294s
sys     0m10.177s

quniq's People

Contributors

syucream avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

ryuichi1208

quniq's Issues

Doesn't work with hashcat

I've tested the following combination with hashcat:

cat /usr/share/dict/words | hashcat --stdout -r /usr/share/hashcat/rules/rockyou-30000.rule | quniq -max-workers 2

but quniq doesn't produce any output (it's stuck).

I've tested by pipeing to pv -l before quniq, and it's showing that over 200M lines has been read (>5M/s), but still no output. Increasing maximum workers doesn't work either.

Any clues why the program doesn't do anything? Or maybe it's waiting till the end of the stream?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.