Git Product home page Git Product logo

fuzzysearch's People

Contributors

jaredcnance avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fuzzysearch's Issues

comment

I couldn't comment on the article(?), so here's my five cents with benchmarks.

I wondered why one would Slice a string into a span of length one and then take the first character of that span, instead of just indexing directly into the string.
Also I couldn't see a reason to slice/index str2 in every iteration of the inner loop.
That's the major differences between LevenshteinDistance and LevenshteinDistanceNewInt.

Then I came up with three more possible optimizations:
Use this which reduces the space complexity from O(nm) to O(min(m, n)).

If we always compare strings below size below size 65,535, we can create a matrix of type ushort[,] instead.

Instead of calling ToUpperInvariant() on str2, call char.ToUpperInvariant.
From my testing this only seemed to be better for str2, probably to keep the inner loop hot.

BenchmarkDotNet=v0.10.13, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.248)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical cores and 4 physical cores
Frequency=2531254 Hz, Resolution=395.0611 ns, Timer=TSC
.NET Core SDK=2.1.100
  [Host] : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT
  Core   : .NET Core 2.0.5 (CoreCLR 4.6.26020.03, CoreFX 4.6.26018.01), 64bit RyuJIT

Job=Core  Runtime=Core

                          Method |             String1 |           String2 |          Mean |       Error |      StdDev |        Median | Scaled |      Gen 0 |    Gen 1 |    Gen 2 |  Allocated |
-------------------------------- |-------------------- |------------------ |--------------:|------------:|------------:|--------------:|-------:|-----------:|---------:|---------:|-----------:|
       LevenshteinDistanceNewInt | RandomString[10000] | RandomString[100] |  6,640.195 us | 128.6897 us | 143.0383 us |  6,609.240 us |   0.18 |   992.1875 | 992.1875 | 992.1875 |  4040680 B |
      LevenshteinDistanceNewUInt | RandomString[10000] | RandomString[100] |  6,562.799 us |  99.4058 us |  88.1206 us |  6,572.998 us |   0.18 |   992.1875 | 992.1875 | 992.1875 |  4040680 B |
    LevenshteinDistanceNewUShort | RandomString[10000] | RandomString[100] |  6,104.670 us |  94.0755 us |  83.3955 us |  6,098.356 us |   0.17 |   492.1875 | 492.1875 | 492.1875 |  2020480 B |
     LevenshteinDistanceNewShort | RandomString[10000] | RandomString[100] |  6,114.543 us |  74.3120 us |  69.5115 us |  6,104.263 us |   0.17 |   492.1875 | 492.1875 | 492.1875 |  2020480 B |
    LevenshteinDistanceSingleInt | RandomString[10000] | RandomString[100] |  5,528.348 us |  76.5684 us |  67.8759 us |  5,531.672 us |   0.15 |    23.4375 |        - |        - |   100312 B |
  LevenshteinDistanceSingleShort | RandomString[10000] | RandomString[100] |  5,707.102 us | 133.1978 us | 130.8180 us |  5,660.310 us |   0.16 |    15.6250 |        - |        - |    60312 B |
   LevenshteinDistanceSingleInt2 | RandomString[10000] | RandomString[100] |  5,634.655 us | 111.5724 us | 212.2780 us |  5,587.114 us |   0.15 |    23.4375 |        - |        - |   100080 B |
 LevenshteinDistanceSingleShort2 | RandomString[10000] | RandomString[100] |  5,436.131 us |  59.5584 us |  49.7340 us |  5,442.011 us |   0.15 |    15.6250 |        - |        - |    60080 B |
             LevenshteinDistance | RandomString[10000] | RandomString[100] |  9,895.640 us | 186.1476 us | 214.3678 us |  9,891.104 us |   0.27 |   984.3750 | 984.3750 | 984.3750 |  4060712 B |
     LevenshteinDistanceBaseline | RandomString[10000] | RandomString[100] | 36,535.146 us | 296.1695 us | 277.0371 us | 36,582.362 us |   1.00 | 21062.5000 | 875.0000 | 812.5000 | 68060718 B |
                                 |                     |                   |               |             |             |               |        |            |          |          |            |
       LevenshteinDistanceNewInt |  RandomString[1000] | RandomString[100] |    720.542 us |  13.7846 us |  15.8743 us |    720.092 us |   0.21 |   124.0234 | 124.0234 | 124.0234 |   404680 B |
      LevenshteinDistanceNewUInt |  RandomString[1000] | RandomString[100] |    736.114 us |  14.8387 us |  32.2581 us |    724.210 us |   0.22 |   124.0234 | 124.0234 | 124.0234 |   404680 B |
    LevenshteinDistanceNewUShort |  RandomString[1000] | RandomString[100] |    697.644 us |  13.9099 us |  19.9492 us |    693.482 us |   0.21 |    61.5234 |  61.5234 |  61.5234 |   202480 B |
     LevenshteinDistanceNewShort |  RandomString[1000] | RandomString[100] |    684.558 us |  13.3351 us |  16.8647 us |    681.559 us |   0.20 |    61.5234 |  61.5234 |  61.5234 |   202480 B |
    LevenshteinDistanceSingleInt |  RandomString[1000] | RandomString[100] |    641.872 us |  12.7010 us |  20.1451 us |    640.576 us |   0.19 |     2.9297 |        - |        - |    10312 B |
  LevenshteinDistanceSingleShort |  RandomString[1000] | RandomString[100] |    613.955 us |   9.7007 us |   9.0740 us |    612.587 us |   0.18 |     1.9531 |        - |        - |     6312 B |
   LevenshteinDistanceSingleInt2 |  RandomString[1000] | RandomString[100] |    614.671 us |  10.6710 us |   9.4596 us |    615.176 us |   0.18 |     2.9297 |        - |        - |    10080 B |
 LevenshteinDistanceSingleShort2 |  RandomString[1000] | RandomString[100] |    618.435 us |  12.2507 us |  12.5806 us |    621.476 us |   0.18 |     0.9766 |        - |        - |     6080 B |
             LevenshteinDistance |  RandomString[1000] | RandomString[100] |  1,063.171 us |  20.6348 us |  19.3018 us |  1,061.658 us |   0.32 |   123.0469 | 123.0469 | 123.0469 |   406712 B |
     LevenshteinDistanceBaseline |  RandomString[1000] | RandomString[100] |  3,369.441 us |  42.4848 us |  39.7403 us |  3,371.681 us |   1.00 |  2035.1563 | 128.9063 | 125.0000 |  6806712 B |
                                 |                     |                   |               |             |             |               |        |            |          |          |            |
       LevenshteinDistanceNewInt |   RandomString[100] | RandomString[100] |     65.648 us |   1.2906 us |   1.4345 us |     65.295 us |   0.20 |    12.9395 |        - |        - |    41080 B |
      LevenshteinDistanceNewUInt |   RandomString[100] | RandomString[100] |     67.065 us |   1.1662 us |   1.0908 us |     67.085 us |   0.20 |    12.9395 |        - |        - |    41080 B |
    LevenshteinDistanceNewUShort |   RandomString[100] | RandomString[100] |     68.051 us |   1.2900 us |   1.2066 us |     68.172 us |   0.20 |     6.4697 |        - |        - |    20680 B |
     LevenshteinDistanceNewShort |   RandomString[100] | RandomString[100] |     66.706 us |   1.2167 us |   1.0160 us |     66.666 us |   0.20 |     6.4697 |        - |        - |    20680 B |
    LevenshteinDistanceSingleInt |   RandomString[100] | RandomString[100] |     68.534 us |   1.3096 us |   1.1609 us |     68.663 us |   0.21 |     0.3662 |        - |        - |     1312 B |
  LevenshteinDistanceSingleShort |   RandomString[100] | RandomString[100] |     68.644 us |   1.3477 us |   1.9755 us |     68.039 us |   0.21 |     0.2441 |        - |        - |      912 B |
   LevenshteinDistanceSingleInt2 |   RandomString[100] | RandomString[100] |     72.776 us |   0.9263 us |   0.8664 us |     72.919 us |   0.22 |     0.2441 |        - |        - |     1080 B |
 LevenshteinDistanceSingleShort2 |   RandomString[100] | RandomString[100] |     72.111 us |   0.8616 us |   0.8059 us |     72.230 us |   0.22 |     0.1221 |        - |        - |      680 B |
             LevenshteinDistance |   RandomString[100] | RandomString[100] |    115.759 us |   2.2043 us |   1.9540 us |    115.770 us |   0.35 |    13.0615 |        - |        - |    41312 B |
     LevenshteinDistanceBaseline |   RandomString[100] | RandomString[100] |    333.471 us |   6.5553 us |   6.4382 us |    332.743 us |   1.00 |   216.3086 |        - |        - |   681312 B |
                                 |                     |                   |               |             |             |               |        |            |          |          |            |
       LevenshteinDistanceNewInt |    RandomString[10] | RandomString[100] |      5.993 us |   0.1144 us |   0.1405 us |      6.001 us |   0.18 |     1.4954 |        - |        - |     4720 B |
      LevenshteinDistanceNewUInt |    RandomString[10] | RandomString[100] |      5.951 us |   0.0659 us |   0.0550 us |      5.934 us |   0.18 |     1.4954 |        - |        - |     4720 B |
    LevenshteinDistanceNewUShort |    RandomString[10] | RandomString[100] |      5.892 us |   0.1149 us |   0.1277 us |      5.885 us |   0.18 |     0.7858 |        - |        - |     2496 B |
     LevenshteinDistanceNewShort |    RandomString[10] | RandomString[100] |      6.017 us |   0.1200 us |   0.2038 us |      5.986 us |   0.19 |     0.7858 |        - |        - |     2496 B |
    LevenshteinDistanceSingleInt |    RandomString[10] | RandomString[100] |      6.282 us |   0.1349 us |   0.1606 us |      6.249 us |   0.19 |     0.3510 |        - |        - |     1128 B |
  LevenshteinDistanceSingleShort |    RandomString[10] | RandomString[100] |      6.236 us |   0.0586 us |   0.0549 us |      6.242 us |   0.19 |     0.2289 |        - |        - |      728 B |
   LevenshteinDistanceSingleInt2 |    RandomString[10] | RandomString[100] |      6.176 us |   0.0979 us |   0.0916 us |      6.165 us |   0.19 |     0.3357 |        - |        - |     1080 B |
 LevenshteinDistanceSingleShort2 |    RandomString[10] | RandomString[100] |      6.495 us |   0.1291 us |   0.3094 us |      6.340 us |   0.20 |     0.2136 |        - |        - |      680 B |
             LevenshteinDistance |    RandomString[10] | RandomString[100] |      9.063 us |   0.1329 us |   0.1243 us |      9.043 us |   0.28 |     1.5106 |        - |        - |     4768 B |
     LevenshteinDistanceBaseline |    RandomString[10] | RandomString[100] |     32.448 us |   0.6267 us |   0.6966 us |     32.473 us |   1.00 |    21.8506 |        - |        - |    68768 B |

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.