swift-collections-performance
Table of Contents
Mission
Answer to the question: what collection type is the best to store enormous number of data?
A comprehensive answer to the question can be found in this blogpost.
Idea
Data layer for Scrabble dictionary application based on my experience and performance tests.
Polish dictionary has almost 3 millions of words and should allow in reasonable time:
- validating words
- returning words possible to create with given letters
- simple regex
Comparison of:
- different arrays (incl. optimized), sets and tries
- compiler optimizations
- syntax optimizations
Environment
- Xcode 9.2
- Swift 4.0
- iPhone 8
- iOS 11.2.5
Optimizations
- Fast, Whole Module Optimization for the application's target (Release) and all dependencies
- Run and Test set to Release build configuration
Installation
- Clone the repository or download ZIP.
- Install CocoaPods (if you don't have):
sudo gem install cocoapods
. - Run
pod install
in the project's root directory.
Collection Types
The repository presents four main types of collection:
- Arrays:
- Standard
Array
, Array
with reserved capacity that equals a count of strings,ContiguousArray
,ContiguousArray
with reserved capacity that equals a count of strings,CleverArray
that is standard two-dimensionalArray
where arrays inside contains words with letters count that equals an index of array.
- Standard
- Sets:
- Standard
Set
, CleverSet
that is standardArray
of standardSet
s where sets inside contains words with letters count that equals an index of set.
- Standard
- Tries:
- Modified
Trie
from Buckets-Swift by Mauricio Santos
- Modified
- Realm database:
- Standard
Realm
database withList
ofStringObject
s, CleverRealm
database with multipleList
s ofStringObject
s where each list contains words with different letters count.
- Standard
Standard vs clever collection types
Measurement
All results, but RAM usage, were collected via measurement unit tests. RAM usage results were collected from Xcode / Debug navigator / Memory Report during running the sample app.
Results
Initialization time
A time needed to initialize an object.
Conclusions
- Sets are much slower than arrays, because of object uniqueness.
CleverSet
andTrie
initialization time is unacceptable.- Both
Realm
andCleverRealm
initialization time is unnoticeably, because of Realm laziness.
Word validation
A time needed to validate correctness of these words one by one: "rękodzieło", "porównywać", "rodzicielski", "się", "powierzchnia", "substancja", "jeden", "dzień", "w", "kobieta", "narzędzie", "fałszywy", "pizza", "medycyna", "niematakiegoslowa".
Conclusions
- Again, there is no big difference between the first four arrays.
- Sets and trie are the fastest.
- Non-optimized Realm database is faster than arrays, but
CleverArray
andCleverRealm
seem to be golden mean (a time needed to initialized sets andTrie
makes them unacceptable).
Words made from given letters
A time needed to find words that can be made from given letters:
- a, p, s, t for four letters test,
- a, d, p, s, t for five letters test.
Conclusions
- There is a huge difference between 4 and 5 letters for standard
Realm
database. - Sets, trie,
CleverArray
andCleverRealm
are really fast in this test, so I decided to run more tests for the fastest four of them.
Logarithmic scale, power trendline
- a, d, e, p, s, t for six letters test,
- a, d, e, p, r, s, t for seven letters test,
- a, d, e, p, r, s, t, z for eight letters test.
Conclusions
- Sets and trie growth when more letters is reasonable, but
CleverRealm
is useless for more than six letters.
Simple regex for blanks
A time time needed to find words that fit these simple blank regex one by one: "p?z?a", "???", "po??r", "pap????", "???????", "tele???".
In example: po??r should return pobór, poder, poker, pokór, polar, poler, polor, pomór, ponor, ponur, potir, power, pozer, pozór, pożar.
Conclusions
- This is possible in reasonable time only for
CleverArray
,CleverSet
,Trie
,Realm
andCleverRealm
. CleverRealm
is the fastest, whenRealm
is the slowest.Trie
is faster than standard Swift collection types.
RAM Usage
RAM usage measured by Xcode profiler on real device running example app:
- RAM Usage is RAM usage just after initialization.
- Highest RAM Usage is the highest RAM usage noticed during initialization.
Conclusions
- Trie and sets RAM usage is unacceptable high (it may cause crashes on older devices).
- Arrays RAM usage is high, but acceptable. Unfortunately their performance is not satisfactory.
- Realm RAM usage is close to null.