Comments (2)
Hey @WardCunningham thank you for the issue. I am really glad that you mentioned this. You did not miss anything. That number should be moved to a config file, which it will in a future version. The choice of 10 words was arbitrary.
It is expected that you can index a document longer than 10 words. The max_phrase_length
as you correctly named it is just that, the max length of a phrase that is indexed. So how you are expecting it to work is the way it works. It can take a document of variable size and index all phrases in that document up to 10 words long. This number if probably big for most use cases as you have pointed out and it will be configurable in the future.
Dump and load does not require the max phrase length because those functions are simply serializing and deserializing the hashmap. So the hashmap is already built when dump and load are called.
from fist.
Closing this issue due to inactivity. Hopefully this answered your question.
from fist.
Related Issues (20)
- Completion of error handling HOT 6
- Database portability HOT 4
- Large blocks of text do not get indexed properly HOT 7
- Delete an entry
- Should calloc argument order in hashmap be re-arranged? HOT 1
- Server quits unexpectedly HOT 3
- Better Hashing Algorithm
- Website HOT 5
- Configuration File HOT 7
- Sending commands in quick succession causes server to crash HOT 1
- Index Compression HOT 1
- Unicode symbols cause a crash HOT 2
- Indexing causes SEARCH to become very slow HOT 8
- Closing server when index becomes very large is slow
- Need to write tests for bst.c
- Create proper documentation HOT 2
- Reimplement indexing algorithm
- Implement stemming algorithm
- Allow user to specify stop words HOT 3
- SEARCH command needs a limit option HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fist.