Comments (2)
Yes. My hypothesis was that media content file names were in the language defined in MyPhotoShare config file. If that's not the case, results are undefined as a stopword in one language can have meaning in another one...
I've done it in the scanner because it did not depend on user input. Stopwords, by definition, are non-meaningful words. So there's no benefits at creating indexes on them. If we filter them out at the scanner phase, we reduce the size on disk too.
Disabling them is more a work around to support multilingual content. Let's find a real multilingual solution, that could work without requiring tweaks from the user browsing the gallery. If my old mother is searching a picture, she won't be able to understand that she has to disable an option in a menu to find pictures. So this decision of having a media findable or not has to be done by the content owner, not the final user. And it must be available for all users, keeping MyPhotoShare simple to use.
One possible way is to specify the language of album or media at the album level. If we defined a language
value in the album.ini
file, one can switch the language used by the scanner. For instance:
[DEFAULT]
# If not specified, all media in this album is in Italian.
language=it
[album]
# Exception for this album name that uses French words
language=fr
[On the beach.jpg]
# Another exception. This picture is in English.
# As "on" and "the" are stopwords in English, only "beach" will be indexed.
language=en
[Je bois le thé.jpg]
language=multi
description=I'm drinking tea with my wife in a coffee.
If in that example, MyPhotoShare was configured to be in Spanish, this special album could have content in Italian, French and English. So correct index would be built for the media in these languages. The JavaScript application does not have to understand what language is used: it only looks if an index for the word entered by the user exists. If that's the case, being a stopword or not is not important, results can be displayed.
It still works if the user searches with multiple words, even stopwords in various languages, as long as they are correctly indexed...
A problem occurs when a same media uses multiple languages, like the last one Je bois le thé.jpg
in the example. The file name and the description uses two different languages. In that case, thé in French (tea in English) is considered as the stopword the in English. If we still want to make this photo findable, the solution is to disable stopwords for that media. That's what the language=multi
does, meaning that we use multiple languages in metadata and that we must disable stopwords.
The album.ini
file must be seen as a config file per album, that can change the scanner behaviour. It was my intent to add a noindex
directive, to prevent a photo or the whole album from being indexed by the scanner and copied to cache, but I haven't had the time to work on it. Or one could use it to specify parameter for OpenCV or to create thumbnails like with thumbnail=crop
or thumbnail=center
...
The code of the scanner has to be adapted as it cached only the stopwords for the default language. Now it should cache multiple languages and switch them based on context. The stopwords JSON file has data for 50+ languages.
Does it make sense?
from myphotoshare.
your analysis makes sense, the language=(xx|multi)
lines in album.ini seem a good solution for multilingual albums
I'd only think on adding an option to disable stop words check, maybe it's a good solution for little album trees
noindex
directive is already there, see the exclude_(files|tree)_marker
options, it seems an easy task to modify the checks on those file so that take into account noindex
directives in album.ini
files.
from myphotoshare.
Related Issues (20)
- album.ini metadata has trouble with Python 2 when filenames have accented characters HOT 4
- ready for 3.4? HOT 20
- album.ini name in options file HOT 7
- Problems with non-occidental languages? HOT 2
- Roadmap for version 3.5 HOT 3
- Optimization of index files creation HOT 4
- do not generate nor save virtual albums if not needed HOT 6
- Add options to display descriptions
- Bugs when used on iPad or mobile phone HOT 5
- privacy concerns when showing original image HOT 4
- Allow % characters in `album.ini` files
- same image in different folders generates trouble in search result
- space for folder name not corrected
- social bar: add a button to get the share link copied to clipboard
- Add option to go directly to image fullscreen
- Set cache subfolders number according to media number
- use os.walk to scan directories in TreeWalker.walk() HOT 1
- wrong album caption height
- copy copyright info to reduced size images and thumbnails
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from myphotoshare.