Git Product home page Git Product logo

Comments (2)

pmetras avatar pmetras commented on July 17, 2024

Yes. My hypothesis was that media content file names were in the language defined in MyPhotoShare config file. If that's not the case, results are undefined as a stopword in one language can have meaning in another one...

I've done it in the scanner because it did not depend on user input. Stopwords, by definition, are non-meaningful words. So there's no benefits at creating indexes on them. If we filter them out at the scanner phase, we reduce the size on disk too.

Disabling them is more a work around to support multilingual content. Let's find a real multilingual solution, that could work without requiring tweaks from the user browsing the gallery. If my old mother is searching a picture, she won't be able to understand that she has to disable an option in a menu to find pictures. So this decision of having a media findable or not has to be done by the content owner, not the final user. And it must be available for all users, keeping MyPhotoShare simple to use.

One possible way is to specify the language of album or media at the album level. If we defined a language value in the album.ini file, one can switch the language used by the scanner. For instance:

[DEFAULT]
# If not specified, all media in this album is in Italian.
language=it

[album]
# Exception for this album name that uses French words
language=fr

[On the beach.jpg]
# Another exception. This picture is in English.
# As "on" and "the" are stopwords in English, only "beach" will be indexed.
language=en

[Je bois le thé.jpg]
language=multi
description=I'm drinking tea with my wife in a coffee.

If in that example, MyPhotoShare was configured to be in Spanish, this special album could have content in Italian, French and English. So correct index would be built for the media in these languages. The JavaScript application does not have to understand what language is used: it only looks if an index for the word entered by the user exists. If that's the case, being a stopword or not is not important, results can be displayed.

It still works if the user searches with multiple words, even stopwords in various languages, as long as they are correctly indexed...

A problem occurs when a same media uses multiple languages, like the last one Je bois le thé.jpg in the example. The file name and the description uses two different languages. In that case, thé in French (tea in English) is considered as the stopword the in English. If we still want to make this photo findable, the solution is to disable stopwords for that media. That's what the language=multi does, meaning that we use multiple languages in metadata and that we must disable stopwords.

The album.ini file must be seen as a config file per album, that can change the scanner behaviour. It was my intent to add a noindex directive, to prevent a photo or the whole album from being indexed by the scanner and copied to cache, but I haven't had the time to work on it. Or one could use it to specify parameter for OpenCV or to create thumbnails like with thumbnail=crop or thumbnail=center...

The code of the scanner has to be adapted as it cached only the stopwords for the default language. Now it should cache multiple languages and switch them based on context. The stopwords JSON file has data for 50+ languages.

Does it make sense?

from myphotoshare.

paolobenve avatar paolobenve commented on July 17, 2024

your analysis makes sense, the language=(xx|multi) lines in album.ini seem a good solution for multilingual albums

I'd only think on adding an option to disable stop words check, maybe it's a good solution for little album trees

noindex directive is already there, see the exclude_(files|tree)_marker options, it seems an easy task to modify the checks on those file so that take into account noindex directives in album.ini files.

from myphotoshare.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.