Git Product home page Git Product logo

trep / opentrep Goto Github PK

View Code? Open in Web Editor NEW
11.0 10.0 5.0 54.7 MB

Open Travel Request Parser

Home Page: https://trep.github.io/opentrep

License: GNU Lesser General Public License v2.1

CMake 28.53% Shell 0.95% Python 3.68% HTML 0.71% C 0.66% C++ 57.88% Makefile 0.09% M4 7.15% Dockerfile 0.16% JavaScript 0.18%
xapian por travel-request xapian-indexing optd opentraveldata por-data docker transport relational-database

opentrep's People

Contributors

da115115 avatar frenzymadness avatar newusernamepls avatar tomspur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opentrep's Issues

distance computation

seems like some distances are not correctly computed. example:
CGI-STL should be close to 182 km, but is reported as 165 km
MAZ-SJU should be close to 122 km, but is reported as 112 km

Indexing travel types with PageRank weights

When someone searches for common types and/or countries, there is first a full text matching process, retrieving a maximum number of POR, for instance currently 30. Out of that limited list, the one with the highest PageRank value is then returned. That works well for cities (e.g., "paris airport"), but not so well for countries and/or travel types (e.g., "france airport", "uk railway", or just "usa" or "airport").

So, when indexing on the country level and/or travel type, it would be good to weigh with the PageRank value of the POR.

Compatibility issue with Python 3.10

Hello.

I'm trying to follow your docs and build opentrep on Fedora rawhide (future 35) where Python 3.10 is the main Python. It seems to me that CMake has some troubles detecting the Python version:

-- The C compiler identification is GNU 11.1.1
-- The CXX compiler identification is GNU 11.1.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Requires Git without specifying any version
-- Current Git revision name: 15ef88fa8d7ec359134d64c2b1857d6264695d58 master
-- Requires Python with version 3.6; however just Python3 is considered here
CMake Error at config/FindPackageHandleStandardArgs.cmake:164 (message):
  Could NOT find Python3 (missing: Python3_LIBRARIES Python3_INCLUDE_DIRS
  Development)
Call Stack (most recent call first):
  config/FindPackageHandleStandardArgs.cmake:445 (_FPHSA_FAILURE_MESSAGE)
  config/FindPython/Support.cmake:2400 (find_package_handle_standard_args)
  config/FindPython3.cmake:311 (include)
  config/project_config_embeddable.cmake:501 (find_package)
  config/project_config_embeddable.cmake:342 (get_python)
  CMakeLists.txt:69 (get_external_libs)

Support for multiple UN/LOCODE codes

Support for UN/LOCODE codes has recently been added (see 7b0e503 for more details).

The Location structure supports currently a single UN/LOCODE.
However, a few POR (points of reference) have several UN/LOCODE codes. For instance, Atlantic City is such a case (referred to as USACX and USAIY). Currently, only USAIY produces the good result on Search Travel web site, whereas USACX should to.

From a C++ source code perspective, it means handling UNLOCode_T structures the same way as CityDetails ones. Indeed, a given travel-related location may serve several distinct cities; the same way, a given location may have several UN/LOCODE codes.

absl and protobuf errors

Hi Denis,

I'm getting a ton of absl and protobuf errors. Before blaming somebody else than myself I would like to ask: which version does trep need?

Modularisation of the search algorithm

The search algorithm is currently made of a single bloc. It should become modularised, with the full-text search just one of the features.

The search algorithm could be as follows:

  • Perform the (Xapian-based) full-text search, as currently done. That full-text search returns:
    • a list of main/idempotent matches, i.e., matches with the same match percentage, usually 100%, for instance when the search is done for a given country such as 'fr';
    • potentially a list of alternative matches, i.e., matches with lower match percentages.
    • All the matches are returned with their full details:
      • Type (airport, city, hotel, train station, etc.).
      • Geographical coordinates.
      • Administrative hierarchy (city, country, continent).
  • For every main and alternative match:
  • Choose a match

Add alternatives

Propose close mattching aternatives.
For instance, Asuncion should return ASU.

Report progress status when search indexing the POR data file

Currently, the opentrep-dbmgr utility reports the progress updates. For instance:

$ opentrep-dbmgr -t mysql
[...]
opentrep> fill_from_por_file 
Indexing the POR file and filling in the SQL database may take a few minutes on some architectures (and a few seconds on fastest ones)...
Number of records inserted into the DB: 1000
[...]
Number of records inserted into the DB: 20000
20039 entries have been processed
opentrep> quit

The actual implementation of that reporting occurs in the DBManager command.
For the indexing process, most probably that actual implementation should occur in the PORParserHelper class.

Airport/city codes should take the precedence over alternate names

When seaching for KHI, the city of Jakarta, Indonesia (ID), is returned, whereas KHI is the code of Karachi, Pakistan (PK).

The cause is that:

  1. The Hakka Chinese translation of Jakarta is "Ngâ-kâ-tha̍t Sú-tû Thi̍t-khî", including the "khî" keyword, which is therefore part of the indexing keywords for Jakarta city.
  2. The respective PageRank values of KHI and JKT/CGK are 8% and 36.5%.
    Hence, KHI will match almost exactly (99.99%) with both Karachi and Jakarta. With the PageRank values, Jakarta comes out with an overall matching weight of ~36% (compared to the ~8% of Karachi).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.