Git Product home page Git Product logo

geoparser's Issues

Scroll or collapse files in menu

Each file in menu will show results underneath, if the result is long it goes off screen.
Each file/result should be collapsable as well as be able to scroll up and down

Need to modify solr schema

As we are indexing millions of records I can see lot of issues with solr.

We initially did lot of handholding editing data types and using encoding but it's now constantly failing with OutOfMemoryError. I have tried increasing mem upto 1.5 gigs but it only gives us some extra time.

At this moment I am trying to create a new schema which addresses below limitations

  • Instead of one single huge document we scale points across multiple documents
  • We must be able to produce search results for a location. Query will be a LOCATION

I am planning below -

  • one core in geoparser solr instance for every index
  • one document in geoparser solr instance for each document in index
  • one domain can cover multiple cores. core syntax will be -> domain_[id]

I am right now doing it only for indexed data on a new branch. As uploaded files thing works with no issues.

@MBoustani

Support Crawled indexed data

Usually crawled data are being indexed to either Solr or Elasticsearch.
GeoParser should be able to get the URL to either of these to indexing machines and domain name, scan whole indexes and geoparse them.
The result (location name and point) will be stored in Solr internally along side with path to crawled data.

Server to use CherryPy instead of Flask

We are decided to use CherryPy for server instead of Flask.
CherryPy is more stable and also as we are going to join with other Memex geo projects we should be using same technologies they use.

Geoparsing for uploaded file

Server should start the Geoparsing process as soon as file/s being uploaded to server and send the status back to front-end.

Be able to remove uploaded files on front-end

After each file uploaded, name of file will appear under "upload file" section.
@smadha Can you please put a remove icon by each file or check with @lawongsta about how to remove a file and maybe send a request to server to remove a file?
@smadha should we use REST URL to send the remove command to server for each file?

Progress bar for uploaded file

Progress bar should show the status of file being uploaded as well as file being geoparsed with status text underneath it.

Show list of uploaded files on front-end

Uploaded files will appear as soon as they get uploaded, but if page get refreshed they will not be shown any more.

Solution:
Have server look up in uploaded folder and send list of files to front-end.

Use Solr instance to store geoparsed data

Using Solr for storing and retrieving data from.
One collection called "Uploaded_Files" will store all geoparsed uploaded files.
Schema could be something close to:
file_name = <file_name>
extracted_text =
location_names = [list of locations name]
lat/lon = [list of location dictionary(key: location name, value: lat/long)]

**This schema may change as we learn more about Solr Spatial query

Have Girder login when the page load

When the GeoParser app loads, have Girder to login to be able to use Girder.
Username: girder
Password: girder

Base 64 encode: Z2lyZGVyOmdpcmRlcg==

Show more meta data on popups and link back to solr doc

Need to show more data on popups to analyze individual bubble.

  • "title" : "Title of doc fetched as configured",
  • "descr" : "Short description fetched as per configuration"
  • "url" : "Link to html/image"
  • "solr_url" : "Link to solr_doc"

GeoParser plugin for Grider

Geoparser plugin for girder can have multiple jobs running using Girder.
Each job can be called using REST URL and will return the results as JSON.

Mock REST services

Below APIs need to be mocked

[API signature], [Method],
[Sample Response]

  1. /upload POST
    Response code - 200
  2. /status/%file_id% GET
    {
    "name":"File Name",
    "status":"Message to be displayed to user",
    "stepCount":4,
    "parsedInfo":[
    {
    "lat":-34.6037232,
    "lon":-58.3815931,
    "name":"Aires Argentina",
    "refCount":10,
    "refContext":"Test line 4 in file uploaded",
    "refUrl":"https://geo1.ggpht.com/cbk?panoid=wkEz-Hwmc44EnMsE7SuXBw&output=thumbnail"
    },
    {
    "lat":19.4302678,
    "lon":-99.1373136,
    "name":"Mexico City, Mexico",
    "refCount":2,
    "refContext":"Test line 42 in file uploaded",
    "refUrl":"https://geo0.ggpht.com/cbk?output=thumbnail&thumb=2&panoid=3DKyddof6dWPw3tx5BULbQ&w=96&h=64&yaw=176"
    }
    ]
    }
  3. /search/index/%keyword% GET
    {
    "name":"File Name",
    "status":"Message to be displayed to user",
    "stepCount":4,
    "parsedInfo":[
    {
    "lat":-34.6037232,
    "lon":-58.3815931,
    "name":"Aires Argentina",
    "refCount":10,
    "refContext":"Test line 4 in file uploaded",
    "refUrl":"https://geo1.ggpht.com/cbk?panoid=wkEz-Hwmc44EnMsE7SuXBw&output=thumbnail"
    },
    {
    "lat":19.4302678,
    "lon":-99.1373136,
    "name":"Mexico City, Mexico",
    "refCount":2,
    "refContext":"Test line 42 in file uploaded",
    "refUrl":"https://geo0.ggpht.com/cbk?output=thumbnail&thumb=2&panoid=3DKyddof6dWPw3tx5BULbQ&w=96&h=64&yaw=176"
    }
    ]
    }

https://drive.google.com/open?id=1ASR0j0lzT8GqifZ0ep6WMBV9SaAOENPHUIqUrrR7dbo

updated return_points url

@smadha I have update the "return_points" url to work with both uploaded files and crawled data.
Now you can call return_points to get point from both.
However, you need to update the code you have now that called return_points, here is the update:

For uploaded files: http://localhost:8000/return_points/<file_name>/uploaded_files
Example: http://localhost:8000/return_points/5c0024-25.pdf/uploaded_files

For crawled data: http://localhost:8000/return_points/<solr_url>/<core_name>
Example: http://localhost:8000/return_points/http://crawl.dyndns.org/solr/domain

Please update both.

Merging CherryPy webserver with Girder

GeoParser at this stage is using two servers as backend.
1- CherryPy as web server
2- Grider for file system and running jobs

Beside running two servers at same time could increase the chance of application failure there is problem if calling Girder on different port by CherryPy in different port (cross-domain issue), therefore these two can be merged to one and have Girder take care of everything.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.