oudalab / phyllo Goto Github PK
View Code? Open in Web Editor NEWPHilologicallY Linguistic LegwOrk.
License: Apache License 2.0
PHilologicallY Linguistic LegwOrk.
License: Apache License 2.0
The results obtained in app.py when we run the query are of type = defaultdict(set)
in the set part of the result dict the snippets of the query are stored.
When I tried to pass the result to the jquery it is able to retrieve everything but the snippets. I do not understand how to rectify this.
The screenshot below shows the query search on the web application
The screenshot below shows that result returned after running the query does contain the snippets:
Hello @ramcharran,
Can you let me know when you upload the recorded gif of the installation and search showing all the advanced search operations?
A screen capture video should be good enough. Then, we will forward that over to the DLL/Dr. Huskey.
I have searched using the already created fts database:
I have done the same with whoosh by downloading the content following the template Dr. Grant (@cegme ) gave me.
But how do i incorporate them both together?
We need to create a module for phyllo so that it can be installed easily in a Docker. We need to correct the imports to make them aware of the full package.
We also need to create functions to hid some of the automatically executed code. Code that is not in functions will cause errors when imported because of the order data is downloaded.
The app.py works for all the advanced searches except for single word searches like - SELECT title, book, author, link, snippet(text_idx) FROM text_idx WHERE text_idx MATCH 'possumus';
and OR searches like SELECT title, book, author, link, snippet(text_idx) FROM text_idx WHERE text_idx MATCH 'quam OR Galliae';
The application exits at line:
Line 90 in 62e7b4e
with the following error when the queries similar to the ones mentioned above are run.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 105: invalid continuation byte
When I tried to execute app.py, i got the following ouput
Connected to pydev debugger (build 171.4249.47)
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
* Restarting with stat
connection to cursor
registering tokenizer
virtual table created
Process finished with exit code 245
It does not throw any error, no segmentation fault nor any other error
it just does not execute the insert statement
Since phyllo runs as a completely Dockerized system, we can launch the image onto the Google Container Engine.
Here is an example of how this is done: Example hello world app deployment and maintenance using.
Also, these docs are interesting for an overall setup and deployment: Pushing and pulling.
We would like to make this repo public when we launch the website. This can happen after the docs are complete.
We will create a docker service to access all the data scraped.
Below are the list of tasks to get the service working.
There is a segmentation fault when indexing the database.
The error can be replicated as follows:
docker build -it cegme/phyllo .
docker run -it cegme/phyllo /bin/bash
$ python3 app.py
The above code builds the image and then goes inside it for shell access. It then tries to run the app code. This code should run the tokenize()
function that is in size app.py
. In the tokenize()
function, the register_tokenizer
function is causing the segmentation fault.
My initial guess is that my CPU is running out of space allocated for the image.
@YanLiang1102 can you take a look at this issue.
This code needs to follow pep8. We will add to the standard to force the use of autopep8 or a better alternative.
This issue wasnt there before. This is the issue that was causing the flask to restart.
This is the output when i try to execute it now:
bash-4.3# python3 app.py
connection to cursor
registering tokenizer
virtual table created
Segmentation fault (core dumped)
When i remove the registration statement everything works fine and the program executes completely.
Paragraph numbers, sentence numbers, line numbers, book numbers are all important for referencing text. Can we ensure that this information has been properly recorded to the database?
/cc @sjhuskey
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.