Git Product home page Git Product logo

writecrow / crow_backend Goto Github PK

View Code? Open in Web Editor NEW
1.0 5.0 0.0 2.79 MB

The canonical resources to build the backend for a corpus/repository management framework for Crow, the Corpus and Repository of Writing

License: GNU General Public License v2.0

PHP 99.83% CSS 0.06% Twig 0.08% JavaScript 0.03%
corpus-linguistics corpus corpus-builder corpus-generator api backend natural-language-processing

crow_backend's Introduction

Corpus/Repository Backend

Drupal 10 site

Overview

This project contains the canonical resources to build the backend for a corpus/repository management framework which serves data over a REST API. This is built on the Drupal CMS, following conventions of Entity API, Search API, and the REST API, and its configuration/implementation should present no surprises for developers familiar with Drupal.

From a fresh installation, the database schema will provide a text entity type, which holds the corpus text data and metadata, and a repository entity type, which references materials related to the texts. These entity types and the metadata they contain can be modified or extended as needed to fit the individual corpus.

The configuration provided subsequently includes search indices for texts and repository materials, and a REST API for performing keyword or metadata searches against the dataset.

This codebase does not make any assumptions about the way the data provided by the API is displayed (in a frontend).

Building the codebase

Developing your own version of this site assumes familiarity with, and local installation of, the [Composer(https://getcomposer.org/) package manager. This repository contains only the "kernel" of the customized code & configuration. It uses Composer to build all assets required for the site, including the Drupal codebase and a handful of corpus-related PHP libraries.

Run composer install from the document root. This will build all assets required for the site. That's it!

Installing the site

The following assumes familiarity with local web development for a PHP/MySQL stack. Since Drupal is written in PHP and uses an SQL database, that means you'll need:

There are a number of pre-packaged solutions that simplify setup of the above. These includes MAMP, Valet, and Lando.

  1. cp sites/example.settings.local.php sites/default/settings.local.php
  2. Create a MySQL database, then add its connection credentials to the newly created settings.local.php. Example:
$databases['default']['default'] = [
  'database' => 'MYSQL_DATABASE',
  'username' => 'MYSQL_USERNAME',
  'password' => 'MYSQL_PASSWORD',
  'host' => 'localhost',
  'port' => '3306',
  'driver' => 'mysql',
  'prefix' => '',
  'collation' => 'utf8mb4_general_ci',
];
  1. Either navigate to your local site's domain and follow the web-based installation instructions, or if you prefer to use drush, run the drush site-install command.
  2. That's it! After signing in at /user, you should see the two available entity types at /node/add, the available metadata references at /admin/structure/taxonomy and the search configuration at /admin/config/search/search-api

Importing data

Properly prepared text files can be imported via a drag-and-drop interface at /admin/config/media/import

Each text file needs to include the metadata elements in the file, followed by the actual text to be indexed. A model for that file structure is below:

<ID: 11165>
<Country: BGD>
<Assignment: 1>
<Draft: A>
<Semester in School: 2>
<Gender: M>
<Term writing: Fall 2015>
<College: E>
<Program: Engineering First Year>
<TOEFL-total: NA>
<TOEFL-reading: NA>
<TOEFL-listening: NA>
<TOEFL-speaking: NA>
<TOEFL-writing: NA>
Sed ut perspiciatis unde omnis iste natus error sit.

Voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?

Alternative to the UI import, a directory of local text files can be imported via the drush corpus-import command. Example usage:

drush corpus-import /Users/me/myfiles/

Performing search requests via the API

All API requests require basic HTTP authorization. Contact the corpus maintainers for access.

All endpoints are accessible via https and are located at writecrow.corporaproject.org, and can return data in either XML or JSON format. An example request for all texts matching a given ID, in JSON format, would look like this:

https://api.writecrow.org/texts/id?id=10533&_format=json

All texts matching a given ID (& Assignment)

Pattern /texts/id?id=ID&assigment=ASSIGNMENT
Example 1 /texts/id?id=10533&_format=json
Example 2 /texts/id?id=10533&assignment=2&_format=xml

Sample output

[
  {"id":"10389","filename":"2_D_KOR_3_M_10389","draft":"D"},
  {"id":"10389","filename":"2_E_KOR_3_M_10389","draft":"E"},
  {"id":"10389","filename":"2_F_KOR_3_M_10389","draft":"F"}
 ]

Single text matching a given filename

Pattern /texts/filename?filename=FILENAME
Example /texts/filename?filename=1_C_CHN_1_M_10285&_format=json

Sample output

[{
  "filename":"1_C_CHN_1_M_10285",
  "assignment":"1",
  "college":"S",
  "country":"China",
  "draft":"C",
  "gender":"M",
  "id":"10285",
  "program":"Computer Science-BS",
  "semester":"1",
  "term":"Spring 2015",
  "toefl_listening":"26",
  "toefl_reading":"23",
  "toefl_speaking":"22",
  "toefl_writing":"25",
  "toefl_total":"96",
  "text":Lorem ipsum dolor sit amet..."
}]

Text search using regular keyword(s)

Pattern /texts/keyword?keywords=WORD+WORD
Single keyword /texts/keyword?keywords=tassets&_format=json
Multipe keywords, AND operator /texts/keyword?keywords=tassets+burnished&op=and&_format=json

Notes

  • Boolean and/or operator may be supplied when searching for multiple keywords. In the absence of a specified parameter, an "OR" search is performed.
  • Keywords are separated by a +

Sample output

{"search_results":[{
  "assignment":"4",
  "college":"A",
  "country":"China",
  "draft":"L",
  "filename":"4_L_CHN_1_F_10206",
  "gender":"F",
  "program":"Agricultural Mech-BS",
  "semester_in_school":"1",
  "term_writing":"Spring 2015",
  "toefl_listening":"28",
  "toefl_reading":"22",
  "toefl_speaking":"22",
  "toefl_total":"97",
  "toefl_writing":"25",
  "search_api_excerpt":"\u2026 or colleagues have to be convincing , more specific and \u003Cstrong\u003Eprofessional\u003C\/strong\u003E. In most cases, Marketing plans are written for \u2026 by attracting their attention as well as explain some \u003Cstrong\u003Eprofessional\u003C\/strong\u003E concepts specifically. Last but not least, by \u2026 majors because they have to be focus on explaining \u003Cstrong\u003Eprofessional\u003C\/strong\u003E concepts and definition in their fields instead \u2026"
}]}

Text search using lemmatized keyword(s)

Pattern /texts/lemma?keywords=WORD+WORD
Example /texts/lemma?op=and&keywords=professional+concepts&_format=json

Notes

  • Keywords submitted will automatically be lemmatized
  • Currently, lemma with part of speech tagging is not supported
  • Boolean and/or operator may be supplied when searching for multiple keywords. In the absence of a specified parameter, an "OR" search is performed.
  • Keywords are separated by a +

Sample output

{"search_results":[{
  "assignment":"4",
  "college":"A",
  "country":"China",
  "draft":"L",
  "filename":"4_L_CHN_1_F_10206",
  "gender":"F",
  "program":"Agricultural Mech-BS",
  "semester_in_school":"1",
  "term_writing":"Spring 2015",
  "toefl_listening":"28",
  "toefl_reading":"22",
  "toefl_speaking":"22",
  "toefl_total":"97",
  "toefl_writing":"25",
  "search_api_excerpt":"\u2026 or colleagues have to be convincing , more specific and \u003Cstrong\u003Eprofessional\u003C\/strong\u003E. In most cases, Marketing plans are written for \u2026 by attracting their attention as well as explain some \u003Cstrong\u003Eprofessional\u003C\/strong\u003E concepts specifically. Last but not least, by \u2026 majors because they have to be focus on explaining \u003Cstrong\u003Eprofessional\u003C\/strong\u003E concepts and definition in their fields instead \u2026"
}]}

crow_backend's People

Contributors

jmf3658 avatar markfullmer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

crow_backend's Issues

Add form for bug reporting

Problem/motivation

https://3.basecamp.com/3129499/buckets/3403924/todos/2162321568#__recording_3067867744

Proposed change

  • Form access is limited to authenticated users
  • The form includes a pre-populated "Page on which the issue occurs" field with the URL of the originating page.
  • The form includes a generic "describe the problem" text field that is required
  • The form includes an optional file uploader, limited to image file types
  • The form includes a checkbox for "Contact me with updates about this issue"
  • Upon submission, the form will initially email the report, along with context-specific data, to an email address specified in configuration.
  • The implemenation should be done in such a way to easily support subsequent integration with Basecamp

You have requested a non-existent parameter "monolog.level.debug" and a non-existent service "theme.manager" when installing

After I cloned and ran composer install then tried to install your great work. However, it shows error as below on the page /core/install.php:

Additional uncaught exception thrown while handling exception.
Original
Symfony\Component\DependencyInjection\Exception\RuntimeException: You have requested a non-existent parameter "monolog.level.debug". in Symfony\Component\DependencyInjection\Compiler\DefinitionErrorExceptionPass->processValue() (line 54 of /var/www/crow/vendor/symfony/dependency-injection/Compiler/DefinitionErrorExceptionPass.php).

Symfony\Component\DependencyInjection\Compiler\DefinitionErrorExceptionPass->processValue() (Line: 83)
Symfony\Component\DependencyInjection\Compiler\AbstractRecursivePass->processValue() (Line: 32)
Symfony\Component\DependencyInjection\Compiler\DefinitionErrorExceptionPass->processValue() (Line: 47)
Symfony\Component\DependencyInjection\Compiler\AbstractRecursivePass->process() (Line: 94)
Symfony\Component\DependencyInjection\Compiler\Compiler->compile() (Line: 762)
Symfony\Component\DependencyInjection\ContainerBuilder->compile() (Line: 1344)
Drupal\Core\DrupalKernel->compileContainer() (Line: 948)
Drupal\Core\DrupalKernel->initializeContainer() (Line: 20)
Drupal\Core\Installer\InstallerKernel->initializeContainer() (Line: 487)
Drupal\Core\DrupalKernel->boot() (Line: 426)
install_begin_request() (Line: 116)
install_drupal() (Line: 48)
Additional
Symfony\Component\DependencyInjection\Exception\ServiceNotFoundException: You have requested a non-existent service "theme.manager". in Symfony\Component\DependencyInjection\ContainerBuilder->getDefinition() (line 1030 of /var/www/crow/vendor/symfony/dependency-injection/ContainerBuilder.php).

Symfony\Component\DependencyInjection\ContainerBuilder->getDefinition() (Line: 600)
Symfony\Component\DependencyInjection\ContainerBuilder->doGet() (Line: 558)
Symfony\Component\DependencyInjection\ContainerBuilder->get() (Line: 649)
Drupal::theme() (Line: 22)
_drupal_maintenance_theme() (Line: 506)
drupal_maintenance_theme() (Line: 1025)
install_display_output() (Line: 271)
_drupal_log_error() (Line: 365)
_drupal_exception_handler()

Option to display concordance numbers

Context

This web interface enhancement was conceived by the team during the Spring 2021 summit. Full notes of the meeting can be found at https://docs.google.com/document/d/18F538rmasr101araP_biwB0z2qLKLf39lSmD_hA_Du8/edit#heading=h.vr5p7wmsjb2v

User story

As a teacher, I want the ability to prepend concordance numbering to search results and be able to generate an embed code that will display those results with concordance numbering, so that students can more easily be directed to the specific result we are discussing or is referenced in pedagogical materials.

Acceptance criteria

  • The corpus search interface will display a new checkbox toggle titled "Number results" which, when checked, will display incrementing numbers, starting at 1, in the search results.
  • When this toggle is checked, if a user subsequently presses "Embed search results," the embed code they are presented will include metadata that will similarly display numbering in the embedded version of the search results.
  • By default, concordance numbers will be "off"

Update contributed code to Drupal 9 compatible versions

Description of task

Presumed updates:

  - Upgrading drupal/better_exposed_filters (4.0.0-alpha1 => 4.0.0-beta2): Extracting archive
  - Upgrading drupal/facets (1.5.0 => 1.6.0): Extracting archive
  - Upgrading drupal/features (3.8.0 => 3.11.0): Extracting archive
  - Upgrading drupal/captcha (1.0.0-beta4 => 1.1.0): Extracting archive
  - Upgrading drupal/search_api (1.17.0 => 1.18.0): Extracting archive
  - Upgrading drupal/rabbit_hole (1.0.0-beta8 => 1.0.0-beta10): Extracting archive
  - Upgrading drupal/redirect_after_login (2.6.0 => 2.7.0): Extracting archive
  - Upgrading drupal/reroute_email (1.2.0 => 1.3.0): Extracting archive
  - Upgrading drupal/token (1.7.0 => 1.9.0): Extracting archive

Update language on registration form

Add to the directions for the project description on the registration form:

"If you are a student, please write that in your description. If you want to use Crow in a course (as either a student or instructor), please describe the course, including the number of students."

Is it efficient enought for a large corpora built with Drupal?

I'm very excited to find your great crow_backend and macaws_backend which both use Drupal. I'm also thinking recently using Drupal as the start point for building a multi-modal corpora with videos, images and texts. However, Is it efficient enought for a large corpora built with Drupal?

Add automated tests

Summary

As a minimal starting place, we should add a test that loads the existing site's database and verifies that the API returns the expected JSON output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.