Git Product home page Git Product logo

scripto's Introduction

Scripto

© 2010-2012, Center for History and New Media
License: GNU GPL v3

Scripto is an open source documentary transcription tool written in PHP. It features a lightweight library that interfaces MediaWiki and potentially any content management system that serves transcribable resources, including text, still image, moving image, and audio files.

Scripto is not a content management system. Scripto is not a graphical user interface. Scripto is a software library powered by wiki technology that developers can use to integrate a custom transcription GUI into an existing CMS. You provide the CMS and GUI; Scripto provides the engine for crowdsourcing the transcription of your content.

Why MediaWiki?

MediaWiki is a good choice for the transcription database for several reasons:

  • It is the most popular wiki application and has a sizable and active developer community;
  • It offers helpful features, such as talk pages, version history, and user administration;
  • Wiki markup is easy to learn;
  • It comes with a powerful, fully-featured API.

Requirements

  • PHP 5.2.4+
  • Zend Framework 1.10+
  • MediaWiki 1.15.4+
  • Custom adapter interface to (and possibly an API for) the external CMS

Installation

  • Download and install MediaWiki;
  • Download the Zend Framework library;
  • Download the Scripto library, set the configuration, and use the API to build your documentary transcription application.

Suggested Configuration and Setup

Here's a basic configuration:

<?php
// Path to directory containing Zend Framework, from root.
define('ZEND_PATH', '/path/to/ZendFramework/library');

// Path to directory containing the Scripto library, from root.
define('SCRIPTO_PATH', '/path/to/Scripto/lib');

// URL to the MediaWiki installation API.
define('MEDIAWIKI_API_URL', 'http://example.com/mediawiki/api.php');

// Set the include path to Zend and Scripto libraries.
set_include_path(get_include_path() 
               . PATH_SEPARATOR . ZEND_PATH 
               . PATH_SEPARATOR . SCRIPTO_PATH);

// Set the Scripto object by passing the custom adapter object and 
// MediaWiki configuration.
require_once 'Scripto.php';
require_once 'Scripto/Adapter/Example.php';
$scripto = new Scripto(new Scripto_Adapter_Example, 
                       array('api_url' => MEDIAWIKI_API_URL));

// Set the current document object.
$doc = $scripto->getDocument($_REQUEST['documentId']);
  
// Set the current document page.
$doc->setPage($_REQUEST['pageId']);

// Render the transcription or talk page using the $scripto and $doc APIs.

See the various implementations of Scripto for more suggestions on configuration, setup, layout, and styles.

Advanced Usage

Record Client IP Address

Scripto does not record a client's IP address by default. All modifications to pages will be set to the IP address of the server running Scripto. To record a client's IP address, you'll need to add the following code to MediaWiki's LocalSettings.php:

$wgSquidServersNoPurge = array('127.0.0.1');

Where '127.0.0.1' is the IP address of the server running Scripto.

Base64 Decoding

Scripto Base64 encodes document and page numbers to prevent incompatible MediaWiki title characters. Because of this, corresponding page titles in MediaWiki will be unusually named. You may place the following code in MediaWiki's LocalSettings.php to make page titles human readable:

// Decode the MediaWiki title from Base64.
// http://www.mediawiki.org/wiki/Manual:Hooks/BeforePageDisplay
$wgHooks['BeforePageDisplay'][] = 'fnScriptoDecodePageTitle';
function fnScriptoDecodePageTitle(&$out, &$sk, $prefix = '.', $delimiter = '.')
{
    $title = strtr($out->getPageTitle(), '-_', '+/');
    if ($prefix != $title[0]) {
        return false;
    }
    $title = array_map('base64_decode', explode($delimiter, ltrim($title, $prefix)));
    $title = 'Document ' . $title[0] . '; Page ' . $title[1];
    $out->setPageTitle($title);
    return false;
}

Changelog

  • 1.1
    • Add option to retain specified HTML attributes.
  • 1.1.1
    • Fix watch and unwatch pages.
  • 1.1.2
    • The /e modifier is deprecated in PHP 5.5.0 and removed in 7.0.0. Use preg_replace_callback() instead.

scripto's People

Contributors

cliotropic avatar jimsafley avatar kalbers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

scripto's Issues

Full testing coverage on Scripto_Document

document_test.php does not yet have full coverage on Scripto_Document. (By full coverage I mean that SimpleTest touches all methods and validates their return values.) However, given the class is merely a bridge between MediaWiki and an external system, and that testing already includes Scripto_Service_MediaWiki and Scripto_Adapter_Interface, I wonder if full coverage is needed.

Write Test Cases for the MediaWiki API Client

This may require the developer to install a testing MediaWiki instance, separate from the production instance. The upshot is test coverage of Scipto's interface with MediaWiki, the reliability of which should be considered unresolved.

Login broken when using MediaWiki 1.27.0+

See https://forum.omeka.org/t/unknown-login-error-failed/2993

Since MediaWiki 1.27.0 it appears that a successful login action does not return a cookieprefix. Scripto's MediaWiki service depends on this prefix to identify and set cookies needed to maintain state between requests. It's unclear exactly why it was removed, but it might be related to login being deprecated in favor of the new clientlogin action (available since 1.27.0).

Regardless of an eventual update from login to clientlogin (see #26), we'll need to find another way to extract the cookie prefix. For example, in Scripto_Service_MediaWiki::login(), we could set do something like this:

$cookies = self::getHttpClient()->getCookieJar()->getAllCookies();
preg_match('/^(.+)_session$/', $cookies[0]->getName(), $matches);
$mediawikiCookiePrefix = $matches[1];

Until this is fixed we could instruct users to use MediaWiki 1.23.15.

Uncreated pages have no page history

Scripto_Service_MediaWiki::getRevisions() returns the following array when a page is not yet created:

(
  [query] => Array
      (
          [pages] => Array
              (
                  [-1] => Array
                      (
                          [ns] => 0
                          [title] => .MQ.Ng
                          [missing] => 
                      )
              )
      )
)

Check for this response in Scripto_Document::_getPageHistory() and return an empty array before the foreach(). Otherwise calls to the function using an uncreated page result in an "Invalid argument supplied for foreach()" warning.

Add MediaWiki API URL validation

Include API URL validation in Scripto_Service_MediaWiki::__construct(). Check against a valid MediaWiki API URL pattern, and catch Zend_Uri_Exception (for invalid URLs), throwing Scripto_Service_Exception when encountering an error.

Enable transcription import

Extend the library to enable sysop and burocrat users to import document transcription to the external system. Page protection could signify a finished transcription, but it does not have to. Utilize Scripto_Adapter_Interface::importDocumentTranscription() and Scripto_Adapter_Interface::importDocumentPageTranscription().

Enable page protections

Extend the library to enable sysop and bureaucrat users to protect an individual document page and an entire document (i.e. all the document's pages). Protected pages are locked to prevent further editing by lower user groups. This may mean that the transcription is finished and ready to be imported to the external system.

MediaWiki's rv* parameters are deprecated

See https://lists.wikimedia.org/pipermail/mediawiki-api-announce/2017-June/000134.html

While rvdifftotext, rvdiffto, rvexpandtemplates,and rvgeneratexml are present in the MediaWiki service class, we only use rvdiffto (to get the difference between two revisions). Remove them all from the $_actions list. Instead of revids/rvdiffto in getRevisionDiff(), use action=compare. (Note that the compare action was not added until 1.18 and the fromid/toid properties were not added until 1.20.)

Add a method that returns a URL to a specific MediaWiki page

Maybe add Scripto::getMediawikiInfo() using Scripto_Service_MediaWiki::getSiteInfo() and return a parsed/formatted array including a URL path prefix for article pages. Also add Scripto::getMediawikiPageUrl($title) that uses Scripto::getMediawikiInfo() and the provided title to build and return a URL.

Back-links from MediaWiki to CMS

Currently it's impossible to get from the raw MediaWiki site to the CMS hosting the page images. Until MediaWiki features like Recent Changes are hooked to the CMS, could Scripto at least point MediaWiki users to the page images?

MediaWiki's login action is deprecated

See https://www.mediawiki.org/wiki/API:Login

Since MediaWiki 1.27.0 the login action has been deprecated in favor of the new clientlogin action. We've already seen an inconsistency in the API result (see #25) but we need to prepare for login's eventual removal.

In Scripto_Service_MediaWiki::login() we'll need to replace the existing login requests with the clientlogin interactive flow:

$params = array('meta' => 'tokens', 'type' => 'login');
$response = $this->_request('query', $params);
$logintoken = $response['query']['tokens']['logintoken'];

$params = array(
    'username' => $username,
    'password' => $password,
    'logintoken' => $logintoken,
    'loginreturnurl' => 'http://example.com',
    'rememberMe' => '1',
);
$response = $this->_request('clientlogin', $params);

On top of this we'll need a way to determine when to use the login action for older installations.

Breaking changes to MediaWiki API

A change to the MediaWiki API, described here, may eventually require a small change to Scripto's MediaWiki API client. I'm not sure why the change is described as backwards compatible since it appears to have changed a parameter name in the response. (The description of the changes are pretty confusing, but the commit may shed some light.)

For now the only change that may need to happen is in Scripto::getAllDocuments():

$from = $response['query-continue']['allpages']['apfrom'];
// changes to
$from = $response['query-continue']['allpages']['apcontinue'];

Another change, described here, will eventually require a small change to the client. Again, this change should not be described as backwards compatible.

For now the only change that will need to happen is in Scripto::getRecentChanges():

$start = $response['query-continue']['recentchanges']['rcstart'];
// changes to
$start = $response['query-continue']['recentchanges']['rccontinue'];

Determine if the current user can edit the current page.

Scripto_Document::canEdit() should determine whether the current user can edit the MediaWiki page that corresponds to the current document page. It currently depends on Scripto_Service_MediaWiki::getEditCredentials() to return NULL if the user doesn't have edit rights. Regrettably, this doesn't account for users with edit rights who attempt to edit a protected page. There must be a way to determine if the current user can edit the current page.

Implement account creation

Ideally we'd separate account creation (user registration) from the MediaWiki Web interface, but the API does not provide an account creation feature, though it is proposed. Unless there is some other way, all accounts will need to be created via the MediaWiki Web interface.

Remove Zend Framework dependency

The most common complaint about installing Scripto is the Zend Framework dependency, in particular, determining the path to the ZF library. Removing this dependency would streamline the installation process, but using a custom HTTP client (which is the extent of ZF needed by Scripto) may be more trouble than it's worth.

An alternative would be to package the ZF library (or only the requisite ZF components) alongside the Scripto library. However, this would greatly, and perhaps unjustifiably, increase the size of Scripto downloads.

A related change would be to update to ZF 2.

Reorganize Scripto repository

The Scripto repository should follow a more traditional software directory structure, such as:

Scripto/
  |_README
  |_lib/
    |_Scripto/
      |_[...]
  |_tests/
  |_examples/
    |_shared/
      |_images/
        |_[...]
      |_[files in OpenLayers/ and layout/]
    |_SideBySide/
      |_[...]
    |_Simple/
      |_[...]
    |_TopAndBottom/
      |_[...]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.