documentcloud / document-viewer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nytimes/document-viewer

692.0 692.0 120.0 5.2 MB

The NYTimes Document Viewer

Home Page: http://open.blogs.nytimes.com/2010/03/27/a-new-view-introducing-doc-viewer-2-0/

License: Apache License 2.0

JavaScript 63.49% CSS 23.24% Ruby 0.10% HTML 13.16%

document-viewer's Introduction

This is the repository for the legacy DocumentCloud site, please see the current repository here:

https://github.com/muckrock/documentcloud

______                                      _   _____ _                 _
|  _  \                                    | | /  __ \ |               | |
| | | |___   ___ _   _ _ __ ___   ___ _ __ | |_| /  \/ | ___  _   _  __| |
| | | / _ \ / __| | | | '_ ` _ \ / _ \ '_ \| __| |   | |/ _ \| | | |/ _` |
| |/ / (_) | (__| |_| | | | | | |  __/ | | | |_| \__/\ | (_) | |_| | (_| |
|___/ \___/ \___|\__,_|_| |_| |_|\___|_| |_|\__|\____/_|\___/ \__,_|\__,_|

DocumentCloud is a catalog of primary source documents and a tool for annotating, organizing and publishing them on the web. Documents are contributed by journalists, researchers and archivists.

This codebase contains the entirety of DocumentCloud.org, and pulls together the rest of our open-source projects: Docsplit is used to extract data from incoming documents; that work is parallelized across CloudCrowd; data on the client-side is modeled by Backbone.js, which depends on Underscore.js for all of its abilities; Jammit concatenates and compresses the dozens of CSS and JS files into a single asset package; the NYTimes' Document Viewer displays the documents, while Pixel Ping records the traffic.

If you find a security issue while browsing the source, please email [email protected] to inform us of the problem.

Code contributed to this project is provided under the MIT license (see the LICENSE file). Some components of the project are subject to their own licenses as indicated (see /vendor and /public/javascripts/vendor directories).

document-viewer's People

Contributors

Stargazers

Watchers

Forkers

jackzheng satish gijs mikekidder mirokulket frankk00 tzuryby peopledoc boy-jer ashaw vbhv netconstructor pombredanne ealliaume d5nguyenvan wil amclean kloy goyaka rhartley76 riordan ahazelwood mikoyan mobilipia mivano nathanstitt jerryharrison coodoo myanmarlinks entea maladev jy4618272 mysrt mastereza17 jorgenio mattjones harrypotterismyname dgoutam rwalport rdmpage justinsoliz newsapps awiss shjain samanthasunne tonyschick npasulka justingiles amnestyinternational kgshv davgit alex24971 malkassem bangpound whmccoy davifiamenghi tchen0123 gollapudi etodanik homofaber17 zabrane richleenyc gerardogomez srikanthkothala dldinternet harlo croby youaani tornabene openstate martino sneelagaru kmhtoo edunext troyericg uniasha roberttdev bassculture procosgroup vijo rajeshvv kamihouse namjae provenire ablotny danersc arcodergh kimjaeyun retrography eemmanuel7 arpanroyuk01 giapdangle mr-justin dannguyen defconcepts rlugojr ssgonchar shalintripathi blambodh mrozmanith

document-viewer's Issues

Page number text input does not switch to proper page when changed

When the page number is changed in the text input at the top of the sidebar, the viewer should switch to the appropriate page, instead it pauses briefly, then either the text input resets to the previous value, or it jumps to an apparently random page.

I've traced it down to the logic in acceptInputCallBack helper in helpers/search.js

document-viewer/public/javascripts/DV/helpers/search.js

Line 31 in 6c94dcf

var pageIndex = parseInt(this.elements.currentPage.text(),10) - 1;

Note that it calls .text() to get the value of this.elements.currentPage. This would be fine, except this.elements.currentPage is set to two references. One is the input in the sidebar, and the other is in the footer.

So if the viewer is on the 2nd page, and the input has 3 typed in it, text() will return "32". Three for the user typed value, and 2 for the hidden input in the footer. If the document happens to have 32 pages, it'll jump to that page, otherwise it will revert the edit and set the value of the text input back to "2"

In the past the footer was not rendered if the sidebar was present, so there would only be a single element contained in the this.elements.currentPage jQuery reference.

The check for the sidebar in footer.jst was removed as part of the responsive commit by jashkenas: 8577a01#diff-364047077bc9bf26782abf38ee33fa67

And then the options being sent to footer.jst was removed as well by knowtheory: f172e52

The quick & easy fix would be to change the selector in acceptInputCallBack so it's more specific and only reads the input that is focused. Or should we attempt to figure out why the footer is being rendered and then hidden when the sidebar is visible?

Tab buttons display incorrectly when CMS defaults to `box-sizing: border-box;`

For better or for worse, our site has decided that everything should use box-sizing: border-box; which results in not-so-pretty tab buttons on the viewer:

Setting box-sizing: content-box; to those elements fixes it.

Here's a sample viewer: http://america.aljazeera.com/multimedia/2013/11/original-documentabuzubaydahdiariesvolumeone.html

Viewer CSS and JavaScript aren't served gzip-compressed

The DocumentCloud Viewer CSS and JavaScript aren't served gzip-compressed from the Amazon S3 servers.

Example URL:

http://s3.amazonaws.com/s3.documentcloud.org/viewer/viewer.js

Response Headers:

Date: Mon, 22 Dec 2014 17:42:35 GMT
ETag: "c461f0969585f2aaff54e405967d12be"
Last-Modified: Tue, 16 Dec 2014 18:18:14 GMT
Server: AmazonS3
x-amz-id-2: ukt2TSsLb5DM0RUlxdYsr/1pozp/gsREOVRJ9xlWIlPlniZjqA3vf8ZACmCu7mP95yWV78oKMXs=
x-amz-request-id: 16053EED6EADE536

While Amazon S3 doesn't support on-the-fly compression, you can compress the files locally and then upload them. You'll need to add the Content-Encoding: gzip header to the files.

A better solution might be to use CloudFront.

Document list embed styles when CMS defaults to `box-sizing: border-box'

This is the same issue flagged in #32.

When we embed the document list in our CMS, in looks like this:

enable annotations

Hi, how can i enable the annotations, cause i cant figure out.

thanks..

pd: theres a guide to implement the project on my own server?

Document zoom level often too low; document would still fit with one or two notches higher zoom

This happens with many documents and browser widths, but for a concrete example: set your browser window width to 1110 (or thereabouts, check in console with window.innerWidth), then go to https://www.documentcloud.org/documents/527670-oregon-zoo-elephant-contract.html

(Also make sure your browser is at default zoom and that the sidebar appears.)

The zoom level that the document loads at is lower than it could be. On Chrome/Mac, you could up the zoom by two notches and still contain the entire document image plus the "p. 1" text. On Safari/Mac you could increase by one notch.

Getting the image as large as possible is important for Overview, because the user relies in part on very rapid user scanning of many documents, and we don't have any screen real estate to waste on needless padding.

Trying to set local json

i created a local json and paste json from http://documents.nytimes.com/goldman-sachs-internal-emails.json in it. getting following exception

Uncaught TypeError: Cannot read property 'schema' of undefined viewer.js:685
DV.loadJSON viewer.js:685
deferred.resolveWith viewer.js:51
done viewer.js:452
callback

don't use document.write to insert script

Chrome emits this warning:
A Parser-blocking, cross-origin script, https://assets.documentcloud.org/viewer/viewer.js, is invoked via document.write. This may be blocked by the browser if the device has poor network connectivity.

Here's the code in question:

/* Request the viewer JavaScript. */
document.write('<script type="text/javascript" src="//assets.documentcloud.org/viewer/viewer.js"></scr' + 'ipt>');

Please replace this with var s = document.createElement('script'); or similar.

mini is not defined

Hello,
I just downloaded the code and ran it as is. I get a "mini is not defined" error in firebug. When I replace the viewer.js with the one on the live documentcloud site, the error is gone and everything works fine. Is the code on here the latest code?

When I do a diff of viewer.js I do see differences. Thanks.

Translation support

I've asked this on irc, but here is more visible. Does the document cloud viewer support translations? And how can we contribute?

I'm happy to collaborate in the spanish version, if is possible.

Use of == when comparing to truth/falsy variables

I have noticed a fair bit of == when comparing values to 0 or true/false. === should be use to prevent possible logic errors.

Images loaded twice

I'm running an embedded docViewer that pulls images from Rails. I put a debug output statement in the image-return controller action. On first load, it reports just one request for each of the first 3 pages. Each page loaded after that generates 2 requests and gets 2 full responses from rails. One request is coming from DV/lib/page.js#drawImage.js, but I don't know where the other comes from. The page doesn't load until the second request completes.

Is this behavior intended? Is there a way to limit it to a single request?

Create a way to remove player from page

The popcorn.js guys have written us a plugin so that popcorn users can control the viewer. As a result, we've discovered that removing the viewer from the page leaves some orphaned listeners which complain.

Responsive option

In addition to the full screen and fixed options, is there a way to offer a viewer option that takes the width of its parent element (calculating an appropriate height from that value)?

We’ve run into an issue on our site now where the Viewer doesn’t fit in with our current responsive grid. I wrote a tiny function to calculate the width/height and pass it to the .load() method, but I’m hoping we can drop this hack at some point.

Suggestion: Remove annotation padding

I've been using DocumentCloud to highlight scientific documents, and we were struggling a bit when it came to highlighting multiple cells in a small table (even with zoom). There were 'dead zones' where the highlight cursor changed to the page grab cursor, near an existing annotation and in a small page-spanning row above it.

I was able to remove this issue by removing the 'padding: 3px' property from '.DV-annotation'. To help users trying to make similarly close highlights, this may be helpful to change.

Viewer w/Public Annotations: "Related Article" link uses wrong font.

Related Article »

Should match

Original Document (PDF) »
Print Notes »

Highlight Previous Match doesn't work when transitioning back to First Page

After searching, clicking on Previous Match doesn't work when the viewer needs to transition from first match of page 2 to last match of page 1. As a result, it not possible to go back to page 1 clicking on Previous Match once you have reached second page.

We are preparing a pull request

Padding is screwed up.

Example: http://www.nytimes.com/interactive/2014/07/30/world/africa/31kidnap-docviewer2.html

Padding-left on .DV-docViewer .p0, .DV-docViewer .p1, .DV-docViewer .p2 is currently 60px. It looks like it wants to be 40.

Variables defined in block scope

I have been documenting the source and have noticed a lot of variables have been defined in a block scope manner or redeclared after being passed as a function's argument. These should be fixed to reflect functional scoping to keep my editors linter from crying ;)

"e is undefined" error when using display:none on an embedded player

I'm creating a DocumentCloud plug-in for Popcorn.js, and this error is proving to be a bit of a blocker in terms of hiding and displaying the players.

The plug-in can insert multiple players into one div and hide/display them successfully with the visibility attribute, but this proves to be a bad choice simply because the hidden players "push" the visible players to the side or down. With in-line display I can make it seem like the hidden players aren't even there.

This is when I get the error. I traced it to a function called sortPages that returns undefined when called in the drawPages function.

Here's a case where the bug is present: http://www.chrisdecairos.ca/projects/documentcloudissue/plugins/documentcloud/popcorn.documentcloud.html

Let me know If I can help with anything, thanks!

Adding an annotation or redaction resets the zoom level

When the viewer is zoomed in or out, adding an annotation or redaction resets the zoom to the default size.

This happens because DV.Api.addAnnotation calls redraw(), which in turn calls this.viewer.helpers.autoZoomPage().

document-viewer/public/javascripts/DV/controllers/api.js

Line 209 in 069256e

this.redraw(true);

As a test, I removed the call to redraw() and the newly drawn annotations then failed to display. Therefore it appears that the redraw() call is necessary.

Viewer w/Public Annotations: Server error on trying to use Twitter login

If I select the "Twitter" logo as a login option I get this:

Documentation reflects old (NY Times Document Viewer) syntax

It appears the README is a bit out of date.

When following the current documentation, it seems like DocumentViewer's sections/chapter display is broken.

It's not. The documentation is wrong (well... just out of date). It reflects the syntax of the original NY Times Document Viewer, not the updated DocumentCloud syntax.

The documentation says to pass sections like so:
sections : [ { title : CHAPTER_TITLE, pages: "1-10" }, { title : CHAPTER_TITLE, pages: "11-20" } ],

The file the viewer.html

That ain't right! Ted pointed me to a DocumentCloud JSON where sections are working correctly.

Here's how it's sections code works:
"sections":[ { "title":"1. Introduction", "page":8 }, { "title":"Abstract", "page":7 }, { "title":"Table of Contents", "page":4 } ],

Aside from the section pages being passed as an int rather than a string (range) there are a few other minor changes.

Meanwhile the stock viewer.html still points to a NYT json rather than a DV json, which reflects this old markup.

p.s.
I had a pull request ready to go when I thought it was just about sections. I'll be amending that and will be sending a new one shortly.

Add download option on embed view

Now you have to click on the fullscreen button to access to the full-width page in order to download a pdf. Is possible to add a button to directly download a document in the embed visor?

thanks

I imported total code into my local desktop . now I want to set few buttons into annotations bar?

Finally I got annotations bar without any controls like print, download, save. But I want to push some controls on the annotations bar. Please help me on this issue .

Thanks.

how to convert the original source PDF into GIF Images to be viewed in this HTML5 document viewer

I was trying to make this plugin work with a plain PDF file but I wasn't able to. Then I read somewhere that this plaugin work only with gif images created from thhe original PDF file, but there's no converter available or documentation on how to do that.
Could you help me?

BTW: This is a awesome plugin but I think that it lacks a lot on documentation to be usable from the open source community.

fragment routing failing in public annotation build

Title says it all. Click on an annotation link, the fragment will append correctly. Copy & pasting the url into a new tab will open the doc, the note will flash open, and then flash closed.

Installing Document-Viewer

I copied the unpressed source to my web server, however when I load viewer.html it says loading but the document doesn't get rendered. Did I miss anything during installation?

Scaling Images to Widths-only and Not Height As Well

Hi,

I have noticed that when one has an image that is not 700/1000 pixels wide, then the heights are oddly scaled. This is probably a mismatch of w/h ratios. Could someone fix this?

Albert

Support translations in viewer interface

Sprung from: documentcloud/documentcloud#121

Would like to pack and deliver (like UPS trucks) translations to the viewer chrome, which will then decide which language to offer the user based on:

Default language for viewer defined at org level
Custom language for viewer defined on document info
Custom language for viewer defined by reader and saved as cookie

List of file formats supported

Hi,

I found this project exciting. Is it only for pdf? Are text(.txt), word(.doc), excel(.xls) and power points(.ppt) files supported too. Can you list the file formats you support?

Add two-up view for simultaneously viewing image and text

On larger viewports, a two-up mode to compare original pages on the left to the OCR'ed text on the right would be quite useful. Challenges include maintaining a sensible relative scroll speed and the same foundational challenge as #5.

Edit capability?

Hello,
I was wondering if the document-viewer contains the UI for editing..i.e redactions, annotations..etc.. I see from documentcloud that it should, but going through the code I'm confused if this is only a 'viewer' or if there's a way to turn on the editing capabilities. I can see that there are callback methods I can attach...but that's about the extent. Can someone point me to the source files to go through, or give an overview of how I can turn it on if it's there? Thanks.

Expose document metadata in viewer API

This isn't currently even provided by the document JSON, but when it is, we should have a generic viewer.api.getMetadata(fieldname) function to read it.

Range of z-indexes for viewer is very large

The fellows over at the New Jersey Star-Ledger have indicated that the viewer CSS is messing with their headers, since our z-indexes range from 0 all the way up to 20000.

Someone needs to go through all the z-index declarations in our CSS files and bump them down in a manner that keeps all of our declarations proportional, but reduces the values by either an order of magnitude (20000 to 2000, 15999 to 1599, and 1 to 1) or into simple increments of 5.

Getting stuck on certain searches

Paging through search results seems to fail at certain spots.

To reproduce:

http://www.documentcloud.org/documents/1004523-christie-document-exhibit-b.html#document/p1

Search for the word "general".
-Shows matching page 1 of 60.
Advance to the next result twice using the "Next" button in the top-middle of the viewer.
It's now on matching page 3 of 60, which is page 30 in the document. It has no highlighted match. At this point, clicking either the "Previous" or "Next" button does nothing.

Searching for "jersey" causes the same problem when you hit the second matching page (page 9 in the document).

Occurred in both Chrome 33.0.1750.149 and Firefox 27.0.1.

Multi-document viewer

Jeremy's had a think with one of his collaborators at the NYT about multi-document viewers this is what's come of it (and the discussion w/ me):

We could repurpose the existing document viewer and chaptering system.
Provide an additional level of hierarchy in the chaptering interface to distinguish separate documents.
Accept an array of document data, rather than just a single record.
ideally the viewer should respect both the absolute number of pages as well as the in document page #.

Viewer w/Public Annotations: "Log In" / "Log out" cleanup

On a document with public annotations available, logging in should be via a button, should use same case ("Log in") as home page.

Current:

Currently, homepage log in button reads "Log in":

"LOGOUT" button should follow the same formatting. "Log out"

Hash fragment links to pages don't jump to the correct top position

when the pages are not a standard height. (e.g. w/ 6 landscape height pages, the jump should double check the page height when jumping, so as not to over jump.)

(As reported by WNYC)

Supported Document Types

What file types does document-viewer project support?

js error: "print_notes_url is not defined"

In the downloaded zip: documentcloud-document-viewer-0.1-555-g38f00b4zip
I get this error when I try and run the included production file viewer.html

print_notes_url is not defined

Maintain in-page scroll position when moving between viewer tabs

Steps to reproduce mild annoyance:

Open a document with a long page of text
Scroll down a bit (so the top of the current page extends beyond the top of the viewport)
Flip to the Text tab and scroll it down a bit as well
Flip back to the Document tab
Flip back to the Text tab

What happens: Every time you change tabs, scroll position is returned to the top of the current page.

What would be ideal: Unless the current page has changed, the viewer should maintain the old scroll position. This is particularly helpful when flipping between the Document (or Pages) view and Text, since you're often referencing the text view to check OCR accuracy and scrolling back down to the relevant bit becomes tiring.

viewer.api.redraw(true) isn't redrawing page widths

viewer.api.redraw(true) is supposed to redraw pages, but isn't doing so (see an example like https://www.themarshallproject.org/documents/1363276-pew-charitable-trusts-prison-projections-2018 or take any page w/ a viewer on it, resize the page and run DV.viewers[_.keys(DV.viewers)[0]].api.redraw(true) in the console).

on getPageText, DocumentViewer returns undefined if text has not yet been presented in the viewer

Go to this URL: https://www.documentcloud.org/documents/1314000-sbrown-2014-2017-executed-contract.html

In console, run:

DV.viewers[_.keys(DV.viewers)[0]].api.getPageText(2)

It returns undefined.

Now go to page 2. Click the text tab. The text loads. Then in the console, run this again:

DV.viewers[_.keys(DV.viewers)[0]].api.getPageText(2)

This time, it returns page 2 of the document text.

It would be great if the API would pull the requested text on call to getPageText -- even if the user has not clicked that text tab yet. I need this functionality in order to (cleanly) add span tags around each token in the document text. Trying to do this: datamade/parserator#16

Add padding to note view

Sometimes notes aren't drawn perfectly.

Opening this ticket to consider adding some default padding into note boundaries when displaying them.

pushState for the viewer

We're implementing oEmbed endpoints for DocumentCloud resources, and doing so requires real urls for each of our embeddable resource types (documents, notes, pages and searches).

On top of that, oEmbed discoverability (say via our wordpress plugin) is dependent on users being able to get access to those real resource urls.

So... it's time to update the viewer's router/history code (which is almost identical to Backbone's incidentally) to use pushState.

Viewer w/Public Annotations: Login box should use home page format

Current:

Better:

"Email" not "Email Address"
Field names inside the field
Forgot your password link available
- Modal doesn't need a "login" title to the right of the DocumentCloud logo -

An error handling endpoint

We currently provide an afterLoad callback, but no mechanism to inspect whether the ajax response for the requested resource has returned a non 200 status.

One current route around this is to check viewer.api.getState() returns 'undefined' in the afterLoad.