sotera / datawakedepot Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 7.0 5.38 MB

Loopback web application for administration of Datawake networks

License: Apache License 2.0

JavaScript 60.63% ApacheConf 3.10% HTML 23.07% CSS 13.13% Shell 0.07%

datawakedepot's People

Contributors

Stargazers

Watchers

Forkers

anukat2015 ashbt weeshlow cognami thezedwards apingali biyanisuraj

datawakedepot's Issues

Forensic view should default to the Team/Domain/Trail selected in the toolbar

User should be able to alter the view but having these default to the selected values would greatly ease the user experience.

Keep info panel open

Once the information panel is opened, it should stay open.

Steps:

Visit a site
Toggle the info panel
click on a link
The panel is closed and the user must reopen the panel.

forensic graph shows lists of types

The forensic graph displays lists of entity types in the legend like person, agent, officeholder instead of just the primary, Person.

Extractors need training to avoid garbage data

I'm seeing junk being extracted from pages that are not remotely the entities the extractor thinks it is. See screenshot

Toolbar DW Icon should link back to the website

When a user clicks on the datawake icon in the left corner of the plugin, the datawake website should be opened.

Toolbar Logout button text larger than button

The text on the logout button is larger than the button. The easiest fix is probably to remove the user name.

Recursive Delete is broken for Trails and Domains

Deleting a domain should also delete its domain entity types and domain items.

Deleting a trail should also delete trail urls and trail url extractions.

This is probably most easily accomplished by modifying the existing recursive delete in the service, however, the best design moving forward is to modify the model.js file for dw-domain and dw-trail to have an on-delete action. There is an existing loopback ticket regarding this issue (loopbackio/loopback-datasource-juggler#88 (comment))

Users should only have access to their trails

A non-admin user should only have access to their trails in the depot trails page or in their plugin drop down.

Refactor Domain Item and Domain Entity Types

Domain Items and Domain Entity Types should be chosen by the user. To do so we need to modify the Domain Item and Entity Type models.

This includes the method to add an item and type from an Extracted Entity in the Depot. The first step is to modify the models. Second step is to modify the Depot extracted entity page so that the user can click an Extracted Item to add it or its Type to a Domain. Third step will be a separate issue to enable this functionality from the Extracted Entity Panel Widget #43 .

Toolbar: Remove team dropdown (assume 1 team per user)

Users should only belong to 1 team. Remove team from the plugin and forensic views so that the user only has to select domain and trail.

TrailUrl Search terms should be persisted

Forensic currently calculates the searchTerms for a url each time it runs. Instead we need to determine and persist the search terms when the url is added during trailing.

I've added a searchTerms property to the dw-trail-url.json model but nothing currently populates it.

Memex Domain Export.

We need to be able to export a domain for use by the crawling teams. The domain should be the aggregate of all of the trails within a domain (At some point we may want to be able to choose specific trails.)

The format should be similar to

{
  "urls": ["http://la.backpage.com/1234", "http://la.backpage.com/1234", "http://www.wikipedia.com"], //All visited urls 
  "topLevelDomains": ["http://la.backpage.com"], //common top level domains 
  "searchTerms": ["escorts","las angeles","massage"], //all search terms
  "domainEntities": ["cherry","mimi","pasadena"], //entities added by a user.
  "domainEntityTypes": ["person","place","bitcoin address"],  //entity types added by a user.
  "commonEntities": ["massage", "parlor","las angeles","pasadena"]  //top 20 most extracted entities.
}

Word cloud

Build a wordcloud in the forensic view showing the extracted entities for a trail.

Create Suggested Link Widget

for incorporating search results from other services. This will most likely include a model change to relate suggested urls to a given TrailUrl.

Info panel doesn't open on some sites

The info panel does not open on some sits. Examples include: reuters.com, wsj.com

Forensic Entity Grid Urls

The entity grid in forensic is only displaying a single url for entities that were extracted from multiple urls

Refresh looses the user

When clicking refresh in the browser, the current user is lost throughout the app requiring the user to sign back in.

Toolbar Toggle button name

The name toggle is ambiguous, it should be changed to page info.

Test the Firefox Add-on with Tor browser

We need to test that the plugin works with the tor browser.

Adding a team to a User gives an error on save

If you try to add a user to a “Team” you get an error on Save and it doesn’t show it in the table. Something about the Amino User instance isn’t valid 422.

You can go to the Teams page and add Users to the Team from there as a workaround.

Create a trail from the Toolbar

The user should be able to create a trail from the plugin.

Plugin should ignore the depot app pages

The plugin currently tracks the user as they use the depot app. The plugin should know to ignore any url's related to the depot.

Info panel opens inside of ads

On some sites, the info panel opens inside of an ad instead of in the main page. Cnn.com is an example.

TrailUrls should be filtered by Trail

The Trail URL's page should have a drop down with the user's trails. Once a trail is selected, the list should populate with URL's for that trail

User dashboard

A normal user should only have access to Domains, Trails, and forensic.

URLExtractions list should be filtered by domain, trail, and url

The URLExtractions list view needs to be filtered by domain, trail, and URL. Only the current users domains and trails should be available.

The current behavior causes OOM errors and can take a long time to load wile not providing any value to the user.

As a work around, we could remove this page and use only the entities grid in forensic.

Add extractor source to extractions

For analysis especially if using multiple extractors that do the same task (NER) we need to include an extractor source field to the URL extraction and have all the extractors add their name.

Configurable context menu in plugin

The plugin should have a context menu that is configurable to add integration with other memex tools such as search in dig, tellfinder, or imagespace. It should also be context specific for selected text or image.

Display ExtractedEntities for a URL in the Browser Panel

Toolbar - highlight Data Items on the page

User highlight Data Items on the page from the toolbar

Add new values to the User

Need to update the user account whenever a new or change is made for that user involving Teams, Domains, or Trails. Currently the user has to log out and back in again to see the change reflected.

Domain Entity Types should be shareable across multiple domains

(Mike and I talked about this one…realize it’s future implementation goal)

Entity Types should be shareable across multiple domains. For instance a type of ‘email’ or ‘address’ might be valid in several domains.

Page importance ("Page Rank" panel widget)

The user should be able to mark a page as either relevant or irrelevant. This could be as simple as a check/x button in the plugin or div the default should be unspecified.

Forensic page Entities & Visited Links tabs have table issues

Two issues are immediately apparent on both of these tabs. First, if one of the columns (e.g. URL) is long, for example having multiple URLs listed, it makes the column very large. This pushes subsequent columns to the right. It easily can create a situation where the table width is then wider than your screen. This would be only a minor annoyance if there was horizontal scroll capability, but there isn't, so anything pushed beyond your window view cannot be seen. You cannot drag/resize the column widths either to attempt to resolve this.

Second issue with the tables are that even though they appear to have sorting capability at each column heading...it does not appear to work.

Multiple pages scraped for each page visit

If you visit a page especially with ads, the trail will show multiple pages for this page. Reuters.com is a good example

Find way to remove "get plugin" button if plugin is already installed

Need to find a way to hide the download button for the plugin if the browser already has the datawake plugin installed.

Add Domain Items and Entity Types via the Extracted Entity Panel Widget

This is Part 3 of #42,Domain Items and Entity Types should be chosen by the user. This task is to enable the functionality created there within the Extracted Entity Panel Widget.

Word Cloud doesn't render correctly

The word cloud renders in a collapsed way when the tab isn't active when clicking the graph button.

Add refresh button to Depot Module list pages(trail, domain)

User needs a way to refresh the contents of the page (Trail list, Trail Url list, Trail Url Extraction list, Domain list, Domain Entity Types list, Domain Items list) without having to refresh the browser page.

Depot Scrolling can break

I've noticed behavior where if you have "scrolling" set for the depot and view the tab for URLs visited when it has enough content to scroll off the browser window you might see an issue where when you scroll to the bottom, you may not be able to get back to the top.

Not sure what triggers this because it doesn't happen all the time. I thought it was triggered by starting a trail on another tab then coming back to the depot where you were scrolled to the bottom. But that can't cause it reliably either. When the error occurs, a white band for the title of the page (i.e. Forensic) extends through the side bar as well. when in that state, scrolling is broken. Suspect this is a bug with the admin template

Adding new Teams, Domains or Trails to the Depot requires restart to see them in the toolbar

If you add any of these items via the web interface for the depot, you won't see them in the pulldowns for the plugin toolbar until you log out then back in again

Create Wordcloud widget for panel

source of wordcloud (word list) should be configurable (trail, domain, extracted items or any other word list)

Clean up Domain and Trail Import/Export and verify functionality.

The models for each have changed a bit since this code was originally written. Need to confirm and fix anything missing on these.

Stanbol Extractor not returning anything.

Trails list should only show trails for the current user

The trails list should only show trails for the current user.

Convert existing panel to Firefox Sidebar

https://developer.mozilla.org/en-US/Add-ons/SDK/Low-Level_APIs/ui_sidebar

This should allow us to close #52 and #51

Larger content pages may break extraction

Seems that if there's a lot to extract on a page, the extractor just gives up. Small pages seem to be extracting ok.

Deleting an extractor from depot list hangs depot page

This is an odd bug, but easy to repro. Branch being tested is 32-extractor-source-from-develop (but that may not be important to this issue specifically).

Behavior is if you have multiple extractors in the extractors management page of the depot and click on the icon to delete one of the extractors, you get the confirmation message to which you select yes to delete. The list goes blank as if it deleted all the other extractors too, and the loading bar at the top of the page starts scrolling. It gets almost to the end and just hangs there. The only way to resolve this is to logout and then back in again. When you do, you'll see that the one you deleted is gone and the others did not get deleted inadvertently.

User can view all users

A user is able to view all users on the system. This is a security concern as users should not know what other users and teams are on the system.

Selecting Trail Extractions crashes node server

On branch 32-extractor-source-from-develop after trailing and getting extractions, if I click on Trail Extractions link in the sidebar node server crashes with the following error every time:

/home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/express/lib/response.js:242
var body = JSON.stringify(val, replacer, spaces);
^
RangeError: Invalid string length
at join (native)
at Object.stringify (native)
at ServerResponse.json (/home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/express/lib/response.js:242:19)
at Object.sendBodyJson as sendBody
at HttpContext.done (/home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/strong-remoting/lib/http-context.js:632:22)
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/strong-remoting/lib/rest-adapter.js:459:11
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:251:17
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:154:25
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:248:21
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:612:34