sotera / datawakedepot Goto Github PK
View Code? Open in Web Editor NEWLoopback web application for administration of Datawake networks
License: Apache License 2.0
Loopback web application for administration of Datawake networks
License: Apache License 2.0
User should be able to alter the view but having these default to the selected values would greatly ease the user experience.
Once the information panel is opened, it should stay open.
Steps:
The forensic graph displays lists of entity types in the legend like person, agent, officeholder instead of just the primary, Person.
When a user clicks on the datawake icon in the left corner of the plugin, the datawake website should be opened.
Deleting a domain should also delete its domain entity types and domain items.
Deleting a trail should also delete trail urls and trail url extractions.
This is probably most easily accomplished by modifying the existing recursive delete in the service, however, the best design moving forward is to modify the model.js file for dw-domain and dw-trail to have an on-delete action. There is an existing loopback ticket regarding this issue (loopbackio/loopback-datasource-juggler#88 (comment))
A non-admin user should only have access to their trails in the depot trails page or in their plugin drop down.
Domain Items and Domain Entity Types should be chosen by the user. To do so we need to modify the Domain Item and Entity Type models.
This includes the method to add an item and type from an Extracted Entity in the Depot. The first step is to modify the models. Second step is to modify the Depot extracted entity page so that the user can click an Extracted Item to add it or its Type to a Domain. Third step will be a separate issue to enable this functionality from the Extracted Entity Panel Widget #43 .
Users should only belong to 1 team. Remove team from the plugin and forensic views so that the user only has to select domain and trail.
Forensic currently calculates the searchTerms for a url each time it runs. Instead we need to determine and persist the search terms when the url is added during trailing.
I've added a searchTerms property to the dw-trail-url.json model but nothing currently populates it.
We need to be able to export a domain for use by the crawling teams. The domain should be the aggregate of all of the trails within a domain (At some point we may want to be able to choose specific trails.)
The format should be similar to
{
"urls": ["http://la.backpage.com/1234", "http://la.backpage.com/1234", "http://www.wikipedia.com"], //All visited urls
"topLevelDomains": ["http://la.backpage.com"], //common top level domains
"searchTerms": ["escorts","las angeles","massage"], //all search terms
"domainEntities": ["cherry","mimi","pasadena"], //entities added by a user.
"domainEntityTypes": ["person","place","bitcoin address"], //entity types added by a user.
"commonEntities": ["massage", "parlor","las angeles","pasadena"] //top 20 most extracted entities.
}
Build a wordcloud in the forensic view showing the extracted entities for a trail.
for incorporating search results from other services. This will most likely include a model change to relate suggested urls to a given TrailUrl.
The info panel does not open on some sits. Examples include: reuters.com, wsj.com
The entity grid in forensic is only displaying a single url for entities that were extracted from multiple urls
When clicking refresh in the browser, the current user is lost throughout the app requiring the user to sign back in.
The name toggle is ambiguous, it should be changed to page info.
We need to test that the plugin works with the tor browser.
If you try to add a user to a “Team” you get an error on Save and it doesn’t show it in the table. Something about the Amino User instance isn’t valid 422.
You can go to the Teams page and add Users to the Team from there as a workaround.
The user should be able to create a trail from the plugin.
The plugin currently tracks the user as they use the depot app. The plugin should know to ignore any url's related to the depot.
The Trail URL's page should have a drop down with the user's trails. Once a trail is selected, the list should populate with URL's for that trail
A normal user should only have access to Domains, Trails, and forensic.
The URLExtractions list view needs to be filtered by domain, trail, and URL. Only the current users domains and trails should be available.
The current behavior causes OOM errors and can take a long time to load wile not providing any value to the user.
As a work around, we could remove this page and use only the entities grid in forensic.
For analysis especially if using multiple extractors that do the same task (NER) we need to include an extractor source field to the URL extraction and have all the extractors add their name.
The plugin should have a context menu that is configurable to add integration with other memex tools such as search in dig, tellfinder, or imagespace. It should also be context specific for selected text or image.
Display ExtractedEntities for a URL in the Browser Panel
User highlight Data Items on the page from the toolbar
Need to update the user account whenever a new or change is made for that user involving Teams, Domains, or Trails. Currently the user has to log out and back in again to see the change reflected.
(Mike and I talked about this one…realize it’s future implementation goal)
Entity Types should be shareable across multiple domains. For instance a type of ‘email’ or ‘address’ might be valid in several domains.
The user should be able to mark a page as either relevant or irrelevant. This could be as simple as a check/x button in the plugin or div the default should be unspecified.
Two issues are immediately apparent on both of these tabs. First, if one of the columns (e.g. URL) is long, for example having multiple URLs listed, it makes the column very large. This pushes subsequent columns to the right. It easily can create a situation where the table width is then wider than your screen. This would be only a minor annoyance if there was horizontal scroll capability, but there isn't, so anything pushed beyond your window view cannot be seen. You cannot drag/resize the column widths either to attempt to resolve this.
Second issue with the tables are that even though they appear to have sorting capability at each column heading...it does not appear to work.
If you visit a page especially with ads, the trail will show multiple pages for this page. Reuters.com is a good example
Need to find a way to hide the download button for the plugin if the browser already has the datawake plugin installed.
This is Part 3 of #42,Domain Items and Entity Types should be chosen by the user. This task is to enable the functionality created there within the Extracted Entity Panel Widget.
The word cloud renders in a collapsed way when the tab isn't active when clicking the graph button.
User needs a way to refresh the contents of the page (Trail list, Trail Url list, Trail Url Extraction list, Domain list, Domain Entity Types list, Domain Items list) without having to refresh the browser page.
I've noticed behavior where if you have "scrolling" set for the depot and view the tab for URLs visited when it has enough content to scroll off the browser window you might see an issue where when you scroll to the bottom, you may not be able to get back to the top.
Not sure what triggers this because it doesn't happen all the time. I thought it was triggered by starting a trail on another tab then coming back to the depot where you were scrolled to the bottom. But that can't cause it reliably either. When the error occurs, a white band for the title of the page (i.e. Forensic) extends through the side bar as well. when in that state, scrolling is broken. Suspect this is a bug with the admin template
If you add any of these items via the web interface for the depot, you won't see them in the pulldowns for the plugin toolbar until you log out then back in again
source of wordcloud (word list) should be configurable (trail, domain, extracted items or any other word list)
The models for each have changed a bit since this code was originally written. Need to confirm and fix anything missing on these.
The trails list should only show trails for the current user.
Seems that if there's a lot to extract on a page, the extractor just gives up. Small pages seem to be extracting ok.
This is an odd bug, but easy to repro. Branch being tested is 32-extractor-source-from-develop (but that may not be important to this issue specifically).
Behavior is if you have multiple extractors in the extractors management page of the depot and click on the icon to delete one of the extractors, you get the confirmation message to which you select yes to delete. The list goes blank as if it deleted all the other extractors too, and the loading bar at the top of the page starts scrolling. It gets almost to the end and just hangs there. The only way to resolve this is to logout and then back in again. When you do, you'll see that the one you deleted is gone and the others did not get deleted inadvertently.
A user is able to view all users on the system. This is a security concern as users should not know what other users and teams are on the system.
On branch 32-extractor-source-from-develop after trailing and getting extractions, if I click on Trail Extractions link in the sidebar node server crashes with the following error every time:
/home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/express/lib/response.js:242
var body = JSON.stringify(val, replacer, spaces);
^
RangeError: Invalid string length
at join (native)
at Object.stringify (native)
at ServerResponse.json (/home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/express/lib/response.js:242:19)
at Object.sendBodyJson as sendBody
at HttpContext.done (/home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/strong-remoting/lib/http-context.js:632:22)
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/strong-remoting/lib/rest-adapter.js:459:11
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:251:17
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:154:25
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:248:21
at /home/ubuntu/src/DatawakeDepot/node_modules/loopback/node_modules/async/lib/async.js:612:34
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.