Git Product home page Git Product logo

newman's Introduction

Newman

Email is a challenge to visualize. The Newman tool and accompanying analytics contain a processing component, email analytics, visualization and discovery tools.

Newman can quickly analyze and explore email using analytics and visulaization techniques - things not possible with traditional email applications.

Checkout the Quick Start at http://sotera.github.io/newman/

Applied Technology

MITIE: MIT Information Extraction -MIT-LL
Topic Clustering - MIT-LL
ActiveSearch - CMU
Tangelo - Kitware
Tika - Apache

newman's People

Contributors

cramsay3 avatar eickovic avatar jakobzlee avatar justinlueders avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

newman's Issues

Change default sorting of emails

By default, have the Email View pane display emails by Date from most recent to oldest. Currently, the Email View pane list is sorted by ID which has no meaning to the analyst.

Location: Email View pane

Change order of Newman tab listings

Change order of tab listings to Rank/Entities/Topics/Email/Attachments. Contents of Rank, Entities, and Topics do not change after account is ingested. Email and Attachments change based on selections from the graph and the Email View pane.

Network graph has disconnected components

This is because BCC doesn't explicitly have any recipients. Need to change this so that the individual who's email it is is explicitly added as a BCC. This should eliminate the disconnected components.

Attachments can't be displayed

Many attachments can’t be displayed.
o Steps to Reproduce:

  • Select show attachments for [email protected]
  • Sort by type
  • Scroll down and select link for Brett Signature.jpg
    Result: new tab displays a 404 Not Found error message.

Email View pane covers email text

The Email View pane covers email text. One option is to pin the vertical scrollbar based on whether the Email View pane is displayed. The issue is trying to scroll through an email when the Email View pane is displayed.
o Steps to reproduce:
 Select the arrow at the bottom middle of the application to display the Email View pane.
 Sort by Body size
 Select an email with a large body
 Select the Email tab at the top
 Use vertical scroll bar to scroll to the bottom of the email.
Issue: analyst can’t see the bottom part of email unless you select the arrow to minimize the Email View pane.

ingest error

vagrant@vagrant-ubuntu-trusty-32:/srv/software/newman/work_dir$ cat ingest_20150611003032.err.log
Traceback (most recent call last):
File "./ingest/mitie/mitie_entity_ingest_file.py", line 65, in
r = extract(email_id, body)
File "./ingest/mitie/mitie_entity_ingest_file.py", line 29, in extract_entities
for rng, tag in entities ]

ValueError: too many values to unpack

Topics and there relevance scores?

Not sure how Topics are generated. I selected the first Topic (jobs state governor candidate rail) which kicked off a search and returned a list of 541 Documents. The very first document’s subject had nothing listed from the topics listed. When I read the document, governor had one hit and rail had 2. However, there were many URLs listed in the document. I went to them and these topics were listed. How did this document get listed as the most relevant?

ingest issue

ingest err.log
Traceback (most recent call last):
File "./ingest/mitie/mitie_entity_ingest_file.py", line 65, in
r = extract(email_id, body)
File "./ingest/mitie/mitie_entity_ingest_file.py", line 29, in extract_entities
for rng, tag in entities ]
ValueError: too many values to unpack

ingest tee.log
ingesting - [email protected]
working dir /srv/software/newman
ingest data
tx: 1
ingested count - 0
entity extraction
loading NER model...

Tags output by this NER model: ['PERSON', 'LOCATION', 'ORGANIZATION', 'MISC']
class 'mysql.connector.errors.InternalError

am i doing something wrong then ?

Add ability to hide/show arrows

Add ability to hide/show arrows. Currently, the graph shows some arrows on the initial display. When you drill down to a small subset of nodes though the arrows go away. A nice toggle selection like what was done for labels would be nice.

Firefox does not prompt for credentials

Firefox browser does not prompt for credentials. Steps to Reproduce: Clicked on URL for public facing newman, default user was already logged in (kmrindfleisch) using Firefox browser (have never provided credentials through this browser). Followed same URL on Chrome browser, was prompted for credentials.

Add Combine Entities feature

Allow user to combine entities. Walker, Scott Walker, and Scott were all obviously the same Entity. Same with Kelly Rindfleisch and ‘Kelly Rindfleisch.

Allow user to specify topics

Allow user to specify topics. For example, ‘fraud money billing payoff spy’ would be an excellent set of topics for this analysis. These user specified topics (i.e. search terms) could then be used to score the emails.

Graph highlighting for Rank

Issue with highlighting nodes by Rank.
o Steps to Reproduce:
 Select options to display graph by Rank
Result: the highlighting for Rank covers multiple nodes. Need to limit the size.

Email counts in dashboard not as expected

There are a few issues with the counts shown in the dashboard.

1st Issue: Sent + Received does not equal Total Email. This is actually a non-issue. Total Email count is being filtered by the date range picked. So, Total Email will be equal to or less than Sent plus Received. A tooltip or help documentation would help explain this to new users.

Select a dataset, initiate a search for "test". The left hand pane updates but incorrectly.
[email protected] 5917, 6498, 11770
[email protected] 1218, 823, 1709
[email protected] 0, 0, 4
test 0, 0, 1007

2nd issue: Total Email count not updating after a search
-- 11770 and 1709 values did not update. All the values (not included the 1007 for test search) should be less than 1007.

3rd issue: Sent plus Received should be greater than or equal to Total Email
-- How can you have 0 Sent and 0 recieved and get 4 Total Email. I looked at the four email for [email protected] and Sent and Received should be 2.

4th issue: Confusing counts on Search term.
-- The record for search term (in this example test) should probably show null (or n/a) for Sent and Received.

Firefox issue copying text

Unable to highlight/copy text from displayed emails in Firefox browser. Works in Chrome.

Location: Email tab

Exclude grey as a node color

Exclude grey as a node color. Reading grey text on the grey background of the Email View pane is challenging.

Location: Graph and Entity View panes

Allow analyst ability to merge entities

Enhancement Request:

Allow the analyst to merge entities. Jeb Bush and Jeb are obviously the same entity, let the user merge them and adjust all the statistics.

Change Topic Scores mouseover

In Topic Scores, change mouse over selection to anywhere in column. Currently, many topics have a low score. Trying to mouse over low scoring columns to get the topic popup list is challenging.

Search by Community isn't working.

Search by Community isn’t working.
Steps to Reproduce:
• Select a node
• Select Search by Community
Result: nothing happens; no search is initiated.
Expected Result: Display only those nodes in that community; list all the emails from that community in the Email View pane. ???

Improve panel layout after adjusting Zoom

System: Windows 7 OS, Chrome Version 47.0.2526.111 m
Steps to reproduce:

  1. Open Chrome browser and navigate to 10.1.70.162:8787. See attachment.
    Issue: Dashboard does not use entire real-estate. Layout looks good in that the left hand widget (list of dataset, search term and top 10 email addresses) does not overlap with the right hand pane (sent/received/attached, Entities/Topics, Ranks/Domains/Communities/Attach Types widgets).
  2. Enter a search term (i.e. test) and then select search term. See attachment.
    Issue: Left hand pane (graph) overlaps the right hand pane (documents list)
  3. Select any email. See attachment.
    Issue: Left hand pane does not display where expected. Overlaps with right hand pane and is displayed more at the bottom of the browser.
  4. Select and use mouse wheel to adjust zoom level to 125%. See attachment.
    Issue: too many to list. Most annoying though is the forward/back feature button is hidden under ID: and can’t be selected.
    defaultzoom
    graphoverlaplist
    emailoverlaplist
    zoom125

Visually indicate which email/row is selected

Highlight or somehow visually indicate which email (row) has been selected. Currently, highlighting is done by mouse over. So, user highlights a row selects it and then email is displayed. As soon as the mouse is moved off the row, the highlighting updates to where selected. Now have no indication which row the current email being displayed corresponds with.

Location: Email View pane

Replies and Forwards are not pulled out correctly

Currently the entire email regardless of what it is is run through the various NLP processes which causes the email header (to,from,cc,etc) and sometimes previous bodies to be used. This throws off topics and entities badly.

Need to talk about fixing this. Probably is part of a larger discussion about email threading and email structure.

Collapse email chains with latest displayed by default

Collapse email chains with latest displayed by default. Currently, emails are listed by ID (separate issue requesting to sort by date) which means email chains are often displayed with the oldest listed first with other emails listed in between follow on replies/forwards.

Would be nice to have the email chains collapsed and the latest displayed in list for selection. Even without collapsing, sorting by date will make the listing more organized.

Provide a timeline playback feature

Enhancement request:

Provide a playback feature where the analyst can display the sequence of events. The example talked about was showing Person A sending an email to Person B who then sends a response back to Person A and forwards to person C. The Timeline/Playback could step through sequential events or step through time and show events happening. Many many ways of providing this capability.

Move Entities Legend

Move Entities Legend to line up with bar graph (move right). Initially, I thought the Entities Legend was associated with the Graph.

Location: Entities tab

Add Help link

Add a Help link. This link could initially just go to sotera.github.io/newman/features.
o Provide better explanation on the custom algorithm used to determine important email addresses.
o Explain how topics are generated.

Graph node label issue

Have issue with showing Labels and mouse over.
Steps to Reproduce:

  1. Select yipsusan.gmail.com dataset and then [email protected]
  2. Select option to show Labels
  3. Move mouse cursor over nodes.
    Result: labels disappear as cursor is moved over each node.

Improve data ingest workflow

Enhancement request:

Improve data ingest workflow. Often this request was mentioned as a ‘One button Ingest’ capability. The difficulties explained in accomplishing this task all seemed to revolve around the ultra-complex ingestion of the sod data which was an often mislabeled multi-format set of datasets. I think this ingestion should be tackled incrementally. First step is a One button Ingest .pst file. Provide an Ingest PST button (or menu choice) that lets the user navigate to a .pst file. Provide the user with a progress bar and handle the current 12 step process in the background. If they pick a word document, csv file or anything other than a .pst file then display an error dialog. Second step could be to add .mbox file. Third step is a help file with an example of a complex (i.e. multiple .pst, .mbox file) ingestion scenario.

Handle quotes within quotes

Looks like quotes are excluded when determining Entities. Saw an issue where an entity is made with quotes (‘Kelly Rindfleisch). Document ID scottwalker/00079__00080 has ‘Kelly Rindfleisch identified as an Entity. It’s basically quotes inside double-quotes.

Make the notifications more prominent

Make the notification 'Searching on ' more prominent. One option is to make it larger and possibly move it more to the middle of the display. System seems to be frozen but it’s just processing. Eventually, I've learned to look down and see if a search is in progress. Another option is to provide a progress bar.

Use a standard Date format

Change Date format to something more standard. Might be as simple as replacing the ‘T’ with space. I’m not sure it really matters but the format doesn’t indicate AM/PM or time zone either.

Provide 'No attachments found' status

Add ‘No attachments found’ for those email addresses that have no attachments. Needed for when searching for attachments and none are found.

Location: Attachments pane

Provide a workspace ability

Enhancement Request:

Provide a workspace concept. This doesn’t necessarily have to be called workspace. The FBI often refers to them as cases. The goal is to allow the analyst to save his investigation work/progress. This could include the searches performed. The graph generated. The email and attachments that are of most interest to the analyst. An input text form that explains the goals, purpose, steps taken, status, and notes for his investigation. Etc. etc. One major requirement is that what is saved today is available tomorrow.

Make Email tab active after a Search by Email operation

When the analyst selects a node and then Search by Email, open the Email tab and display the first email in the list. This would make it more consistent with node selection/Show Attachments (i.e. the right-hand pane displays the Attachments pane with a list of all the attachments for that node/email address).

No License

Please include a license for the project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.