Git Product home page Git Product logo

facets's Introduction

Introduction

The facets project contains two visualizations for understanding and analyzing machine learning datasets: Facets Overview and Facets Dive.

The visualizations are implemented as Polymer web components, backed by Typescript code and can be easily embedded into Jupyter notebooks or webpages.

Live demos of the visualizations can be found on the Facets project description page.

Facets Overview

Overview visualization of UCI census data

Overview gives a high-level view of one or more data sets. It produces a visual feature-by-feature statistical analysis, and can also be used to compare statistics across two or more data sets. The tool can process both numeric and string features, including multiple instances of a number or string per feature.

Overview can help uncover issues with datasets, including the following:

  • Unexpected feature values
  • Missing feature values for a large number of examples
  • Training/serving skew
  • Training/test/validation set skew

Key aspects of the visualization are outlier detection and distribution comparison across multiple datasets. Interesting values (such as a high proportion of missing data, or very different distributions of a feature across multiple datasets) are highlighted in red. Features can be sorted by values of interest such as the number of missing values or the skew between the different datasets.

The python code to generate the statistics for visualization can be installed through pip install facets-overview. As of version 1.1.0, the facets-overview package requires a version of protobuf at version 3.20.0 or later.

Details about Overview usage can be found in its README.

Facets Dive

Dive visualization of UCI census data

Dive is a tool for interactively exploring up to tens of thousands of multidimensional data points, allowing users to seamlessly switch between a high-level overview and low-level details. Each example is a represented as single item in the visualization and the points can be positioned by faceting/bucketing in multiple dimensions by their feature values. Combining smooth animation and zooming with faceting and filtering, Dive makes it easy to spot patterns and outliers in complex data sets.

Details about Dive usage can be found in its README.

Setup

Usage in Google Colabratory/Jupyter Notebooks

Using Facets in Google Colabratory and Jupyter notebooks can be seen in this notebook. These notebooks work without the need to first download/install this repository.

Both Facets visualizations make use of HTML imports. So in order to use them, you must first load the appropriate polyfill, through <script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js"></script>, as shown in the demo notebooks in this repo.

Note that for using Facets Overview in a Jupyter notebook, there are two considerations:

  1. In the notebook, you will need to change the path that the Facets Overview python code is loaded from to the correct path given where your notebook kernel is run from.
  2. You must also have the Protocol Buffers python runtime library installed: https://github.com/google/protobuf/tree/master/python. If you used pip or anaconda to install Jupyter, you can use the same tool to install the runtime library.

When visualizing a large amount of data in Dive in a Juypter notebook, as is done in the Dive demo Jupyter notebook, you will need to start the notebook server with an increased IOPub data rate. This can be done with the command jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000.

Code Installation

git clone https://github.com/PAIR-code/facets
cd facets

Building the Visualizations

If you make code changes to the visualization and would like to rebuild them, follow these directions:

  1. Install bazel: https://bazel.build/
  2. Build the visualizations: bazel build facets:facets_jupyter (run from the facets top-level directory)

Using the rebuilt Visualizations in a Jupyter notebook

If you want to use the visualizations you built locally in a Jupyter notebook, follow these directions:

  1. Move the resulting vulcanized html file from the build step into the facets-dist directory: cp -f bazel-bin/facets/facets-jupyter.html facets-dist/
  2. Install the visualizations into Jupyter as an nbextension.
  • If jupyter was installed with pip, you can use jupyter nbextension install facets-dist/ if jupyter was installed system-wide or jupyter nbextension install facets-dist/ --user if installed per-user (run from the facets top-level directory). You do not need to run any follow-up jupyter nbextension enable command for this extension.
  • Alternatively, you can manually install the nbextension by finding your jupyter installation's share/jupyter/nbextensions folder and copying the facets-dist directory into it.
  1. In the notebook cell's HTML link tag that loads the built facets html, load from /nbextensions/facets-dist/facets-jupyter.html, which is the locally installed facets distribution. from the previous step.

Known Issues

  • The Facets visualizations currently work only in Chrome - Issue 9.

Disclaimer: This is not an official Google product

facets's People

Contributors

andylou2 avatar brills avatar chihuahua avatar chivesrs avatar contractorwolf avatar faviovazquez avatar hplantec avatar jameswex avatar jart avatar jimbojw avatar jli avatar jonathan-lemos avatar jsiddique avatar junjaytan avatar lincolnfrias avatar montanalow avatar morganics avatar moustafaaatta avatar mpushkarna avatar pfrstg avatar rictic avatar royalliii avatar rrdelaney avatar stephanwlee avatar sudharakap avatar tobyjamesthomas avatar tvalentyn avatar vfdev-5 avatar yannic avatar youbinmo-g avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facets's Issues

can't find html file

七月 24, 2017 4:01:04 下午 io.bazel.rules.closure.http.filter.LoggingFilter handle
信息: 100,565µs /0:0:0:0:0:0:0:1:45886 404 GET /
七月 24, 2017 4:01:04 下午 io.bazel.rules.closure.http.filter.LoggingFilter handle
信息: 4,121µs /0:0:0:0:0:0:0:1:45886 404 GET /favicon.ico
七月 24, 2017 4:02:11 下午 io.bazel.rules.closure.http.filter.LoggingFilter handle
信息: 3,755µs /0:0:0:0:0:0:0:1:45886 404 GET /facets-overview/function-tests/simple/index.html
七月 24, 2017 4:02:38 下午 io.bazel.rules.closure.http.filter.LoggingFilter handle
信息: 3,892µs /0:0:0:0:0:0:0:1:45886 404 GET /facets-overview/function-tests/index.html
after runing bazel i meet this pro, can u solve this, many thanks!

Add a way to download a selection of the data

Right now we can dice and slice the datasets but there is no way (as far as I know) to 'visually select' part of your dataset and download it as a json list file.

I can imagine two main form of interaction:

  • Drag a rectangle around a region
  • Shift + click on the boxes
    and then a download selection button.

Accept JSON data on <facets-overview> component

Would there be a way to use the <facets-overview> component without creating a protobuf? I'd love if there could be some default inference, taking data= in the same way the <facets-dive> works.

Facets Dive - Instructions for setting atlasUrl are underspecified

The instructions in the Facets Dive README.md state:

To specify the URL to an atlas to use, set the atlasUrl property of the Dive Polymer Element in JavaScript (or the atlas-url attribute in HTML).

This is insufficient. The distinction between atlasUrl and atlas-url should be made more clear by example. The README.md file should include this JS snippet to showcase the atlasUrl property:

const vis = document.querySelector('facets-dive');
vis.atlasUrl = 'your_sprite_image.png';

That file should also show this HTML snippet for the atlas-url attribute:

<facets-dive atlas-url="your_sprite_image.png"></facets-dive>

To serve the sprite atlas image from the bazel test server, you have to include it as a source in the test target of the BUILD file.

ts_web_library(
    name = "test",
    testonly = True,
    srcs = [
        "test.html",
        "test.ts",
        "your_sprite_image.png",  ### ADD THIS ###
    ],
    path = "/facets-dive/components/facets-dive",
    deps = [
        ":facets_dive",
        "//facets_dive/lib/test:externs",
        "@org_tensorflow_tensorboard//tensorboard/components/tf_imports:web_component_tester",
    ],
)

Getting 404 GET /nbextensions/facets-dist/facets-jupyter.html

After installing I'm getting a 404... is this due to missing nbextension config possibly?

✔ ~/projects/wisdot 
04:26 $ jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000
[I 04:26:39.830 NotebookApp] [nb_conda_kernels] enabled, 4 kernels found
[I 04:26:40.312 NotebookApp] ✓ nbpresent HTML export ENABLED
[W 04:26:40.313 NotebookApp] ✗ nbpresent PDF export DISABLED: No module named nbbrowserpdf.exporters.pdf
[I 04:26:40.353 NotebookApp] [jupyter_nbextensions_configurator] enabled 0.2.5
[I 04:26:40.358 NotebookApp] [nb_conda] enabled
[I 04:26:40.379 NotebookApp] [nb_anacondacloud] enabled
[I 04:26:40.386 NotebookApp] Serving notebooks from local directory: /Users/charles.hack/projects/wisdot
[I 04:26:40.386 NotebookApp] 0 active kernels 
[I 04:26:40.387 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 04:26:40.387 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
0:42: execution error: "http://localhost:8888/tree" doesn’t understand the “open location” message. (-1708)
[W 04:26:53.917 NotebookApp] 404 GET /api/kernels/8d085097-d70c-4684-b62c-c8918fe91922/channels?session_id=B9C7E2A87D1C4FD185022437200F1AA1 (::1): Kernel does not exist: 8d085097-d70c-4684-b62c-c8918fe91922
[W 04:26:53.946 NotebookApp] 404 GET /api/kernels/8d085097-d70c-4684-b62c-c8918fe91922/channels?session_id=B9C7E2A87D1C4FD185022437200F1AA1 (::1) 38.32ms referer=None
[W 04:27:10.343 NotebookApp] Replacing stale connection: 8d085097-d70c-4684-b62c-c8918fe91922:B9C7E2A87D1C4FD185022437200F1AA1
[I 04:27:38.571 NotebookApp] Saving file at /WisDOT Accidents v170620.ipynb
[I 04:27:38.896 NotebookApp] Saving file at /WisDOT Accidents v170620.ipynb
[W 04:27:43.255 NotebookApp] 404 GET /nbextensions//usr/local/share/jupyter/nbextensions/facets-jupyter.html.js?v=20170721042639 (::1) 4.62ms referer=http://localhost:8888/notebooks/WisDOT%20Accidents%20v170620.ipynb
[W 04:27:43.281 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20170721042639 (::1) 1.65ms referer=http://localhost:8888/notebooks/WisDOT%20Accidents%20v170620.ipynb
[W 04:27:44.536 NotebookApp] 404 GET /nbextensions/facets-dist/facets-jupyter.html (::1) 1.72ms referer=http://localhost:8888/notebooks/WisDOT%20Accidents%20v170620.ipynb
[I 04:27:44.591 NotebookApp] Kernel started: 1f1c3a9e-afee-430c-ac12-6820346bbcf1

Dive_demo.ipynb doesn't show the data

I'm trying to run the Dive_demo.ipynb example but the HTML interface doesn't show any points. All is set to NONE. I initialized the jupyter as described on readme file:
$ jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000

Comparing features between multiple dataset points

Hi, I've found Facets to be a very interesting visualization tool, however I'm wondering if it is possible to compare features between two selected examples (or even more)? I know that it's not an issue, but rather a question about capabilities of this software.

For example: consider a csv file with power consumption generated by different houses, and there is a pair of two houses which have similar properties, but completely different levels of power consumption. It would be nice to visually compare features of both, and try to understand why they differ so much. Having tabular view of differences between those samples would be the easiest way (in my opinion). Are you going to extend this feature somehow? Thank you in advance.

why facets dive consume so much memory

I loaded a 80MB file and found that my computer was in a hight load, so i have to close the jupter notebook. i also want to know what's the maximum amount of data the facets dive can handle?

Bazel error - can't run bazel (mac OS 10.12.6)

Hey,

When I try to run bazel build facets:facets_jupyter I get the following error message:

bazel build facets:facets_jupyter
ERROR: bazel does not currently work properly from paths containing spaces (com.google.devtools.build.lib.runtime.BlazeWorkspace@2d9005a).

Any hints on how to get it working in order for me to see the visualizations?

tried using in webpage but failed.

I was trying to start the service only on webpage rather than in a notebook, using bazel run facets:facets the webpage on http://127.0.0.1:6006/facets/visualizations.html did not render and gave me several JS syntex import errors like import CommonStatistics from 'goog:proto.featureStatistics.CommonStatistics';. It it even possible to start the service this way?

width of dive window

It is possible to set the height of dive in jupyter as follows:

<facets-dive id="elem" height="1000">

However, a width parameter does not seem to exist? Is it possible to set the width, there is a lot of wasted space on the right.

Error when running demo

When I try running:
bazel run //facets_dive/demo --incomtible_disallow_set_constructor=false

There's the error:
every rule of type ts_web_library implicitly depends upon the target '@com_google_javascript_closure_compiler_externs//:com_google_javascript_closure_compiler_externs', but this target could not be found because of: no such package '@com_google_javascript_closure_compiler_externs//': The set constructor for depsets is deprecated and will be removed. Please use the depset constructor instead. You can temporarily enable the deprecated set constructor by passing the flag --incompatible_disallow_set_constructor=false

What's missing? How to add the package?

Using ubuntu 14.04
bazel 0.6.1
anaconda3, python 3.6

Unable to view sprite atlas in Dive

I have installed Facets to be used in a jupyter notebook and followed the instructions on the github page. I tried to load the provided notebook, Dive_demo.ipynb, but I am unable to view any data points (although clicking on different areas inside the image does display information on the right hand side).
image

This same visualization displays properly on the facets website https://pair-code.github.io/facets/.
image

Is there any additional setup I must complete to view the default sprite atlas?

Boolean features show up as errors

I have boolean features which are either 0.0 / 1.0 (numpy.floats). Because they're zero (legitimately), facets considers them to be erroneous. Is there a different datatype I should be using for booleans?

Support utf-8 csv?

Hi,
When I use the demo website, it cannot show the utf-8 csv correctly.
Dose it only supports ansi csv?
Thanks.
2017-07-20 10 35 34

Trouble with running with Bazel (OS 10.12.6)

Getting the following error from Bazel attempts:

ERROR: /private/var/tmp/_bazel_debraji/4a5c48c7124814fee87ce4b6b5ca6383/external/io_bazel_rules_closure/closure/private/defs.bzl:27:16: The set constructor for depsets is deprecated and will be removed. Please use the depset constructor instead. You can temporarily enable the deprecated set constructor by passing the flag --incompatible_disallow_set_constructor=false.
ERROR: error loading package '': Extension file 'closure/private/defs.bzl' has errors.
INFO: Elapsed time: 0.055s

Which version of Bazel would I be running to avoid this error?

Trying to find the use cases for google facets

Hi Team, great work with the both overview and dive. I have been looking at the things that this application is able to do and would appreciate it if you can share the motivations for creating this repo. I am thinking of using this application for my current datascience project and would like to understand your motivations so that I can think of cool insights that I can draw from my datasets.

Allow multi-select in facets-dive

Allow for users to easily select more than one item in facets-dive

  • Multi-select with control-click
  • Perhaps select all items in a given facet-box by clicking the background of that box?

Additionally, users have requested exposing a list of the indices of the selected data items (and not just an array of those data items themselves) as a property or event when selection changes.

Lastly, add a box/highlight around selected items to make it clear which are selected. As discussed, might be easiest to put an svg box around the element on the svg plane behind the canvas.

Jupyter server get 404

Environment: Python 2.7/ipython 5.4.1 installed in Ubuntu 16.04 without any virtual environment and Jupyter was installed by pip

I've tried both method to install the nbextension (jupyter nbextension install command & directly copy to /usr/local/share/jupyter/nbextension path). Both result in same output as below:

[I 18:33:47.582 NotebookApp] 0 active kernels
[I 18:33:47.582 NotebookApp] The Jupyter Notebook is running at: http://172.16.2.96:8080/
[I 18:33:47.582 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 18:33:54.933 NotebookApp] 404 GET /nbextensions/facets-jupyter.html.js?v=20170811183347 (172.16.3.131) 11.91ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.936 NotebookApp] 404 GET /nbextensions/facets.js?v=20170811183347 (172.16.3.131) 1.89ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.938 NotebookApp] 404 GET /nbextensions/facets-dist/main.js?v=20170811183347 (172.16.3.131) 1.88ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.940 NotebookApp] 404 GET /nbextensions/facets-dist/.js?v=20170811183347 (172.16.3.131) 1.88ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.943 NotebookApp] 404 GET /nbextensions/facet.js?v=20170811183347 (172.16.3.131) 1.90ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.945 NotebookApp] 404 GET /nbextensions/facetsmain.js?v=20170811183347 (172.16.3.131) 1.87ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.955 NotebookApp] 404 GET /nbextensions/notebook.js?v=20170811183347 (172.16.3.131) 1.76ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.958 NotebookApp] 404 GET /nbextensions/facets-dist/facets-jupyter.html.js?v=20170811183347 (172.16.3.131) 1.77ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.968 NotebookApp] 404 GET /nbextensions//usr/local/share/jupyter/nbextensions/facets-dist/main.js?v=20170811183347 (172.16.3.131) 1.77ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.971 NotebookApp] 404 GET /nbextensions/facets/main.js?v=20170811183347 (172.16.3.131) 1.74ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[W 18:33:54.979 NotebookApp] 404 GET /nbextensions//home/datalab/work/.js?v=20170811183347 (172.16.3.131) 1.80ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[I 18:33:55.127 NotebookApp] Kernel started: 363cbc00-2912-4623-bbdd-251f889af915
[W 18:33:55.193 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20170811183347 (172.16.3.131) 2.40ms referer=http://172.16.2.96:8080/notebooks/luoyan/Untitled.ipynb
[I 18:33:55.635 NotebookApp] Adapting to protocol v5.1 for kernel 363cbc00-2912-4623-bbdd-251f889af915

I also checked other issues but it seems they r in virtual environment. So some solution like re-initialization can not be implemented.

Will keep working on it.

display(HTML(html)) doesn't display anything

anyone could make the overview work in jupyter?
i followed all the readme files and at end the display doesn't display anything and don't give me any errors too... i'm missing something?

Running Jupyter notebook example for Facets Overview fails

I am receiving the following error message:

TypeError: 24720 has type <class 'numpy.int64'>, but expected one of: (<class 'float'>, <class 'int'>, <class 'int'>)

Full error log:

TypeError Traceback (most recent call last)
c:\users\shahar karny\appdata\local\programs\python\python35\lib\site-packages\google\protobuf\internal\python_message.py in init(self, **kwargs)
460 try:
--> 461 setattr(self, field_name, field_value)
462 except TypeError:

c:\users\shahar karny\appdata\local\programs\python\python35\lib\site-packages\google\protobuf\internal\python_message.py in field_setter(self, new_value)
595 # (0, 0.0, enum 0, and False).
--> 596 new_value = type_checker.CheckValue(new_value)
597 if clear_when_set_to_default and not new_value:

c:\users\shahar karny\appdata\local\programs\python\python35\lib\site-packages\google\protobuf\internal\type_checkers.py in CheckValue(self, proposed_value)
108 (proposed_value, type(proposed_value), self._acceptable_types))
--> 109 raise TypeError(message)
110 return proposed_value

TypeError: 24720 has type <class 'numpy.int64'>, but expected one of: (<class 'float'>, <class 'int'>, <class 'int'>)

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last)
in ()
5 gfsg = GenericFeatureStatisticsGenerator()
6 proto = gfsg.ProtoFromDataFrames([{'name': 'train', 'table': train_data},
----> 7 {'name': 'test', 'table': test_data}])
8 protostr = base64.b64encode(proto.SerializeToString()).decode("utf-8")

c:\GitProjects\facets\facets_overview\python\base_generic_feature_statistics_generator.py in ProtoFromDataFrames(self, dataframes)
53 'name': dataframe['name']
54 })
---> 55 return self.GetDatasetsProto(datasets)
56
57 def DtypeToType(self, dtype):

c:\GitProjects\facets\facets_overview\python\base_generic_feature_statistics_generator.py in GetDatasetsProto(self, datasets, features)
256 high_rank=val_index,
257 sample_count=val[0],
--> 258 label=printable_val)
259 if val_index < 2:
260 featstats.top_values.add(

c:\users\shahar karny\appdata\local\programs\python\python35\lib\site-packages\google\protobuf\internal\containers.py in add(self, **kwargs)
367 arguments may be used to initialize the element.
368 """
--> 369 new_element = self._message_descriptor._concrete_class(**kwargs)
370 new_element._SetListener(self._message_listener)
371 self._values.append(new_element)

c:\users\shahar karny\appdata\local\programs\python\python35\lib\site-packages\google\protobuf\internal\python_message.py in init(self, **kwargs)
461 setattr(self, field_name, field_value)
462 except TypeError:
--> 463 _ReraiseTypeErrorWithFieldName(message_descriptor.name, field_name)
464
465 init.module = None

c:\users\shahar karny\appdata\local\programs\python\python35\lib\site-packages\google\protobuf\internal\python_message.py in _ReraiseTypeErrorWithFieldName(message_name, field_name)
384
385 # re-raise possibly-amended exception with original traceback:
--> 386 raise type(exc)(exc, sys.exc_info()[2])
387
388

TypeError: (TypeError("24720 has type <class 'numpy.int64'>, but expected one of: (<class 'float'>, <class 'int'>, <class 'int'>) for field Bucket.sample_count",), <traceback object at 0x0000027750F9A708>)

How to display visualization outise of Jupyter and webserver

I would like to use the library directly in plain python, however, it isn't clear to me how to do that.
Also I will appreciate if someone can explain why we need to build with bazel and run a server to display the visualization. Why we can't just construct the html page through python code and open it with a browser.
I looked into the code of https://pair-code.github.io/facets/index.html as well as the functional tests however I couldn't come up with the way to do it.

I am aware of the different issues including #15 however none of them has clear and thorough answer.

it would be nice to have a guide of possible ways of using the library:

  • Jupyter (this is already there).
  • In a web server
  • with plain python

thanks :)

How to deploy facets as a webserver and display overview&dive without jupyter notebook.

I have to say facets overview and dive is a great job. And i already deployed facets in a docker container , simultaneously facets-overview can be displayed remotely in chrome with “xx.xx.xx.xx:6006/facets-overview/functional-tests/simple/index.html” .
But i want to exchange test data with my own csv/images dataset which hardcorded in simple_test.ts when devserver is running .
So must i parse dataset in typescript files (such as simple_test.js) ?
How can i use python interface to parse data in server part , and display facets_overview&dive without jupyter notebook , just like a tensorboard start server with tensorboard --logdir=name1:/path/to/logs/1,name2:/path/to/logs/2 in server
(e.g. such as parsing data and display remotely in facets overview&dive with tornado) ?

Neither Overview or Drive works in Firefox.

It either hangs the browser, or outputs nothing in my Jupyter notebook. Understand firefox is not supported at this moment, but would love to see them working in browsers other than chrome.

Fails to render

Demo snippet fails with following error if JSON text contains ' or \\:

Javascript error adding output!
SyntaxError: Invalid or unexpected token
See your browser Javascript console for more details.

Error Replication

from IPython.core.display import display, HTML
import pandas as pd

d = [{'text': '\\'}] # --> fails 
d = [{'text': '\''}] # --> fails
jsonstr = pd.DataFrame(d).to_json(orient='records')

# Display the Dive visualization for this data
from IPython.core.display import display, HTML

HTML_TEMPLATE = """<link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html">
        <facets-dive id="elem" height="600"></facets-dive>
        <script>
          var data = JSON.parse('{jsonstr}');
          document.querySelector("#elem").data = data;
        </script>"""
html = HTML_TEMPLATE.format(jsonstr=jsonstr)
display(HTML(html))

Alternative
If you already have a valid JSON string you can pass the data as it is without calling parse function. It is better to add the following snippet in demo notebooks:

HTML_TEMPLATE = """<link rel="import" href="/nbextensions/facets-dist/facets-jupyter.html">
        <facets-dive id="elem" height="600"></facets-dive>
        <script>
          var data = {jsonstr};
          document.querySelector("#elem").data = data;
        </script>"""
html = HTML_TEMPLATE.format(jsonstr=jsonstr)
display(HTML(html))

Atlas image too small to accommodate atlas capacity.

Keep getting this error.

I'm not sure why - where is capacity variable set? Does this have to do with the number of entries in the son string and matching that up with the number of spites on the atlas? Please help!

How to make Facets Overview less laggy on Jupyter notebook?

Hey guys, I'm loving the functionality of facets overview.

However I tried using it on my company's data on a jupyter notebook and everything is very very laggy. I've set --NotebookApp.iopub_data_rate_limit=1.0e11 but it's still very slow. The training set is 150K rows by 100 columns and the test set is 25K rows by 100 columns.

Any tips on how to make things less laggy? Or are these larger datasets the current limit of facets on jupyter? I still have a lot of RAM available but facets doesn't seem to be using it.

Nothing is displayed on Jupyter in Chrome for Overview

display(HTML(html)) doesn't show anything on Facets-overview on jupyter running in Chrome. I can run the tests with bazel though. Worth mentioning, Dive works fine.
Also I don't get any errors in the notebook or in the browser console so I looked into the code a bit to see if errors are handled silently somewhere but didn't find any leads.
Any leads from where I can start debugging?

Add BytesStatistics info to categorical feature table

Features that have type BYTES and contain a BytesStatistics message display in the categorical feature table. Update the code for this table to show the # of unique and the avg length fields from BytesStatistics in the table.

How to load and view your data?

After reading through all the interesting technicalities on what the facets is I decided to try it on one of my datasets and realized that I still don't know how to load and view the data? Is there a normal tutorial that basically says: if you have a dataset D, do a, b and c, voilà ?

Make python code pip-installable

Make the facets-overview python code pip-installable and upload it to pypi. Update the documentation and examples to install using pip.

Add support for log scale in dive

for numerical features with very wide ranges, it would be useful to be able to use different scale when binning. A contrived example can be see in that screenshot:

screen shot 2017-08-09 at 5 04 52 pm

Features with high cardinality crashes chrome (Overview)

I find this library super helpful for data exploration! However, I faced a problem with datasets that have categorical columns with high cardinality since each value is included in the generated proto stats object as a histogram bucket. I couldn't make it work for a dataset with 2.5M records with some columns have +1M unique values, naturally, chrome crashes.

I see that the idea of features whitelist is already implemented, however, there is still no way to pass the list. Anyways, a whitelist would require us checking the features beforehand to get rid of codes/IDs and would prevent us from checking the useful stats of those features like the rate of missing values.

I added the possibility for categorical features to display only the top K values in the histograms.
The idea is that for features with that much cardinality, a histogram wouldn't be very useful. K can be large enough to make sure to capture all values in low cardinality features.
This makes the generated stats object size nearly constant regardless of the sample size taken from the dataset.
Additionally, even in cases where it's still possible to display the visualization ( up to 500K records, maybe more) my browser became a little bit laggy and it takes some time for the visualization to appear. A smaller object speeds it up a lot.

I already have this implemented locally and working with it. If you agree with this approach, let me know and I will submit a PR.

Specify url

Just run the test example in facets_dive/components.
When trying to add user defined images, I generated a 800x100 png with 8 symbols in it, put it into the components/facets_dive folder and set
atlas-url="test_sprite.png"
in facets-dive.html
But it's not found.
INFO: 28,851µs /0:0:0:0:0:0:0:1:51068 404 GET /facets-dive/components/facets-dive/test_sprite.png
How to specify the url?

empty cells are treated as zeros, all statistics are therefore wrong

I have uploaded a CSV with empty cells, on the demo site at https://pair-code.github.io/facets/
but all the empty cells are treated as zeros and therefore the calculated statistical values like mean,median, st.dev are wrong.

See attached Handelsbanken_fonder_DB.xlsx file, e.g. the column "10 ar" contains many (337) empty cells, with no value, but these are wrongly treated as 0 by facets. The median should be 59.4, but facets gives instead 31.5.

Handelsbanken_fonder_DB.xlsx

expected is that empty cells are counted as "missing / nodata" and not used in the statistical calculations.

Error when creating ProtoFromDataFrames from DataFrame without column names

I receive the following error when calling ProtoFromDataFrames after creating a DataFrame without naming the columns:

gfsg = GenericFeatureStatisticsGenerator()
df = pd.DataFrame(my_samples)
proto = gfsg.ProtoFromDataFrames([{'name': 'data', 'table': df}])

0 has type <class 'numpy.int64'>, but expected one of: (<class 'bytes'>, <class 'str'>) for field FeatureNameStatistics.name

Facets Dive handing of long strings

If Facets Dive deals with a feature that is a very long unique string, such as an encoded image string, it uses that feature automatically as the label, which becomes unreadable. It also prints the entire encoded string in the info card when an item is selected.

I think Dive should take care to not use huge strings as automatically determined label feature, and not display all of them by default in the info card.

can't build with bazel on linux

I run the command to build and found below errors:
bazel build facets:facets_jupyter

ERROR: /home/rum/.cache/bazel/_bazel_rum/145145386566f2ed261e1410aff2d253/external/io_bazel_rules_closure/closure/private/defs.bzl:27:16: The set constructor for depsets is deprecated and will be removed. Please use the depset constructor instead. You can temporarily enable the deprecated set constructor by passing the flag --incompatible_disallow_set_constructor=false
ERROR: error loading package '': Extension file 'closure/private/defs.bzl' has errors
ERROR: error loading package '': Extension file 'closure/private/defs.bzl' has errors
INFO: Elapsed time: 0.125s
FAILED: Build did NOT complete successfully (0 packages loaded)

Then I changed the command to below which is followed with other errors:
bazel build facets:facets_jupyter --incompatible_disallow_set_constructor=false

ERROR: /work/facets/facets/BUILD:8:1: every rule of type ts_web_library implicitly depends upon the target '@com_google_javascript_closure_compiler_externs//:com_google_javascript_closure_compiler_externs', but this target could not be found because of: no such package '@com_google_javascript_closure_compiler_externs//': The set constructor for depsets is deprecated and will be removed. Please use the depset constructor instead. You can temporarily enable the deprecated set constructor by passing the flag --incompatible_disallow_set_constructor=false
ERROR: Analysis of target '//facets:facets_jupyter' failed; build aborted
INFO: Elapsed time: 0.414s
FAILED: Build did NOT complete successfully (7 packages loaded)

Please help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.