Git Product home page Git Product logo

man-group / dtale Goto Github PK

View Code? Open in Web Editor NEW
4.5K 4.5K 382.0 128.25 MB

Visualizer for pandas data structures

Home Page: http://alphatechadmin.pythonanywhere.com

License: GNU Lesser General Public License v2.1

JavaScript 1.07% Dockerfile 0.21% Python 41.78% CSS 6.87% HTML 0.61% SCSS 0.18% TypeScript 49.27%
data-analysis data-science data-visualization flask ipython jupyter-notebook pandas plotly-dash python27 python3 react react-virtualized visualization xarray

dtale's People

Contributors

0442a403 avatar aflag avatar anthrax1 avatar appznoix avatar ariffrahimin avatar aschonfeld avatar astrojuanlu avatar cdeil avatar colin-jensen avatar dependabot[bot] avatar dsblank avatar gidim avatar goodship1 avatar jasonkholden avatar matrach avatar msabramo avatar mswastik avatar nriver avatar ogabrielluiz avatar phillipdupuis avatar poodlewars avatar rileymshea avatar thekchang avatar vlarcta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dtale's Issues

Add support for display of timestamps in date columns

Currently by default pandas timestamps only show the date portion. Need to add additional handling for timestamps:

  • additional formatter popup based on the fact that the column you've selected is a datetime
  • handling of multiple column selects and datatypes

unpin jsonschema & flasgger

Currently flasgger is pinned to version 0.9.3 which in turn forces jsonschema to a version <3.0.0.

Possible options moving forward:

  • find a way to unpin flasgger & jsonschema
  • drop flasgger support
  • make flasgger available as an extra requirement

FYI, other packages which currently have collisions with the version of jsonschema are:

  • sphinx (strictly for documentation purposes)

Ability to use rolling computations in charts

Yes, it would be great to have a feature to perform rolling computations. If it's not asking much, it would be nice to be able to specify the computation method.

Please, let me know if you need help implementting or specifying it.

Originally posted by @mindlessbrain in #43 (comment)

Inplace format

Now when I set format for a column it only changes the view and doesn't inplace the format. it would be helpful to inplace it on the df

Allow truncation of columns

Since columns are sized to fit the largest value, they can become extremely wide for columns of string/text data (making it very difficult to view the dataframe).

Adding an option to the 'Format' pop-up for "Truncate at [number of] characters" would be helpful. It would also be great if hovering over the truncated value or ellipsis resulted in the entire value being displayed.

I also think it would be neat if this was automatically applied to columns past a certain length, ex: if df['column_one'].map(lambda x: len(x)).max() > 50, truncate it. Since I'd guess 99% of people would not want ridiculously wide columns by default. But that isn't too important

Value Error

I have no idea what happened but this cannot run...


ValueError Traceback (most recent call last)
in
----> 1 dtale.show(df)

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/app.py in show(data, host, port, name, debug, subprocess, data_loader, reaper_on, open_browser, notebook, force, context_vars, ignore_duplicate, **kwargs)
465 url = build_url(ACTIVE_PORT, ACTIVE_HOST)
466 instance = startup(url, data=data, data_loader=data_loader, name=name, context_vars=context_vars,
--> 467 ignore_duplicate=ignore_duplicate)
468 is_active = not running_with_flask_debug() and is_up(url)
469 if is_active:

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/views.py in startup(url, data, data_loader, name, data_id, context_vars, ignore_duplicate)
445 global_state.set_settings(data_id, dict(locked=curr_locked))
446 global_state.set_data(data_id, data)
--> 447 global_state.set_dtypes(data_id, build_dtypes_state(data, global_state.get_dtypes(data_id) or []))
448 global_state.set_context_variables(data_id, build_context_variables(data_id, context_vars))
449 return DtaleData(data_id, url)

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/views.py in build_dtypes_state(data, prev_state)
340 """
341 prev_dtypes = {c['name']: c for c in prev_state or []}
--> 342 ranges = data.agg([min, max]).to_dict()
343 dtype_f = dtype_formatter(data, get_dtypes(data), ranges, prev_dtypes)
344 return [dtype_f(i, c) for i, c in enumerate(data.columns)]

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
6710 result = None
6711 try:
-> 6712 result, how = self._aggregate(func, axis=axis, *args, **kwargs)
6713 except TypeError:
6714 pass

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/frame.py in _aggregate(self, arg, axis, *args, **kwargs)
6724 result = result.T if result is not None else result
6725 return result, how
-> 6726 return super()._aggregate(arg, *args, **kwargs)
6727
6728 agg = aggregate

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
562 elif is_list_like(arg):
563 # we require a list, but not an 'str'
--> 564 return self._aggregate_multiple_funcs(arg, _level=_level, _axis=_axis), None
565 else:
566 result = None

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
619 # if we are empty
620 if not len(results):
--> 621 raise ValueError("no results")
622
623 try:

ValueError: no results

Split Date and text columns

It would be helpful to split date column (for example Year/Month/Day) and split text column based on the separators

IP Address Could not be found

Hi,

I am trying to use DTale on a jupyterhub running on a remote centos machine, however it is giving the below error. My guess it that I should somehow manually specify the IP Address of the machine or localhost so that it can link to itself properly but I am not sure how.

image

Thanks in advance,
Hristo

Timeseries chart viewing

If you have a date column with multiple values, give the users the ability to view columns as a timeseries chart.

  • Would need to add some sort of grouping functionality
  • Limitation on how many groups can be displayed and if there are too many force the users to filter down to a specific group before viewing the timeseries chart.

Export python code

Is it possible to add a feature to export the panda's code or mathplotlib's code for explored charts or filtering or groupby?
It can be like a popup text box in which the user can simply copy and paste the code to reuse it in the Jupyter notebook.

Coverage Analysis

In the demo video, there is Coverage Analysis. I installed version 1.7.6 and it doesn't have. where can I find or activate it?

Column filtering

This is a feature request. Right now, filtering can be done using the df.query method from the main menu. That's great, but often you only need to filter by certain values in a column.

My suggestion would be to add a filtering option in the context menu of the columns, so that the filtering expression would be applied only to that column, saving you from having to type the full name of the column.

Another addition would be that for columns that include discrete values (categories), you could select the values you want to filter on instead of having to write them down.

Both options are implemented at qgrid.

Highlight columns based on the dtype

It would be perfect to have a setting which user can select it to change the column's back color based on the column dtype
for example

  • Text is white
  • Int is light blue
  • Date is light pink

Periods in column names cause page to go blank

If a data frame column has a period in the name, as in 'meta.site_id' the rendered page goes blank as soon as you click on that column. No errors logged, just blank screen.

dtale 1.7.3
pandas 0.25.1

``

Add ability to build new basic columns

It would be nice if the menu had a "Build Column" option so that users could quickly & easily modify the underlying data.

The basic idea would be that the 'Build column' interface would prompt users to enter:

  1. The new column's name (freetext w/ verification that it's valid)
  2. The type of column (picklist)
  3. The source column(s), i.e. the ones used to drive the calculations (picklist)
  4. Any additional options dependent on column type

The types of columns could be:

  1. Sum
    Prompt the user to pick two numeric columns

  2. Difference
    Prompt the user to pick two numeric columns

  3. Bins (evenly spaced)
    Prompt the user to pick one column.
    Additional options: Number of bins, bin labels.
    Use pd.cut

  4. Bins (evenly sized)
    Prompt the user to pick one column.
    Additional options: number of bins, and bin labels)
    Use pd.qcut

  5. Datetime property
    Prompt the user to pick one datetime column
    Additional options: property (one of the Series.dt.__ properties like time, year, month, etc.)

  6. (Maybe?) String function
    Prompt the user to pick one string column
    Additional options: function (Series.str.__ such as endswith, startswith, len, etc.) and value

And possibly more, will update this over time...

[DOC] Reduce the size of documentation gifs

Great work on this project. Cloning this repository is fairly time consuming, as there are 100 MB in the /docs folder. Of this 100 MB, 40 MB come from the gifs. It might be nice to try to reduce the size of these gifs, if possible, for people cloning with throttled/limited download speeds ๐Ÿ˜„

pivot table

It is helpful to select 3 columns and create a pivot table (in a new tab) based on the clo1, col2, and col3

Install to bare virtualenv fails

Install to a bare virtualenv fails:
% virtualenv -p python3 virt-dtale
virt-dtale% pip install six
virt-dtale% python setup.py develop

yield the following error

Moving flask_ngrok-0.0.25-py3.6.egg to /home/jason/virt-dtale/lib/python3.6/site-packages
flask-ngrok 0.0.25 is already the active version in easy-install.pth

Installed /home/jason/virt-dtale/lib/python3.6/site-packages/flask_ngrok-0.0.25-py3.6.egg
error: The 'Flask' distribution was not found and is required by dtale

A quick google shows maybe we should set the version in setup.py (it's pretty old)
pallets/flask#1106

Jupyter Notebook - showing table not working

Example behavior:
image

When I run 2nd cell with d.open_browser() it opens (as expected) a new browser tab with the table.
After that when I re-run the 1st cell the table appeared as expected in the notebook.

I did it several times and it started being worse. Running command:
'd = dtale.show(df, notebook=True)`
takes long time to display the table (over 2min).

I restarted kernel and tested several combinations of commands. Only following works:
image
The last 2 cell take very long time to run.

And after several runs it stopped working at all:
image
image

I was restarting kernel each time. I even restarted jupyter notebook several times.

Sorry for so many screenshots. I wanted to explain the issue in details.
I hope it helps.

Scatter charts in chart builder

If a user selects a numeric (int/float) column for the x-axis & y-axis then you should have the ability to select "Scatter" as a chart type. This will be limited to 15,000 points or else the client will crash.

You should also be able to group by a column, but in this case the user will be presented with scatters for each group since it is impossible to display them all in one scatter chart.

Linked Chart Builder

Additionally, I loved the linked behaviour of the "Correlations" popup.

Do you think it's possible to have a tool for building linked charts?
(perhaps this should be on a new issue)

Originally posted by @mindlessbrain in #45 (comment)

My initial proposal for this behavior would be to add an additional button to the tooltips of line charts in the chart builder with the following conditions:

  • the button would only be available when using an 'aggregation', otherwise there isn't really much data to show
  • by default it will show a modal window with a bar chart (might get a little confusing if there are duplicate data points)
  • you can toggle between the different styles of charts (line, bar, pie, wordcloud)

Let me know if I'm going down the right path...

Replicate dtale Correlations tab live demo example

Hi,

I am trying to replicate the live demo, specifically the behaviour of the Correlations tab when you have a time series and display its rolling correlation followed by the scatter plots after clicking on the line plot.

I tested with the following code and got the following output:

import pandas as pd
import numpy as np
import dtale

ii = pd.date_range(start='2018-01-01', end='2019-12-01', freq='D')
ii = pd.Index(ii, name='date') 

n = ii.shape[0]
c = 5
data = np.random.random((n, c))

df = pd.DataFrame(data, index=ii)

d = dtale.show(df)

image
image

I also tried resetting the df index in order to put the date as a column as in the demo but got the same behaviour:
dtale.show(df.reset_index())

image
image

I installed dtale via pip (pip install --upgrade dtale) inside an Anaconda environment on Windows.

Does anyone know if I should preformat the data or turn some flag on?

Mis-renamed colums crash dtale under Jupyter

The following minimal example will cause the Jupyter kernel to crash with all sorts of amazing looking IO errors:

import dtale
import pandas as pd

df = pd.DataFrame([[1,2],[3,4]], columns=['a','b'])

dtale.show(df.rename(columns={'a':'b'}))

The subtle error here is where I rename a column over the top of another one. This new dataframe seems to look OK, and can be printed to a terminal, but dtale blows up when rendering it.

Ideally, pandas should not allow me to do this, and this might just be dtale not being able to handle corrupted internal memory in pandas (in which case I'll go file this in pandas instead).

Apply a custom function

It would be helpful to define some functions/replace dictionary and apply/map them to the different columns

Failed to connect D-Tale process in hosted notebook

Hi there,

I installed dtale v1.7.11 in my notebook, which is hosted in Google K8S engine as a pod. The DTale process can be assigned properly to a reference:

import dtale
import pandas as pd

df = pd.DataFrame([dict(a=1,b=2,c=3)])
d = dtale.show(df)

However I failed to connect to it as the the kernel keeps busy when running print(d).

I checked the pod logs and realized this has something to do with the port:

textPayload: 
"- - -" 0 - "-" 158 0 129 - "-" "-" "-" "-" "<POD_IP>:40000" 
PassthroughCluster <POD_IP>:52390 <POD_IP>:40000 <POD_IP>:52388 -

I've read that for hosted notebooks, the port 40000 has to be allowed but I'm not quite sure where should I start to configure this. My notebook container is running inside a pod, together with an istio sidecar proxy. The pod is managed by a StatefulSet which exposes a Service with a cluster ip.

I'm not quite sure if this is the right place to file this issue, plz let me know if I should rather head to the k8s side. And many thanks in advance!

KeyError: '1' when plotting MultiIndex data.

First off all I absolutely low this package, thank you so much for your effort! I am however having some issues:

I have the following dataframe containing fundamental data (columns) for different stocks (ticker) at different times (asof):
image

So let's say I want to plot the evolution of a feature (EQ_EARN_SHARE) from Apple (ticker == 'AAPL') over time (asof), I would suspect I should do this:
image

Sadly this results in the error:

Traceback (most recent call last):
  File "C:\Users\Peter\Anaconda3\lib\site-packages\dtale\dash_application\charts.py", line 723, in build_figure_data
    data = run_query(DATA[data_id], query, CONTEXT_VARIABLES[data_id])
KeyError: '1'

What am I missing?

Behind Reverse Proxy

I tried many possibilities by using Apache Reverse Proxy in order to expose dtale widgets outside my JupyterLab/JupyterHub Platform, but I didn't find any solution.

Here's some rules I tested with Apache web server, but it doesn't works :

RewriteRule /datalab/mynode/dtale/5100/(.) ws://127.0.0.1:41000/dtale/main/$1 [P,L]
ProxypassMatch /datalab/mynode/dtale/5100/(.
) http://127.0.0.1:41000/dtale/main/$1
ProxypassReverse /datalab/mynode/dtale/5100/(.*) http://127.0.0.1:41000/dtale/main/$1

You should specify in your doc where to configure the "base_url" in order to expose your dtale outside.

Nowadays I consider dtale like a "localhost" component, impossible to open it for teams who want to develop and collaborate by using JupyterLab instances on a remote server.

It seems that dtale has been developed for only selfish developpers who are working on their laptop !

Be more open-mind please.

Dtale doesn't work when using Google Colab

Using Google Colab, I installed Dtale first by doing: !pip install -U dtale and it installed successfully.
After installing dtale, I read my data using pandas df = pd.read_csv("credit_card.csv") then using dtale by doing the following: dtale.show(df) and It gives me a link. When opening, it says that the site can't be reached. I tried doing it again to have a new link, but still faced the same problem.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.