man-group / dtale Goto Github PK

Visualizer for pandas data structures

Home Page: http://alphatechadmin.pythonanywhere.com

License: GNU Lesser General Public License v2.1

JavaScript 1.07% Dockerfile 0.21% Python 41.78% CSS 6.87% HTML 0.61% SCSS 0.18% TypeScript 49.27%

data-analysis data-science data-visualization flask ipython jupyter-notebook pandas plotly-dash python27 python3 react react-virtualized visualization xarray

dtale's People

Contributors

Stargazers

Watchers

Forkers

jasonkholden alexcrownjr rosofo paurichardson scls19fr femtotrader lukeaanderso ceallen moalaaelden guriben ktaranov astrojuanlu yaxche-io tunstallzs tchigher trendingtechnology tony32769 fagan2888 benthomasson linyaru polybius12 brianfarrelly ogabrielluiz ericpmwong spgraham chingweiyu sultankhaledalmutairi mbrukman yanis7774 leandroordonez ahmedhaies thefoxburrow johnjdailey em5813 mikedarn dstammel plucena24 sanyam07 shalevy1 usaama-ansari hiwire03 marcogorelli afcarl nogtini mgavish-forked data-scientist-dude verde2006 eugenepy sdhalpern86 joshxyzhimself atefar2 simoneperazzoli pfcid edalevol diemesleno bbbbpage alphapacada andrefmb yg79 rishi1111 sidmishra7 krishnatray narayanmahto trigunawan89 student1304 technology-institute webclinic017 sigmakappa pragmaticcoder rje4242 federerkk kevin-paraforge duncan-haywood ebgaspar chorseng alirezanouri samuelbayani janardhanv pilasande mswastik nmangera davidma0908 jataware davidalphafox smileycap gvprime collegroup narendra-neerukonda sebasurbina tahasha mpwjames lepy arseniysky mroshan74 smitakshigupta maxcodextc manikant92 adbmd khadim2090 santannamf

dtale's Issues

Add support for display of timestamps in date columns

Currently by default pandas timestamps only show the date portion. Need to add additional handling for timestamps:

additional formatter popup based on the fact that the column you've selected is a datetime
handling of multiple column selects and datatypes

opening multiple instances only displays data from most-recent instance

import pandas as pd
import dtale
for i range(5):
    df = pd.DataFrame([dict(a=i, b=i+1)])
    print df
    dtale.show(df, port=50000 + i)

unpin jsonschema & flasgger

Currently flasgger is pinned to version 0.9.3 which in turn forces jsonschema to a version <3.0.0.

Possible options moving forward:

find a way to unpin flasgger & jsonschema
drop flasgger support
make flasgger available as an extra requirement

FYI, other packages which currently have collisions with the version of jsonschema are:

sphinx (strictly for documentation purposes)

Ability to use rolling computations in charts

Yes, it would be great to have a feature to perform rolling computations. If it's not asking much, it would be nice to be able to specify the computation method.

Please, let me know if you need help implementting or specifying it.

Originally posted by @mindlessbrain in #43 (comment)

Inplace format

Now when I set format for a column it only changes the view and doesn't inplace the format. it would be helpful to inplace it on the df

"Dataset exceeds 15000 records, cannot render." when plotting

Really, really nice library. :)

There is a dataset limitation of 15.000. Any progress on this?

Allow truncation of columns

Since columns are sized to fit the largest value, they can become extremely wide for columns of string/text data (making it very difficult to view the dataframe).

Adding an option to the 'Format' pop-up for "Truncate at [number of] characters" would be helpful. It would also be great if hovering over the truncated value or ellipsis resulted in the entire value being displayed.

I also think it would be neat if this was automatically applied to columns past a certain length, ex: if df['column_one'].map(lambda x: len(x)).max() > 50, truncate it. Since I'd guess 99% of people would not want ridiculously wide columns by default. But that isn't too important

Value Error

I have no idea what happened but this cannot run...

ValueError Traceback (most recent call last)
in
----> 1 dtale.show(df)

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/app.py in show(data, host, port, name, debug, subprocess, data_loader, reaper_on, open_browser, notebook, force, context_vars, ignore_duplicate, **kwargs)
465 url = build_url(ACTIVE_PORT, ACTIVE_HOST)
466 instance = startup(url, data=data, data_loader=data_loader, name=name, context_vars=context_vars,
--> 467 ignore_duplicate=ignore_duplicate)
468 is_active = not running_with_flask_debug() and is_up(url)
469 if is_active:

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/views.py in startup(url, data, data_loader, name, data_id, context_vars, ignore_duplicate)
445 global_state.set_settings(data_id, dict(locked=curr_locked))
446 global_state.set_data(data_id, data)
--> 447 global_state.set_dtypes(data_id, build_dtypes_state(data, global_state.get_dtypes(data_id) or []))
448 global_state.set_context_variables(data_id, build_context_variables(data_id, context_vars))
449 return DtaleData(data_id, url)

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/views.py in build_dtypes_state(data, prev_state)
340 """
341 prev_dtypes = {c['name']: c for c in prev_state or []}
--> 342 ranges = data.agg([min, max]).to_dict()
343 dtype_f = dtype_formatter(data, get_dtypes(data), ranges, prev_dtypes)
344 return [dtype_f(i, c) for i, c in enumerate(data.columns)]

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
6710 result = None
6711 try:
-> 6712 result, how = self._aggregate(func, axis=axis, *args, **kwargs)
6713 except TypeError:
6714 pass

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/frame.py in _aggregate(self, arg, axis, *args, **kwargs)
6724 result = result.T if result is not None else result
6725 return result, how
-> 6726 return super()._aggregate(arg, *args, **kwargs)
6727
6728 agg = aggregate

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
562 elif is_list_like(arg):
563 # we require a list, but not an 'str'
--> 564 return self._aggregate_multiple_funcs(arg, _level=_level, _axis=_axis), None
565 else:
566 result = None

~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
619 # if we are empty
620 if not len(results):
--> 621 raise ValueError("no results")
622
623 try:

ValueError: no results

Split Date and text columns

It would be helpful to split date column (for example Year/Month/Day) and split text column based on the separators

Add hot-keys

It is helpful to have hotkeys

Add the ability to load R data into D-Tale

This should be possible using Rpy2

IP Address Could not be found

Hi,

I am trying to use DTale on a jupyterhub running on a remote centos machine, however it is giving the below error. My guess it that I should somehow manually specify the IP Address of the machine or localhost so that it can link to itself properly but I am not sure how.

Thanks in advance,
Hristo

Timeseries chart viewing

If you have a date column with multiple values, give the users the ability to view columns as a timeseries chart.

Would need to add some sort of grouping functionality
Limitation on how many groups can be displayed and if there are too many force the users to filter down to a specific group before viewing the timeseries chart.

Export python code

Is it possible to add a feature to export the panda's code or mathplotlib's code for explored charts or filtering or groupby?
It can be like a popup text box in which the user can simply copy and paste the code to reuse it in the Jupyter notebook.

Coverage Analysis

In the demo video, there is Coverage Analysis. I installed version 1.7.6 and it doesn't have. where can I find or activate it?

Selecting multiple columns for y-axis in chart builder

We should be able to select multiple columns to plot in the chart builder and have a y-axis associated with each column selected. This we can plot multiple items against each other.

Column filtering

This is a feature request. Right now, filtering can be done using the df.query method from the main menu. That's great, but often you only need to filter by certain values in a column.

My suggestion would be to add a filtering option in the context menu of the columns, so that the filtering expression would be applied only to that column, saving you from having to type the full name of the column.

Another addition would be that for columns that include discrete values (categories), you could select the values you want to filter on instead of having to write them down.

Both options are implemented at qgrid.

Highlight columns based on the dtype

It would be perfect to have a setting which user can select it to change the column's back color based on the column dtype
for example

Text is white
Int is light blue
Date is light pink

Periods in column names cause page to go blank

If a data frame column has a period in the name, as in 'meta.site_id' the rendered page goes blank as soon as you click on that column. No errors logged, just blank screen.

dtale 1.7.3
pandas 0.25.1

Column Menu cutoff on last column of wide dataframes

When you have a dataframe that extends all the way to the end of your browser window it will be cut in half.

Add hint to "Correlations" popup that lets users know to click on the cells

Maybe add another one for the timeseries chart that will let users know about the scatter chart

Add ability to build new basic columns

It would be nice if the menu had a "Build Column" option so that users could quickly & easily modify the underlying data.

The basic idea would be that the 'Build column' interface would prompt users to enter:

The new column's name (freetext w/ verification that it's valid)
The type of column (picklist)
The source column(s), i.e. the ones used to drive the calculations (picklist)
Any additional options dependent on column type

The types of columns could be:

Sum
Prompt the user to pick two numeric columns
Difference
Prompt the user to pick two numeric columns
Bins (evenly spaced)
Prompt the user to pick one column.
Additional options: Number of bins, bin labels.
Use pd.cut
Bins (evenly sized)
Prompt the user to pick one column.
Additional options: number of bins, and bin labels)
Use pd.qcut
Datetime property
Prompt the user to pick one datetime column
Additional options: property (one of the Series.dt.__ properties like time, year, month, etc.)
(Maybe?) String function
Prompt the user to pick one string column
Additional options: function (Series.str.__ such as endswith, startswith, len, etc.) and value

And possibly more, will update this over time...

[DOC] Reduce the size of documentation gifs

Great work on this project. Cloning this repository is fairly time consuming, as there are 100 MB in the /docs folder. Of this 100 MB, 40 MB come from the gifs. It might be nice to try to reduce the size of these gifs, if possible, for people cloning with throttled/limited download speeds 😄

Make "Instance" menu option displayed by default

Currently it only displays if you have more than one instance open in a terminal. Maybe update the text to display the number of instances in a little bootstrap badge

pivot table

It is helpful to select 3 columns and create a pivot table (in a new tab) based on the clo1, col2, and col3

Install to bare virtualenv fails

Install to a bare virtualenv fails:
% virtualenv -p python3 virt-dtale
virt-dtale% pip install six
virt-dtale% python setup.py develop

yield the following error

Moving flask_ngrok-0.0.25-py3.6.egg to /home/jason/virt-dtale/lib/python3.6/site-packages
flask-ngrok 0.0.25 is already the active version in easy-install.pth

Installed /home/jason/virt-dtale/lib/python3.6/site-packages/flask_ngrok-0.0.25-py3.6.egg
error: The 'Flask' distribution was not found and is required by dtale

A quick google shows maybe we should set the version in setup.py (it's pretty old)
pallets/flask#1106

Build Column with Random number

It would be helpful to create a new column with Random in range number or Random Characters

Loading initial chart as non-line in chart builder

If you try to change the chart type from "line" in the chart builder before initially loading data it will cause the page to break.

Updates on the rolling correlation

@mindlessbrain v1.6.5 is available which includes the rolling correlation functionality. Will keep you posted on the other features I mentioned earlier.

Originally posted by @aschonfeld in #45 (comment)

Filter & Formats popups no longer have smooth transition from top of screen

When you go to open the Filter or Formats popups they no longer smoothly transition out of the top of the browser

Table display not scaling with column width on Chrome/Ubuntu Linux

If you do a docker-compose up from 4b32689 and then navigate to 127.0.0.1:8081 (dtale 3.6 pulling dated data from mongo), the formatting of the tabular data isn't scaling to column width.

No module named past.utils

Installation seems to miss a dependency. Works after a pip install future.

double clicking cells in correlation grid for scatter will cause chart not to display

Add logic to ignore successive clicks...

allow for names to be used in url strings for data_id

Currently D-Tale always creates an integer string for data_id when adding data to a session. For example:
http://localhost:40000/dtale/main/1

If when adding the data the user has seet the name parameter we should allow then to hit D-Tale using that name value. For example:
http://localhost:40000/dtale/main/[name]

This will make links a little more clear about what data you're viewing

highlight numbers more/less than a number

Now with the format, we can only highlight numbers <0. it would be helpful to have a range like
x>5 and x<10

Jupyter Notebook - showing table not working

Example behavior:

When I run 2nd cell with d.open_browser() it opens (as expected) a new browser tab with the table.
After that when I re-run the 1st cell the table appeared as expected in the notebook.

I did it several times and it started being worse. Running command:
'd = dtale.show(df, notebook=True)`
takes long time to display the table (over 2min).

I restarted kernel and tested several combinations of commands. Only following works:

The last 2 cell take very long time to run.

And after several runs it stopped working at all:

I was restarting kernel each time. I even restarted jupyter notebook several times.

Sorry for so many screenshots. I wanted to explain the issue in details.
I hope it helps.

Scatter charts in chart builder

If a user selects a numeric (int/float) column for the x-axis & y-axis then you should have the ability to select "Scatter" as a chart type. This will be limited to 15,000 points or else the client will crash.

You should also be able to group by a column, but in this case the user will be presented with scatters for each group since it is impossible to display them all in one scatter chart.

Linked Chart Builder

Additionally, I loved the linked behaviour of the "Correlations" popup.

Do you think it's possible to have a tool for building linked charts?
(perhaps this should be on a new issue)

Originally posted by @mindlessbrain in #45 (comment)

My initial proposal for this behavior would be to add an additional button to the tooltips of line charts in the chart builder with the following conditions:

the button would only be available when using an 'aggregation', otherwise there isn't really much data to show
by default it will show a modal window with a bar chart (might get a little confusing if there are duplicate data points)
you can toggle between the different styles of charts (line, bar, pie, wordcloud)

Let me know if I'm going down the right path...

Replicate dtale Correlations tab live demo example

Hi,

I am trying to replicate the live demo, specifically the behaviour of the Correlations tab when you have a time series and display its rolling correlation followed by the scatter plots after clicking on the line plot.

I tested with the following code and got the following output:

import pandas as pd
import numpy as np
import dtale

ii = pd.date_range(start='2018-01-01', end='2019-12-01', freq='D')
ii = pd.Index(ii, name='date') 

n = ii.shape[0]
c = 5
data = np.random.random((n, c))

df = pd.DataFrame(data, index=ii)

d = dtale.show(df)

I also tried resetting the df index in order to put the date as a column as in the demo but got the same behaviour:
dtale.show(df.reset_index())

I installed dtale via pip (pip install --upgrade dtale) inside an Anaconda environment on Windows.

Does anyone know if I should preformat the data or turn some flag on?

Mis-renamed colums crash dtale under Jupyter

The following minimal example will cause the Jupyter kernel to crash with all sorts of amazing looking IO errors:

import dtale
import pandas as pd

df = pd.DataFrame([[1,2],[3,4]], columns=['a','b'])

dtale.show(df.rename(columns={'a':'b'}))

The subtle error here is where I rename a column over the top of another one. This new dataframe seems to look OK, and can be printed to a terminal, but dtale blows up when rendering it.

Ideally, pandas should not allow me to do this, and this might just be dtale not being able to handle corrupted internal memory in pandas (in which case I'll go file this in pandas instead).

Add color bars for date or value ranges in Scatter Charts

@aschonfeld, it could also be useful to be able to add color bars for date or value ranges.
Here's a matplotlib example from PyMC3.

Originally posted by @mindlessbrain in #48 (comment)

Apply a custom function

It would be helpful to define some functions/replace dictionary and apply/map them to the different columns

Failed to connect D-Tale process in hosted notebook

Hi there,

I installed dtale v1.7.11 in my notebook, which is hosted in Google K8S engine as a pod. The DTale process can be assigned properly to a reference:

import dtale
import pandas as pd

df = pd.DataFrame([dict(a=1,b=2,c=3)])
d = dtale.show(df)

However I failed to connect to it as the the kernel keeps busy when running print(d).

I checked the pod logs and realized this has something to do with the port:

textPayload: 
"- - -" 0 - "-" 158 0 129 - "-" "-" "-" "-" "<POD_IP>:40000" 
PassthroughCluster <POD_IP>:52390 <POD_IP>:40000 <POD_IP>:52388 -

I've read that for hosted notebooks, the port 40000 has to be allowed but I'm not quite sure where should I start to configure this. My notebook container is running inside a pod, together with an istio sidecar proxy. The pod is managed by a StatefulSet which exposes a Service with a cluster ip.

I'm not quite sure if this is the right place to file this issue, plz let me know if I should rather head to the k8s side. And many thanks in advance!

Describe popup does full refresh when clicking rows in dtype grid

KeyError: '1' when plotting MultiIndex data.

First off all I absolutely low this package, thank you so much for your effort! I am however having some issues:

I have the following dataframe containing fundamental data (columns) for different stocks (ticker) at different times (asof):

So let's say I want to plot the evolution of a feature (EQ_EARN_SHARE) from Apple (ticker == 'AAPL') over time (asof), I would suspect I should do this:

Sadly this results in the error:

Traceback (most recent call last):
  File "C:\Users\Peter\Anaconda3\lib\site-packages\dtale\dash_application\charts.py", line 723, in build_figure_data
    data = run_query(DATA[data_id], query, CONTEXT_VARIABLES[data_id])
KeyError: '1'

What am I missing?

fix link to "Charts" in jupyter cells when running within PyCharm so it opens a new window

Currently does nothing when you're within a jupyter cell being run from PyCharm's jupyter plugin

Jump to 9:12 in this video to see what I'm talking about

Behind Reverse Proxy

I tried many possibilities by using Apache Reverse Proxy in order to expose dtale widgets outside my JupyterLab/JupyterHub Platform, but I didn't find any solution.

Here's some rules I tested with Apache web server, but it doesn't works :

RewriteRule /datalab/mynode/dtale/5100/(.) ws://127.0.0.1:41000/dtale/main/$1 [P,L]
ProxypassMatch /datalab/mynode/dtale/5100/(.) http://127.0.0.1:41000/dtale/main/$1
ProxypassReverse /datalab/mynode/dtale/5100/(.*) http://127.0.0.1:41000/dtale/main/$1

You should specify in your doc where to configure the "base_url" in order to expose your dtale outside.

Nowadays I consider dtale like a "localhost" component, impossible to open it for teams who want to develop and collaborate by using JupyterLab instances on a remote server.

It seems that dtale has been developed for only selfish developpers who are working on their laptop !

Be more open-mind please.

adding Replace command

It would be helpful to replace a column's content like excel

Dtale doesn't work when using Google Colab

Using Google Colab, I installed Dtale first by doing: !pip install -U dtale and it installed successfully.
After installing dtale, I read my data using pandas df = pd.read_csv("credit_card.csv") then using dtale by doing the following: dtale.show(df) and It gives me a link. When opening, it says that the site can't be reached. I tried doing it again to have a new link, but still faced the same problem.

Consider if we should support dtale[arctic] extras functionality

This may be a better way to conditionally support arctic as a conditional dependency

https://setuptools.readthedocs.io/en/latest/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies

setup(
    name="Project-A",
    ...
    extras_require={
        'PDF':  ["ReportLab>=1.2", "RXP"],
        'reST': ["docutils>=0.3"],
    }
)