man-group / dtale Goto Github PK
View Code? Open in Web Editor NEWVisualizer for pandas data structures
Home Page: http://alphatechadmin.pythonanywhere.com
License: GNU Lesser General Public License v2.1
Visualizer for pandas data structures
Home Page: http://alphatechadmin.pythonanywhere.com
License: GNU Lesser General Public License v2.1
Currently by default pandas timestamps only show the date portion. Need to add additional handling for timestamps:
import pandas as pd
import dtale
for i range(5):
df = pd.DataFrame([dict(a=i, b=i+1)])
print df
dtale.show(df, port=50000 + i)
Currently flasgger is pinned to version 0.9.3 which in turn forces jsonschema to a version <3.0.0.
Possible options moving forward:
FYI, other packages which currently have collisions with the version of jsonschema are:
Yes, it would be great to have a feature to perform rolling computations. If it's not asking much, it would be nice to be able to specify the computation method.
Please, let me know if you need help implementting or specifying it.
Originally posted by @mindlessbrain in #43 (comment)
Now when I set format for a column it only changes the view and doesn't inplace the format. it would be helpful to inplace it on the df
Really, really nice library. :)
There is a dataset limitation of 15.000. Any progress on this?
Since columns are sized to fit the largest value, they can become extremely wide for columns of string/text data (making it very difficult to view the dataframe).
Adding an option to the 'Format' pop-up for "Truncate at [number of] characters" would be helpful. It would also be great if hovering over the truncated value or ellipsis resulted in the entire value being displayed.
I also think it would be neat if this was automatically applied to columns past a certain length, ex: if df['column_one'].map(lambda x: len(x)).max() > 50, truncate it. Since I'd guess 99% of people would not want ridiculously wide columns by default. But that isn't too important
I have no idea what happened but this cannot run...
ValueError Traceback (most recent call last)
in
----> 1 dtale.show(df)
~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/app.py in show(data, host, port, name, debug, subprocess, data_loader, reaper_on, open_browser, notebook, force, context_vars, ignore_duplicate, **kwargs)
465 url = build_url(ACTIVE_PORT, ACTIVE_HOST)
466 instance = startup(url, data=data, data_loader=data_loader, name=name, context_vars=context_vars,
--> 467 ignore_duplicate=ignore_duplicate)
468 is_active = not running_with_flask_debug() and is_up(url)
469 if is_active:
~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/views.py in startup(url, data, data_loader, name, data_id, context_vars, ignore_duplicate)
445 global_state.set_settings(data_id, dict(locked=curr_locked))
446 global_state.set_data(data_id, data)
--> 447 global_state.set_dtypes(data_id, build_dtypes_state(data, global_state.get_dtypes(data_id) or []))
448 global_state.set_context_variables(data_id, build_context_variables(data_id, context_vars))
449 return DtaleData(data_id, url)
~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/dtale/views.py in build_dtypes_state(data, prev_state)
340 """
341 prev_dtypes = {c['name']: c for c in prev_state or []}
--> 342 ranges = data.agg([min, max]).to_dict()
343 dtype_f = dtype_formatter(data, get_dtypes(data), ranges, prev_dtypes)
344 return [dtype_f(i, c) for i, c in enumerate(data.columns)]
~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/frame.py in aggregate(self, func, axis, *args, **kwargs)
6710 result = None
6711 try:
-> 6712 result, how = self._aggregate(func, axis=axis, *args, **kwargs)
6713 except TypeError:
6714 pass
~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/frame.py in _aggregate(self, arg, axis, *args, **kwargs)
6724 result = result.T if result is not None else result
6725 return result, how
-> 6726 return super()._aggregate(arg, *args, **kwargs)
6727
6728 agg = aggregate
~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/base.py in _aggregate(self, arg, *args, **kwargs)
562 elif is_list_like(arg):
563 # we require a list, but not an 'str'
--> 564 return self._aggregate_multiple_funcs(arg, _level=_level, _axis=_axis), None
565 else:
566 result = None
~/opt/anaconda3/envs/ThinkBayes2/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
619 # if we are empty
620 if not len(results):
--> 621 raise ValueError("no results")
622
623 try:
ValueError: no results
It would be helpful to split date column (for example Year/Month/Day) and split text column based on the separators
It is helpful to have hotkeys
This should be possible using Rpy2
Hi,
I am trying to use DTale on a jupyterhub running on a remote centos machine, however it is giving the below error. My guess it that I should somehow manually specify the IP Address of the machine or localhost so that it can link to itself properly but I am not sure how.
Thanks in advance,
Hristo
If you have a date column with multiple values, give the users the ability to view columns as a timeseries chart.
Is it possible to add a feature to export the panda's code or mathplotlib's code for explored charts or filtering or groupby?
It can be like a popup text box in which the user can simply copy and paste the code to reuse it in the Jupyter notebook.
In the demo video, there is Coverage Analysis. I installed version 1.7.6 and it doesn't have. where can I find or activate it?
We should be able to select multiple columns to plot in the chart builder and have a y-axis associated with each column selected. This we can plot multiple items against each other.
This is a feature request. Right now, filtering can be done using the df.query method from the main menu. That's great, but often you only need to filter by certain values in a column.
My suggestion would be to add a filtering option in the context menu of the columns, so that the filtering expression would be applied only to that column, saving you from having to type the full name of the column.
Another addition would be that for columns that include discrete values (categories), you could select the values you want to filter on instead of having to write them down.
Both options are implemented at qgrid.
It would be perfect to have a setting which user can select it to change the column's back color based on the column dtype
for example
If a data frame column has a period in the name, as in 'meta.site_id' the rendered page goes blank as soon as you click on that column. No errors logged, just blank screen.
dtale 1.7.3
pandas 0.25.1
``
When you have a dataframe that extends all the way to the end of your browser window it will be cut in half.
Maybe add another one for the timeseries chart that will let users know about the scatter chart
It would be nice if the menu had a "Build Column" option so that users could quickly & easily modify the underlying data.
The basic idea would be that the 'Build column' interface would prompt users to enter:
The types of columns could be:
Sum
Prompt the user to pick two numeric columns
Difference
Prompt the user to pick two numeric columns
Bins (evenly spaced)
Prompt the user to pick one column.
Additional options: Number of bins, bin labels.
Use pd.cut
Bins (evenly sized)
Prompt the user to pick one column.
Additional options: number of bins, and bin labels)
Use pd.qcut
Datetime property
Prompt the user to pick one datetime column
Additional options: property (one of the Series.dt.__ properties like time, year, month, etc.)
(Maybe?) String function
Prompt the user to pick one string column
Additional options: function (Series.str.__ such as endswith, startswith, len, etc.) and value
And possibly more, will update this over time...
Great work on this project. Cloning this repository is fairly time consuming, as there are 100 MB in the /docs
folder. Of this 100 MB, 40 MB come from the gifs. It might be nice to try to reduce the size of these gifs, if possible, for people cloning with throttled/limited download speeds ๐
Currently it only displays if you have more than one instance open in a terminal. Maybe update the text to display the number of instances in a little bootstrap badge
It is helpful to select 3 columns and create a pivot table (in a new tab) based on the clo1, col2, and col3
Install to a bare virtualenv fails:
% virtualenv -p python3 virt-dtale
virt-dtale% pip install six
virt-dtale% python setup.py develop
yield the following error
Moving flask_ngrok-0.0.25-py3.6.egg to /home/jason/virt-dtale/lib/python3.6/site-packages
flask-ngrok 0.0.25 is already the active version in easy-install.pth
Installed /home/jason/virt-dtale/lib/python3.6/site-packages/flask_ngrok-0.0.25-py3.6.egg
error: The 'Flask' distribution was not found and is required by dtale
A quick google shows maybe we should set the version in setup.py (it's pretty old)
pallets/flask#1106
It would be helpful to create a new column with Random in range number or Random Characters
If you try to change the chart type from "line" in the chart builder before initially loading data it will cause the page to break.
@mindlessbrain v1.6.5 is available which includes the rolling correlation functionality. Will keep you posted on the other features I mentioned earlier.
Originally posted by @aschonfeld in #45 (comment)
When you go to open the Filter or Formats popups they no longer smoothly transition out of the top of the browser
If you do a docker-compose up from 4b32689 and then navigate to 127.0.0.1:8081 (dtale 3.6 pulling dated data from mongo), the formatting of the tabular data isn't scaling to column width.
Installation seems to miss a dependency. Works after a pip install future.
Add logic to ignore successive clicks...
Currently D-Tale always creates an integer string for data_id when adding data to a session. For example:
http://localhost:40000/dtale/main/1
If when adding the data the user has seet the name
parameter we should allow then to hit D-Tale using that name value. For example:
http://localhost:40000/dtale/main/[name]
This will make links a little more clear about what data you're viewing
Now with the format, we can only highlight numbers <0. it would be helpful to have a range like
x>5 and x<10
When I run 2nd cell with d.open_browser()
it opens (as expected) a new browser tab with the table.
After that when I re-run the 1st cell the table appeared as expected in the notebook.
I did it several times and it started being worse. Running command:
'd = dtale.show(df, notebook=True)`
takes long time to display the table (over 2min).
I restarted kernel and tested several combinations of commands. Only following works:
The last 2 cell take very long time to run.
And after several runs it stopped working at all:
I was restarting kernel each time. I even restarted jupyter notebook several times.
Sorry for so many screenshots. I wanted to explain the issue in details.
I hope it helps.
If a user selects a numeric (int/float) column for the x-axis & y-axis then you should have the ability to select "Scatter" as a chart type. This will be limited to 15,000 points or else the client will crash.
You should also be able to group by a column, but in this case the user will be presented with scatters for each group since it is impossible to display them all in one scatter chart.
Additionally, I loved the linked behaviour of the "Correlations" popup.
Do you think it's possible to have a tool for building linked charts?
(perhaps this should be on a new issue)
Originally posted by @mindlessbrain in #45 (comment)
My initial proposal for this behavior would be to add an additional button to the tooltips of line charts in the chart builder with the following conditions:
Let me know if I'm going down the right path...
Hi,
I am trying to replicate the live demo, specifically the behaviour of the Correlations tab when you have a time series and display its rolling correlation followed by the scatter plots after clicking on the line plot.
I tested with the following code and got the following output:
import pandas as pd
import numpy as np
import dtale
ii = pd.date_range(start='2018-01-01', end='2019-12-01', freq='D')
ii = pd.Index(ii, name='date')
n = ii.shape[0]
c = 5
data = np.random.random((n, c))
df = pd.DataFrame(data, index=ii)
d = dtale.show(df)
I also tried resetting the df index in order to put the date as a column as in the demo but got the same behaviour:
dtale.show(df.reset_index())
I installed dtale via pip (pip install --upgrade dtale
) inside an Anaconda environment on Windows.
Does anyone know if I should preformat the data or turn some flag on?
The following minimal example will cause the Jupyter kernel to crash with all sorts of amazing looking IO errors:
import dtale
import pandas as pd
df = pd.DataFrame([[1,2],[3,4]], columns=['a','b'])
dtale.show(df.rename(columns={'a':'b'}))
The subtle error here is where I rename a column over the top of another one. This new dataframe seems to look OK, and can be printed to a terminal, but dtale blows up when rendering it.
Ideally, pandas should not allow me to do this, and this might just be dtale not being able to handle corrupted internal memory in pandas (in which case I'll go file this in pandas instead).
@aschonfeld, it could also be useful to be able to add color bars for date or value ranges.
Here's a matplotlib example from PyMC3.
Originally posted by @mindlessbrain in #48 (comment)
It would be helpful to define some functions/replace dictionary and apply/map them to the different columns
Hi there,
I installed dtale v1.7.11
in my notebook, which is hosted in Google K8S engine as a pod. The DTale process can be assigned properly to a reference:
import dtale
import pandas as pd
df = pd.DataFrame([dict(a=1,b=2,c=3)])
d = dtale.show(df)
However I failed to connect to it as the the kernel keeps busy when running print(d)
.
I checked the pod logs and realized this has something to do with the port:
textPayload:
"- - -" 0 - "-" 158 0 129 - "-" "-" "-" "-" "<POD_IP>:40000"
PassthroughCluster <POD_IP>:52390 <POD_IP>:40000 <POD_IP>:52388 -
I've read that for hosted notebooks, the port 40000
has to be allowed but I'm not quite sure where should I start to configure this. My notebook container is running inside a pod, together with an istio sidecar proxy
. The pod is managed by a StatefulSet
which exposes a Service
with a cluster ip.
I'm not quite sure if this is the right place to file this issue, plz let me know if I should rather head to the k8s side. And many thanks in advance!
First off all I absolutely low this package, thank you so much for your effort! I am however having some issues:
I have the following dataframe containing fundamental data (columns) for different stocks (ticker) at different times (asof):
So let's say I want to plot the evolution of a feature (EQ_EARN_SHARE) from Apple (ticker == 'AAPL') over time (asof), I would suspect I should do this:
Sadly this results in the error:
Traceback (most recent call last):
File "C:\Users\Peter\Anaconda3\lib\site-packages\dtale\dash_application\charts.py", line 723, in build_figure_data
data = run_query(DATA[data_id], query, CONTEXT_VARIABLES[data_id])
KeyError: '1'
What am I missing?
Currently does nothing when you're within a jupyter cell being run from PyCharm's jupyter plugin
Jump to 9:12 in this video to see what I'm talking about
I tried many possibilities by using Apache Reverse Proxy in order to expose dtale widgets outside my JupyterLab/JupyterHub Platform, but I didn't find any solution.
Here's some rules I tested with Apache web server, but it doesn't works :
RewriteRule /datalab/mynode/dtale/5100/(.) ws://127.0.0.1:41000/dtale/main/$1 [P,L]
ProxypassMatch /datalab/mynode/dtale/5100/(.) http://127.0.0.1:41000/dtale/main/$1
ProxypassReverse /datalab/mynode/dtale/5100/(.*) http://127.0.0.1:41000/dtale/main/$1
You should specify in your doc where to configure the "base_url" in order to expose your dtale outside.
Nowadays I consider dtale like a "localhost" component, impossible to open it for teams who want to develop and collaborate by using JupyterLab instances on a remote server.
It seems that dtale has been developed for only selfish developpers who are working on their laptop !
Be more open-mind please.
It would be helpful to replace a column's content like excel
Using Google Colab, I installed Dtale first by doing: !pip install -U dtale
and it installed successfully.
After installing dtale, I read my data using pandas df = pd.read_csv("credit_card.csv")
then using dtale by doing the following: dtale.show(df)
and It gives me a link. When opening, it says that the site can't be reached. I tried doing it again to have a new link, but still faced the same problem.
This may be a better way to conditionally support arctic as a conditional dependency
setup(
name="Project-A",
...
extras_require={
'PDF': ["ReportLab>=1.2", "RXP"],
'reST': ["docutils>=0.3"],
}
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.