Git Product home page Git Product logo

pysankey's People

Contributors

anazalea avatar marcomanz avatar pierre-sassoulas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pysankey's Issues

Thanks

Just wanted to say thanks for making this. It's really nice to use right out of the box.

Weighted sankay example

Hi @marcomanz,

I tried to integrate your change to the common pysankey repository, and now I want to test them, but I fail to reproduce your weighted example. What I tried is this :

import pandas as pd
from pysankey import sankey

df = pd.read_csv(
    'pysankey/customers-goods.csv', sep=',',
    names=['id', 'customer', 'good', 'revenue']
)
sankey(
    left=df['customer'], right=df['good'], rightWeight=df['revenue'], aspect=20,
    fontsize=20, figureName="customer-good"
)

But I get this error :

    myD['bottom'] = rightWidths[rightLabels[i - 1]]['top'] + 0.02 * dataFrame.rightWeight.sum()
TypeError: can't multiply sequence by non-int of type 'float'

Do you have any idea what is wrong with my call ? What was your own call to get customer-goods.png, do you still have it ?

Cheers,

TypeError: 'module' object is not callable

Can you help me, please?

Trying to run one the examples I got this error:

import pandas as pd
from pySankey import sankey

df = pd.read_csv(
... 'goods.csv', sep=',',
... names=['id', 'customer', 'good', 'revenue']
... )

df
id customer good revenue
0 0 John fruit 5.5
1 1 Mike meat 11.0
2 2 Betty drinks 7.0
3 3 Ben fruit 4.0
4 4 Betty bread 2.0
5 5 John bread 2.5
6 6 John drinks 8.0
7 7 Ben bread 2.0
8 8 Mike bread 3.5
9 9 John meat 13.0

sankey(left=df['customer'], right=df['good'], rightWeight=df['revenue'], aspect=20,fontsize=20, figureName="customer-good")
Traceback (most recent call last):
File "", line 1, in
TypeError: 'module' object is not callable

Citing the code

Hi,
I have been using your code (and finding it very useful, thank you!), I was wondering how best to cite your work as I am working on a paper that uses aspects of your code, and I want to ensure that you are properly acknowledged.
Many thanks,
Mark

Error when using leftLabels/rightLabels

if len(labels > 0):

pretty sure this should be len(labels) > 0...

       df_compare['pred_RF'].values,
       colorDict=colorDict,
       figure_name='visualization/output/sankeytest',
       rightLabels=['C','I','R','P','AUX'])```

yields 

`Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/home/georg/.local/share/virtualenvs/pvpotential-aZgESz7z/lib/python3.8/site-packages/pySankey/sankey.py", line 119, in sankey
    check_data_matches_labels(leftLabels, df['right'], 'right')
  File "/home/georg/.local/share/virtualenvs/pvpotential-aZgESz7z/lib/python3.8/site-packages/pySankey/sankey.py", line 34, in check_data_matches_labels
    if len(labels >0):
TypeError: '>' not supported between instances of 'str' and 'int'`

Possibility of joining plots

Hi,

I was a long time trying to find a way to do a plot like the one this package offers so thank you! However, I need is to make one plot only there are 3 diagrams together. It is a one to one correspondence in terms of names on both sides so it should be somehow straightforward but I didn't manage to make it work.
Is this actually possible with your package?

Thank you in advance,

Ana Marta

Use matplotlib more transparently

Matplotlib gives us a lot of flexibility in terms of plot sizes, aspect ratios, adding annotations, styling, and so on: all of which are valuable to people creating sankey diagrams. Currently, the sankey function uses a number of extra keyword arguments to ape a tiny portion of that flexibility, making it harder to read and maintain, while taking a lot of valuable power away from the user (see #22 , #19, #7).

sankey could, instead, take a matplotlib Axes object, which the user can customise to their liking, and where the user also has access to the Figure to customise. Then they can choose what to do with it - showing it, saving it with much more flexibility.

pySankey's maintainability would improve as it's doing less; users would have a lot more power.

P.S. pySankey is great! So much more usable than floweaver.

Creating a package on Pypi

Hi,

I would like to use this package as a dependency in order to update it more easily with pip. Right now I added the file directly in my repository, so the whole process of adding something in PySankey, doing a pull request and updating my project is complicated. Adding the package on Pypi would permit to do a "pip install pysankey" directly to use the last version.

Are you familliar with Pypi @anazalea ? "pysankey" is available as a name for now. I can help you create the package and upload it on Pypi (creating the setup.py). I could also do it myself, and add you as a package manager if you have an account. But I don't want to steal your thunder :)

Regards,

Can't plot the example on my laptop

I'm trying to reproduce your fruits example plot and there seems to be a problem with how the DataFrame cases are defined. I got this error:

File "..\anaconda3/lib/python3.6/site-packages/pySankey-1.0.0-py3.6.egg/pysankey/sankey.py", line 107, in sankey
    raise NullsInFrame('Sankey graph does not support null values.')
  File "..\anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 1121, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I just tried to plot the first diagram, without any change, using the fruits.txt file.

Using leftLabels / rightLabels

This is a great tool you have here - I'd been trying to make a Sankey using Plotly all morning to no avail, but this was relatively straightforward and looks good. However I'm struggling to order the labels for presentation. I have 9 string labels on the left, and seven string labels on the right but I'm not sure what to pass to leftLabels and 'rightLabels` to define the order. I've passed each arg a list of the labels in the order I want, but I get a LabelMismatch error.

I tried sorting my data before running using .sort_values in pandas however that resulted in incorrect output with the wrong values being assigned to the wrong links (despite the values being correctly ordered in the data frame). I don't know if this is actually the result of a miss-ordering of the labels or if the entirety of the visualised data is incorrect.

I also noted what I think is a mistake in the check_data_matches_labels function. I believe it should be if len(labels) > 0:

Essentially I can get an accurate visual now, but it would be great to be able to reorder the data so that both left and right sets of labels are in alphabetical order, and the links adjust to match.

Trying to specify the order of nodes

Hi,
I saw several comments on determining the order of right and left nodes. Can someone provide with some example code on how to do this successfully, as I am struggling with this.
Much appreciated,
Amrit

The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

After I fixed the len() error, there was another one:

/usr/local/lib/python3.6/dist-packages/pysankey/sankey.py in check_data_matches_labels(labels, data, side)
     56             data = set(data.unique().tolist())
     57         if isinstance(labels, list):
---> 58             labels = set(labels)
     59         if labels != data:
     60             msg = "\n"

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Do you know why?

Code still up to date?

I installed the package via pip install and had to change a lot in order to make it work, wherefore I wonder wether this is the most up to date manual?

Issues -> fixes I had to do:
from pysankey import sankey -> from pySankey import sankey (otherwise cant find module)
sankey(df... -> sankey.sankey(.... (since we do not import the function directly)
figureName -> figure_name (otherwise it doesn't work)

So you may want to update the example?
Thank you

Possibility to get plot only

Is it possible to make the plot part of a matplotlib figure instead of saving it directly?

I would like to add some labels and headlines.

Thank you

"aspect" not working

I tried to change the aspect of the graphic, changing the value of the key 'aspect', but the graphic does not change at all. I also would like to personalize the graph, but this seems to be not possible (there is no a **kwargs input in the sankey function, so there is not a way to pass other input before the plot is built).
By the way, thank you very much for piece of code. The sankey diagram on matplotlib is really ugly, and yours is, imho, the best free alternative.

Left side is squeezed

image
Hi,

I'm trying the example but the end result has the left side items squeezed together. Is there something wrong with my code?

Syntax error in sankey.py

Currently line 34 of sankey.py in the pySankey package is - if len(labels > 0):
When the leftLabels or rightLabels arguments are provided to the sankey function, this generates an error.
Line 34 should be updated to if len(labels) > 0:

ModuleNotFoundError

`ModuleNotFoundError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_29264/4168780170.py in
1 import pandas as pd
----> 2 from pysankey import Sankey

ModuleNotFoundError: No module named 'pysankey'`

Output of pip freeze:

pySankeyBeta==1.4.0

UnboundLocalError: local variable 'topEdge' referenced before assignment

My dataframe:

  Cerrada | Text Analytics | Cantidad | Correctas | Erroneas | Asertividad
-- | -- | -- | -- | -- | -- | -- | --
Sin Anomalias | CR | 103 | 71 | 32 | 68.932039
Sin Anomalias | FR | 223 | 81 | 142 | 36.322870
Sin Anomalias | NC | 295 | 257 | 38 | 87.118644
Sin Anomalias | SVP | 2186 | 2053 | 133 | 93.915828
Sin Anomalias | SA | 5946 | 5578 | 368 | 93.810965
Sin Anomalias | SD | 455 | 217 | 238 | 47.692308
Sin Anomalias | TC | 77 | 8 | 69 | 10.389610
Sin Anomalias | YR | 49 | 46 | 3 | 93.877551

My code:

import matplotlib.pyplot as plt
from pysankey import sankey
import matplotlib.pyplot as plt
from pysankey import sankey

weight = df['Asertividad'].values[1:].astype(float)

sankey(left=df['Cerrada'].values[1:], right=df['Text Analytics'].values[1:], rightWeight=weight, leftWeight=weight, aspect=20, fontsize=20)

plt.show() # to display

The error thrown:

UnboundLocalError Traceback (most recent call last)
in
4 weight = df['Asertividad'].values[1:].astype(float)
5
----> 6 sankey(left=df['Cerrada'].values[1:], right=df['Text Analytics'].values[1:], rightWeight=weight, leftWeight=weight, aspect=20, fontsize=20)
7
8 plt.show() # to display

~\AppData\Roaming\Python\Python37\site-packages\pysankey\sankey.py in sankey(left, right, leftWeight, rightWeight, colorDict, leftLabels, rightLabels, aspect, rightColor, fontsize, ax)
192 # Determine positions of left label patches and total widths
193 leftWidths, topEdge = _get_positions_and_total_widths(
--> 194 dataFrame, leftLabels, 'left', aspect)
195
196 # Determine positions of right label patches and total widths

~\AppData\Roaming\Python\Python37\site-packages\pysankey\sankey.py in _get_positions_and_total_widths(df, labels, side, aspect)
296 LOGGER.debug("%s position of '%s' : %s", side, label, labelWidths)
297
--> 298 return widths, topEdge

UnboundLocalError: local variable 'topEdge' referenced before assignment

Thanks in advance !!!

Feature Requests: Multi Variable diagrams and Target Flows

It would be nice if these diagrams could be used for multiple varibles.
Also, it would be cool if the you could color all flows based on the color of one of the variables from which they flow like you can do in the case of the alluvial package in R. Check this example with the Titanic Survivors dataset -> https://github.com/mbojan/alluvial
I find this is the most important feature of the alluvial package that was not implemented in any python package that implements Sankey Diagrams like Plotly. This feature would be very useful for viasualy analyzing categorical variables.

Another fork

I understand that this repo is no longer active. I started building from this code to improve my Python skills, with the results available here:

https://github.com/AUMAG/ausankey/

The extensions in this work allow multiple columns and many configuration options around automatic sorting of nodes, layout/formatting, and labelling.

When I started this work I didn’t realise there were other active forks so I thought it would be a good idea to comment on it here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.