I have a pandas dataframe, when I try to generated report from <code class="notransla

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ValueError when generating profile from dataframe about ydata-profiling HOT 6 CLOSED

ydataai commented on May 12, 2024 1

ValueError when generating profile from dataframe

from ydata-profiling.

Comments (6)

yevhen-m commented on May 12, 2024 1

@conradoqg unfortunately I don't have it anymore. Context of this error has been lost.

from ydata-profiling.

sbrugman commented on May 12, 2024 1

Should be resolved in v2.1.0.

from ydata-profiling.

conradoqg commented on May 12, 2024

Could you provide a dataset that I can reproduce this error?

Thanks

from ydata-profiling.

aagostini-tada commented on May 12, 2024

I have the same error. Unfortunately I cannot provide the data set :(

from ydata-profiling.

adamrossnelson commented on May 12, 2024

I have been able to reproduce this error. And I can post data that reproduces it.

Describe the bug
Error output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-d4e9ca79f757> in <module>()
----> 1 df3.profile_report()

/anaconda3/lib/python3.6/site-packages/pandas_profiling/controller/pandas_decorator.py in profile_report(df, **kwargs)
     14         A ProfileReport of the DataFrame.
     15     """
---> 16     p = ProfileReport(df, **kwargs)
     17     return p
     18 

/anaconda3/lib/python3.6/site-packages/pandas_profiling/__init__.py in __init__(self, df, **kwargs)
     56 
     57         # Get dataset statistics
---> 58         description_set = describe_df(df)
     59 
     60         # Get sample

/anaconda3/lib/python3.6/site-packages/pandas_profiling/model/describe.py in describe(df)
    454 
    455     # Get correlations
--> 456     correlations = calculate_correlations(df, variables)
    457 
    458     # Check correlations between numerical variables

/anaconda3/lib/python3.6/site-packages/pandas_profiling/model/correlations.py in calculate_correlations(df, variables)
    187     for correlation_name, get_matrix in categorical_correlations.items():
    188         if config["correlations"][correlation_name].get(bool):
--> 189             correlation = get_matrix(df, variables)
    190             if len(correlation) > 0:
    191                 correlations[correlation_name] = correlation

/anaconda3/lib/python3.6/site-packages/pandas_profiling/model/correlations.py in recoded_matrix(df, variables)
     72         A recoded matrix for categorical variables.
     73     """
---> 74     return categorical_matrix(df, variables, partial(check_recoded, count=len(df)))
     75 
     76 

/anaconda3/lib/python3.6/site-packages/pandas_profiling/model/correlations.py in categorical_matrix(df, variables, correlation_function)
    108         correlation_matrix.loc[name2, name1] = correlation_matrix.loc[
    109             name1, name2
--> 110         ] = correlation_function(confusion_matrix)
    111 
    112     return correlation_matrix

/anaconda3/lib/python3.6/site-packages/pandas_profiling/model/correlations.py in check_recoded(confusion_matrix, count)
     44         Whether the variables are recoded.
     45     """
---> 46     return int(confusion_matrix.values.diagonal().sum() == count)
     47 
     48 

ValueError: diag requires an array of at least two dimensions

To Reproduce

import pandas as pd
import pandas_profiling
import requests

response = requests.get(
    'https://raw.githubusercontent.com/adamrossnelson/HelloWorld/master/sparefiles/buggy1.pkl'
)
pkl_name = 'buggy1.pkl'
pkl_file = open(pkl_name, 'wb')
pkl_file.write(response.content)
pkl_file.close()

df = pd.read_pickle('buggy1.pkl')
df.profile_report()

Version information:
See my version output on previous issue:
#194

Additional context
No error when... If you save the df to a csv, then read from csv and then attempt a profile report.
Otherwise I would have just saved the example data as a csv. I was only able to reproduce when I pickled the data.

Would be happy to resubmit as a new issue if that would be helpful.

from ydata-profiling.

sbrugman commented on May 12, 2024

@adamrossnelson: The CSV conversion in your case effectively produces the same output as:

df.replace('', np.nan, inplace=True)
df.fillna(np.nan, inplace=True)

What were the values before converting the DataFrame to pickle? If the '' values are intentional, we will need a way for the package to work with them.

The use of None and '' are at least problematic, I'll have a look..

from ydata-profiling.

ValueError when generating profile from dataframe about ydata-profiling HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent