fbdesignpro / sweetviz Goto Github PK

Visualize and compare datasets, target values and associations, with one line of code.

License: MIT License

Python 70.09% HTML 23.21% JavaScript 6.71%

pandas-dataframe eda pandas data-exploration data-analysis data-science data-visualization machine-learning data-profiling exploratory-data-analysis

sweetviz's People

Contributors

Stargazers

Watchers

Forkers

rvalenzuelar evanmath rajaramkuberan o7s8r6 2series bonamprabhu ksranjith786 zgainsforth narahari8999 mucunwuxian bhargavaganti frankfan007 zwbjtu123 ylq-127 a-redpanda hotbird3 seven4343 javonc charlieduckgogogo lzwandnju sparkhonghe witnessu someone-jason shifwang jianglin314 open-bigdata jozanm lab4ai krzemienski qimingnan bookc618 zhonghai2810 hhy5277 dawnadvent 99bomber5 ashwini-padhy sec-js srinivasanbigdata lukysmile nathancarter qiaoshin cangzt eshnil2000 shaunstanislauslau chadaeschliman eecheonwu eugene-kolesnikov victorion luoyang871111 parkman328 mycarta unburied jotarios sajidahmeduiu ritesh1991 anshuljdhingra jyp0308 suwenkui cybernetics hadryan cilmidheere redpoint13 dl1683 keshabb prat07 tripleorange jiyuanzhanglalala gaohuan2015 barrychocolate praffiah avianaglobal arryboom giggles-data-science greatsong ueberueber veneratiovitae cryptonome onderdemirtas lindseyj79 saulofurtado tilenrupnik soyabulislamlincoln a246530 vinod-kumar-g beknown-j muggle14 xrosliang sujoydc pekemonzzx barryzm leoyichen kiudee gbaf ndless-analysis djun mayankkumar-materialplus valeman pplonski youngbaymax smitakshigupta

sweetviz's Issues

More in depth analysis of nulls

I was just reading over this article and thought that while having the null count on each feature is really nice, having something that allows us to see the nulls over time as well might be beneficial in some cases.
https://towardsdatascience.com/visualizing-the-nothing-ae6daccc9197

Incorrect report

The Sweetviz data analysis report is incorrect in my case.

The report in PDF format: Analyst.html.pdf

Code to reproduce the error:

from pandas import read_csv, DataFrame
!pip install sweetviz
import sweetviz as sv
df = read_csv('https://raw.githubusercontent.com/wanderloop/WanderlustAI/master/assumed_pha_thousand.csv', low_memory=True,)
keep_cols = 'long lat MID_POINT_X'.split()
df = df.loc[:, keep_cols]
df.rename(columns={'MID_POINT_X': 'MPX',}, inplace=True,)
keep_cols = 'long lat MPX'.split()
data_analyst_report = sv.analyze(df)
data_analyst_report.show_html('Analyst.html')

Unable to compare the loan prediction dataset

I get an error as :
TypeError: cannot do label indexing on <class 'pandas.core.indexes.numeric.Int64Index'> with these indexers [16.12000084] of <class 'float'>

when trying to compare the train and test dataset for loan prediction.
The code I have written is :
my_report = sv.compare([df_train,"Train"],[df_test,"Test"],"Loan_Status")

Mixed Data Type issue

@fbdesignpro

I was working on the online retail dataset.

In this dataset, the variable 'Invoice Number' is a 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'C', it indicates a Cancellation.

And once I run the following code I get the error below:

library installing issues

I cannot use the command "pip install sweetviz" since it gives me back an error message ("ERROR: Could not find a version that satisfies the requirement sweetviz (from versions: none)
ERROR: No matching distribution found for sweetviz").

So I tried to use the command "git clone https://github.com/fbdesignpro/sweetviz.git ", and it worked out, but when I import the library in python interpreter it seems sweetviz does not have any module I can use. Am I doing anything wrong?

Sweetviz closes the Tkinter Window

When I try to use Sweetviz in conjunction with Sweetviz, it closes Tkinter for some reason. It is very strange.

Report gets cut-off after feature #502

I am running sweetviz to analyze a data set containing 638 features and 259,000 rows. The report building process takes quite some time but it is ok considering the dataset size. However, once I browse the resulting graph the report gets cut off after feature 502. No output after that. I can see the tabs but no information is displayed.

It worked in Jupyter but it is not generating report in Google Colab:(

Allow generation of HTML without opening browser, and partial HTML for inclusion in other projects

I am trying to use sweetviz within a streamlit workflow. Sweetviz automatically opens up an external browser, which is nice but not appropriate within streamlit which is already producing output in the browser. Instead, I would like to include the output of sweetviz as a "control" within streamlit similar to:
st.write(sweetviz.analyze(df))
much like I can already embed plotly or other frameworks that produce web output:
st.plotly_chart(fig)

Here are a couple thoughts about how this could work. As a first attempt, I thought about simply putting an if statement around the code that opens an external browser:

        if load_browser == True: 
            # Not sure how to work around this: not fatal but annoying... 
            # https://bugs.python.org/issue5993 
            webbrowser.open('file://' + os.path.realpath(filepath))

This disables automatically opening a browser. Then I attempted to embed the html output inside the streamlit page, but the output from sweetviz includes an entire HTML including head and body, which I can't then directly include inside the body of another html page.

Therefore, there should be another function I can call which produces HTML such that I can embed it in an existing html page.

UI glitch

In the html report generated when i click on the card of a column to select then there is no way to deselect it i.e. I can't select any other card.

API: add method to get the HTML (parts) instead of always saving and showing it

This would be useful in all contexts where the library is used in a context which already shows HTML, e.g. Jupyter notebooks, dash, etc.

What is really needed is to retrieve the generated div for the report plus the necessary javascript and styling. Depending on how the HTML output is designed, it may be sufficient to include javascript and styling only once in each page.

I did not look at the generated HTML in detail, but for this to work it would also be useful if all HTML classes would have a choosable prefix and any ids that are used for one report are generated so that they do not clash with any other report embedded in the same HTML.

Installation not working

Installation via pip on linux 16 does not seem to work:

sudo pip3 install sweetviz
ERROR: Could not find a version that satisfies the requirement sweetviz (from versions: none)

FloatingPointError: divide by zero encountered in true_divide

I ran into a "FloatingPointError: divide by zero encountered in true_divide" in the pairwise feature portion of the code. Apparently there was a divide by zero issue in the cov part of the underlying code.

The trace of the error is as follows:
file: sv_public.py, line 13, in analyze, pairwise_analysis, feat_cfg)
file: dataframe_report.py, line 243, in init, self.process_associations(features_to_process, source_target_series, compare_target series
file: dataframe_report.py, line 423, in process_associations, feature.source.corr(other.source, method='pearson')
file: series.py line 2254, in corr, this.values, other.values, method=method, min_periods=min_periods
file: nanops.py, line 69, in _f, return f(*args,*kwargs)
file: nanops.py, line 1240, in nancorr, return f(a,b)
file: nanops.py, line 1256, in _pearson, return np.corrcoef(a,b)[0,1]
file: <array_function internals>, line 6, in corrcoef
file: function_base.py,line 2526 in corrcoef, c=cov(x,y,rowvar)
file: <array_function internals>, line 6, in cov
file: function_base.py, line 2455, in cov, c=np.true_divide(1,fact)

My dataframe had some empty strings where nulls should have been, but there were other columns that had similar features, but they never threw this error.

Not able to generate html report

I'm trying to generate html report by executing show_html() but I do not see a output file.

IDE: Google colab.

Data type issue

TypeError: Column InvoiceNo has a 'mixed' inferred_type (as determined by Pandas). This is is not currently supported; column types should not contain mixed data. e.g. only float, or strings, but not a combination.

import sweetviz doesn't work

I run the command pip install sweetviz. The library was successfully installed but when I run the command import sweetviz as sv, I have the error : AttributeError: module 'sweetviz' has no attribute 'from_dython'

Charset utf-8

First of all it’s awesome! Many thanks for your effort on data visualization! There is a small issue maybe, the html report lacks a meta tag showing the charset as “utf-8”; by adding it, the report can correctly show the MBCS characters and will catch eyes of more global analysts.
Thanks again! Hope this project goes better!

Documentation analyze vs compare

Hi, first of all, congrats on the project. Haven’t used extensively yet but will do so soon.

Just dropping this note because I noticed some inconsistencies in the documentation and the medium article. On the medium article the analize() function is mention but not used. Here on git compare() is mentioned but not used.

Just wanted to let you know....

AttributeError: module 'matplotlib' has no attribute 'style'

File "C:...\Python\Python37\site-packages\sweetviz\graph.py", line 58, in set_style
matplotlib.style.use(styles_in_final_location)

AttributeError: module 'matplotlib' has no attribute 'style'

ZeroDivisionError: float division by zero

Hi, the execution throws the "Zero Division Error".
I'm new with Jupiter Notebook and Python.

C:\Anaconda3\lib\site-packages\sweetviz\graph_associations.py in heatmap(y, x, figure_size, **kwargs)
224
225 # Scale with num squares
--> 226 size_scale = size_scale / len(x)
227 def value_to_size(val):
228 if size_min == size_max:

ZeroDivisionError: float division by zero

Thanks a lot!
Luciano.-

issue on profile on date field

File "c:\users\appdata\local\programs\python\python37\lib\site-packages\pandas\core\base.py", line 1220, in _reduce
klass=self.class.name, op=name

TypeError: DatetimeIndex cannot perform the operation sum

Column names can only be string type

Columns names need to be strings : I've got an error when using integers (happens when calling sweetviz.DataframeReport). This could be easily changed by modifying the line 215 in \sweetviz\dataframe_report.py:

From
self.progress_bar.set_description(':' + f.source.name + '')

To
self.progress_bar.set_description(':' + str(f.source.name) + '')

Complete error code

----------------------------------------------------------
TypeError                Traceback (most recent call last)
<ipython-input-59-d8e5410a0f22> in <module>
----> 1 sweet_report = sweetviz.analyze(source=[df, "Train"], target_feat="positive")

~\anaconda3\envs\datascience\lib\site-packages\sweetviz\sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
     10             feat_cfg: FeatureConfig = None,
     11             pairwise_analysis: str = 'auto'):
---> 12     report = sweetviz.DataframeReport(source, target_feat, None,
     13                                       pairwise_analysis, feat_cfg)
     14     return report

~\anaconda3\envs\datascience\lib\site-packages\sweetviz\dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
    213         for f in features_to_process:
    214             # start = time.perf_counter()
--> 215             self.progress_bar.set_description(':' + f.source.name + '')
    216             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
    217             self.progress_bar.update(1)

TypeError: can only concatenate str (not "int") to str

compare infra fails with KeyError: 'cannot use a single bool to index into setitem'

This seems to be a possible bug:

Date:  Jul 22, 2020
platform: Macos
environment: conda custom environment
sweetviz version:  sweetviz==1.0b3

np.__version__ # 1.18.4
pd.__version__ # 1.0.3

MWE

import numpy as np
import pandas as pd
import seaborn as sns

import sweetviz

df = sns.load_dataset('titanic')
display(df.head(2))
feat_cfg = sweetviz.FeatureConfig(skip="deck")
my_report = sweetviz.compare_intra(df,
                                   df["sex"] == "male",
                                   ["Male", "Female"],
                                   'survived',
                                   feat_cfg)
my_report.show_html('compare_male_vs_female.html')

Error

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
   1138             else:
-> 1139                 self.index._engine.set_value(self._values, label, value)
   1140         except (KeyError, TypeError):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: True

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-2-6f7575a96558> in <module>
     12                                    ["Male", "Female"],
     13                                    'survived',
---> 14                                    feat_cfg)
     15 my_report.show_html('compare_male_vs_female.html')

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/sv_public.py in compare_intra(source_df, condition_series, names, target_feat, feat_cfg, pairwise_analysis)
     42     report = sweetviz.DataframeReport([data_true, names[0]], target_feat,
     43                                       [data_false, names[1]],
---> 44                                       pairwise_analysis, feat_cfg)
     45     return report
     46 

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
    215             # start = time.perf_counter()
    216             self.progress_bar.set_description(':' + f.source.name + '')
--> 217             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
    218             self.progress_bar.update(1)
    219             # print(f"DONE FEATURE------> {f.source.name}"

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
     97             # Explicitly show missing categories on each set
     98             if compare_type == FeatureType.TYPE_CAT or compare_type == FeatureType.TYPE_BOOL:
---> 99                 fill_out_missing_counts_in_other_series(to_process.compare_counts, to_process.source_counts)
    100                 fill_out_missing_counts_in_other_series(to_process.source_counts, to_process.compare_counts)
    101         returned_feature_dict["compare"] = dict()

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in fill_out_missing_counts_in_other_series(my_counts, other_counts)
     43                 if my_counts[to_fill].index.dtype.name == 'category':
     44                     my_counts[to_fill] = my_counts[to_fill].reindex(my_counts[to_fill].index.add_categories(key))
---> 45                 my_counts[to_fill].at[key] = 0
     46 
     47 def add_series_base_stats_to_dict(series: pd.Series, counts: dict, updated_dict: dict) -> dict:

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
   2192         key = list(self._convert_key(key, is_setter=True))
   2193         key.append(value)
-> 2194         self.obj._set_value(*key, takeable=self._takeable)
   2195 
   2196 

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
   1140         except (KeyError, TypeError):
   1141             # set using a non-recursive method
-> 1142             self.loc[label] = value
   1143 
   1144         return self

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    669             key = com.apply_if_callable(key, self.obj)
    670         indexer = self._get_setitem_indexer(key)
--> 671         self._setitem_with_indexer(indexer, value)
    672 
    673     def _validate_key(self, key, axis: int):

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    870         else:
    871 
--> 872             indexer, missing = convert_missing_indexer(indexer)
    873 
    874             if missing:

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in convert_missing_indexer(indexer)
   2342 
   2343         if isinstance(indexer, bool):
-> 2344             raise KeyError("cannot use a single bool to index into setitem")
   2345         return indexer, True
   2346 

KeyError: 'cannot use a single bool to index into setitem'

TypeError: DatetimeIndex cannot perform the operation sum

TypeError Traceback (most recent call last)
in
1 #analyzing the dataset
----> 2 report = sv.analyze(channelmemberhistory)
3 #display the report
4 report.show_html('MemberHistory.html')

~/.local/lib/python3.6/site-packages/sweetviz/sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
11 pairwise_analysis: str = 'auto'):
12 report = sweetviz.DataframeReport(source, target_feat, None,
---> 13 pairwise_analysis, feat_cfg)
14 return report
15

~/.local/lib/python3.6/site-packages/sweetviz/dataframe_report.py in init(self, source, target_feature_name, compare, pairwise_analysis, fc)
214 # start = time.perf_counter()
215 self.progress_bar.set_description(':' + f.source.name + '')
--> 216 self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
217 self.progress_bar.update(1)
218 # print(f"DONE FEATURE------> {f.source.name}"

~/.local/lib/python3.6/site-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
90
91 # Establish base stats
---> 92 add_series_base_stats_to_dict(to_process.source, to_process.source_counts, returned_feature_dict)
93 if to_process.compare is not None:
94 add_series_base_stats_to_dict(to_process.compare, to_process.compare_counts, compare_dict)

~/.local/lib/python3.6/site-packages/sweetviz/series_analyzer.py in add_series_base_stats_to_dict(series, counts, updated_dict)
42 base_stats = updated_dict["base_stats"]
43 num_total = counts["num_rows_total"]
---> 44 num_zeros = series[series == 0].sum()
45 non_nan = counts["num_rows_with_data"]
46 base_stats["total_rows"] = num_total

~/.local/lib/python3.6/site-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, min_count, **kwargs)
11180 skipna=skipna,
11181 numeric_only=numeric_only,

11182 min_count=min_count,
11183 )
11184

~/.local/lib/python3.6/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
3901 numeric_only=numeric_only,
3902 filter_type=filter_type,
-> 3903 **kwds,
3904 )
3905

~/.local/lib/python3.6/site-packages/pandas/core/base.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
1058 if func is None:
1059 raise TypeError(
-> 1060 f"{type(self).name} cannot perform the operation {name}"
1061 )
1062 return func(skipna=skipna, **kwds)

font error?

First off, love this! installed and run it on a 88K+ line dataset. loved how it came out.
Now, I got about 15 errors saying the same thing...
c:\users\shawn\python38\lib\site-packages\matplotlib\backends\backend_agg.py:214: RuntimeWarning: Glyph 37239 missing from current font.
font.set_text(s, 0.0, flags=flags)
This might be my computer/matplotlib version (3.2.1) or who knows.
thanks for a great lib

Add more descriptive statistics

Some suggestions:

There is already distinct count.

More features:

number of ZEROS and its percentage

target type is int64 still raising error

63             if predetermined_type_target not in (FeatureType.TYPE_BOOL,
 64                                                  FeatureType.TYPE_NUM):

---> 65 raise ValueError
66 self.predetermined_type_target = predetermined_type_target
67 else:

ValueError:

this error raised in spite of target being 1,2,3 values and type being int64

python3.7.1

module 'sweetviz' has no attribute 'analyze'

Make font color of "missing" red

Suggestion:

This is an open suggestion that the font color of "missing" in report HTML would be nicer (IMHO) it has color red.

Add box plots

version: 1.0.3
date: Jul 22, 2020

Currently "sweetviz" only has bar-charts for visualizations. For medium-size data analysis (such as titanic or Boston housing) it is not much costly to show box-plots as well as bar-plots. For a larger dataset, it can be made optional in config.inifile and can also be determined file size to make it true or false.

For example:

if file size < 50MB:
   show boxplots and bar plots
else:
    show only bar plots

Bug with screen resolution

Hi,

I'm having an issue with generated HTML report. The plots and data are overlapping e there's no horizontal scrollbar.
I guess it's a problem with the resolution of the screen, but I can't figure out how to get a proper visualization.

the code run on jupyter notebook with pandas 1.0.4

Integer feature with values 1 and 2 cannot be handled as categorical?

Hey guys, I'm getting an error when handling integer columns but the error message is not very clear for me to understand what is going on. So far it looks like a bug to me. Here it goes.

We start by importing basic stuff and generate a pandas dataframe with 4 columns containing random real numbers, plus an integer column named 'target' with values 1 and 2.

import sweetviz as sv
import pandas as pd
import numpy as np

np.random.seed(42)
np_data = np.random.randn(10, 4)
df = pd.DataFrame(np_data, columns=['col1', 'col2', 'col3', 'col4'])
df['target'] = 1.0
df['target'].iloc[5:] = 2.
df['target'] = df['target'].astype(int)

Taking a look at the original types of the dataframe (df.dtypes), we have as a result:
col1 float64
col2 float64
col3 float64
col4 float64
target int32
dtype: object

Error: TypeError

compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])
compareReport.show_html()

gives this message:

TypeError                                 Traceback (most recent call last)
<ipython-input-54-8e3e89553904> in <module>
      1 #feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
----> 2 compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])#, feat_cfg=feature_config, target_feat='target')
      3 compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\sv_public.py in compare_intra(source_df, condition_series, names, target_feat, feat_cfg, pairwise_analysis)
     42     report = sweetviz.DataframeReport([data_true, names[0]], target_feat,
     43                                       [data_false, names[1]],
---> 44                                       pairwise_analysis, feat_cfg)
     45     return report
     46 

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
    215             # start = time.perf_counter()
    216             self.progress_bar.set_description(':' + f.source.name + '')
--> 217             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
    218             self.progress_bar.update(1)
    219             # print(f"DONE FEATURE------> {f.source.name}"

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\series_analyzer.py in analyze_feature_to_dictionary(to_process)
     92         compare_type = determine_feature_type(to_process.compare,
     93                                               to_process.compare_counts,
---> 94                                               returned_feature_dict["type"], "COMPARED")
     95         if compare_type != FeatureType.TYPE_ALL_NAN and \
     96             source_type != FeatureType.TYPE_ALL_NAN:

~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\type_detection.py in determine_feature_type(series, counts, must_be_this_type, which_dataframe)
     73             var_type = FeatureType.TYPE_TEXT
     74         else:
---> 75             raise TypeError(f"Cannot force series '{series.name}' in {which_dataframe} to be from its type {var_type} to\n"
     76                             f"DESIRED type {must_be_this_type}. Check documentation for the possible coercion possibilities.\n"
     77                             f"This can be solved by changing the source data or is sometimes caused by\n"

TypeError: Cannot force series 'target' in COMPARED to be from its type FeatureType.TYPE_CAT to
DESIRED type FeatureType.TYPE_BOOL. Check documentation for the possible coercion possibilities.
This can be solved by changing the source data or is sometimes caused by
a feature type mismatch between source and compare dataframes.

If I explicitly supply the feat_cfg argument the result is the same.

feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"], feat_cfg=feature_config)
compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"

However, if I add 10 to the 'target' column (it will now have 11 and 12 as values), the report is generated without errors.
Am I missing something or it is indeed a bug?

error in graph_associations.py line 210, ValueError: cannot convert float NaN to integer

Error thrown up during analyze(dataframe), right after :PAIRWISE DONE: and Creating Associations graph...
Traceback (most recent call last):

File "", line 1, in
myreport = sv.analyze(df)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\sv_public.py", line 13, in analyze
pairwise_analysis, feat_cfg)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\dataframe_report.py", line 246, in init
self._association_graphs["all"] = GraphAssoc(self, "all", self._associations)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 165, in init
f = corrplot(graph_data, dataframe_report)

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 410, in corrplot
dataframe_report = dataframe_report

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 318, in heatmap
cur_size[1] / 2, facecolor=value_to_color(color[index]),

File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 210, in value_to_color
ind = int(val_position * (n_colors - 1)) # target index in the color palette

ValueError: cannot convert float NaN to integer

sweetviz

sv.analyze is woking well in DOS mode but fails in python 3.7.7 shell,AttributeError: module 'sweetviz' has no attribute 'analyze'

ModuleNotFoundError: No module named 'sweetviz'

installation is successful,still i am getting error.
ModuleNotFoundError: No module named 'sweetviz'

ValueError: Duplicate column names detected in "source"; this is not supported

While passing dataframe to get a sweetviz report it throws the following error:

ValueError Traceback (most recent call last)
in
----> 1 da_report=sv.analyze(features)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sweetviz\sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
11 pairwise_analysis: str = 'auto'):
12 report = sweetviz.DataframeReport(source, target_feat, None,
---> 13 pairwise_analysis, feat_cfg)
14 return report
15

~\AppData\Local\Continuum\anaconda3\lib\site-packages\sweetviz\dataframe_report.py in init(self, source, target_feature_name, compare, pairwise_analysis, fc)
58 raise ValueError('"source" parameter should either be a string or a list of 2 elements: [dataframe, "Name"].')
59 if len(su.get_duplicate_cols(source_df)) > 0:
---> 60 raise ValueError('Duplicate column names detected in "source"; this is not supported.')
61
62 all_source_names = [cur_name for cur_name, cur_series in source_df.iteritems()]

ValueError: Duplicate column names detected in "source"; this is not supported.

NOTES-
I have no two columns with same names.

Add version attribute version

This is just a suggestion to the developer of this awesome module:

Generally to get the version number of a python module we do:

import numpy

numpy.__version__ 

The dunder method   __version__ gives the version number.

I suggest to add this attribute to this library so that this gives the version number:

import sweetviz

print(sweetviz.__version__)

TypeError: conversion from numpy.float32 to Decimal is not supported

Hi, was running compare_intra() on my dataframe and encountered this error:

TypeError: conversion from numpy.float32 to Decimal is not supported

lib/python3.7/site-packages/sweetviz/sv_html_formatters.py in fmt_smart(value)
47 return "0.00"
48 elif absolute < 0.001:
---> 49 return f"{Decimal(value):.2e}"
50 elif absolute < 0.1:
51 return f"{value:.3f}"

Should the value be converted into a Python item first?

Config File Not Applying Changes

This is a great package and I love the idea of a one-stop data profile approach. My only note is that I cannot seem to get the config file (sweet_viz.ini) to apply my changes. I tried directing to my own .ini file with minor changes to sweet_vix.ini and the changes did not take. I then tried changing the setting within sweet_viz.ini directly (by setting the logo =0) and it still did not take. I used the code from the front page: sv.config_parser.read("Override.ini")

Perhaps there is an example notebook that has used this successfully that could be shared?

show_html() doesn't shows the output jupyter notebook / lab

Hi there,

I try to use sweetviz in local:

Ubuntu 20.04

And in anaconda enterprise:

K8s with centOS

Both lead to the same issue. The display of the output in jupyter lab and notebook isn't visible.

Local:

Jupyter_lab=2.0
AE:
jupyter=1.0.0
jupyter_client=5.3.3
jupyter_console=6.0.0
jupyter_core=4.5.0
Jupyter_lab=1.1.3
ipython=7.8.0

The report has been generated but not display.

How to fix it?

Best

Larger feature visualization on right is hidden

Appreciate the effort for this library and I see the potential!. Tried out at work and when the html displayed the charts for features are partially hidden. Thought that I could scroll over to see them but no horizontal scroll bar was available.

AttributeError: module 'sweetviz' has no attribute 'analyze' in office network

when i tried to analyze my data frame i am getting this error.

---->1 my_report = sv.analyze(df)
2 my_report.show_html()

Error message in pip install sweetviz

Hi, attempting to install sweetviz using pip install sweetviz, but kept encountering following error message (reproduced below)
Am using pandas version 1.0.1. Kindly advise, thanks.

Installing collected packages: importlib-resources, pandas, tqdm, sweetviz Attempting uninstall: pandas Found existing installation: pandas 1.0.1 Uninstalling pandas-1.0.1: ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\users\\65943\\anaconda3\\lib\\site-packages\\~andas\\_libs\\algos.cp37-win_amd64.pyd' Consider using the --user option or check the permissions.

sweetviz libaray

Hi,
I am not able to install sweetviz library. It Installed in the anaconda prompt but not working in the Jupyter notebook.

Sweetviz needs to handle None values in the missing values section

show_html() doesn't show any output in databricks notebooks

Hi
I have used 'Sweetviz' library in azure databricks. The reports are not generated in html.
snapshot

ValueError: index must be monotonic increasing or decreasing

I am able to generate the same report on Titanic data as in the Medium articles. However, when I try to test the Boston housing data, I get the errors as below:

ValueError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in get_slice_bound(self, label, side, kind)
5166 try:
-> 5167 return self._searchsorted_monotonic(label, side)
5168 except ValueError:

~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in _searchsorted_monotonic(self, label, side)
5127
-> 5128 raise ValueError("index must be monotonic increasing or decreasing")
5129

ValueError: index must be monotonic increasing or decreasing

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in
----> 1 my_report = sv.analyze(dfx)

Any ideas on the error?

Thanks.

Time series plotting

TypeError: DatetimeIndex cannot perform the operation sum

I've a dataset which has date_time column of the format: 2020-07-12 11:37:25

I get the following error:

:date_time:                        |███                  | [ 14%]   00:00  -> (00:03 left)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-77-cbd387f7f43e> in <module>()
      1 #analyzing the dataset
----> 2 techglares_report = sv.analyze(df)

6 frames
/usr/local/lib/python3.6/dist-packages/sweetviz/sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
     11             pairwise_analysis: str = 'auto'):
     12     report = sweetviz.DataframeReport(source, target_feat, None,
---> 13                                       pairwise_analysis, feat_cfg)
     14     return report
     15 

/usr/local/lib/python3.6/dist-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
    214             # start = time.perf_counter()
    215             self.progress_bar.set_description(':' + f.source.name + '')
--> 216             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
    217             self.progress_bar.update(1)
    218             # print(f"DONE FEATURE------> {f.source.name}"

/usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
     90 
     91     # Establish base stats
---> 92     add_series_base_stats_to_dict(to_process.source, to_process.source_counts, returned_feature_dict)
     93     if to_process.compare is not None:
     94         add_series_base_stats_to_dict(to_process.compare, to_process.compare_counts, compare_dict)

/usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in add_series_base_stats_to_dict(series, counts, updated_dict)
     42     base_stats = updated_dict["base_stats"]
     43     num_total = counts["num_rows_total"]
---> 44     num_zeros = series[series == 0].sum()
     45     non_nan = counts["num_rows_with_data"]
     46     base_stats["total_rows"] = num_total

/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, min_count, **kwargs)
  11180             skipna=skipna,
  11181             numeric_only=numeric_only,
> 11182             min_count=min_count,
  11183         )
  11184 

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   3901             numeric_only=numeric_only,
   3902             filter_type=filter_type,
-> 3903             **kwds,
   3904         )
   3905 

/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   1058         if func is None:
   1059             raise TypeError(
-> 1060                 f"{type(self).__name__} cannot perform the operation {name}"
   1061             )
   1062         return func(skipna=skipna, **kwds)

TypeError: DatetimeIndex cannot perform the operation sum

I'm running sweetviz on Google Colab.

Is there any way to solve this error?

compare fails with Key Error

This seems to be a possible bug:

Date:  Jul 22, 2020
platform: Macos
environment: conda custom environment
sweetviz version:  sweetviz==1.0b3

np.__version__ # 1.18.4
pd.__version__ # 1.0.3

MWE

import numpy as np
import pandas as pd
import seaborn as sns

import sweetviz

df = sns.load_dataset('titanic')
my_report = sweetviz.compare(
    [df.query("sex == 'male'"), "Male"],
    [df.query("sex == 'female'"), "Female"],
    "survived"
    )

Error

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
   1138             else:
-> 1139                 self.index._engine.set_value(self._values, label, value)
   1140         except (KeyError, TypeError):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: True

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-62-d04e4b8c2811> in <module>
      2     [df.query("sex == 'male'"), "Male"],
      3     [df.query("sex == 'female'"), "Female"],
----> 4     "survived"
      5     )
      6 

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/sv_public.py in compare(source, compare, target_feat, feat_cfg, pairwise_analysis)
     21             pairwise_analysis: str = 'auto'):
     22     report = sweetviz.DataframeReport(source, target_feat, compare,
---> 23                                       pairwise_analysis, feat_cfg)
     24     return report
     25 

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
    215             # start = time.perf_counter()
    216             self.progress_bar.set_description(':' + f.source.name + '')
--> 217             self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
    218             self.progress_bar.update(1)
    219             # print(f"DONE FEATURE------> {f.source.name}"

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
     97             # Explicitly show missing categories on each set
     98             if compare_type == FeatureType.TYPE_CAT or compare_type == FeatureType.TYPE_BOOL:
---> 99                 fill_out_missing_counts_in_other_series(to_process.compare_counts, to_process.source_counts)
    100                 fill_out_missing_counts_in_other_series(to_process.source_counts, to_process.compare_counts)
    101         returned_feature_dict["compare"] = dict()

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in fill_out_missing_counts_in_other_series(my_counts, other_counts)
     43                 if my_counts[to_fill].index.dtype.name == 'category':
     44                     my_counts[to_fill] = my_counts[to_fill].reindex(my_counts[to_fill].index.add_categories(key))
---> 45                 my_counts[to_fill].at[key] = 0
     46 
     47 def add_series_base_stats_to_dict(series: pd.Series, counts: dict, updated_dict: dict) -> dict:

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
   2192         key = list(self._convert_key(key, is_setter=True))
   2193         key.append(value)
-> 2194         self.obj._set_value(*key, takeable=self._takeable)
   2195 
   2196 

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
   1140         except (KeyError, TypeError):
   1141             # set using a non-recursive method
-> 1142             self.loc[label] = value
   1143 
   1144         return self

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    669             key = com.apply_if_callable(key, self.obj)
    670         indexer = self._get_setitem_indexer(key)
--> 671         self._setitem_with_indexer(indexer, value)
    672 
    673     def _validate_key(self, key, axis: int):

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
    870         else:
    871 
--> 872             indexer, missing = convert_missing_indexer(indexer)
    873 
    874             if missing:

~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in convert_missing_indexer(indexer)
   2342 
   2343         if isinstance(indexer, bool):
-> 2344             raise KeyError("cannot use a single bool to index into setitem")
   2345         return indexer, True
   2346 

KeyError: 'cannot use a single bool to index into setitem'

fbdesignpro / sweetviz Goto Github PK

sweetviz's People

Contributors

Stargazers

Watchers

Forkers

sweetviz's Issues

MWE

Error

MWE

Error

Recommend Projects

Recommend Topics

Recommend Org