fbdesignpro / sweetviz Goto Github PK
View Code? Open in Web Editor NEWVisualize and compare datasets, target values and associations, with one line of code.
License: MIT License
Visualize and compare datasets, target values and associations, with one line of code.
License: MIT License
I was just reading over this article and thought that while having the null count on each feature is really nice, having something that allows us to see the nulls over time as well might be beneficial in some cases.
https://towardsdatascience.com/visualizing-the-nothing-ae6daccc9197
The Sweetviz data analysis report is incorrect in my case.
The report in PDF format: Analyst.html.pdf
Code to reproduce the error:
from pandas import read_csv, DataFrame
!pip install sweetviz
import sweetviz as sv
df = read_csv('https://raw.githubusercontent.com/wanderloop/WanderlustAI/master/assumed_pha_thousand.csv', low_memory=True,)
keep_cols = 'long lat MID_POINT_X'.split()
df = df.loc[:, keep_cols]
df.rename(columns={'MID_POINT_X': 'MPX',}, inplace=True,)
keep_cols = 'long lat MPX'.split()
data_analyst_report = sv.analyze(df)
data_analyst_report.show_html('Analyst.html')
I get an error as :
TypeError: cannot do label indexing on <class 'pandas.core.indexes.numeric.Int64Index'> with these indexers [16.12000084] of <class 'float'>
when trying to compare the train and test dataset for loan prediction.
The code I have written is :
my_report = sv.compare([df_train,"Train"],[df_test,"Test"],"Loan_Status")
I was working on the online retail dataset.
In this dataset, the variable 'Invoice Number' is a 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'C', it indicates a Cancellation.
And once I run the following code I get the error below:
I cannot use the command "pip install sweetviz" since it gives me back an error message ("ERROR: Could not find a version that satisfies the requirement sweetviz (from versions: none)
ERROR: No matching distribution found for sweetviz").
So I tried to use the command "git clone https://github.com/fbdesignpro/sweetviz.git ", and it worked out, but when I import the library in python interpreter it seems sweetviz does not have any module I can use. Am I doing anything wrong?
When I try to use Sweetviz in conjunction with Sweetviz, it closes Tkinter for some reason. It is very strange.
I am running sweetviz to analyze a data set containing 638 features and 259,000 rows. The report building process takes quite some time but it is ok considering the dataset size. However, once I browse the resulting graph the report gets cut off after feature 502. No output after that. I can see the tabs but no information is displayed.
I am trying to use sweetviz within a streamlit workflow. Sweetviz automatically opens up an external browser, which is nice but not appropriate within streamlit which is already producing output in the browser. Instead, I would like to include the output of sweetviz as a "control" within streamlit similar to:
st.write(sweetviz.analyze(df))
much like I can already embed plotly or other frameworks that produce web output:
st.plotly_chart(fig)
Here are a couple thoughts about how this could work. As a first attempt, I thought about simply putting an if statement around the code that opens an external browser:
if load_browser == True:
# Not sure how to work around this: not fatal but annoying...
# https://bugs.python.org/issue5993
webbrowser.open('file://' + os.path.realpath(filepath))
This disables automatically opening a browser. Then I attempted to embed the html output inside the streamlit page, but the output from sweetviz includes an entire HTML including head and body, which I can't then directly include inside the body of another html page.
Therefore, there should be another function I can call which produces HTML such that I can embed it in an existing html page.
In the html report generated when i click on the card of a column to select then there is no way to deselect it i.e. I can't select any other card.
This would be useful in all contexts where the library is used in a context which already shows HTML, e.g. Jupyter notebooks, dash, etc.
What is really needed is to retrieve the generated div for the report plus the necessary javascript and styling. Depending on how the HTML output is designed, it may be sufficient to include javascript and styling only once in each page.
I did not look at the generated HTML in detail, but for this to work it would also be useful if all HTML classes would have a choosable prefix and any ids that are used for one report are generated so that they do not clash with any other report embedded in the same HTML.
Installation via pip on linux 16 does not seem to work:
sudo pip3 install sweetviz
ERROR: Could not find a version that satisfies the requirement sweetviz (from versions: none)
I ran into a "FloatingPointError: divide by zero encountered in true_divide" in the pairwise feature portion of the code. Apparently there was a divide by zero issue in the cov part of the underlying code.
The trace of the error is as follows:
file: sv_public.py, line 13, in analyze, pairwise_analysis, feat_cfg)
file: dataframe_report.py, line 243, in init, self.process_associations(features_to_process, source_target_series, compare_target series
file: dataframe_report.py, line 423, in process_associations, feature.source.corr(other.source, method='pearson')
file: series.py line 2254, in corr, this.values, other.values, method=method, min_periods=min_periods
file: nanops.py, line 69, in _f, return f(*args,*kwargs)
file: nanops.py, line 1240, in nancorr, return f(a,b)
file: nanops.py, line 1256, in _pearson, return np.corrcoef(a,b)[0,1]
file: <array_function internals>, line 6, in corrcoef
file: function_base.py,line 2526 in corrcoef, c=cov(x,y,rowvar)
file: <array_function internals>, line 6, in cov
file: function_base.py, line 2455, in cov, c=np.true_divide(1,fact)
My dataframe had some empty strings where nulls should have been, but there were other columns that had similar features, but they never threw this error.
I'm trying to generate html report by executing show_html() but I do not see a output file.
IDE: Google colab.
TypeError: Column InvoiceNo has a 'mixed' inferred_type (as determined by Pandas). This is is not currently supported; column types should not contain mixed data. e.g. only float, or strings, but not a combination.
I run the command pip install sweetviz
. The library was successfully installed but when I run the command import sweetviz as sv
, I have the error : AttributeError: module 'sweetviz' has no attribute 'from_dython'
First of all it’s awesome! Many thanks for your effort on data visualization! There is a small issue maybe, the html report lacks a meta tag showing the charset as “utf-8”; by adding it, the report can correctly show the MBCS characters and will catch eyes of more global analysts.
Thanks again! Hope this project goes better!
Hi, first of all, congrats on the project. Haven’t used extensively yet but will do so soon.
Just dropping this note because I noticed some inconsistencies in the documentation and the medium article. On the medium article the analize() function is mention but not used. Here on git compare() is mentioned but not used.
Just wanted to let you know....
File "C:...\Python\Python37\site-packages\sweetviz\graph.py", line 58, in set_style
matplotlib.style.use(styles_in_final_location)
AttributeError: module 'matplotlib' has no attribute 'style'
Hi, the execution throws the "Zero Division Error".
I'm new with Jupiter Notebook and Python.
C:\Anaconda3\lib\site-packages\sweetviz\graph_associations.py in heatmap(y, x, figure_size, **kwargs)
224
225 # Scale with num squares
--> 226 size_scale = size_scale / len(x)
227 def value_to_size(val):
228 if size_min == size_max:
ZeroDivisionError: float division by zero
Thanks a lot!
Luciano.-
File "c:\users\appdata\local\programs\python\python37\lib\site-packages\pandas\core\base.py", line 1220, in _reduce
klass=self.class.name, op=name
TypeError: DatetimeIndex cannot perform the operation sum
Columns names need to be strings : I've got an error when using integers (happens when calling sweetviz.DataframeReport). This could be easily changed by modifying the line 215 in \sweetviz\dataframe_report.py:
From
self.progress_bar.set_description(':' + f.source.name + '')
To
self.progress_bar.set_description(':' + str(f.source.name) + '')
----------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-59-d8e5410a0f22> in <module>
----> 1 sweet_report = sweetviz.analyze(source=[df, "Train"], target_feat="positive")
~\anaconda3\envs\datascience\lib\site-packages\sweetviz\sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
10 feat_cfg: FeatureConfig = None,
11 pairwise_analysis: str = 'auto'):
---> 12 report = sweetviz.DataframeReport(source, target_feat, None,
13 pairwise_analysis, feat_cfg)
14 return report
~\anaconda3\envs\datascience\lib\site-packages\sweetviz\dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
213 for f in features_to_process:
214 # start = time.perf_counter()
--> 215 self.progress_bar.set_description(':' + f.source.name + '')
216 self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
217 self.progress_bar.update(1)
TypeError: can only concatenate str (not "int") to str
This seems to be a possible bug:
Date: Jul 22, 2020
platform: Macos
environment: conda custom environment
sweetviz version: sweetviz==1.0b3
np.__version__ # 1.18.4
pd.__version__ # 1.0.3
import numpy as np
import pandas as pd
import seaborn as sns
import sweetviz
df = sns.load_dataset('titanic')
display(df.head(2))
feat_cfg = sweetviz.FeatureConfig(skip="deck")
my_report = sweetviz.compare_intra(df,
df["sex"] == "male",
["Male", "Female"],
'survived',
feat_cfg)
my_report.show_html('compare_male_vs_female.html')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
1138 else:
-> 1139 self.index._engine.set_value(self._values, label, value)
1140 except (KeyError, TypeError):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: True
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-2-6f7575a96558> in <module>
12 ["Male", "Female"],
13 'survived',
---> 14 feat_cfg)
15 my_report.show_html('compare_male_vs_female.html')
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/sv_public.py in compare_intra(source_df, condition_series, names, target_feat, feat_cfg, pairwise_analysis)
42 report = sweetviz.DataframeReport([data_true, names[0]], target_feat,
43 [data_false, names[1]],
---> 44 pairwise_analysis, feat_cfg)
45 return report
46
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
215 # start = time.perf_counter()
216 self.progress_bar.set_description(':' + f.source.name + '')
--> 217 self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
218 self.progress_bar.update(1)
219 # print(f"DONE FEATURE------> {f.source.name}"
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
97 # Explicitly show missing categories on each set
98 if compare_type == FeatureType.TYPE_CAT or compare_type == FeatureType.TYPE_BOOL:
---> 99 fill_out_missing_counts_in_other_series(to_process.compare_counts, to_process.source_counts)
100 fill_out_missing_counts_in_other_series(to_process.source_counts, to_process.compare_counts)
101 returned_feature_dict["compare"] = dict()
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in fill_out_missing_counts_in_other_series(my_counts, other_counts)
43 if my_counts[to_fill].index.dtype.name == 'category':
44 my_counts[to_fill] = my_counts[to_fill].reindex(my_counts[to_fill].index.add_categories(key))
---> 45 my_counts[to_fill].at[key] = 0
46
47 def add_series_base_stats_to_dict(series: pd.Series, counts: dict, updated_dict: dict) -> dict:
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
2192 key = list(self._convert_key(key, is_setter=True))
2193 key.append(value)
-> 2194 self.obj._set_value(*key, takeable=self._takeable)
2195
2196
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
1140 except (KeyError, TypeError):
1141 # set using a non-recursive method
-> 1142 self.loc[label] = value
1143
1144 return self
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
669 key = com.apply_if_callable(key, self.obj)
670 indexer = self._get_setitem_indexer(key)
--> 671 self._setitem_with_indexer(indexer, value)
672
673 def _validate_key(self, key, axis: int):
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
870 else:
871
--> 872 indexer, missing = convert_missing_indexer(indexer)
873
874 if missing:
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in convert_missing_indexer(indexer)
2342
2343 if isinstance(indexer, bool):
-> 2344 raise KeyError("cannot use a single bool to index into setitem")
2345 return indexer, True
2346
KeyError: 'cannot use a single bool to index into setitem'
TypeError Traceback (most recent call last)
in
1 #analyzing the dataset
----> 2 report = sv.analyze(channelmemberhistory)
3 #display the report
4 report.show_html('MemberHistory.html')
~/.local/lib/python3.6/site-packages/sweetviz/sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
11 pairwise_analysis: str = 'auto'):
12 report = sweetviz.DataframeReport(source, target_feat, None,
---> 13 pairwise_analysis, feat_cfg)
14 return report
15
~/.local/lib/python3.6/site-packages/sweetviz/dataframe_report.py in init(self, source, target_feature_name, compare, pairwise_analysis, fc)
214 # start = time.perf_counter()
215 self.progress_bar.set_description(':' + f.source.name + '')
--> 216 self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
217 self.progress_bar.update(1)
218 # print(f"DONE FEATURE------> {f.source.name}"
~/.local/lib/python3.6/site-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
90
91 # Establish base stats
---> 92 add_series_base_stats_to_dict(to_process.source, to_process.source_counts, returned_feature_dict)
93 if to_process.compare is not None:
94 add_series_base_stats_to_dict(to_process.compare, to_process.compare_counts, compare_dict)
~/.local/lib/python3.6/site-packages/sweetviz/series_analyzer.py in add_series_base_stats_to_dict(series, counts, updated_dict)
42 base_stats = updated_dict["base_stats"]
43 num_total = counts["num_rows_total"]
---> 44 num_zeros = series[series == 0].sum()
45 non_nan = counts["num_rows_with_data"]
46 base_stats["total_rows"] = num_total
~/.local/lib/python3.6/site-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, min_count, **kwargs)
11180 skipna=skipna,
11181 numeric_only=numeric_only,
11182 min_count=min_count,
11183 )
11184
~/.local/lib/python3.6/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
3901 numeric_only=numeric_only,
3902 filter_type=filter_type,
-> 3903 **kwds,
3904 )
3905
~/.local/lib/python3.6/site-packages/pandas/core/base.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
1058 if func is None:
1059 raise TypeError(
-> 1060 f"{type(self).name} cannot perform the operation {name}"
1061 )
1062 return func(skipna=skipna, **kwds)
First off, love this! installed and run it on a 88K+ line dataset. loved how it came out.
Now, I got about 15 errors saying the same thing...
c:\users\shawn\python38\lib\site-packages\matplotlib\backends\backend_agg.py:214: RuntimeWarning: Glyph 37239 missing from current font.
font.set_text(s, 0.0, flags=flags)
This might be my computer/matplotlib version (3.2.1) or who knows.
thanks for a great lib
Some suggestions:
More features:
63 if predetermined_type_target not in (FeatureType.TYPE_BOOL,
64 FeatureType.TYPE_NUM):
---> 65 raise ValueError
66 self.predetermined_type_target = predetermined_type_target
67 else:
ValueError:
this error raised in spite of target being 1,2,3 values and type being int64
module 'sweetviz' has no attribute 'analyze'
Suggestion:
This is an open suggestion that the font color of "missing" in report HTML would be nicer (IMHO) it has color red.
version: 1.0.3
date: Jul 22, 2020
Currently "sweetviz" only has bar-charts for visualizations. For medium-size data analysis (such as titanic or Boston housing) it is not much costly to show box-plots as well as bar-plots. For a larger dataset, it can be made optional in config.ini
file and can also be determined file size to make it true or false.
For example:
if file size < 50MB:
show boxplots and bar plots
else:
show only bar plots
Hey guys, I'm getting an error when handling integer columns but the error message is not very clear for me to understand what is going on. So far it looks like a bug to me. Here it goes.
We start by importing basic stuff and generate a pandas dataframe with 4 columns containing random real numbers, plus an integer column named 'target' with values 1 and 2.
import sweetviz as sv
import pandas as pd
import numpy as np
np.random.seed(42)
np_data = np.random.randn(10, 4)
df = pd.DataFrame(np_data, columns=['col1', 'col2', 'col3', 'col4'])
df['target'] = 1.0
df['target'].iloc[5:] = 2.
df['target'] = df['target'].astype(int)
Taking a look at the original types of the dataframe (df.dtypes
), we have as a result:
col1 float64
col2 float64
col3 float64
col4 float64
target int32
dtype: object
Error: TypeError
compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])
compareReport.show_html()
gives this message:
TypeError Traceback (most recent call last)
<ipython-input-54-8e3e89553904> in <module>
1 #feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
----> 2 compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"])#, feat_cfg=feature_config, target_feat='target')
3 compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"
~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\sv_public.py in compare_intra(source_df, condition_series, names, target_feat, feat_cfg, pairwise_analysis)
42 report = sweetviz.DataframeReport([data_true, names[0]], target_feat,
43 [data_false, names[1]],
---> 44 pairwise_analysis, feat_cfg)
45 return report
46
~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
215 # start = time.perf_counter()
216 self.progress_bar.set_description(':' + f.source.name + '')
--> 217 self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
218 self.progress_bar.update(1)
219 # print(f"DONE FEATURE------> {f.source.name}"
~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\series_analyzer.py in analyze_feature_to_dictionary(to_process)
92 compare_type = determine_feature_type(to_process.compare,
93 to_process.compare_counts,
---> 94 returned_feature_dict["type"], "COMPARED")
95 if compare_type != FeatureType.TYPE_ALL_NAN and \
96 source_type != FeatureType.TYPE_ALL_NAN:
~\AppData\Local\Continuum\anaconda3\envs\sweetbug\lib\site-packages\sweetviz\type_detection.py in determine_feature_type(series, counts, must_be_this_type, which_dataframe)
73 var_type = FeatureType.TYPE_TEXT
74 else:
---> 75 raise TypeError(f"Cannot force series '{series.name}' in {which_dataframe} to be from its type {var_type} to\n"
76 f"DESIRED type {must_be_this_type}. Check documentation for the possible coercion possibilities.\n"
77 f"This can be solved by changing the source data or is sometimes caused by\n"
TypeError: Cannot force series 'target' in COMPARED to be from its type FeatureType.TYPE_CAT to
DESIRED type FeatureType.TYPE_BOOL. Check documentation for the possible coercion possibilities.
This can be solved by changing the source data or is sometimes caused by
a feature type mismatch between source and compare dataframes.
If I explicitly supply the feat_cfg argument the result is the same.
feature_config = sv.FeatureConfig(force_num=['col1', 'col2', 'col3', 'col4'], force_cat='target')
compareReport = sv.compare_intra(df, df['target'] == 1, ["Complete", "Incomplete"], feat_cfg=feature_config)
compareReport.show_html() # Default arguments will generate to "SWEETVIZ_REPORT.html"
However, if I add 10 to the 'target' column (it will now have 11 and 12 as values), the report is generated without errors.
Am I missing something or it is indeed a bug?
Error thrown up during analyze(dataframe), right after :PAIRWISE DONE: and Creating Associations graph...
Traceback (most recent call last):
File "", line 1, in
myreport = sv.analyze(df)
File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\sv_public.py", line 13, in analyze
pairwise_analysis, feat_cfg)
File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\dataframe_report.py", line 246, in init
self._association_graphs["all"] = GraphAssoc(self, "all", self._associations)
File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 165, in init
f = corrplot(graph_data, dataframe_report)
File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 410, in corrplot
dataframe_report = dataframe_report
File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 318, in heatmap
cur_size[1] / 2, facecolor=value_to_color(color[index]),
File "C:\Users\cnble\anaconda37\lib\site-packages\sweetviz\graph_associations.py", line 210, in value_to_color
ind = int(val_position * (n_colors - 1)) # target index in the color palette
ValueError: cannot convert float NaN to integer
sv.analyze is woking well in DOS mode but fails in python 3.7.7 shell,AttributeError: module 'sweetviz' has no attribute 'analyze'
installation is successful,still i am getting error.
ModuleNotFoundError: No module named 'sweetviz'
While passing dataframe to get a sweetviz report it throws the following error:
ValueError Traceback (most recent call last)
in
----> 1 da_report=sv.analyze(features)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sweetviz\sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
11 pairwise_analysis: str = 'auto'):
12 report = sweetviz.DataframeReport(source, target_feat, None,
---> 13 pairwise_analysis, feat_cfg)
14 return report
15
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sweetviz\dataframe_report.py in init(self, source, target_feature_name, compare, pairwise_analysis, fc)
58 raise ValueError('"source" parameter should either be a string or a list of 2 elements: [dataframe, "Name"].')
59 if len(su.get_duplicate_cols(source_df)) > 0:
---> 60 raise ValueError('Duplicate column names detected in "source"; this is not supported.')
61
62 all_source_names = [cur_name for cur_name, cur_series in source_df.iteritems()]
ValueError: Duplicate column names detected in "source"; this is not supported.
NOTES-
I have no two columns with same names.
This is just a suggestion to the developer of this awesome module:
Generally to get the version number of a python module we do:
import numpy
numpy.__version__
The dunder method __version__ gives the version number.
I suggest to add this attribute to this library so that this gives the version number:
import sweetviz
print(sweetviz.__version__)
Hi, was running compare_intra()
on my dataframe and encountered this error:
TypeError: conversion from numpy.float32 to Decimal is not supported
lib/python3.7/site-packages/sweetviz/sv_html_formatters.py in fmt_smart(value)
47 return "0.00"
48 elif absolute < 0.001:
---> 49 return f"{Decimal(value):.2e}"
50 elif absolute < 0.1:
51 return f"{value:.3f}"
Should the value
be converted into a Python item first?
This is a great package and I love the idea of a one-stop data profile approach. My only note is that I cannot seem to get the config file (sweet_viz.ini) to apply my changes. I tried directing to my own .ini file with minor changes to sweet_vix.ini and the changes did not take. I then tried changing the setting within sweet_viz.ini directly (by setting the logo =0) and it still did not take. I used the code from the front page: sv.config_parser.read("Override.ini")
Perhaps there is an example notebook that has used this successfully that could be shared?
Hi there,
I try to use sweetviz in local:
And in anaconda enterprise:
Both lead to the same issue. The display of the output in jupyter lab and notebook isn't visible.
Local:
The report has been generated but not display.
How to fix it?
Best
Appreciate the effort for this library and I see the potential!. Tried out at work and when the html displayed the charts for features are partially hidden. Thought that I could scroll over to see them but no horizontal scroll bar was available.
when i tried to analyze my data frame i am getting this error.
---->1 my_report = sv.analyze(df)
2 my_report.show_html()
Hi, attempting to install sweetviz using pip install sweetviz, but kept encountering following error message (reproduced below)
Am using pandas version 1.0.1. Kindly advise, thanks.
Installing collected packages: importlib-resources, pandas, tqdm, sweetviz Attempting uninstall: pandas Found existing installation: pandas 1.0.1 Uninstalling pandas-1.0.1: ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\users\\65943\\anaconda3\\lib\\site-packages\\~andas\\_libs\\algos.cp37-win_amd64.pyd' Consider using the
--user option or check the permissions.
Hi,
I am not able to install sweetviz library. It Installed in the anaconda prompt but not working in the Jupyter notebook.
I am able to generate the same report on Titanic data as in the Medium articles. However, when I try to test the Boston housing data, I get the errors as below:
ValueError Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in get_slice_bound(self, label, side, kind)
5166 try:
-> 5167 return self._searchsorted_monotonic(label, side)
5168 except ValueError:
~\AppData\Local\Continuum\anaconda3\envs\envSDS\lib\site-packages\pandas\core\indexes\base.py in _searchsorted_monotonic(self, label, side)
5127
-> 5128 raise ValueError("index must be monotonic increasing or decreasing")
5129
ValueError: index must be monotonic increasing or decreasing
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
in
----> 1 my_report = sv.analyze(dfx)
Any ideas on the error?
Thanks.
I've a dataset which has date_time column of the format: 2020-07-12 11:37:25
I get the following error:
:date_time: |███ | [ 14%] 00:00 -> (00:03 left)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-77-cbd387f7f43e> in <module>()
1 #analyzing the dataset
----> 2 techglares_report = sv.analyze(df)
6 frames
/usr/local/lib/python3.6/dist-packages/sweetviz/sv_public.py in analyze(source, target_feat, feat_cfg, pairwise_analysis)
11 pairwise_analysis: str = 'auto'):
12 report = sweetviz.DataframeReport(source, target_feat, None,
---> 13 pairwise_analysis, feat_cfg)
14 return report
15
/usr/local/lib/python3.6/dist-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
214 # start = time.perf_counter()
215 self.progress_bar.set_description(':' + f.source.name + '')
--> 216 self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
217 self.progress_bar.update(1)
218 # print(f"DONE FEATURE------> {f.source.name}"
/usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
90
91 # Establish base stats
---> 92 add_series_base_stats_to_dict(to_process.source, to_process.source_counts, returned_feature_dict)
93 if to_process.compare is not None:
94 add_series_base_stats_to_dict(to_process.compare, to_process.compare_counts, compare_dict)
/usr/local/lib/python3.6/dist-packages/sweetviz/series_analyzer.py in add_series_base_stats_to_dict(series, counts, updated_dict)
42 base_stats = updated_dict["base_stats"]
43 num_total = counts["num_rows_total"]
---> 44 num_zeros = series[series == 0].sum()
45 non_nan = counts["num_rows_with_data"]
46 base_stats["total_rows"] = num_total
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, min_count, **kwargs)
11180 skipna=skipna,
11181 numeric_only=numeric_only,
> 11182 min_count=min_count,
11183 )
11184
/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
3901 numeric_only=numeric_only,
3902 filter_type=filter_type,
-> 3903 **kwds,
3904 )
3905
/usr/local/lib/python3.6/dist-packages/pandas/core/base.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
1058 if func is None:
1059 raise TypeError(
-> 1060 f"{type(self).__name__} cannot perform the operation {name}"
1061 )
1062 return func(skipna=skipna, **kwds)
TypeError: DatetimeIndex cannot perform the operation sum
I'm running sweetviz
on Google Colab.
Is there any way to solve this error?
This seems to be a possible bug:
Date: Jul 22, 2020
platform: Macos
environment: conda custom environment
sweetviz version: sweetviz==1.0b3
np.__version__ # 1.18.4
pd.__version__ # 1.0.3
import numpy as np
import pandas as pd
import seaborn as sns
import sweetviz
df = sns.load_dataset('titanic')
my_report = sweetviz.compare(
[df.query("sex == 'male'"), "Male"],
[df.query("sex == 'female'"), "Female"],
"survived"
)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
1138 else:
-> 1139 self.index._engine.set_value(self._values, label, value)
1140 except (KeyError, TypeError):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.set_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: True
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-62-d04e4b8c2811> in <module>
2 [df.query("sex == 'male'"), "Male"],
3 [df.query("sex == 'female'"), "Female"],
----> 4 "survived"
5 )
6
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/sv_public.py in compare(source, compare, target_feat, feat_cfg, pairwise_analysis)
21 pairwise_analysis: str = 'auto'):
22 report = sweetviz.DataframeReport(source, target_feat, compare,
---> 23 pairwise_analysis, feat_cfg)
24 return report
25
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/dataframe_report.py in __init__(self, source, target_feature_name, compare, pairwise_analysis, fc)
215 # start = time.perf_counter()
216 self.progress_bar.set_description(':' + f.source.name + '')
--> 217 self._features[f.source.name] = sa.analyze_feature_to_dictionary(f)
218 self.progress_bar.update(1)
219 # print(f"DONE FEATURE------> {f.source.name}"
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in analyze_feature_to_dictionary(to_process)
97 # Explicitly show missing categories on each set
98 if compare_type == FeatureType.TYPE_CAT or compare_type == FeatureType.TYPE_BOOL:
---> 99 fill_out_missing_counts_in_other_series(to_process.compare_counts, to_process.source_counts)
100 fill_out_missing_counts_in_other_series(to_process.source_counts, to_process.compare_counts)
101 returned_feature_dict["compare"] = dict()
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/sweetviz/series_analyzer.py in fill_out_missing_counts_in_other_series(my_counts, other_counts)
43 if my_counts[to_fill].index.dtype.name == 'category':
44 my_counts[to_fill] = my_counts[to_fill].reindex(my_counts[to_fill].index.add_categories(key))
---> 45 my_counts[to_fill].at[key] = 0
46
47 def add_series_base_stats_to_dict(series: pd.Series, counts: dict, updated_dict: dict) -> dict:
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
2192 key = list(self._convert_key(key, is_setter=True))
2193 key.append(value)
-> 2194 self.obj._set_value(*key, takeable=self._takeable)
2195
2196
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/series.py in _set_value(self, label, value, takeable)
1140 except (KeyError, TypeError):
1141 # set using a non-recursive method
-> 1142 self.loc[label] = value
1143
1144 return self
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
669 key = com.apply_if_callable(key, self.obj)
670 indexer = self._get_setitem_indexer(key)
--> 671 self._setitem_with_indexer(indexer, value)
672
673 def _validate_key(self, key, axis: int):
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
870 else:
871
--> 872 indexer, missing = convert_missing_indexer(indexer)
873
874 if missing:
~/opt/miniconda3/envs/dataSc/lib/python3.7/site-packages/pandas/core/indexing.py in convert_missing_indexer(indexer)
2342
2343 if isinstance(indexer, bool):
-> 2344 raise KeyError("cannot use a single bool to index into setitem")
2345 return indexer, True
2346
KeyError: 'cannot use a single bool to index into setitem'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.