Git Product home page Git Product logo

pandas-log's Introduction

Hi there, I'm Eyal 👋


Enthusiastic Software Engineer👷
Who appreciates good software engineering 🙏
I have with big passion for Python 🐍, Machine Learning 🤖, Databases 🛢️, Scale and Performance Optimisations🦸 and making all of these easy to use.

Wanna chat? 👉

    GitHub

Latest Blog Posts:

pandas-log's People

Contributors

cmdavis4 avatar dmil avatar eyaltrabelsi avatar gtfuhr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pandas-log's Issues

Ease up finding bottlenecks

Brief Description

I would like to propose adding the precent of total execution time for each operation to ease up finding bottlenecks

A way to enable globally?

Brief Description

I'm looking for a way to enable pandas-log globally without the use of context manager. Is it possible right now? If not, how do you think about this feature? Thanks.

Add tips/watch out section

Brief Description

I would like to propose a section with various of tips like:

  • warn if using iterrows
  • use resample for group by on timestamp

can use dovpanda once it get stabelized

pd.merge nonetype object has no attribute 'memory_usage'

Brief Description

System Information

  • Operating system: Windows
  • OS details (optional):
  • Python version (required): Python 3.6

installed via pip

Minimally Reproducible Code

import pandas as pd
import pandas_log
df_a = pd.DataFrame({'a':[1,2,3],'b':['a','b','c']})
df_b = pd.DataFrame({'c':[11,12,13],'b':['a','b','c']})

with pandas_log.enable():
df = (
pd.merge(df_a,df_b,on='b')
)

Error Messages

--------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-c5cf824082a8> in <module>
      1 with pandas_log.enable():
      2     df = (
----> 3         pd.merge(df_a,df_b,on='b')
      4     )

~\.conda\envs\empl\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     79         copy=copy,
     80         indicator=indicator,
---> 81         validate=validate,
     82     )
     83     return op.get_result()

~\.conda\envs\empl\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    624             self.right_join_keys,
    625             self.join_names,
--> 626         ) = self._get_merge_keys()
    627 
    628         # validate the merge keys dtypes. We may need to coerce

~\.conda\envs\empl\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
   1031 
   1032         if right_drop:
-> 1033             self.right = self.right._drop_labels_or_levels(right_drop)
   1034 
   1035         return left_keys, right_keys, join_names

~\.conda\envs\empl\lib\site-packages\pandas\core\generic.py in _drop_labels_or_levels(self, keys, axis)
   1860             # Handle dropping columns labels
   1861             if labels_to_drop:
-> 1862                 dropped.drop(labels_to_drop, axis=1, inplace=True)
   1863         else:
   1864             # Handle dropping column levels

~\.conda\envs\empl\lib\site-packages\pandas_flavor\register.py in __call__(self, *args, **kwargs)
     27             @wraps(method)
     28             def __call__(self, *args, **kwargs):
---> 29                 return method(self._obj, *args, **kwargs)
     30 
     31         register_dataframe_accessor(method.__name__)(AccessorMethod)

~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_log.py in wrapped(*args, **fn_kwargs)
    134                 full_signature,
    135                 silent,
--> 136                 verbose,
    137             )
    138             return output_df

~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_log.py in _run_method_and_calc_stats(fn, fn_args, fn_kwargs, input_df, full_signature, silent, verbose)
    104 
    105         output_df, execution_stats = get_execution_stats(
--> 106             fn, input_df, fn_args, fn_kwargs
    107         )
    108 

~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_execution_stats.py in get_execution_stats(fn, input_df, fn_args, fn_kwargs)
     35 
     36     input_memory_size = StepStats.calc_df_series_memory(input_df)
---> 37     output_memory_size = StepStats.calc_df_series_memory(output_df)
     38 
     39     ExecutionStats = namedtuple(

~\.conda\envs\empl\lib\site-packages\pandas_log\pandas_execution_stats.py in calc_df_series_memory(df_or_series)
     78     @staticmethod
     79     def calc_df_series_memory(df_or_series):
---> 80         memory_size = df_or_series.memory_usage(index=True, deep=True)
     81         return (
     82             humanize.naturalsize(memory_size.sum())

AttributeError: 'NoneType' object has no attribute 'memory_usage'

No module humanize or pandas-flavor

Brief Description

Are the required imports for this package up-to-date? I installed pip install pandas-log, then got module import errors as I tried importing pandas log to my notebook:

--> 114             import pandas_log
    115 
    116             with pandas_logs.enable():

~/GitHub/simple-tech-challenges/venv/lib/python3.8/site-packages/pandas_log/__init__.py in <module>
      2 
      3 """Top-level package for pandas-log."""
----> 4 from .pandas_log import *
      5 
      6 __author__ = """Eyal Trabelsi"""

~/GitHub/simple-tech-challenges/venv/lib/python3.8/site-packages/pandas_log/pandas_log.py in <module>
     15     restore_pandas_func_copy,
     16 )
---> 17 from pandas_log.pandas_execution_stats import StepStats, get_execution_stats
     18 
     19 __all__ = ["auto_enable", "auto_disable", "enable"]

~/GitHub/simple-tech-challenges/venv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in <module>
     22 with warnings.catch_warnings():
     23     warnings.simplefilter("ignore")
---> 24     import humanize
     25 
     26 

ModuleNotFoundError: No module named 'humanize'

Integrate with Python logging module

Integrate with Python logging

I would love a way to integrate this with the standard Python logging module, or ar dropin replacement thereof, such as loguro.

Such an integration would make it more useful when running production data-science code, and ease adoption of this library, which is thinks is a really interesting idea!

All logs should be suppressed after disable being called

Brief Description

Currently, after disabling method some methods still reproduce logs altough they shouldn't, because the reference of the pandas method is diffrent from the instance method.

Minimally Reproducible Code

with enable():
    df = pd.read_csv("../examples/pokemon.csv")
    (
        df.query("legendary==0")
        .query("type_1=='fire' or type_2=='fire'")
    )
df.query("legendary==1")

Add CI/CD

Brief Description

Add Travis for CI/CD

TypeError: data type not understood

Brief Description

I'm trying to run pandas-log on my chain and it fails with the error:

TypeError: data type not understood

System Information

  • Python version (required): Python 3.8.5
  • Pandas version: 1.3.2

Minimally Reproducible Code

import pandas as pd
autos = pd.read_csv('https://github.com/mattharrison/datasets/raw/master/data/vehicles.csv.zip')
def to_tz(df_, time_col, tz_offset, tz_name):
    return (df_
             .groupby(tz_offset)
             [time_col]
             .transform(lambda s: pd.to_datetime(s)
                 .dt.tz_localize(s.name, ambiguous=True)
                 .dt.tz_convert(tz_name))
            )


def tweak_autos(autos):
    cols = ['city08', 'comb08', 'highway08', 'cylinders', 'displ', 'drive', 'eng_dscr', 
        'fuelCost08', 'make', 'model', 'trany', 'range', 'createdOn', 'year']
    return (autos
     [cols]
     .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),
             displ=autos.displ.fillna(0).astype('float16'),
             drive=autos.drive.fillna('Other').astype('category'),
             automatic=autos.trany.str.contains('Auto'),
             speeds=autos.trany.str.extract(r'(\d)+').fillna('20').astype('int8'),
             tz=autos.createdOn.str.extract(r'\d\d:\d\d ([A-Z]{3}?)').replace('EDT', 'EST5EDT'),
             str_date=(autos.createdOn.str.slice(4,19) + ' ' + autos.createdOn.str.slice(-4)),
             createdOn=lambda df_: to_tz(df_, 'str_date', 'tz', 'US/Eastern'),
             ffs=autos.eng_dscr.str.contains('FFS')
            )
     .pipe(show, rows=2, title='New Cols')            
     .astype({'highway08': 'int8', 'city08': 'int16', 'comb08': 'int16', 'fuelCost08': 'int16',
              'range': 'int16',  'year': 'int16', 'make': 'category'})
     .drop(columns=['trany', 'eng_dscr'])
    )
import pandas_log
with pandas_log.enable():
    tweak_autos(autos)

Error Messages

1) fillna(value: 'object | ArrayLike | None' ="20", method: 'FillnaOptions | None' = None, axis: 'Axis | None' = None, inplace: 'bool' = False, limit=None, downcast=None):
	Metadata:
	* Filled 837 with 20.
	Execution Stats:
	* Execution time: Step Took 0.001512 seconds.

1) replace(to_replace="EDT", value="EST5EDT", inplace: 'bool' = False, limit=None, regex: 'bool' = False, method: 'str' = 'pad'):
	Execution Stats:
	* Execution time: Step Took 0.001215 seconds.

1) groupby(by="tz", axis: 'Axis' = 0, level: 'Level | None' = None, as_index: 'bool' = True, sort: 'bool' = True, group_keys: 'bool' = True, squeeze: 'bool | lib.NoDefault' = <no_default>, observed: 'bool' = False, dropna: 'bool' = True):
	Metadata:
	* Grouping by tz resulted in 2 groups like 
		EST,
		EST5EDT,
	  and more.
	Execution Stats:
	* Execution time: Step Took 0.006409 seconds.
/home/matt/envs/menv/lib/python3.8/site-packages/pandas_log/patched_logs_functions.py:249: UserWarning: Some pandas logging may involve copying dataframes, which can be time-/memory-intensive. Consider passing copy_ok=False to the enable/auto_enable functions in pandas_log if issues arise.
  warnings.warn(COPY_WARNING_MSG)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-f6bfc55c635b> in <module>
     33 import pandas_log
     34 with pandas_log.enable():
---> 35     tweak_autos(autos)

<ipython-input-1-f6bfc55c635b> in tweak_autos(autos)
     14     cols = ['city08', 'comb08', 'highway08', 'cylinders', 'displ', 'drive', 'eng_dscr', 
     15         'fuelCost08', 'make', 'model', 'trany', 'range', 'createdOn', 'year']
---> 16     return (autos
     17      [cols]
     18      .assign(cylinders=autos.cylinders.fillna(0).astype('int8'),

~/envs/menv/lib/python3.8/site-packages/pandas_flavor/register.py in __call__(self, *args, **kwargs)
     27             @wraps(method)
     28             def __call__(self, *args, **kwargs):
---> 29                 return method(self._obj, *args, **kwargs)
     30 
     31         register_dataframe_accessor(method.__name__)(AccessorMethod)

~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_log.py in wrapped(*args, **fn_kwargs)
    184 
    185             input_df, fn_args = args[0], args[1:]
--> 186             output_df = _run_method_and_calc_stats(
    187                 fn,
    188                 fn_args,

~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_log.py in _run_method_and_calc_stats(fn, fn_args, fn_kwargs, input_df, full_signature, silent, verbose, copy_ok, calculate_memory)
    168             output_df,
    169         )
--> 170         step_stats.log_stats_if_needed(silent, verbose, copy_ok)
    171         if isinstance(output_df, pd.DataFrame) or isinstance(output_df, pd.Series):
    172             step_stats.persist_execution_stats()

~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in log_stats_if_needed(self, silent, verbose, copy_ok)
    106 
    107         if verbose or self.fn.__name__ not in DATAFRAME_ADDITIONAL_METHODS_TO_OVERIDE:
--> 108             s = self.__repr__(verbose, copy_ok)
    109             if s:
    110                 # If this method isn't patched and verbose is False, __repr__ will give an empty string, which

~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in __repr__(self, verbose, copy_ok)
    147 
    148         # Step Metadata stats
--> 149         logs, tips = self.get_logs_for_specifc_method(verbose, copy_ok)
    150         metadata_stats = f"\033[4mMetadata\033[0m:\n{logs}" if logs else ""
    151         metadata_tips = f"\033[4mTips\033[0m:\n{tips}" if tips else ""

~/envs/menv/lib/python3.8/site-packages/pandas_log/pandas_execution_stats.py in get_logs_for_specifc_method(self, verbose, copy_ok)
    128 
    129         log_method = partial(log_method, self.output_df, self.input_df)
--> 130         logs, tips = log_method(*self.fn_args, **self.fn_kwargs)
    131         return logs, tips
    132 

~/envs/menv/lib/python3.8/site-packages/pandas_log/patched_logs_functions.py in log_assign(output_df, input_df, **kwargs)
    250             # If copying is ok, we can check how many values actually changed
    251             for col in changed_cols:
--> 252                 values_changed, values_unchanged = num_values_changed(
    253                     input_df[col], output_df[col]
    254                 )

~/envs/menv/lib/python3.8/site-packages/pandas_log/patched_logs_functions.py in num_values_changed(input_obj, output_obj)
    127         isinstance(input_obj, pd.Series)
    128         and isinstance(output_obj, pd.Series)
--> 129         and input_obj.dtype != output_obj.dtype
    130     ):
    131         # Comparing values for equality across dtypes wouldn't be well-defined so we just say they all changed

TypeError: Cannot interpret 'datetime64[ns, US/Eastern]' as a data type

Allow pretty html output

Brief Description

I would like to propose the ability to generate html from history execution

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.