Git Product home page Git Product logo

Comments (2)

ardunn avatar ardunn commented on September 28, 2024

We should also look into profiling the code for slow/memory-hogging spots, in case interpolation isn't the main culprit

from beep.

ardunn avatar ardunn commented on September 28, 2024

I did some memory profiling of the structuring methods. Take it with a grain of salt, according to this stackoverflow, memory_profiler can be inaccurate wrt. loops because of OS chunking

Problem areas

  • interpolate_step
  • interpolate_cycles
  • interpolate_diagnostic_cycles

Full output of memory profiling while structuring:

Raw file size on disk is 140MB. Size of raw loaded dataframe is 179MB.

SIZEOF: raw_data (): 179.946804 MB
MEM: GETTING STRUCTURING PARAMETERS
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   123    434.6 MiB    434.6 MiB           1               def wrapper(*args, **kwargs):
   124    434.6 MiB      0.0 MiB           1                   if args[0]._is_legacy:
   125                                                             raise ValueError(
   126                                                                 f"{args[0].__class__.__name__} is deserialized from a legacy file! Operation not allowed."
   127                                                             )
   128                                                         else:
   129    299.1 MiB   -135.5 MiB           1                       return func(*args, **kwargs)


MEM: Structuring with parameters
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   974    299.1 MiB    299.1 MiB           1       @profile
   975                                             def summarize_diagnostic(self, diagnostic_available):
   976                                                 """
   977                                                 Gets summary statistics for data according to location of
   978                                                 diagnostic cycles in the data
   979                                         
   980                                                 Args:
   981                                                     diagnostic_available (dict): dictionary with diagnostic_types
   982                                                         as list, 'length' of the diagnostic in cycles and location
   983                                                         of the diagnostic by cycle index
   984                                         
   985                                                 Returns:
   986                                                     (DataFrame) of summary statistics by cycle
   987                                         
   988                                                 """
   989                                         
   990    299.1 MiB      0.0 MiB           1           max_cycle = self.raw_data.cycle_index.max()
   991                                                 starts_at = [
   992    299.1 MiB      0.0 MiB           7               i for i in diagnostic_available["diagnostic_starts_at"] if i <= max_cycle
   993                                                 ]
   994    299.1 MiB      0.0 MiB           1           diag_cycles_at = list(
   995    299.1 MiB      0.0 MiB           1               itertools.chain.from_iterable(
   996    299.1 MiB      0.0 MiB           7                   [list(range(i, i + diagnostic_available["length"])) for i in starts_at]
   997                                                     )
   998                                                 )
   999    305.4 MiB      6.3 MiB           1           diag_summary = self.raw_data.groupby("cycle_index").agg(self._diag_aggregation)
  1000                                         
  1001    305.4 MiB      0.0 MiB           1           diag_summary.columns = self._diag_summary_cols
  1002                                         
  1003    305.4 MiB      0.0 MiB           1           diag_summary = diag_summary[diag_summary.index.isin(diag_cycles_at)]
  1004                                         
  1005                                                 diag_summary["coulombic_efficiency"] = (
  1006    305.4 MiB      0.0 MiB           1               diag_summary["discharge_capacity"] / diag_summary["charge_capacity"]
  1007                                                 )
  1008    305.4 MiB      0.0 MiB           1           diag_summary["paused"] = self.raw_data.groupby("cycle_index").apply(
  1009    427.6 MiB    122.2 MiB           1               get_max_paused_over_threshold
  1010                                                 )
  1011                                         
  1012    427.6 MiB      0.0 MiB           1           diag_summary.reset_index(drop=True, inplace=True)
  1013                                         
  1014    427.6 MiB      0.0 MiB           1           diag_summary["cycle_type"] = pd.Series(
  1015    427.6 MiB      0.0 MiB           1               diagnostic_available["cycle_type"] * len(starts_at)
  1016                                                 )
  1017                                         
  1018    427.6 MiB      0.0 MiB           1           diag_summary = self._cast_dtypes(diag_summary, "diagnostic_summary")
  1019                                         
  1020    427.6 MiB      0.0 MiB           1           return diag_summary


SIZEOF: diagnostic_summary (): 0.003069 MB
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 168/168 [00:45<00:00,  3.73it/s]
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   842    427.6 MiB    427.6 MiB           1       @profile
   843                                             # equivalent of get_interpolated_diagnostic_cycles
   844                                             def interpolate_diagnostic_cycles(
   845                                                     self, diagnostic_available, resolution=1000, v_resolution=0.0005
   846                                             ):
   847                                                 """
   848                                                 Interpolates data according to location and type of diagnostic
   849                                                 cycles in the data
   850                                         
   851                                                 Args:
   852                                                     diagnostic_available (dict): dictionary with diagnostic_types
   853                                                         as list, 'length' of the diagnostic in cycles and location
   854                                                         of the diagnostic
   855                                                     resolution (int): resolution of interpolation
   856                                                     v_resolution (float): voltage delta to set for range based interpolation
   857                                         
   858                                                 Returns:
   859                                                     (pd.DataFrame) of interpolated diagnostic steps by step and cycle
   860                                         
   861                                                 """
   862                                                 # Get the project name and the parameter file for the diagnostic
   863    427.6 MiB      0.0 MiB           1           project_name_list = parameters_lookup.get_project_sequence(self.paths["raw"])
   864    427.6 MiB      0.0 MiB           1           diag_path = os.path.join(MODULE_DIR, "procedure_templates")
   865    427.6 MiB      0.0 MiB           1           v_range = parameters_lookup.get_diagnostic_parameters(
   866    427.6 MiB      0.0 MiB           1               diagnostic_available, diag_path, project_name_list[0]
   867                                                 )
   868                                         
   869                                                 # Determine the cycles and types of the diagnostic cycles
   870    427.6 MiB      0.0 MiB           1           max_cycle = self.raw_data.cycle_index.max()
   871                                                 starts_at = [
   872    427.6 MiB      0.0 MiB           7               i for i in diagnostic_available["diagnostic_starts_at"] if i <= max_cycle
   873                                                 ]
   874    427.6 MiB      0.0 MiB           1           diag_cycles_at = list(
   875    427.6 MiB      0.0 MiB           1               itertools.chain.from_iterable(
   876    427.6 MiB      0.0 MiB           7                   [range(i, i + diagnostic_available["length"]) for i in starts_at]
   877                                                     )
   878                                                 )
   879                                                 # Duplicate cycle type list end to end for each starting index
   880    427.6 MiB      0.0 MiB           1           diag_cycle_type = diagnostic_available["cycle_type"] * len(starts_at)
   881    427.6 MiB      0.0 MiB           1           if not len(diag_cycles_at) == len(diag_cycle_type):
   882                                                     errmsg = (
   883                                                         "Diagnostic cycles, {}, and diagnostic cycle types, "
   884                                                         "{}, are unequal lengths".format(diag_cycles_at, diag_cycle_type)
   885                                                     )
   886                                                     raise ValueError(errmsg)
   887                                         
   888    435.4 MiB      7.8 MiB           1           diag_data = self.raw_data[self.raw_data["cycle_index"].isin(diag_cycles_at)]
   889                                         
   890                                                 # Counter to ensure non-contiguous repeats of step_index
   891                                                 # within same cycle_index are grouped separately
   892    439.2 MiB      3.7 MiB           1           diag_data.loc[:, "step_index_counter"] = 0
   893                                         
   894    469.9 MiB  -1064.2 MiB          21           for cycle_index in diag_cycles_at:
   895    458.1 MiB   -990.3 MiB          20               indices = diag_data.loc[diag_data.cycle_index == cycle_index].index
   896    469.9 MiB   -821.1 MiB          20               step_index_list = diag_data.step_index.loc[indices]
   897    469.9 MiB  -1057.6 MiB          20               diag_data.loc[indices, "step_index_counter"] = step_index_list.ne(
   898    469.9 MiB  -1064.0 MiB          20                   step_index_list.shift()
   899                                                     ).cumsum()
   900                                         
   901    406.6 MiB    -63.3 MiB           1           group = diag_data.groupby(["cycle_index", "step_index", "step_index_counter"])
   902                                                 incl_columns = [
   903    406.6 MiB      0.0 MiB           1               "current",
   904    406.6 MiB      0.0 MiB           1               "charge_capacity",
   905    406.6 MiB      0.0 MiB           1               "discharge_capacity",
   906    406.6 MiB      0.0 MiB           1               "charge_energy",
   907    406.6 MiB      0.0 MiB           1               "discharge_energy",
   908    406.6 MiB      0.0 MiB           1               "internal_resistance",
   909    406.6 MiB      0.0 MiB           1               "temperature",
   910    406.6 MiB      0.0 MiB           1               "test_time",
   911                                                 ]
   912                                         
   913    406.6 MiB      0.0 MiB           1           diag_dict = {}
   914    424.0 MiB    -67.3 MiB          18           for cycle in diag_data.cycle_index.unique():
   915    424.0 MiB    -52.6 MiB          17               diag_dict.update({cycle: None})
   916    424.0 MiB    -47.7 MiB          17               steps = diag_data[diag_data.cycle_index == cycle].step_index.unique()
   917    424.0 MiB    -66.2 MiB          17               diag_dict[cycle] = list(steps)
   918                                         
   919    410.4 MiB    -13.6 MiB           1           all_dfs = []
   920    551.9 MiB     75.3 MiB         169           for (cycle_index, step_index, step_index_counter), df in tqdm(group):
   921    551.9 MiB    -59.7 MiB         168               if len(df) < 2:
   922                                                         continue
   923    551.9 MiB    -59.7 MiB         168               if diag_cycle_type[diag_cycles_at.index(cycle_index)] == "hppc":
   924    551.9 MiB    -52.3 MiB         142                   v_hppc_step = [df.voltage.min(), df.voltage.max()]
   925    551.9 MiB    -50.6 MiB         142                   hppc_resolution = int(
   926    551.9 MiB    -50.6 MiB         142                       (df.voltage.max() - df.voltage.min()) / v_resolution
   927                                                         )
   928    551.9 MiB    -50.6 MiB         142                   new_df = interpolate_df(
   929    551.9 MiB    -50.6 MiB         142                       df,
   930    551.9 MiB    -50.6 MiB         142                       field_name="voltage",
   931    551.9 MiB    -50.6 MiB         142                       field_range=v_hppc_step,
   932    551.9 MiB    -50.6 MiB         142                       columns=incl_columns,
   933    551.9 MiB    -47.5 MiB         142                       resolution=hppc_resolution,
   934                                                         )
   935                                                     else:
   936    551.7 MiB     -7.4 MiB          26                   new_df = interpolate_df(
   937    551.7 MiB     -0.4 MiB          26                       df,
   938    551.7 MiB     -0.4 MiB          26                       field_name="voltage",
   939    551.7 MiB     -0.4 MiB          26                       field_range=v_range,
   940    551.7 MiB     -0.4 MiB          26                       columns=incl_columns,
   941    551.8 MiB      2.2 MiB          26                       resolution=resolution,
   942                                                         )
   943                                         
   944    551.9 MiB    -49.6 MiB         168               new_df["cycle_index"] = cycle_index
   945    551.9 MiB    -59.4 MiB         168               new_df["cycle_type"] = diag_cycle_type[diag_cycles_at.index(cycle_index)]
   946    551.9 MiB    -59.4 MiB         168               new_df["step_index"] = step_index
   947    551.9 MiB    -59.4 MiB         168               new_df["step_index_counter"] = step_index_counter
   948    551.9 MiB    -59.4 MiB         168               new_df["step_type"] = diag_dict[cycle_index].index(step_index)
   949    551.9 MiB    -59.4 MiB         168               new_df.astype(
   950                                                         {
   951    551.9 MiB    -59.4 MiB         168                       "cycle_index": "int32",
   952    551.9 MiB    -59.4 MiB         168                       "cycle_type": "category",
   953    551.9 MiB    -59.4 MiB         168                       "step_index": "uint8",
   954    551.9 MiB    -59.4 MiB         168                       "step_index_counter": "int16",
   955    551.9 MiB    -58.8 MiB         168                       "step_type": "uint8",
   956                                                         }
   957                                                     )
   958                                                     new_df["discharge_dQdV"] = (
   959    551.9 MiB    -59.5 MiB         168                   new_df.discharge_capacity.diff() / new_df.voltage.diff()
   960                                                     )
   961                                                     new_df["charge_dQdV"] = (
   962    551.9 MiB    -59.5 MiB         168                   new_df.charge_capacity.diff() / new_df.voltage.diff()
   963                                                     )
   964    551.9 MiB    -59.5 MiB         168               all_dfs.append(new_df)
   965                                         
   966                                                 # Ignore the index to avoid issues with overlapping voltages
   967    557.7 MiB      5.8 MiB           1           result = pd.concat(all_dfs, ignore_index=True)
   968                                                 # Cycle_index gets a little weird about typing, so round it here
   969    558.1 MiB      0.3 MiB           1           result.cycle_index = result.cycle_index.round()
   970    555.1 MiB     -3.0 MiB           1           result = self._cast_dtypes(result, "diagnostic_interpolated")
   971                                         
   972    555.1 MiB      0.0 MiB           1           return result


SIZEOF: diagnostic_summary (): 3.289344 MB
SIZEOF: cycle_indices in interpolate_step (): 0.002184 MB
Interpolating discharge (2.5 - 4.2)V (1000 points): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 231/231 [01:15<00:00,  3.08it/s]
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   512    463.8 MiB    463.8 MiB           1       @profile
   513                                             def interpolate_step(
   514                                                     self,
   515                                                     v_range,
   516                                                     resolution,
   517                                                     step_type="discharge",
   518                                                     reg_cycles=None,
   519                                                     axis="voltage",
   520                                                     desc=None
   521                                             ):
   522                                                 """
   523                                                 Gets interpolated cycles for the step specified, charge or discharge.
   524                                         
   525                                                 Args:
   526                                                     v_range ([Float, Float]): list of two floats that define
   527                                                         the voltage interpolation range endpoints.
   528                                                     resolution (int): resolution of interpolated data.
   529                                                     step_type (str): which step to interpolate i.e. 'charge' or 'discharge'
   530                                                     reg_cycles (list): list containing cycle indicies of regular cycles
   531                                                     axis (str): which column to use for interpolation
   532                                                     desc (str): Description to print to tqdm column.
   533                                         
   534                                                 Returns:
   535                                                     pandas.DataFrame: DataFrame corresponding to interpolated values.
   536                                                 """
   537                                         
   538    463.8 MiB      0.0 MiB           1           if not desc:
   539                                                     desc = \
   540    463.8 MiB      0.0 MiB           1                   f"Interpolating {step_type} ({v_range[0]} - {v_range[1]})V " \
   541                                                         f"({resolution} points)"
   542                                         
   543    463.8 MiB      0.0 MiB           1           if step_type == "discharge":
   544    463.8 MiB      0.0 MiB           1               step_filter = step_is_dchg
   545                                                 elif step_type == "charge":
   546                                                     step_filter = step_is_chg
   547                                                 else:
   548                                                     raise ValueError("{} is not a recognized step type")
   549                                                 incl_columns = [
   550    463.8 MiB      0.0 MiB           1               "test_time",
   551    463.8 MiB      0.0 MiB           1               "voltage",
   552    463.8 MiB      0.0 MiB           1               "current",
   553    463.8 MiB      0.0 MiB           1               "charge_capacity",
   554    463.8 MiB      0.0 MiB           1               "discharge_capacity",
   555    463.8 MiB      0.0 MiB           1               "charge_energy",
   556    463.8 MiB      0.0 MiB           1               "discharge_energy",
   557    463.8 MiB      0.0 MiB           1               "internal_resistance",
   558    463.8 MiB      0.0 MiB           1               "temperature",
   559                                                 ]
   560    463.8 MiB      0.0 MiB           1           all_dfs = []
   561    465.0 MiB      1.2 MiB           1           cycle_indices = self.raw_data.cycle_index.unique()
   562    465.0 MiB      0.0 MiB         251           cycle_indices = sorted([c for c in cycle_indices if c in reg_cycles])
   563                                         
   564                                         
   565    465.0 MiB      0.0 MiB           1           pm(cycle_indices, "cycle_indices in interpolate_step")
   566                                         
   567    465.0 MiB      0.0 MiB         232           for cycle_index in tqdm(cycle_indices, desc=desc):
   568                                                     # Use a cycle_index mask instead of a global groupby to save memory
   569                                                     new_df = (
   570    465.3 MiB  -2537.3 MiB         231                   self.raw_data.loc[self.raw_data["cycle_index"] == cycle_index]
   571    465.3 MiB  -2608.6 MiB         231                       .groupby("step_index")
   572    459.9 MiB  -2619.5 MiB         231                       .filter(step_filter)
   573                                                     )
   574    459.9 MiB   -372.6 MiB         231               if new_df.size == 0:
   575                                                         continue
   576                                         
   577    459.9 MiB   -372.6 MiB         231               if axis in ["charge_capacity", "discharge_capacity"]:
   578                                                         axis_range = [self.raw_data[axis].min(),
   579                                                                       self.raw_data[axis].max()]
   580                                                         new_df = interpolate_df(
   581                                                             new_df,
   582                                                             axis,
   583                                                             field_range=axis_range,
   584                                                             columns=incl_columns,
   585                                                             resolution=resolution,
   586                                                         )
   587    459.9 MiB   -372.6 MiB         231               elif axis == "test_time":
   588                                                         axis_range = [new_df[axis].min(), new_df[axis].max()]
   589                                                         new_df = interpolate_df(
   590                                                             new_df,
   591                                                             axis,
   592                                                             field_range=axis_range,
   593                                                             columns=incl_columns,
   594                                                             resolution=resolution,
   595                                                         )
   596    459.9 MiB   -372.6 MiB         231               elif axis == "voltage":
   597    459.9 MiB   -372.6 MiB         231                   new_df = interpolate_df(
   598    459.9 MiB   -372.6 MiB         231                       new_df,
   599    459.9 MiB   -372.6 MiB         231                       axis,
   600    459.9 MiB   -372.6 MiB         231                       field_range=v_range,
   601    459.9 MiB   -372.6 MiB         231                       columns=incl_columns,
   602    460.0 MiB   -366.9 MiB         231                       resolution=resolution,
   603                                                         )
   604                                                     else:
   605                                                         raise ValueError(f"Axis {axis} not a valid step interpolation axis.")
   606    460.0 MiB      0.0 MiB         231               new_df["cycle_index"] = cycle_index
   607    460.0 MiB      0.0 MiB         231               new_df["step_type"] = step_type
   608    460.0 MiB      0.0 MiB         231               new_df["step_type"] = new_df["step_type"].astype("category")
   609    460.0 MiB      0.0 MiB         231               all_dfs.append(new_df)
   610                                         
   611                                                 # Ignore the index to avoid issues with overlapping voltages
   612    474.5 MiB      9.5 MiB           1           result = pd.concat(all_dfs, ignore_index=True)
   613                                         
   614                                                 # Cycle_index gets a little weird about typing, so round it here
   615    475.3 MiB      0.8 MiB           1           result.cycle_index = result.cycle_index.round()
   616                                         
   617    475.3 MiB      0.0 MiB           1           return result


SIZEOF: cycle_indices in interpolate_step (): 0.002184 MB
Interpolating charge (2.5 - 4.2)V (1000 points): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 231/231 [01:00<00:00,  3.81it/s]
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   512    463.7 MiB    463.7 MiB           1       @profile
   513                                             def interpolate_step(
   514                                                     self,
   515                                                     v_range,
   516                                                     resolution,
   517                                                     step_type="discharge",
   518                                                     reg_cycles=None,
   519                                                     axis="voltage",
   520                                                     desc=None
   521                                             ):
   522                                                 """
   523                                                 Gets interpolated cycles for the step specified, charge or discharge.
   524                                         
   525                                                 Args:
   526                                                     v_range ([Float, Float]): list of two floats that define
   527                                                         the voltage interpolation range endpoints.
   528                                                     resolution (int): resolution of interpolated data.
   529                                                     step_type (str): which step to interpolate i.e. 'charge' or 'discharge'
   530                                                     reg_cycles (list): list containing cycle indicies of regular cycles
   531                                                     axis (str): which column to use for interpolation
   532                                                     desc (str): Description to print to tqdm column.
   533                                         
   534                                                 Returns:
   535                                                     pandas.DataFrame: DataFrame corresponding to interpolated values.
   536                                                 """
   537                                         
   538    463.7 MiB      0.0 MiB           1           if not desc:
   539                                                     desc = \
   540    463.7 MiB      0.0 MiB           1                   f"Interpolating {step_type} ({v_range[0]} - {v_range[1]})V " \
   541                                                         f"({resolution} points)"
   542                                         
   543    463.7 MiB      0.0 MiB           1           if step_type == "discharge":
   544                                                     step_filter = step_is_dchg
   545    463.7 MiB      0.0 MiB           1           elif step_type == "charge":
   546    463.7 MiB      0.0 MiB           1               step_filter = step_is_chg
   547                                                 else:
   548                                                     raise ValueError("{} is not a recognized step type")
   549                                                 incl_columns = [
   550    463.7 MiB      0.0 MiB           1               "test_time",
   551    463.7 MiB      0.0 MiB           1               "voltage",
   552    463.7 MiB      0.0 MiB           1               "current",
   553    463.7 MiB      0.0 MiB           1               "charge_capacity",
   554    463.7 MiB      0.0 MiB           1               "discharge_capacity",
   555    463.7 MiB      0.0 MiB           1               "charge_energy",
   556    463.7 MiB      0.0 MiB           1               "discharge_energy",
   557    463.7 MiB      0.0 MiB           1               "internal_resistance",
   558    463.7 MiB      0.0 MiB           1               "temperature",
   559                                                 ]
   560    463.7 MiB      0.0 MiB           1           all_dfs = []
   561    468.8 MiB      5.1 MiB           1           cycle_indices = self.raw_data.cycle_index.unique()
   562    468.8 MiB      0.0 MiB         251           cycle_indices = sorted([c for c in cycle_indices if c in reg_cycles])
   563                                         
   564                                         
   565    468.8 MiB      0.0 MiB           1           pm(cycle_indices, "cycle_indices in interpolate_step")
   566                                         
   567    475.2 MiB      0.0 MiB         232           for cycle_index in tqdm(cycle_indices, desc=desc):
   568                                                     # Use a cycle_index mask instead of a global groupby to save memory
   569                                                     new_df = (
   570    475.2 MiB   -153.4 MiB         231                   self.raw_data.loc[self.raw_data["cycle_index"] == cycle_index]
   571    475.2 MiB   -218.9 MiB         231                       .groupby("step_index")
   572    475.2 MiB   -218.8 MiB         231                       .filter(step_filter)
   573                                                     )
   574    475.2 MiB   -229.4 MiB         231               if new_df.size == 0:
   575                                                         continue
   576                                         
   577    475.2 MiB   -229.4 MiB         231               if axis in ["charge_capacity", "discharge_capacity"]:
   578    475.2 MiB   -229.4 MiB         231                   axis_range = [self.raw_data[axis].min(),
   579    475.2 MiB   -229.4 MiB         231                                 self.raw_data[axis].max()]
   580    475.2 MiB   -229.4 MiB         231                   new_df = interpolate_df(
   581    475.2 MiB   -229.4 MiB         231                       new_df,
   582    475.2 MiB   -229.4 MiB         231                       axis,
   583    475.2 MiB   -229.4 MiB         231                       field_range=axis_range,
   584    475.2 MiB   -229.4 MiB         231                       columns=incl_columns,
   585    475.2 MiB   -223.8 MiB         231                       resolution=resolution,
   586                                                         )
   587                                                     elif axis == "test_time":
   588                                                         axis_range = [new_df[axis].min(), new_df[axis].max()]
   589                                                         new_df = interpolate_df(
   590                                                             new_df,
   591                                                             axis,
   592                                                             field_range=axis_range,
   593                                                             columns=incl_columns,
   594                                                             resolution=resolution,
   595                                                         )
   596                                                     elif axis == "voltage":
   597                                                         new_df = interpolate_df(
   598                                                             new_df,
   599                                                             axis,
   600                                                             field_range=v_range,
   601                                                             columns=incl_columns,
   602                                                             resolution=resolution,
   603                                                         )
   604                                                     else:
   605                                                         raise ValueError(f"Axis {axis} not a valid step interpolation axis.")
   606    475.2 MiB      0.0 MiB         231               new_df["cycle_index"] = cycle_index
   607    475.2 MiB      0.0 MiB         231               new_df["step_type"] = step_type
   608    475.2 MiB      0.0 MiB         231               new_df["step_type"] = new_df["step_type"].astype("category")
   609    475.2 MiB      0.0 MiB         231               all_dfs.append(new_df)
   610                                         
   611                                                 # Ignore the index to avoid issues with overlapping voltages
   612    494.4 MiB     19.2 MiB           1           result = pd.concat(all_dfs, ignore_index=True)
   613                                         
   614                                                 # Cycle_index gets a little weird about typing, so round it here
   615    495.2 MiB      0.9 MiB           1           result.cycle_index = result.cycle_index.round()
   616                                         
   617    495.2 MiB      0.0 MiB           1           return result


Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   619    546.6 MiB    546.6 MiB           1       @profile
   620                                             def interpolate_cycles(
   621                                                     self,
   622                                                     v_range=None,
   623                                                     resolution=1000,
   624                                                     diagnostic_available=None,
   625                                                     charge_axis='charge_capacity',
   626                                                     discharge_axis='voltage'
   627                                             ):
   628                                                 """
   629                                                 Gets interpolated cycles for both charge and discharge steps.
   630                                         
   631                                                 Args:
   632                                                     v_range ([float, float]): list of two floats that define
   633                                                         the voltage interpolation range endpoints.
   634                                                     resolution (int): resolution of interpolated data.
   635                                                     diagnostic_available (dict): dictionary containing information about
   636                                                         location of diagnostic cycles
   637                                                     charge_axis (str): column to use for interpolation for charge
   638                                                     discharge_axis (str): column to use for interpolation for discharge
   639                                         
   640                                                 Returns:
   641                                                     (pandas.DataFrame): DataFrame corresponding to interpolated values.
   642                                                 """
   643    546.6 MiB      0.0 MiB           1           if diagnostic_available:
   644    546.6 MiB      0.0 MiB           1               diag_cycles = list(
   645    546.6 MiB      0.0 MiB           1                   itertools.chain.from_iterable(
   646                                                             [
   647    546.6 MiB      0.0 MiB           7                           list(range(i, i + diagnostic_available["length"]))
   648    546.6 MiB      0.0 MiB           5                           for i in diagnostic_available["diagnostic_starts_at"]
   649    546.6 MiB      0.0 MiB           4                           if i <= self.raw_data.cycle_index.max()
   650                                                             ]
   651                                                         )
   652                                                     )
   653                                                     reg_cycles = [
   654    550.7 MiB      4.1 MiB         251                   i for i in self.raw_data.cycle_index.unique() if
   655    550.7 MiB      0.0 MiB         248                   i not in diag_cycles
   656                                                     ]
   657                                                 else:
   658                                                     reg_cycles = [i for i in self.raw_data.cycle_index.unique()]
   659                                         
   660    550.7 MiB      0.0 MiB           1           v_range = v_range or [2.8, 3.5]
   661                                         
   662                                                 # If any regular cycle contains a waveform step, interpolate on test_time.
   663                                         
   664    555.8 MiB      5.1 MiB           1           if self.raw_data[self.raw_data.cycle_index.isin(reg_cycles)]. \
   665    555.8 MiB      0.0 MiB           1                   groupby(["cycle_index", "step_index"]). \
   666    468.2 MiB    -87.6 MiB           1                   apply(step_is_waveform_dchg).any():
   667                                                     discharge_axis = 'test_time'
   668                                         
   669    469.3 MiB      1.1 MiB           1           if self.raw_data[self.raw_data.cycle_index.isin(reg_cycles)]. \
   670    469.3 MiB      0.0 MiB           1                   groupby(["cycle_index", "step_index"]). \
   671    463.8 MiB     -5.5 MiB           1                   apply(step_is_waveform_chg).any():
   672                                                     charge_axis = 'test_time'
   673                                         
   674    463.8 MiB      0.0 MiB           1           interpolated_discharge = self.interpolate_step(
   675    463.8 MiB      0.0 MiB           1               v_range,
   676    463.8 MiB      0.0 MiB           1               resolution,
   677    463.8 MiB      0.0 MiB           1               step_type="discharge",
   678    463.8 MiB      0.0 MiB           1               reg_cycles=reg_cycles,
   679    463.7 MiB     -0.0 MiB           1               axis=discharge_axis,
   680                                                 )
   681    463.7 MiB      0.0 MiB           1           interpolated_charge = self.interpolate_step(
   682    463.7 MiB      0.0 MiB           1               v_range,
   683    463.7 MiB      0.0 MiB           1               resolution,
   684    463.7 MiB      0.0 MiB           1               step_type="charge",
   685    463.7 MiB      0.0 MiB           1               reg_cycles=reg_cycles,
   686    483.4 MiB     19.7 MiB           1               axis=charge_axis,
   687                                                 )
   688    483.4 MiB      0.0 MiB           1           result = pd.concat(
   689    534.4 MiB     50.9 MiB           1               [interpolated_discharge, interpolated_charge], ignore_index=True
   690                                                 )
   691                                         
   692    589.9 MiB     55.6 MiB           1           return self._cast_dtypes(result, "cycles_interpolated")


SIZEOF: structured_data (): 20.790389 MB
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   695    546.7 MiB    546.7 MiB           1       @profile
   696                                             # equivalent of legacy get_summary
   697                                             def summarize_cycles(
   698                                                     self,
   699                                                     diagnostic_available=False,
   700                                                     nominal_capacity=1.1,
   701                                                     full_fast_charge=0.8,
   702                                                     cycle_complete_discharge_ratio=0.97,
   703                                                     cycle_complete_vmin=3.3,
   704                                                     cycle_complete_vmax=3.3,
   705                                                     error_threshold=1e6
   706                                             ):
   707                                                 """
   708                                                 Gets summary statistics for data according to cycle number. Summary data
   709                                                 must be float or int type for compatibility with other methods
   710                                         
   711                                                 Args:
   712                                                     diagnostic_available (dict): dictionary with diagnostic_types
   713                                                     nominal_capacity (float): nominal capacity for summary stats
   714                                                     full_fast_charge (float): full fast charge for summary stats
   715                                                     cycle_complete_discharge_ratio (float): expected ratio
   716                                                         discharge/charge at the end of any complete cycle
   717                                                     cycle_complete_vmin (float): expected voltage minimum achieved
   718                                                         in any complete cycle
   719                                                     cycle_complete_vmax (float): expected voltage maximum achieved
   720                                                         in any complete cycle
   721                                                     error_threshold (float): threshold to consider the summary value
   722                                                         an error (applied only to specific columns that should reset
   723                                                         each cycle)
   724                                         
   725                                                 Returns:
   726                                                     (pandas.DataFrame): summary statistics by cycle.
   727                                         
   728                                                 """
   729                                                 # Filter out only regular cycles for summary stats. Diagnostic summary computed separately
   730    546.7 MiB      0.0 MiB           1           if diagnostic_available:
   731    546.7 MiB      0.0 MiB           1               diag_cycles = list(
   732    546.7 MiB      0.0 MiB           1                   itertools.chain.from_iterable(
   733                                                             [
   734    546.7 MiB      0.0 MiB           7                           list(range(i, i + diagnostic_available["length"]))
   735    546.7 MiB      0.0 MiB           5                           for i in diagnostic_available["diagnostic_starts_at"]
   736    546.7 MiB      0.0 MiB           4                           if i <= self.raw_data.cycle_index.max()
   737                                                             ]
   738                                                         )
   739                                                     )
   740                                                     reg_cycles_at = [
   741    546.7 MiB      0.0 MiB         251                   i for i in self.raw_data.cycle_index.unique() if
   742    546.7 MiB      0.0 MiB         248                   i not in diag_cycles
   743                                                     ]
   744                                                 else:
   745                                                     reg_cycles_at = [i for i in self.raw_data.cycle_index.unique()]
   746                                         
   747    551.3 MiB      4.7 MiB           1           summary = self.raw_data.groupby("cycle_index").agg(self._aggregation)
   748                                         
   749                                                 # pd.set_option('display.max_rows', 500)
   750                                                 # pd.set_option('display.max_columns', 500)
   751                                                 # pd.set_option('display.width', 1000)
   752                                         
   753    551.3 MiB      0.0 MiB           1           summary.columns = self._summary_cols
   754                                         
   755    551.3 MiB      0.0 MiB           1           summary = summary[summary.index.isin(reg_cycles_at)]
   756                                                 summary["energy_efficiency"] = (
   757    551.3 MiB      0.0 MiB           1                   summary["discharge_energy"] / summary["charge_energy"]
   758                                                 )
   759                                                 summary.loc[
   760                                                     ~np.isfinite(summary["energy_efficiency"]), "energy_efficiency"
   761    551.3 MiB      0.0 MiB           1           ] = np.NaN
   762                                                 # This code is designed to remove erroneous energy values
   763    551.3 MiB      0.0 MiB           3           for col in ["discharge_energy", "charge_energy"]:
   764    551.3 MiB      0.0 MiB           2               summary.loc[summary[col].abs() > error_threshold, col] = np.NaN
   765    551.3 MiB      0.0 MiB           1           summary["charge_throughput"] = summary.charge_capacity.cumsum()
   766    551.3 MiB      0.0 MiB           1           summary["energy_throughput"] = summary.charge_energy.cumsum()
   767                                         
   768                                                 # This method for computing charge start and end times implicitly
   769                                                 # assumes that a cycle starts with a charge step and is then followed
   770                                                 # by discharge step.
   771                                                 charge_start_time = \
   772    551.3 MiB      0.0 MiB           1               self.raw_data.groupby("cycle_index", as_index=False)[
   773    551.3 MiB      0.0 MiB           1                   "date_time_iso"
   774    552.9 MiB      1.5 MiB           1               ].agg("first")
   775                                         
   776                                                 charge_finish_time = (
   777    552.9 MiB      0.0 MiB           1               self.raw_data[
   778    625.1 MiB     72.2 MiB           1                   self.raw_data.charge_capacity >= nominal_capacity * full_fast_charge]
   779    625.1 MiB      0.0 MiB           1               .groupby("cycle_index", as_index=False)["date_time_iso"]
   780    583.2 MiB    -41.9 MiB           1               .agg("first")
   781                                                 )
   782                                         
   783                                                 # Left merge, since some cells might not reach desired levels of
   784                                                 # charge_capacity and will have NaN for charge duration
   785    583.2 MiB      0.0 MiB           1           merged = charge_start_time.merge(
   786    583.3 MiB      0.1 MiB           1               charge_finish_time, on="cycle_index", how="left"
   787                                                 )
   788                                         
   789                                                 # Charge duration stored in seconds - note that date_time_iso is only ~1sec resolution
   790    583.3 MiB      0.0 MiB           1           time_diff = np.subtract(
   791    583.3 MiB      0.0 MiB           1               pd.to_datetime(merged.date_time_iso_y, utc=True, errors="coerce"),
   792    583.3 MiB      0.0 MiB           1               pd.to_datetime(merged.date_time_iso_x, errors="coerce"),
   793                                                 )
   794    583.3 MiB      0.0 MiB           1           summary["charge_duration"] = np.round(
   795    583.4 MiB      0.0 MiB           1               time_diff / np.timedelta64(1, "s"), 2)
   796                                         
   797                                                 # Compute time since start of cycle in minutes. This comes handy
   798                                                 # for featurizing time-temperature integral
   799    583.4 MiB      0.0 MiB           1           self.raw_data["time_since_cycle_start"] = pd.to_datetime(
   800    583.4 MiB      0.0 MiB           1               self.raw_data["date_time_iso"]
   801    583.4 MiB      0.0 MiB           1           ) - pd.to_datetime(
   802    583.4 MiB      0.0 MiB           1               self.raw_data.groupby("cycle_index")["date_time_iso"].transform(
   803    529.2 MiB    -54.2 MiB           1                   "first")
   804                                                 )
   805    529.2 MiB      0.0 MiB           1           self.raw_data["time_since_cycle_start"] = (self.raw_data[
   806    529.2 MiB      0.0 MiB           1                                                          "time_since_cycle_start"] / np.timedelta64(
   807    528.8 MiB     -0.4 MiB           1               1, "s")) / 60
   808                                         
   809                                                 # Group by cycle index and integrate time-temperature
   810                                                 # using a lambda function.
   811    528.8 MiB      0.0 MiB           1           if "temperature" in self.raw_data.columns:
   812    528.8 MiB      0.0 MiB           1               summary["time_temperature_integrated"] = self.raw_data.groupby(
   813    528.8 MiB      0.0 MiB           1                   "cycle_index").apply(
   814    761.7 MiB    136.2 MiB         497                   lambda g: integrate.trapz(g.temperature, x=g.time_since_cycle_start)
   815                                                     )
   816                                         
   817                                                 # Drop the time since cycle start column
   818    647.2 MiB   -114.6 MiB           1           self.raw_data.drop(columns=["time_since_cycle_start"])
   819                                         
   820                                                 # Determine if any of the cycles has been paused
   821    647.2 MiB      0.0 MiB           1           summary["paused"] = self.raw_data.groupby("cycle_index").apply(
   822    490.8 MiB   -156.4 MiB           1               get_max_paused_over_threshold)
   823                                         
   824    490.8 MiB      0.0 MiB           1           summary = self._cast_dtypes(summary, "summary")
   825                                         
   826    490.8 MiB      0.0 MiB           1           last_voltage = self.raw_data.loc[
   827    490.8 MiB      0.0 MiB           1               self.raw_data["cycle_index"] == self.raw_data["cycle_index"].max()
   828    490.8 MiB      0.0 MiB           1               ]["voltage"]
   829                                                 if (
   830    490.8 MiB      0.0 MiB           1                   (last_voltage.min() < cycle_complete_vmin)
   831    490.8 MiB      0.0 MiB           1                   and (last_voltage.max() > cycle_complete_vmax)
   832                                                         and (
   833                                                         (summary.iloc[[-1]])["discharge_capacity"].iloc[0]
   834                                                         > cycle_complete_discharge_ratio
   835                                                         * (summary.iloc[[-1]])["charge_capacity"].iloc[0]
   836                                                             )
   837                                                 ):
   838                                                     return summary
   839                                                 else:
   840    490.8 MiB      0.0 MiB           1               return summary.iloc[:-1]


SIZEOF: structured_summary (): 0.039124 MB
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   123    299.1 MiB    299.1 MiB           1               def wrapper(*args, **kwargs):
   124    299.1 MiB      0.0 MiB           1                   if args[0]._is_legacy:
   125                                                             raise ValueError(
   126                                                                 f"{args[0].__class__.__name__} is deserialized from a legacy file! Operation not allowed."
   127                                                             )
   128                                                         else:
   129    490.8 MiB    191.6 MiB           1                       return func(*args, **kwargs)

Source code for running:

from beep.structure.maccor import MaccorDatapath
import pandas as pd
import os
from beep.tests.constants import TEST_FILE_DIR

os.environ["BEEP_PROCESSING_DIR"] = TEST_FILE_DIR

maccor_file_w_parameters = os.path.join(
    TEST_FILE_DIR, "PreDiag_000287_000128.092"
)

md = MaccorDatapath.from_file(maccor_file_w_parameters)


print("MEM: GETTING STRUCTURING PARAMETERS")
(
    v_range,
    resolution,
    nominal_capacity,
    full_fast_charge,
    diagnostic_available,
) = md.determine_structuring_parameters()

print("MEM: Structuring with parameters")
structured_data = md.structure(v_range=v_range, resolution=resolution, nominal_capacity=nominal_capacity, full_fast_charge=full_fast_charge, diagnostic_available=diagnostic_available)

from beep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.