Comments (2)
We should also look into profiling the code for slow/memory-hogging spots, in case interpolation isn't the main culprit
from beep.
I did some memory profiling of the structuring methods. Take it with a grain of salt, according to this stackoverflow, memory_profiler can be inaccurate wrt. loops because of OS chunking
Problem areas
- interpolate_step
- interpolate_cycles
- interpolate_diagnostic_cycles
Full output of memory profiling while structuring:
Raw file size on disk is 140MB. Size of raw loaded dataframe is 179MB.
SIZEOF: raw_data (): 179.946804 MB
MEM: GETTING STRUCTURING PARAMETERS
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
123 434.6 MiB 434.6 MiB 1 def wrapper(*args, **kwargs):
124 434.6 MiB 0.0 MiB 1 if args[0]._is_legacy:
125 raise ValueError(
126 f"{args[0].__class__.__name__} is deserialized from a legacy file! Operation not allowed."
127 )
128 else:
129 299.1 MiB -135.5 MiB 1 return func(*args, **kwargs)
MEM: Structuring with parameters
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
974 299.1 MiB 299.1 MiB 1 @profile
975 def summarize_diagnostic(self, diagnostic_available):
976 """
977 Gets summary statistics for data according to location of
978 diagnostic cycles in the data
979
980 Args:
981 diagnostic_available (dict): dictionary with diagnostic_types
982 as list, 'length' of the diagnostic in cycles and location
983 of the diagnostic by cycle index
984
985 Returns:
986 (DataFrame) of summary statistics by cycle
987
988 """
989
990 299.1 MiB 0.0 MiB 1 max_cycle = self.raw_data.cycle_index.max()
991 starts_at = [
992 299.1 MiB 0.0 MiB 7 i for i in diagnostic_available["diagnostic_starts_at"] if i <= max_cycle
993 ]
994 299.1 MiB 0.0 MiB 1 diag_cycles_at = list(
995 299.1 MiB 0.0 MiB 1 itertools.chain.from_iterable(
996 299.1 MiB 0.0 MiB 7 [list(range(i, i + diagnostic_available["length"])) for i in starts_at]
997 )
998 )
999 305.4 MiB 6.3 MiB 1 diag_summary = self.raw_data.groupby("cycle_index").agg(self._diag_aggregation)
1000
1001 305.4 MiB 0.0 MiB 1 diag_summary.columns = self._diag_summary_cols
1002
1003 305.4 MiB 0.0 MiB 1 diag_summary = diag_summary[diag_summary.index.isin(diag_cycles_at)]
1004
1005 diag_summary["coulombic_efficiency"] = (
1006 305.4 MiB 0.0 MiB 1 diag_summary["discharge_capacity"] / diag_summary["charge_capacity"]
1007 )
1008 305.4 MiB 0.0 MiB 1 diag_summary["paused"] = self.raw_data.groupby("cycle_index").apply(
1009 427.6 MiB 122.2 MiB 1 get_max_paused_over_threshold
1010 )
1011
1012 427.6 MiB 0.0 MiB 1 diag_summary.reset_index(drop=True, inplace=True)
1013
1014 427.6 MiB 0.0 MiB 1 diag_summary["cycle_type"] = pd.Series(
1015 427.6 MiB 0.0 MiB 1 diagnostic_available["cycle_type"] * len(starts_at)
1016 )
1017
1018 427.6 MiB 0.0 MiB 1 diag_summary = self._cast_dtypes(diag_summary, "diagnostic_summary")
1019
1020 427.6 MiB 0.0 MiB 1 return diag_summary
SIZEOF: diagnostic_summary (): 0.003069 MB
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 168/168 [00:45<00:00, 3.73it/s]
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
842 427.6 MiB 427.6 MiB 1 @profile
843 # equivalent of get_interpolated_diagnostic_cycles
844 def interpolate_diagnostic_cycles(
845 self, diagnostic_available, resolution=1000, v_resolution=0.0005
846 ):
847 """
848 Interpolates data according to location and type of diagnostic
849 cycles in the data
850
851 Args:
852 diagnostic_available (dict): dictionary with diagnostic_types
853 as list, 'length' of the diagnostic in cycles and location
854 of the diagnostic
855 resolution (int): resolution of interpolation
856 v_resolution (float): voltage delta to set for range based interpolation
857
858 Returns:
859 (pd.DataFrame) of interpolated diagnostic steps by step and cycle
860
861 """
862 # Get the project name and the parameter file for the diagnostic
863 427.6 MiB 0.0 MiB 1 project_name_list = parameters_lookup.get_project_sequence(self.paths["raw"])
864 427.6 MiB 0.0 MiB 1 diag_path = os.path.join(MODULE_DIR, "procedure_templates")
865 427.6 MiB 0.0 MiB 1 v_range = parameters_lookup.get_diagnostic_parameters(
866 427.6 MiB 0.0 MiB 1 diagnostic_available, diag_path, project_name_list[0]
867 )
868
869 # Determine the cycles and types of the diagnostic cycles
870 427.6 MiB 0.0 MiB 1 max_cycle = self.raw_data.cycle_index.max()
871 starts_at = [
872 427.6 MiB 0.0 MiB 7 i for i in diagnostic_available["diagnostic_starts_at"] if i <= max_cycle
873 ]
874 427.6 MiB 0.0 MiB 1 diag_cycles_at = list(
875 427.6 MiB 0.0 MiB 1 itertools.chain.from_iterable(
876 427.6 MiB 0.0 MiB 7 [range(i, i + diagnostic_available["length"]) for i in starts_at]
877 )
878 )
879 # Duplicate cycle type list end to end for each starting index
880 427.6 MiB 0.0 MiB 1 diag_cycle_type = diagnostic_available["cycle_type"] * len(starts_at)
881 427.6 MiB 0.0 MiB 1 if not len(diag_cycles_at) == len(diag_cycle_type):
882 errmsg = (
883 "Diagnostic cycles, {}, and diagnostic cycle types, "
884 "{}, are unequal lengths".format(diag_cycles_at, diag_cycle_type)
885 )
886 raise ValueError(errmsg)
887
888 435.4 MiB 7.8 MiB 1 diag_data = self.raw_data[self.raw_data["cycle_index"].isin(diag_cycles_at)]
889
890 # Counter to ensure non-contiguous repeats of step_index
891 # within same cycle_index are grouped separately
892 439.2 MiB 3.7 MiB 1 diag_data.loc[:, "step_index_counter"] = 0
893
894 469.9 MiB -1064.2 MiB 21 for cycle_index in diag_cycles_at:
895 458.1 MiB -990.3 MiB 20 indices = diag_data.loc[diag_data.cycle_index == cycle_index].index
896 469.9 MiB -821.1 MiB 20 step_index_list = diag_data.step_index.loc[indices]
897 469.9 MiB -1057.6 MiB 20 diag_data.loc[indices, "step_index_counter"] = step_index_list.ne(
898 469.9 MiB -1064.0 MiB 20 step_index_list.shift()
899 ).cumsum()
900
901 406.6 MiB -63.3 MiB 1 group = diag_data.groupby(["cycle_index", "step_index", "step_index_counter"])
902 incl_columns = [
903 406.6 MiB 0.0 MiB 1 "current",
904 406.6 MiB 0.0 MiB 1 "charge_capacity",
905 406.6 MiB 0.0 MiB 1 "discharge_capacity",
906 406.6 MiB 0.0 MiB 1 "charge_energy",
907 406.6 MiB 0.0 MiB 1 "discharge_energy",
908 406.6 MiB 0.0 MiB 1 "internal_resistance",
909 406.6 MiB 0.0 MiB 1 "temperature",
910 406.6 MiB 0.0 MiB 1 "test_time",
911 ]
912
913 406.6 MiB 0.0 MiB 1 diag_dict = {}
914 424.0 MiB -67.3 MiB 18 for cycle in diag_data.cycle_index.unique():
915 424.0 MiB -52.6 MiB 17 diag_dict.update({cycle: None})
916 424.0 MiB -47.7 MiB 17 steps = diag_data[diag_data.cycle_index == cycle].step_index.unique()
917 424.0 MiB -66.2 MiB 17 diag_dict[cycle] = list(steps)
918
919 410.4 MiB -13.6 MiB 1 all_dfs = []
920 551.9 MiB 75.3 MiB 169 for (cycle_index, step_index, step_index_counter), df in tqdm(group):
921 551.9 MiB -59.7 MiB 168 if len(df) < 2:
922 continue
923 551.9 MiB -59.7 MiB 168 if diag_cycle_type[diag_cycles_at.index(cycle_index)] == "hppc":
924 551.9 MiB -52.3 MiB 142 v_hppc_step = [df.voltage.min(), df.voltage.max()]
925 551.9 MiB -50.6 MiB 142 hppc_resolution = int(
926 551.9 MiB -50.6 MiB 142 (df.voltage.max() - df.voltage.min()) / v_resolution
927 )
928 551.9 MiB -50.6 MiB 142 new_df = interpolate_df(
929 551.9 MiB -50.6 MiB 142 df,
930 551.9 MiB -50.6 MiB 142 field_name="voltage",
931 551.9 MiB -50.6 MiB 142 field_range=v_hppc_step,
932 551.9 MiB -50.6 MiB 142 columns=incl_columns,
933 551.9 MiB -47.5 MiB 142 resolution=hppc_resolution,
934 )
935 else:
936 551.7 MiB -7.4 MiB 26 new_df = interpolate_df(
937 551.7 MiB -0.4 MiB 26 df,
938 551.7 MiB -0.4 MiB 26 field_name="voltage",
939 551.7 MiB -0.4 MiB 26 field_range=v_range,
940 551.7 MiB -0.4 MiB 26 columns=incl_columns,
941 551.8 MiB 2.2 MiB 26 resolution=resolution,
942 )
943
944 551.9 MiB -49.6 MiB 168 new_df["cycle_index"] = cycle_index
945 551.9 MiB -59.4 MiB 168 new_df["cycle_type"] = diag_cycle_type[diag_cycles_at.index(cycle_index)]
946 551.9 MiB -59.4 MiB 168 new_df["step_index"] = step_index
947 551.9 MiB -59.4 MiB 168 new_df["step_index_counter"] = step_index_counter
948 551.9 MiB -59.4 MiB 168 new_df["step_type"] = diag_dict[cycle_index].index(step_index)
949 551.9 MiB -59.4 MiB 168 new_df.astype(
950 {
951 551.9 MiB -59.4 MiB 168 "cycle_index": "int32",
952 551.9 MiB -59.4 MiB 168 "cycle_type": "category",
953 551.9 MiB -59.4 MiB 168 "step_index": "uint8",
954 551.9 MiB -59.4 MiB 168 "step_index_counter": "int16",
955 551.9 MiB -58.8 MiB 168 "step_type": "uint8",
956 }
957 )
958 new_df["discharge_dQdV"] = (
959 551.9 MiB -59.5 MiB 168 new_df.discharge_capacity.diff() / new_df.voltage.diff()
960 )
961 new_df["charge_dQdV"] = (
962 551.9 MiB -59.5 MiB 168 new_df.charge_capacity.diff() / new_df.voltage.diff()
963 )
964 551.9 MiB -59.5 MiB 168 all_dfs.append(new_df)
965
966 # Ignore the index to avoid issues with overlapping voltages
967 557.7 MiB 5.8 MiB 1 result = pd.concat(all_dfs, ignore_index=True)
968 # Cycle_index gets a little weird about typing, so round it here
969 558.1 MiB 0.3 MiB 1 result.cycle_index = result.cycle_index.round()
970 555.1 MiB -3.0 MiB 1 result = self._cast_dtypes(result, "diagnostic_interpolated")
971
972 555.1 MiB 0.0 MiB 1 return result
SIZEOF: diagnostic_summary (): 3.289344 MB
SIZEOF: cycle_indices in interpolate_step (): 0.002184 MB
Interpolating discharge (2.5 - 4.2)V (1000 points): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 231/231 [01:15<00:00, 3.08it/s]
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
512 463.8 MiB 463.8 MiB 1 @profile
513 def interpolate_step(
514 self,
515 v_range,
516 resolution,
517 step_type="discharge",
518 reg_cycles=None,
519 axis="voltage",
520 desc=None
521 ):
522 """
523 Gets interpolated cycles for the step specified, charge or discharge.
524
525 Args:
526 v_range ([Float, Float]): list of two floats that define
527 the voltage interpolation range endpoints.
528 resolution (int): resolution of interpolated data.
529 step_type (str): which step to interpolate i.e. 'charge' or 'discharge'
530 reg_cycles (list): list containing cycle indicies of regular cycles
531 axis (str): which column to use for interpolation
532 desc (str): Description to print to tqdm column.
533
534 Returns:
535 pandas.DataFrame: DataFrame corresponding to interpolated values.
536 """
537
538 463.8 MiB 0.0 MiB 1 if not desc:
539 desc = \
540 463.8 MiB 0.0 MiB 1 f"Interpolating {step_type} ({v_range[0]} - {v_range[1]})V " \
541 f"({resolution} points)"
542
543 463.8 MiB 0.0 MiB 1 if step_type == "discharge":
544 463.8 MiB 0.0 MiB 1 step_filter = step_is_dchg
545 elif step_type == "charge":
546 step_filter = step_is_chg
547 else:
548 raise ValueError("{} is not a recognized step type")
549 incl_columns = [
550 463.8 MiB 0.0 MiB 1 "test_time",
551 463.8 MiB 0.0 MiB 1 "voltage",
552 463.8 MiB 0.0 MiB 1 "current",
553 463.8 MiB 0.0 MiB 1 "charge_capacity",
554 463.8 MiB 0.0 MiB 1 "discharge_capacity",
555 463.8 MiB 0.0 MiB 1 "charge_energy",
556 463.8 MiB 0.0 MiB 1 "discharge_energy",
557 463.8 MiB 0.0 MiB 1 "internal_resistance",
558 463.8 MiB 0.0 MiB 1 "temperature",
559 ]
560 463.8 MiB 0.0 MiB 1 all_dfs = []
561 465.0 MiB 1.2 MiB 1 cycle_indices = self.raw_data.cycle_index.unique()
562 465.0 MiB 0.0 MiB 251 cycle_indices = sorted([c for c in cycle_indices if c in reg_cycles])
563
564
565 465.0 MiB 0.0 MiB 1 pm(cycle_indices, "cycle_indices in interpolate_step")
566
567 465.0 MiB 0.0 MiB 232 for cycle_index in tqdm(cycle_indices, desc=desc):
568 # Use a cycle_index mask instead of a global groupby to save memory
569 new_df = (
570 465.3 MiB -2537.3 MiB 231 self.raw_data.loc[self.raw_data["cycle_index"] == cycle_index]
571 465.3 MiB -2608.6 MiB 231 .groupby("step_index")
572 459.9 MiB -2619.5 MiB 231 .filter(step_filter)
573 )
574 459.9 MiB -372.6 MiB 231 if new_df.size == 0:
575 continue
576
577 459.9 MiB -372.6 MiB 231 if axis in ["charge_capacity", "discharge_capacity"]:
578 axis_range = [self.raw_data[axis].min(),
579 self.raw_data[axis].max()]
580 new_df = interpolate_df(
581 new_df,
582 axis,
583 field_range=axis_range,
584 columns=incl_columns,
585 resolution=resolution,
586 )
587 459.9 MiB -372.6 MiB 231 elif axis == "test_time":
588 axis_range = [new_df[axis].min(), new_df[axis].max()]
589 new_df = interpolate_df(
590 new_df,
591 axis,
592 field_range=axis_range,
593 columns=incl_columns,
594 resolution=resolution,
595 )
596 459.9 MiB -372.6 MiB 231 elif axis == "voltage":
597 459.9 MiB -372.6 MiB 231 new_df = interpolate_df(
598 459.9 MiB -372.6 MiB 231 new_df,
599 459.9 MiB -372.6 MiB 231 axis,
600 459.9 MiB -372.6 MiB 231 field_range=v_range,
601 459.9 MiB -372.6 MiB 231 columns=incl_columns,
602 460.0 MiB -366.9 MiB 231 resolution=resolution,
603 )
604 else:
605 raise ValueError(f"Axis {axis} not a valid step interpolation axis.")
606 460.0 MiB 0.0 MiB 231 new_df["cycle_index"] = cycle_index
607 460.0 MiB 0.0 MiB 231 new_df["step_type"] = step_type
608 460.0 MiB 0.0 MiB 231 new_df["step_type"] = new_df["step_type"].astype("category")
609 460.0 MiB 0.0 MiB 231 all_dfs.append(new_df)
610
611 # Ignore the index to avoid issues with overlapping voltages
612 474.5 MiB 9.5 MiB 1 result = pd.concat(all_dfs, ignore_index=True)
613
614 # Cycle_index gets a little weird about typing, so round it here
615 475.3 MiB 0.8 MiB 1 result.cycle_index = result.cycle_index.round()
616
617 475.3 MiB 0.0 MiB 1 return result
SIZEOF: cycle_indices in interpolate_step (): 0.002184 MB
Interpolating charge (2.5 - 4.2)V (1000 points): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 231/231 [01:00<00:00, 3.81it/s]
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
512 463.7 MiB 463.7 MiB 1 @profile
513 def interpolate_step(
514 self,
515 v_range,
516 resolution,
517 step_type="discharge",
518 reg_cycles=None,
519 axis="voltage",
520 desc=None
521 ):
522 """
523 Gets interpolated cycles for the step specified, charge or discharge.
524
525 Args:
526 v_range ([Float, Float]): list of two floats that define
527 the voltage interpolation range endpoints.
528 resolution (int): resolution of interpolated data.
529 step_type (str): which step to interpolate i.e. 'charge' or 'discharge'
530 reg_cycles (list): list containing cycle indicies of regular cycles
531 axis (str): which column to use for interpolation
532 desc (str): Description to print to tqdm column.
533
534 Returns:
535 pandas.DataFrame: DataFrame corresponding to interpolated values.
536 """
537
538 463.7 MiB 0.0 MiB 1 if not desc:
539 desc = \
540 463.7 MiB 0.0 MiB 1 f"Interpolating {step_type} ({v_range[0]} - {v_range[1]})V " \
541 f"({resolution} points)"
542
543 463.7 MiB 0.0 MiB 1 if step_type == "discharge":
544 step_filter = step_is_dchg
545 463.7 MiB 0.0 MiB 1 elif step_type == "charge":
546 463.7 MiB 0.0 MiB 1 step_filter = step_is_chg
547 else:
548 raise ValueError("{} is not a recognized step type")
549 incl_columns = [
550 463.7 MiB 0.0 MiB 1 "test_time",
551 463.7 MiB 0.0 MiB 1 "voltage",
552 463.7 MiB 0.0 MiB 1 "current",
553 463.7 MiB 0.0 MiB 1 "charge_capacity",
554 463.7 MiB 0.0 MiB 1 "discharge_capacity",
555 463.7 MiB 0.0 MiB 1 "charge_energy",
556 463.7 MiB 0.0 MiB 1 "discharge_energy",
557 463.7 MiB 0.0 MiB 1 "internal_resistance",
558 463.7 MiB 0.0 MiB 1 "temperature",
559 ]
560 463.7 MiB 0.0 MiB 1 all_dfs = []
561 468.8 MiB 5.1 MiB 1 cycle_indices = self.raw_data.cycle_index.unique()
562 468.8 MiB 0.0 MiB 251 cycle_indices = sorted([c for c in cycle_indices if c in reg_cycles])
563
564
565 468.8 MiB 0.0 MiB 1 pm(cycle_indices, "cycle_indices in interpolate_step")
566
567 475.2 MiB 0.0 MiB 232 for cycle_index in tqdm(cycle_indices, desc=desc):
568 # Use a cycle_index mask instead of a global groupby to save memory
569 new_df = (
570 475.2 MiB -153.4 MiB 231 self.raw_data.loc[self.raw_data["cycle_index"] == cycle_index]
571 475.2 MiB -218.9 MiB 231 .groupby("step_index")
572 475.2 MiB -218.8 MiB 231 .filter(step_filter)
573 )
574 475.2 MiB -229.4 MiB 231 if new_df.size == 0:
575 continue
576
577 475.2 MiB -229.4 MiB 231 if axis in ["charge_capacity", "discharge_capacity"]:
578 475.2 MiB -229.4 MiB 231 axis_range = [self.raw_data[axis].min(),
579 475.2 MiB -229.4 MiB 231 self.raw_data[axis].max()]
580 475.2 MiB -229.4 MiB 231 new_df = interpolate_df(
581 475.2 MiB -229.4 MiB 231 new_df,
582 475.2 MiB -229.4 MiB 231 axis,
583 475.2 MiB -229.4 MiB 231 field_range=axis_range,
584 475.2 MiB -229.4 MiB 231 columns=incl_columns,
585 475.2 MiB -223.8 MiB 231 resolution=resolution,
586 )
587 elif axis == "test_time":
588 axis_range = [new_df[axis].min(), new_df[axis].max()]
589 new_df = interpolate_df(
590 new_df,
591 axis,
592 field_range=axis_range,
593 columns=incl_columns,
594 resolution=resolution,
595 )
596 elif axis == "voltage":
597 new_df = interpolate_df(
598 new_df,
599 axis,
600 field_range=v_range,
601 columns=incl_columns,
602 resolution=resolution,
603 )
604 else:
605 raise ValueError(f"Axis {axis} not a valid step interpolation axis.")
606 475.2 MiB 0.0 MiB 231 new_df["cycle_index"] = cycle_index
607 475.2 MiB 0.0 MiB 231 new_df["step_type"] = step_type
608 475.2 MiB 0.0 MiB 231 new_df["step_type"] = new_df["step_type"].astype("category")
609 475.2 MiB 0.0 MiB 231 all_dfs.append(new_df)
610
611 # Ignore the index to avoid issues with overlapping voltages
612 494.4 MiB 19.2 MiB 1 result = pd.concat(all_dfs, ignore_index=True)
613
614 # Cycle_index gets a little weird about typing, so round it here
615 495.2 MiB 0.9 MiB 1 result.cycle_index = result.cycle_index.round()
616
617 495.2 MiB 0.0 MiB 1 return result
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
619 546.6 MiB 546.6 MiB 1 @profile
620 def interpolate_cycles(
621 self,
622 v_range=None,
623 resolution=1000,
624 diagnostic_available=None,
625 charge_axis='charge_capacity',
626 discharge_axis='voltage'
627 ):
628 """
629 Gets interpolated cycles for both charge and discharge steps.
630
631 Args:
632 v_range ([float, float]): list of two floats that define
633 the voltage interpolation range endpoints.
634 resolution (int): resolution of interpolated data.
635 diagnostic_available (dict): dictionary containing information about
636 location of diagnostic cycles
637 charge_axis (str): column to use for interpolation for charge
638 discharge_axis (str): column to use for interpolation for discharge
639
640 Returns:
641 (pandas.DataFrame): DataFrame corresponding to interpolated values.
642 """
643 546.6 MiB 0.0 MiB 1 if diagnostic_available:
644 546.6 MiB 0.0 MiB 1 diag_cycles = list(
645 546.6 MiB 0.0 MiB 1 itertools.chain.from_iterable(
646 [
647 546.6 MiB 0.0 MiB 7 list(range(i, i + diagnostic_available["length"]))
648 546.6 MiB 0.0 MiB 5 for i in diagnostic_available["diagnostic_starts_at"]
649 546.6 MiB 0.0 MiB 4 if i <= self.raw_data.cycle_index.max()
650 ]
651 )
652 )
653 reg_cycles = [
654 550.7 MiB 4.1 MiB 251 i for i in self.raw_data.cycle_index.unique() if
655 550.7 MiB 0.0 MiB 248 i not in diag_cycles
656 ]
657 else:
658 reg_cycles = [i for i in self.raw_data.cycle_index.unique()]
659
660 550.7 MiB 0.0 MiB 1 v_range = v_range or [2.8, 3.5]
661
662 # If any regular cycle contains a waveform step, interpolate on test_time.
663
664 555.8 MiB 5.1 MiB 1 if self.raw_data[self.raw_data.cycle_index.isin(reg_cycles)]. \
665 555.8 MiB 0.0 MiB 1 groupby(["cycle_index", "step_index"]). \
666 468.2 MiB -87.6 MiB 1 apply(step_is_waveform_dchg).any():
667 discharge_axis = 'test_time'
668
669 469.3 MiB 1.1 MiB 1 if self.raw_data[self.raw_data.cycle_index.isin(reg_cycles)]. \
670 469.3 MiB 0.0 MiB 1 groupby(["cycle_index", "step_index"]). \
671 463.8 MiB -5.5 MiB 1 apply(step_is_waveform_chg).any():
672 charge_axis = 'test_time'
673
674 463.8 MiB 0.0 MiB 1 interpolated_discharge = self.interpolate_step(
675 463.8 MiB 0.0 MiB 1 v_range,
676 463.8 MiB 0.0 MiB 1 resolution,
677 463.8 MiB 0.0 MiB 1 step_type="discharge",
678 463.8 MiB 0.0 MiB 1 reg_cycles=reg_cycles,
679 463.7 MiB -0.0 MiB 1 axis=discharge_axis,
680 )
681 463.7 MiB 0.0 MiB 1 interpolated_charge = self.interpolate_step(
682 463.7 MiB 0.0 MiB 1 v_range,
683 463.7 MiB 0.0 MiB 1 resolution,
684 463.7 MiB 0.0 MiB 1 step_type="charge",
685 463.7 MiB 0.0 MiB 1 reg_cycles=reg_cycles,
686 483.4 MiB 19.7 MiB 1 axis=charge_axis,
687 )
688 483.4 MiB 0.0 MiB 1 result = pd.concat(
689 534.4 MiB 50.9 MiB 1 [interpolated_discharge, interpolated_charge], ignore_index=True
690 )
691
692 589.9 MiB 55.6 MiB 1 return self._cast_dtypes(result, "cycles_interpolated")
SIZEOF: structured_data (): 20.790389 MB
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
695 546.7 MiB 546.7 MiB 1 @profile
696 # equivalent of legacy get_summary
697 def summarize_cycles(
698 self,
699 diagnostic_available=False,
700 nominal_capacity=1.1,
701 full_fast_charge=0.8,
702 cycle_complete_discharge_ratio=0.97,
703 cycle_complete_vmin=3.3,
704 cycle_complete_vmax=3.3,
705 error_threshold=1e6
706 ):
707 """
708 Gets summary statistics for data according to cycle number. Summary data
709 must be float or int type for compatibility with other methods
710
711 Args:
712 diagnostic_available (dict): dictionary with diagnostic_types
713 nominal_capacity (float): nominal capacity for summary stats
714 full_fast_charge (float): full fast charge for summary stats
715 cycle_complete_discharge_ratio (float): expected ratio
716 discharge/charge at the end of any complete cycle
717 cycle_complete_vmin (float): expected voltage minimum achieved
718 in any complete cycle
719 cycle_complete_vmax (float): expected voltage maximum achieved
720 in any complete cycle
721 error_threshold (float): threshold to consider the summary value
722 an error (applied only to specific columns that should reset
723 each cycle)
724
725 Returns:
726 (pandas.DataFrame): summary statistics by cycle.
727
728 """
729 # Filter out only regular cycles for summary stats. Diagnostic summary computed separately
730 546.7 MiB 0.0 MiB 1 if diagnostic_available:
731 546.7 MiB 0.0 MiB 1 diag_cycles = list(
732 546.7 MiB 0.0 MiB 1 itertools.chain.from_iterable(
733 [
734 546.7 MiB 0.0 MiB 7 list(range(i, i + diagnostic_available["length"]))
735 546.7 MiB 0.0 MiB 5 for i in diagnostic_available["diagnostic_starts_at"]
736 546.7 MiB 0.0 MiB 4 if i <= self.raw_data.cycle_index.max()
737 ]
738 )
739 )
740 reg_cycles_at = [
741 546.7 MiB 0.0 MiB 251 i for i in self.raw_data.cycle_index.unique() if
742 546.7 MiB 0.0 MiB 248 i not in diag_cycles
743 ]
744 else:
745 reg_cycles_at = [i for i in self.raw_data.cycle_index.unique()]
746
747 551.3 MiB 4.7 MiB 1 summary = self.raw_data.groupby("cycle_index").agg(self._aggregation)
748
749 # pd.set_option('display.max_rows', 500)
750 # pd.set_option('display.max_columns', 500)
751 # pd.set_option('display.width', 1000)
752
753 551.3 MiB 0.0 MiB 1 summary.columns = self._summary_cols
754
755 551.3 MiB 0.0 MiB 1 summary = summary[summary.index.isin(reg_cycles_at)]
756 summary["energy_efficiency"] = (
757 551.3 MiB 0.0 MiB 1 summary["discharge_energy"] / summary["charge_energy"]
758 )
759 summary.loc[
760 ~np.isfinite(summary["energy_efficiency"]), "energy_efficiency"
761 551.3 MiB 0.0 MiB 1 ] = np.NaN
762 # This code is designed to remove erroneous energy values
763 551.3 MiB 0.0 MiB 3 for col in ["discharge_energy", "charge_energy"]:
764 551.3 MiB 0.0 MiB 2 summary.loc[summary[col].abs() > error_threshold, col] = np.NaN
765 551.3 MiB 0.0 MiB 1 summary["charge_throughput"] = summary.charge_capacity.cumsum()
766 551.3 MiB 0.0 MiB 1 summary["energy_throughput"] = summary.charge_energy.cumsum()
767
768 # This method for computing charge start and end times implicitly
769 # assumes that a cycle starts with a charge step and is then followed
770 # by discharge step.
771 charge_start_time = \
772 551.3 MiB 0.0 MiB 1 self.raw_data.groupby("cycle_index", as_index=False)[
773 551.3 MiB 0.0 MiB 1 "date_time_iso"
774 552.9 MiB 1.5 MiB 1 ].agg("first")
775
776 charge_finish_time = (
777 552.9 MiB 0.0 MiB 1 self.raw_data[
778 625.1 MiB 72.2 MiB 1 self.raw_data.charge_capacity >= nominal_capacity * full_fast_charge]
779 625.1 MiB 0.0 MiB 1 .groupby("cycle_index", as_index=False)["date_time_iso"]
780 583.2 MiB -41.9 MiB 1 .agg("first")
781 )
782
783 # Left merge, since some cells might not reach desired levels of
784 # charge_capacity and will have NaN for charge duration
785 583.2 MiB 0.0 MiB 1 merged = charge_start_time.merge(
786 583.3 MiB 0.1 MiB 1 charge_finish_time, on="cycle_index", how="left"
787 )
788
789 # Charge duration stored in seconds - note that date_time_iso is only ~1sec resolution
790 583.3 MiB 0.0 MiB 1 time_diff = np.subtract(
791 583.3 MiB 0.0 MiB 1 pd.to_datetime(merged.date_time_iso_y, utc=True, errors="coerce"),
792 583.3 MiB 0.0 MiB 1 pd.to_datetime(merged.date_time_iso_x, errors="coerce"),
793 )
794 583.3 MiB 0.0 MiB 1 summary["charge_duration"] = np.round(
795 583.4 MiB 0.0 MiB 1 time_diff / np.timedelta64(1, "s"), 2)
796
797 # Compute time since start of cycle in minutes. This comes handy
798 # for featurizing time-temperature integral
799 583.4 MiB 0.0 MiB 1 self.raw_data["time_since_cycle_start"] = pd.to_datetime(
800 583.4 MiB 0.0 MiB 1 self.raw_data["date_time_iso"]
801 583.4 MiB 0.0 MiB 1 ) - pd.to_datetime(
802 583.4 MiB 0.0 MiB 1 self.raw_data.groupby("cycle_index")["date_time_iso"].transform(
803 529.2 MiB -54.2 MiB 1 "first")
804 )
805 529.2 MiB 0.0 MiB 1 self.raw_data["time_since_cycle_start"] = (self.raw_data[
806 529.2 MiB 0.0 MiB 1 "time_since_cycle_start"] / np.timedelta64(
807 528.8 MiB -0.4 MiB 1 1, "s")) / 60
808
809 # Group by cycle index and integrate time-temperature
810 # using a lambda function.
811 528.8 MiB 0.0 MiB 1 if "temperature" in self.raw_data.columns:
812 528.8 MiB 0.0 MiB 1 summary["time_temperature_integrated"] = self.raw_data.groupby(
813 528.8 MiB 0.0 MiB 1 "cycle_index").apply(
814 761.7 MiB 136.2 MiB 497 lambda g: integrate.trapz(g.temperature, x=g.time_since_cycle_start)
815 )
816
817 # Drop the time since cycle start column
818 647.2 MiB -114.6 MiB 1 self.raw_data.drop(columns=["time_since_cycle_start"])
819
820 # Determine if any of the cycles has been paused
821 647.2 MiB 0.0 MiB 1 summary["paused"] = self.raw_data.groupby("cycle_index").apply(
822 490.8 MiB -156.4 MiB 1 get_max_paused_over_threshold)
823
824 490.8 MiB 0.0 MiB 1 summary = self._cast_dtypes(summary, "summary")
825
826 490.8 MiB 0.0 MiB 1 last_voltage = self.raw_data.loc[
827 490.8 MiB 0.0 MiB 1 self.raw_data["cycle_index"] == self.raw_data["cycle_index"].max()
828 490.8 MiB 0.0 MiB 1 ]["voltage"]
829 if (
830 490.8 MiB 0.0 MiB 1 (last_voltage.min() < cycle_complete_vmin)
831 490.8 MiB 0.0 MiB 1 and (last_voltage.max() > cycle_complete_vmax)
832 and (
833 (summary.iloc[[-1]])["discharge_capacity"].iloc[0]
834 > cycle_complete_discharge_ratio
835 * (summary.iloc[[-1]])["charge_capacity"].iloc[0]
836 )
837 ):
838 return summary
839 else:
840 490.8 MiB 0.0 MiB 1 return summary.iloc[:-1]
SIZEOF: structured_summary (): 0.039124 MB
Filename: /Users/ardunn/alex/tri/code/beep/beep/structure/base.py
Line # Mem usage Increment Occurences Line Contents
============================================================
123 299.1 MiB 299.1 MiB 1 def wrapper(*args, **kwargs):
124 299.1 MiB 0.0 MiB 1 if args[0]._is_legacy:
125 raise ValueError(
126 f"{args[0].__class__.__name__} is deserialized from a legacy file! Operation not allowed."
127 )
128 else:
129 490.8 MiB 191.6 MiB 1 return func(*args, **kwargs)
Source code for running:
from beep.structure.maccor import MaccorDatapath
import pandas as pd
import os
from beep.tests.constants import TEST_FILE_DIR
os.environ["BEEP_PROCESSING_DIR"] = TEST_FILE_DIR
maccor_file_w_parameters = os.path.join(
TEST_FILE_DIR, "PreDiag_000287_000128.092"
)
md = MaccorDatapath.from_file(maccor_file_w_parameters)
print("MEM: GETTING STRUCTURING PARAMETERS")
(
v_range,
resolution,
nominal_capacity,
full_fast_charge,
diagnostic_available,
) = md.determine_structuring_parameters()
print("MEM: Structuring with parameters")
structured_data = md.structure(v_range=v_range, resolution=resolution, nominal_capacity=nominal_capacity, full_fast_charge=full_fast_charge, diagnostic_available=diagnostic_available)
from beep.
Related Issues (20)
- [Feature Request] Documentation on How to Contribute to BEEP Codebase
- Calculate Diagnostic Features at Every Diagnostic Cycle HOT 2
- [Feature Request] Integrate with amplabs.ai API Read/GET endpoints
- How to load Biologic raw data file in beep ? [Question] HOT 2
- [Feature Request] Novonix Datapath HOT 2
- Add Novonix data requirements to the documentation
- Featurize the Structured Data @ardunn HOT 2
- Create ability for datapaths to produce plots
- Noxonix protocol generation HOT 1
- [Feature Request] Usability of working with data
- Incorporate unifying scheme for featurization
- [Feature Request] Decouple the modelling from the data
- Include `step_type_name` in structured data
- [Bug] Allow for structured summary to account gracefully for steps without cv segment HOT 1
- New Arbin Log HOT 2
- Struggling with featurization HOT 2
- [Question]
- [Bug] pip install crashes HOT 1
- `indeterminate_step_charge_default` needs test
- multiple steps with same step-index are collated together within a single cycle
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from beep.