openeemeter / eemeter Goto Github PK

An open source python package for implementing and developing standard methods for calculating normalized metered energy consumption and avoided energy use.

Home Page: http://eemeter.openee.io/

License: Apache License 2.0

Python 99.81% Dockerfile 0.06% Shell 0.13%

energy efficiency energy-data energy-efficiency building-energy

eemeter's People

Contributors

Stargazers

Watchers

eemeter's Issues

Flake 8 failures

The command pytest --flake8 currently fails because of a bunch of bare except clauses (E722). We should either ignore this type of error where it's not as relevant or fix them in cases where the error flags a real code smell.

Constant (average) counterfactual with eemeter daily matrix on certain datasets only.

Hi,

I'm using eemeter for a research project comparing reliability of metered savings from hourly and daily gas consumption data. I've been able to generate varying hourly counterfactuals for a set of publicly available data; however, I'm having trouble generating a varying counterfactual for daily consumption with some datasets. Instead of giving me a counterfactual that varies with temperature, I'm getting the average of the baseline meter period.

Have you ever come across an issue like this? Is this just a data issue, or is there an issue with the model?

Output from Dataset 1 (LCL-June2015v2_126)

Summary statistics from baseline period included for reference.

            value
count  385.000000
mean     1.341932
std      0.924915
min      0.000000
25%      0.708000
50%      1.299000
75%      1.908000
max      7.219000


                          reporting_observed  counterfactual_usage
2013-01-21 00:00:00+00:00               1.560              1.642287
2013-01-22 00:00:00+00:00               3.207              1.628237
2013-01-23 00:00:00+00:00               1.796              1.598183
2013-01-24 00:00:00+00:00               2.400              1.610889
2013-01-25 00:00:00+00:00               1.746              1.610034
2013-01-26 00:00:00+00:00               2.336              1.497270
2013-01-27 00:00:00+00:00               2.314              1.408208
2013-01-28 00:00:00+00:00               1.914              1.459275
2013-01-29 00:00:00+00:00               1.635              1.304730
2013-01-30 00:00:00+00:00               0.000              1.352743

Output from Dataset 2 (LCL-June2015v2_0)

Summary statistics from baseline period included for reference.

            value
count  385.000000
mean     5.912592
std      2.664848
min      0.000000
25%      5.144000
50%      6.102000
75%      6.922000
max     23.399000

                        reporting_observed  counterfactual_usage
2013-01-21 00:00:00+00:00               6.083              5.912592
2013-01-22 00:00:00+00:00               5.715              5.912592
2013-01-23 00:00:00+00:00               6.080              5.912592
2013-01-24 00:00:00+00:00               6.491              5.912592
2013-01-25 00:00:00+00:00               4.954              5.912592
2013-01-26 00:00:00+00:00               8.271              5.912592
2013-01-27 00:00:00+00:00               6.022              5.912592
2013-01-28 00:00:00+00:00               5.305              5.912592
2013-01-29 00:00:00+00:00               4.802              5.912592
2013-01-30 00:00:00+00:00               0.000              5.912592

Mock NOAA FTP responses.

No more Internet needed for the test suite! Mocked NOAA FTP Responses (and set up a utility to update the fixture data). Details on this in #127, ht @hangtwenty

pip install eemeter failing to install. Is statsmodels the culprit or just a symptom?

An issue that Devan from Kilowatt Analytics ran into trying to pip install eemeter

I’m guessing this issue is due to the eemeter install not completing without errors. Having just repeated the process I realize that ‘pip install eemeter’ is failing due to an issue with lxml. I had to manually apt-get install lxml as I was previously warned that it was not present when trying to install eemeter. Maybe there’s a lxml version dependency that is causing an issue. Regardless, I cannot seem to install eemeter following the instructions on either github or RTD

Problem executing tutorial hourly example

Hi,

trying to reproduce the hourly example found here, I get the following error:

Traceback (most recent call last):
  File "test_hourly.py", line 64, in <module>
    metered_savings_dataframe, error_bands = eemeter.metered_savings(
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/derivatives.py", line 226, in metered_savings
    model_prediction = baseline_model.predict(
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/caltrack/hourly.py", line 191, in predict
    return self.model.predict(prediction_index, temperature_data, **kwargs)
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/segmentation.py", line 221, in predict
    prediction = segment_model.predict(segmented_data) * segmented_data.weight
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/segmentation.py", line 97, in predict
    prediction = design_matrix_granular.dot(parameters).rename(
TypeError: rename() got an unexpected keyword argument 'columns'

The problem seems to be that the rename command is called on a Pandas series instead of a Pandas Dataframe with the keyword argument "columns"

Report installed package versions

pandas==1.1.3 
eemeter==2.10.0

Minimal example

import eemeter

meter_data, temperature_data, sample_metadata = (
    eemeter.load_sample("il-electricity-cdd-hdd-hourly")
)

# the dates if an analysis "blackout" period during which a project was performed.
blackout_start_date = sample_metadata["blackout_start_date"]
blackout_end_date = sample_metadata["blackout_end_date"]

# get meter data suitable for fitting a baseline model
baseline_meter_data, warnings = eemeter.get_baseline_data(
    meter_data, end=blackout_start_date, max_days=365
)

# create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data, temperature_data,
    )
)

# build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'three_month_weighted'
)

# assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# assign temperatures to bins
temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        temperature_bins,
    )
)

# build a CalTRACK hourly model
baseline_model = eemeter.fit_caltrack_hourly_model(
    segmented_design_matrices,
    occupancy_lookup,
    temperature_bins,
)

# get a year of reporting period data
reporting_meter_data, warnings = eemeter.get_reporting_data(
    meter_data, start=blackout_end_date, max_days=365
)

# compute metered savings for the year of the reporting period we've selected
metered_savings_dataframe, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data,
    temperature_data, with_disaggregated=True
)

Thank you!

"OutOfBoundsDatetime: Out of bounds nanosecond timestamp" error.

An error is produced when using eemeter with the most recent version of pandas.

>>> meter_data_daily, temperature_data_daily, metadata_daily = eemeter.load_sample('il-electricity-cdd-hdd-daily')
>>> meter_data_billing, temperature_data_billing, metadata_billing = eemeter.load_sample('il-electricity-cdd-hdd-billing_monthly')
>>> baseline_end_date = metadata_billing['blackout_start_date']
>>> baseline_meter_data_daily, baseline_warnings_daily = eemeter.get_baseline_data(meter_data_daily, end=baseline_end_date, max_days=365)

After calling eemeter.get_baseline_data(), the error is: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43.

Package versions

Python==3.9.1
eemeter==3.1.0
pandas==1.3.2

Reverting pandas version back to 1.2.1 has fixed the issue.

Daily Model Overpredictions - eemeter version 4.0

In this specific zone, the heating system has reached its maximum capacity, resulting in a plateau in energy usage. However, the regression used for prediction has not been split appropriately. It’s important to note that this issue is not about the heating balance point; rather, it pertains to properly handling the regression when HVAC is operating at its maximum capacity.

Docs still link to etl/datastore

Hi.

Just exploring the eemeter project and noticed that these links are incorrect or out of date:

https://eemeter.readthedocs.io/en/latest/etl_installation.html

ETL project seems to no longer be on Github

https://eemeter.readthedocs.io/en/latest/datastore_index.html

datastore project seems to no longer be on Github

Docker build fails

Re testing: running both docker build . and docker-compose build returns an error relating to line 11 in Dockerfile: executor failed running [/bin/sh -c set -ex && pipenv install --system --deploy --dev]: exit code: 1. Has this issue been noticed before? Is there a step-by-step testing tutorial that could be shared specific to eemeter?

This occurs for me on Windows 11 running eemeter tests on a unchanged copy of eemeter cloned directly from this repo. Any help much appreciated.

Best way to contribute?

Hello,

I was wondering what the best way to contribute to this project might be. Most of my experience is with front-end stuff with React/Angular (although I am by no means a master of it) and I was wondering if I could perhaps contribute towards a basic GUI/charting system for the project.

Error produced when using metered_savings() on hourly data.

Hi, I am trying to run an example in the Tutorial (http://eemeter.openee.io/tutorial.html) for the hourly data (Quickstart for CalTRACK Hourly). Please advise. Thank you!

The error occurs when running the following code:
metered_savings_dataframe, error_bands = eemeter.metered_savings(
baseline_model, reporting_meter_data,
temperature_data, with_disaggregated=True
)

Traceback (most recent call last):

File "", line 3, in temperature_data, with_disaggregated=True
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\derivatives.py", line 227, in metered_savings prediction_index, temperature_data, **predict_kwargs
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\caltrack\hourly.py", line 191, in predict
return self.model.predict(prediction_index, temperature_data, **kwargs)
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\segmentation.py", line 221, in predict
prediction = segment_model.predict(segmented_data) * segmented_data.weight
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\segmentation.py", line 98, in predict
columns={0: "predicted_usage"}
TypeError: rename() got an unexpected keyword argument 'columns'

2.Version string of eemeter, pandas, and their dependencies:
eemeter version : 2.9.2
pandas version : 1.0.1
python : 3.7.6

get_baseline_data does not partition data (using daily data set).

Report installed package versions

eemeter==2.7.2
pandas==0.23.4
scipy==1.3.0
numpy==1.16.4

Describe the bug
The get_baseline_data function with option max_days = 365 returns the input dataframe, not a version subsetted to 365 days.

Include a short, self-contained Python snippet reproducing the problem. You can
format the code nicely by using GitHub Flavored Markdown:

>>> In [1]: import eemeter

>>> In [2]: import pandas as pd

>>> In [3]: meter_data, temperature_data, metadata = \
...:     eemeter.load_sample('il-electricity-cdd-hdd-daily')
>>> In [5]: data = eemeter.create_caltrack_daily_design_matrix(meter_data, temperature_data)
...:
>>> In [6]: baseline_data, warnings = eemeter.get_baseline_data(data, max_days=365)
>>> In [7]: baseline_data.equals(data)
>>>  Out[7]: True
>>> In [8]: eemeter.get_version()
>>>  Out[8]: '2.7.2'
>>> In [9]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 19.1.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.4
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.6.1
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.3.6
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

   >>> In [10]: import scipy
   >>> In [12]: scipy.__version__
   >>> Out[12]: '1.3.0'
   >>>  In [13]: import numpy
   >>> In [15]: numpy.__version__
   >>> Out[15]: '1.16.4'
   >>> In [16]: len(baseline_data)
   >>> Out[16]: 810
   >>> In [17]: len(data)
   >>> Out[17]: 810

Expected behavior

Expect a dataframe of length 365 days over only the first 365 days of data.

Additional context
Add any other context about the problem here.

ETL using Singer

Just a suggestion.

It might be worth looking at the https://www.singer.io/ project, it is sponsored by Stitch Data and is a nice open source generalised ETL approach which could be adopted or even recommended when integrating eemeter at organisations.

Caltrack usage per day predict

I am trying to use eemeter.caltrack_usage_per_day_predict with my caltrack daily model to predict usage over a provided prediction index. I get an error:
'DatetimeIndex' object has no attribute 'index'

The error seems to be associated with the computer_temperatures_features step. When I isolate the compute temperature features step with the same values it runs without errors.

Here is my code with details of each argument for eemeter.caltrack_usage_per_day_predict

eemeter.caltrack_usage_per_day_predict(
    baseline_model_results_daily.model.model_type, 
    baseline_model_results_daily.model.model_params,
    normal_year_temperatures_F,
    prediction_index_hourly
    )

baseline_model_results_daily.model.model_type=

'cdd_hdd'

baseline_model_results_daily.model.model_params =

{'intercept': 15.348747394608491,
'beta_cdd': 0.9809488470884145,
'beta_hdd': 0.37002743650584663,
'cooling_balance_point': 55,
'heating_balance_point': 55}

normal_year_temperatures_F a series (same error message received if I convert type to dataframe) =

2020-10-10 15:00:00+00:00 53.96
2020-10-10 16:00:00+00:00 55.94
2020-10-10 17:00:00+00:00 57.92
2020-10-10 18:00:00+00:00 60.08
2020-10-10 19:00:00+00:00 60.98
...
2021-10-11 10:00:00+00:00 42.98
2021-10-11 11:00:00+00:00 44.06
2021-10-11 12:00:00+00:00 44.60
2021-10-11 13:00:00+00:00 44.96
2021-10-11 14:00:00+00:00 44.96
Freq: H, Length: 8784, dtype: float64

prediction_index_hourly=

DatetimeIndex(['2020-10-10 15:00:00+00:00', '2020-10-10 16:00:00+00:00',
'2020-10-10 17:00:00+00:00', '2020-10-10 18:00:00+00:00',
'2020-10-10 19:00:00+00:00', '2020-10-10 20:00:00+00:00',
'2020-10-10 21:00:00+00:00', '2020-10-10 22:00:00+00:00',
'2020-10-10 23:00:00+00:00', '2020-10-11 00:00:00+00:00',
...
'2021-10-11 05:00:00+00:00', '2021-10-11 06:00:00+00:00',
'2021-10-11 07:00:00+00:00', '2021-10-11 08:00:00+00:00',
'2021-10-11 09:00:00+00:00', '2021-10-11 10:00:00+00:00',
'2021-10-11 11:00:00+00:00', '2021-10-11 12:00:00+00:00',
'2021-10-11 13:00:00+00:00', '2021-10-11 14:00:00+00:00'],
dtype='datetime64[ns, UTC]', length=8784, freq='H')

Here is my code for eemeter.compute_temperature_features

eemeter.compute_temperature_features(
    prediction_index_daily, 
    normal_year_temperatures_F, 
    heating_balance_points=None, 
    cooling_balance_points=None, 
    data_quality=False, 
    temperature_mean=True, 
    degree_day_method='daily', 
    percent_hourly_coverage_per_day=0.5, 
    percent_hourly_coverage_per_billing_period=0.9, 
    use_mean_daily_values=True, 
    tolerance=None, 
    keep_partial_nan_rows=True
    )

I only post this because this ran successfully when this is what I believe was causing an error in caltrack_usage_per_day_predict.

load_sample() error

From the basic usage docs:
http://eemeter.openee.io/basics.html#loading-sample-data

meter_data, temperature_data, metadata = eemeter.load_sample('il-electricity-cdd-hdd-daily')

I get this:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-a24e0ab28a69> in <module>
----> 1 meter_data, temperature_data, metadata = eemeter.load_sample('il-gas-intercept-only-hourly')

~/git/learning/openee3/src/eemeter/eemeter/samples/load.py in load_sample(sample)
     80     meter_data_filename = metadata["meter_data_filename"]
     81     with resource_stream("eemeter.samples", meter_data_filename) as f:
---> 82         meter_data = meter_data_from_csv(f, gzipped=True, freq=freq)
     83 
     84     temperature_filename = metadata["temperature_filename"]

~/git/learning/openee3/src/eemeter/eemeter/io.py in meter_data_from_csv(filepath_or_buffer, tz, start_col, value_col, gzipped, freq, **kwargs)
     81     read_csv_kwargs.update(kwargs)
     82 
---> 83     df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")
     84     if tz is not None:
     85         df = df.tz_convert(tz)

~/anaconda3/envs/openee2/lib/python3.7/site-packages/pandas/core/generic.py in tz_localize(self, tz, axis, level, copy, ambiguous, nonexistent)
   9405             if level not in (None, 0, ax.name):
   9406                 raise ValueError("The level {0} is not valid".format(level))
-> 9407             ax = _tz_localize(ax, tz, ambiguous, nonexistent)
   9408 
   9409         result = self._constructor(self._data, copy=copy)

~/anaconda3/envs/openee2/lib/python3.7/site-packages/pandas/core/generic.py in _tz_localize(ax, tz, ambiguous, nonexistent)
   9385                     ax_name = self._get_axis_name(axis)
   9386                     raise TypeError('%s is not a valid DatetimeIndex or '
-> 9387                                     'PeriodIndex' % ax_name)
   9388                 else:
   9389                     ax = DatetimeIndex([], tz=tz)

TypeError: index is not a valid DatetimeIndex or PeriodIndex

I installed eemeter with pip install -e git+git://github.com/openeemeter/[email protected]#egg=eemeteras in #352 but I don't know if that is causing it.

eemeter is loaded and available:

> eemeter.samples()
['il-electricity-cdd-hdd-billing_bimonthly',
 'il-electricity-cdd-hdd-billing_monthly',
 'il-electricity-cdd-hdd-daily',
 'il-electricity-cdd-hdd-hourly',
 'il-electricity-cdd-only-billing_bimonthly',
 'il-electricity-cdd-only-billing_monthly',
 'il-electricity-cdd-only-daily',
 'il-electricity-cdd-only-hourly',
 'il-gas-hdd-only-billing_bimonthly',
 'il-gas-hdd-only-billing_monthly',
 'il-gas-hdd-only-daily',
 'il-gas-hdd-only-hourly',
 'il-gas-intercept-only-billing_bimonthly',
 'il-gas-intercept-only-billing_monthly',
 'il-gas-intercept-only-daily',
 'il-gas-intercept-only-hourly']

and see(eemeter.load_sample) shows this:

isfunction    isroutine     ()            <             <=            ==
    !=            >             >=            dir()         hash()
    help()        repr()        str()

Thanks for looking into it.

pip install installs eemeter-2.5.3.post2 instead of 2.5.4

Loading samples didn't work with 2.5.3.post2 but it worked if I explicitly installed 2.5.4.

Problems loading sample data in 2.5.4

Installing using pip install eemeter and trying to run the sample data using eemeter.samples() gives me the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/jeff/anaconda3/envs/openee/lib/python3.7/site-packages/eemeter/samples/metadata.json'

The contents of /Users/jeff/anaconda3/envs/openee/lib/python3.7/site-packages/eemeter/samples/ has:

__init__.py
__pycache__
load.py

No sample data or metadata.json.

However, based on issue #330 I installed in a new environment using:

pip install -e git+git://github.com/openeemeter/[email protected]#egg=eemeter

eemeter.samples() worked at this point.

I'm using conda to manage environments, but I suspect that isn't the problem. I would guess those assets are not being distributed in the release, but it works when cloning. Either way, here is my setup from conda info:

     active environment : openee
    active env location : /Users/jeff/anaconda3/envs/openee
            shell level : 2
       user config file : /Users/jeff/.condarc
 populated config files : /Users/jeff/.condarc
          conda version : 4.6.14
    conda-build version : 3.17.8
         python version : 3.7.3.final.0
       base environment : /Users/jeff/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/osx-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/jeff/anaconda3/pkgs
                          /Users/jeff/.conda/pkgs
       envs directories : /Users/jeff/anaconda3/envs
                          /Users/jeff/.conda/envs
               platform : osx-64
             user-agent : conda/4.6.14 requests/2.21.0 CPython/3.7.3 Darwin/18.5.0 OSX/10.14.4
                UID:GID : 501:20
             netrc file : /Users/jeff/.netrc
           offline mode : False

So I'm not blocked, but I thought you should know. Thanks.

caltrack hourly method problem

I plan to try EEmeter caltrack hourly method, but there is one line of the code doesn't work. I used the sample data and do exactly the same with the tutorial, but this line of the code just shows error. I wonder if we need a specific version of pandas to run our code? I' m really confused about it, would you please have a look at this problem?

occupancy_lookup_hourly = eemeter.estimate_hour_of_week_occupancy(
preliminary_design_matrix_hourly,
segmentation=segmentation_hourly,
# threshold=0.65 # default
)

Consolidate pytest.ini and tox.ini, rationale in commit log of #127

ht @hangtwenty

Set flake8 config to line lengths of 100

set flake8 config to line lengths of 100, rationale in #127, ht @hangtwenty

cli with sample il-electricity-cdd-hdd-daily returns no model

The CLI command with sample returns "NO MODEL" rather than a model as shown in the documentation. This is with version 2.8.1 of eemeter.

The command is: eemeter caltrack --sample=il-electricity-cdd-hdd-daily

The command and results are from the documentation, in the section "Using the CLI":
https://eemeter.readthedocs.io/basics.html

I would expect the command to return the same status as shown in the documentation.

Thank you!

Change datetime type for samples read_meter_data_from_csv

When loading the default sample data, .tz_localize('UTC') requires a different datetime type than is being passed. The parser is unable to standardize dates and is unable to convert 'start' to a datetime index.

I fixed this bug by explicitly converting the datetime column.

SOLUTION:
def meter_data_from_csv(
filepath_or_buffer,
tz=None,
start_col="start",
value_col="value",
gzipped=False,
freq=None,
**kwargs
):
""" Load meter data from a CSV file.
Default format::
start,value
2017-01-01T00:00:00+00:00,0.31
2017-01-02T00:00:00+00:00,0.4
2017-01-03T00:00:00+00:00,0.58
Parameters
----------
filepath_or_buffer : :any:str or file-handle
File path or object.
tz : :any:str, optional
E.g., 'UTC' or 'US/Pacific'
start_col : :any:str, optional, default 'start'
Date period start column.
value_col : :any:str, optional, default 'value'
Value column, can be in any unit.
gzipped : :any:bool, optional
Whether file is gzipped.
freq : :any:str, optional
If given, apply frequency to data using :any:pandas.DataFrame.resample.
**kwargs
Extra keyword arguments to pass to :any:pandas.read_csv, such as
sep='|'.
"""

read_csv_kwargs = {
    "usecols": [start_col, value_col],
    "dtype": {value_col: np.float64},
    "parse_dates": [start_col],
    "index_col": start_col,
}

if gzipped:
    read_csv_kwargs.update({"compression": "gzip"})

# allow passing extra kwargs
read_csv_kwargs.update(kwargs)

df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs)
**df.index = pd.to_datetime(df.index, utc=True)**
if tz is not None:
    df = df.tz_convert(tz)

if freq == "hourly":
    df = df.resample("H").sum()
elif freq == "daily":
    df = df.resample("D").sum()

return df

ERROR:

TypeError Traceback (most recent call last)
in
1 #Daily Billing for Caltrack
2 meter_data, temperature_data, sample_metadata = (
----> 3 eemeter.load_sample("il-electricity-cdd-hdd-daily")
4 )
5

~/anaconda3/envs/eenv/lib/python3.7/site-packages/eemeter/samples/load.py in load_sample(sample)
80 meter_data_filename = metadata["meter_data_filename"]
81 with resource_stream("eemeter.samples", meter_data_filename) as f:
---> 82 meter_data = meter_data_from_csv(f, gzipped=True, freq=freq)
83
84 temperature_filename = metadata["temperature_filename"]

~/anaconda3/envs/eenv/lib/python3.7/site-packages/eemeter/io.py in meter_data_from_csv(filepath_or_buffer, tz, start_col, value_col, gzipped, freq, **kwargs)
81 read_csv_kwargs.update(kwargs)
82
---> 83 df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")
84 if tz is not None:
85 df = df.tz_convert(tz)

~/anaconda3/envs/eenv/lib/python3.7/site-packages/pandas/core/generic.py in tz_localize(self, tz, axis, level, copy, ambiguous, nonexistent)
9865 if level not in (None, 0, ax.name):
9866 raise ValueError("The level {0} is not valid".format(level))
-> 9867 ax = _tz_localize(ax, tz, ambiguous, nonexistent)
9868
9869 result = self._constructor(self._data, copy=copy)

~/anaconda3/envs/eenv/lib/python3.7/site-packages/pandas/core/generic.py in _tz_localize(ax, tz, ambiguous, nonexistent)
9848 ax_name = self._get_axis_name(axis)
9849 raise TypeError(
-> 9850 "%s is not a valid DatetimeIndex or " "PeriodIndex" % ax_name
9851 )
9852 else:

TypeError: index is not a valid DatetimeIndex or PeriodIndex

Parse and graph temperatures in degC

Can you please add functionality to receive temperature in degC (not just degF) and to plot graphs using degC, too?

(incl. .plot_time_series etc.)

PyPI only has tarballs for version 1.5.1

https://pypi.org/project/eemeter/

Add test suite config to use pytest-xdist

Add test suite config to use pytest-xdist to parallelize across up to 4 cores, speeding up test suite by a bit better than 2x. More notes in commit logs. See #127 for potential implementation. ht @hangtwenty

Extracting Confidence Interval of fitted regression models in "CalTRACK Hourly method" with a “one_month” setting

I am using OpenEE open source code to measure the energy efficiency of intervention and my client is asking for Confidence Interval (CI) of the fitted regression model.

I am running the eemeter with hourly meter and temperature data sets using "CalTRACK Hourly method" with a “one_month” setting (one regression model for each month or 12 models in total).

Can you please show me how to extract the Confidence Interval of the model for each model?

This is the core code I am using to do "CalTRACK Hourly method" with a “one_month”:

# Get meter data suitable for fitting a baseline model
baseline_end_date_hr = min(meter_data.index) + pd.Timedelta(days=365)
baseline_meter_data_hr, warnings = eemeter.get_baseline_data(
    meter_data, end=baseline_end_date_hr, max_days=365
    )

# Create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data_hr, temperature_data,
        )
    )

# Build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'one_month',
    )

# Assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
    )

# Assign temperatures to bins
temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
    )

# Build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        temperature_bins,
        )
    )

# BEGIN NEW CODE for fitting baseline model - example of using SegmentedModel
# directly with modified segment type. CalTRACKHourlyModel is a very thin wrapper
# around SegmentedModel, which is why this works
segment_models = [
    eemeter.fit_caltrack_hourly_model_segment(segment_name, segment_data)
    for segment_name, segment_data in segmented_design_matrices.items()
    ]

# Fit a CalTRACK hourly model
baseline_model_hr = eemeter.SegmentedModel(
    prediction_segment_type="one_month",
    prediction_segment_name_mapping=None,
    segment_models=segment_models,
    prediction_feature_processor=eemeter.caltrack_hourly_prediction_feature_processor,
    prediction_feature_processor_kwargs={
        "occupancy_lookup": occupancy_lookup,
        "temperature_bins": temperature_bins,
        },
            )

# END NEW CODE

# Get a year of reporting period data
reporting_meter_data_hr, warnings_hr = eemeter.get_reporting_data(
    meter_data, start=baseline_end_date_hr, max_days=(455)
    )
warnings_hr

# Compute metered savings for the year of the reporting period we've selected
metered_savings_hr, error_bands_hr = eemeter.metered_savings(
    baseline_model_hr, reporting_meter_data_hr,
    temperature_data, confidence_level=0.90, with_disaggregated=True
    )
error_bands_hr
metered_savings_hr.metered_savings.plot()

During test suite, disable file logging.

See #127, ht @hangtwenty

Add more versatile tools for controlling logging from the CLI

From @hangtwenty: Add versatile tools for controlling logging from the CLI. Allows changing the whole logging config, but more typically you would let the default config get used... and optionally --log-console to turn on console log output, and/or --log-level=DEBUG to increase the verbosity. For the console logger and the DEBUG level logger, the default logging config's log format has a "debug trace" flavor to it, showing modules, function names, and line numbers. (Reusing my favorite base config from other projects.) See #127

Segmentation with holidays

Energy usage in buildings typically varies on holidays compared to weekends or other weekday-hour brackets. The segmentation allows us to easily define a new map, like the following example, and segregate holiday data from the rest. This enhances the regression accuracy through more precise occupancy bins. However, one challenge is the number of data points in the holiday segment, which is necessary to prevent overfitting due to the number of independent variables (such as 168 weekday-hours, temperature bins, etc.). I would recommend to update segment_weights... and segment_time_series functions of segmentation.py to include holidays.

"three_month_weighted": {
"jan": "dec-jan-feb-weighted",
"feb": "jan-feb-mar-weighted",
"mar": "feb-mar-apr-weighted",
"apr": "mar-apr-may-weighted",
"may": "apr-may-jun-weighted",
"jun": "may-jun-jul-weighted",
"jul": "jun-jul-aug-weighted",
"aug": "jul-aug-sep-weighted",
"sep": "aug-sep-oct-weighted",
"oct": "sep-oct-nov-weighted",
"nov": "oct-nov-dec-weighted",
"dec": "nov-dec-jan-weighted",
"holiday": "holiday",
},

LF enery data architecture presentation request

Hi openEEmeter community,

With LF energy data architecture working group we like to have more insight in the current LF energy projects and their data architecture. The goal of the data architecture is to improve interopabilty of the LF energy projects.

We would like to get insight in the following topics. Can you guys give an 30 minute presentation around this topics during one of the office hours?
Project focus and introduction
Data input
Data output
Used semantics (e.g. What information standards are used?)

Please select a date and I will send an invite.
https://wiki.lfenergy.org/display/HOME/Data+Architecture+Working+Group

Data architecture working document:
https://docs.google.com/document/d/1QcHqPRSmUUJQlJnfygGDkOpDPlId6U1V22pBuvZvDYk/edit#heading=h.g0v5yhj0kiyj

Gr Sander

Releases have not been tracked

Github page for eemeter shows only v1.0.0 release (https://github.com/openeemeter/eemeter) but looking on pypi we can see there are many more versions that have been released (https://pypi.org/project/eemeter/#history). Ideally these would be consistent.

Bug in eemter when doing CalTRACK Hourly method with 'one_month' adjustment

I am trying to run the eemeter with my hourly meter and temperature data sets using "CalTRACK Hourly method".

When I do it with 'three_month_weighted' setting, it works well and I can calculate metered_savings after fitting a CalTRACK hourly model.

But, when I change that setting to 'one_month', I can still fit a CalTRACK hourly model, but when I want to calculate the metered_savings, it gives me the following error:

>>> KeyError: 'dec-jan-feb-weighted'

It seems that the source of error is here:

>>>  File "eemeter-2.8.5\eemeter\caltrack\hourly.py", line 159, in predict return self.model.predict(prediction_index, temperature_data, **kwargs)

I also get the same error when the setting is 'single' or 'three_month'.

Do you know how we can fix this error?

Thanks in advance,
Ali

During test suite, set to max verbosity and logs on console.

During test suite, set to max verbosity and logs on console. This just results in more info when you want it. You won't see extra output except when tests fail. See #127 ht @hangtwenty

eemeter 2.x.x version to test tutorial

Hey there,

I cloned the repo to local and try to test the code - I wanted to test the tutorial jupyter notebook - however, when I installed the eemeter library - the current version is 1.5.1 which didn't compile with the code. Can you tell me how I can install version 2.x.x? Thx a lot!

Best,
Doris

Use Codeclimate

can't load sample data

I'm trying load sample data as described in http://eemeter.openee.io/basics.html#loading-sample-data. I tried with both 1.5.1 (pip install) and 2.2.6 (pip install git+git://github.com/openeemeter/[email protected]), both with python 3.6.0

running eemeter.samples() with 1.5.1 fails with module 'eemeter' has no attribute 'samples'

(eemeter) ~/work/platform $ pip install eemeter
Installing collected packages: eemeter
Successfully installed eemeter-1.5.1
(eemeter) ~/work/platform $ python3
Python 3.6.0 (default, Sep 12 2017, 20:42:47)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
>>> import eemeter
>>> eemeter.samples()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'eemeter' has no attribute 'samples'
>>>

running eemeter.samples() with 2.2.6 fails with FileNotFoundError: [Errno 2] No such file or directory: '/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/metadata.json'

(eemeter) ~/work/platform $ pip install git+git://github.com/openeemeter/[email protected]
Successfully installed eemeter-2.2.6
(eemeter) ~/work/platform $ python3
Python 3.6.0 (default, Sep 12 2017, 20:42:47)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
>>> import eemeter
>>> eemeter.samples()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/load.py", line 45, in samples
    sample_metadata = _load_sample_metadata()
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/load.py", line 32, in _load_sample_metadata
    with resource_stream("eemeter.samples", "metadata.json") as f:
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1208, in resource_stream
    self, resource_name
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1573, in get_resource_stream
    return open(self._fn(self.module_path, resource_name), 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/metadata.json'

I expected it to load sample data as described on http://eemeter.openee.io/basics.html#loading-sample-data

I'm using MacOS High Sierra 10.13.6 and python 3.6.0

Documentation is out of date

Describe the bug

The documentation at https://eemeter.openee.io/index.html doesn't reflect the latest version of the package, 4.0.

For example, trying to access functions in modules in the tutorial and the API aren't available

import eemeter
eemeter.create_caltrack_billing_design_matrix

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'eemeter' has no attribute 'create_caltrack_billing_design_matrix'

Expected behavior

The documentation should reflect the current state of the software.

Additional context

None

loading sample data

Trying to follow the tutorial for loading sample data: https://eemeter.readthedocs.io/basics.html#loading-sample-data
On: Windows 10, Python 3.6.8, EEMeter 2.3.0
eemeter.samples correctly gives me the list of sample data available. When I try to load_sample per tutorial syntax I get the following error:

meter_data, temperature_data, metadata =
... eemeter.load_sample('il-electricity-cdd-hdd-daily')
Traceback (most recent call last):
File "", line 2, in
File "c:\users\ben\src\eemeter\eemeter\samples\load.py", line 82, in load_sample
meter_data = meter_data_from_csv(f, gzipped=True, freq=freq)
File "c:\users\ben\src\eemeter\eemeter\io.py", line 83, in meter_data_from_csv
df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")
File "C:\Users\Ben\Anaconda3\envs\OpenEE\lib\site-packages\pandas\core\generic.py", line 9407, in tz_localize
ax = _tz_localize(ax, tz, ambiguous, nonexistent)
File "C:\Users\Ben\Anaconda3\envs\OpenEE\lib\site-packages\pandas\core\generic.py", line 9387, in _tz_localize
'PeriodIndex' % ax_name)
TypeError: index is not a valid DatetimeIndex or PeriodIndex

Advices on how to use eemeter

Hello,

I'm interested in applying the eemeter library on data for building winter energy usage, but I'm having some difficulties in applying the model for daily data, while I'm getting consistent results in the monthly and weekly cases.

Data

Data are for a building in Italy for years 2019, 2020 and 2021 (data.zip). The meter data report the number of instants in which the heating machine was on. Below you can find a plot of the data.

Weekly and monthly models

Aggregating the data to obtain monthly and weekly frequencies, the resulting models make sense.
This is the code that I'm using to generate the model.

import datetime

import pytz
import pandas as pd
import matplotlib.pyplot as plt

import eemeter


# Load data
meter_data_path = "./meter_data.csv"
temp_data_path = "./temperature_data.csv"

meter_data = pd.read_csv(meter_data_path, index_col=0)
meter_data.index = pd.to_datetime(meter_data.index)

temp_data = pd.read_csv(temp_data_path, index_col=0)

temp_data.index = pd.to_datetime(temp_data.index)
temp_data = temp_data.resample("1H").mean().interpolate(method="linear").value

temp_data = temp_data.loc[temp_data.index >= datetime.datetime(2019, 1, 1, tzinfo=pytz.utc)]


# Define parameters: "W" for weekly model, "M" fr monthly model
time_freq = "W"
use_billing_presets = True
weights_col = "num_days"

# Aggregate meter data
meter_data_agg = meter_data.value.dropna().resample(time_freq).agg(["sum", "size"]) 
meter_data_agg["num_days"] = meter_data_agg["size"] / 24

meter_data_agg = meter_data_agg.rename(columns={"sum": "value"}) 

# Create caltrack billing design matrix and extract baseline data
data = eemeter.create_caltrack_billing_design_matrix(meter_data_agg, temp_data)
    
baseline_data = eemeter.get_baseline_data(
    data,
    start=datetime.datetime(2019, 1, 1, tzinfo=pytz.utc),
    end=datetime.datetime(2019, 12, 31, tzinfo=pytz.utc),
    max_days=None
)

# Add weights column to baseline data
baseline_df = baseline_data[0]
baseline_df[weights_col] = meter_data_agg[weights_col]

# Fit Caltrack model
model_results = eemeter.fit_caltrack_usage_per_day_model(
    baseline_data[0],
    use_billing_presets=use_billing_presets,
    weights_col=weights_col
)

# Plot resulting model
fig, ax = plt.subplots(2, 1, figsize=(12, 8))

ax[0].set_title("Reference period")
eemeter.plot_energy_signature(
    meter_data_agg.loc[meter_data_agg.index <= datetime.datetime(2020, 1, 1, tzinfo=pytz.utc)],
    temp_data, ax=ax[0])
model_results.plot(ax=ax[0], with_candidates=False)

ax[1].set_title("Whole dataset")
eemeter.plot_energy_signature(meter_data_agg, temp_data, ax=ax[1])
model_results.plot(ax=ax[1], with_candidates=False)

fig.subplots_adjust(hspace=0.5)

plt.show()

The above code generates the following two figures (setting time_freq to "W" and "M" respectively).

Weekly

Monthly

Daily data

The daily data show a strong dependence on the day of the week with a very different pattern between weekdays and weekends (see image below).

Consequently, when I fit the Caltrack daily model, I obtain a model that underestimate the in-week values and overestimate the weekends.

My idea was to include a week of day categorical variable in the regression model features (overriding the methods get_single_*_only_candidate_model). Do you have any advice on how to improve the daily model?

Thank you!

pip does not install site-packages/eemeter/sample_data

eemeter sample fails with
IOError: [Errno 2] No such file or directory: '/usr/lib64/python2.7/site-packages/eemeter/sample_data/projects.csv'
Installing the sample_data directory by hand enabled the command to run to completion

Several tests fail

Several tests fail:
failures.txt

Version: 3.1.1
Python-3.9
pandas-1.3.5
FreeBSD 13.1

Documentation is misleading about dates & datetimes

I'm deviating from the issue reporting template here because I'm reporting a bug in the documentation.

It's a confusing aspect of eemeter's documentation that in some places it refers to dates and datetimes interchangably. For example, the documentation for get_baseline_data():

def get_baseline_data(...):
    """
    ...
    start : :any:`datetime.datetime`
        A timezone-aware datetime that represents the earliest allowable start
        date for the baseline data. The stricter of this or `max_days` is used
        to determine the earliest allowable baseline period date.
    end : :any:`datetime.datetime`
        A timezone-aware datetime that represents the latest allowable end
        date for the baseline data, i.e., the latest date for which data is
        available before the intervention begins.
    max_days : :any:`int`, default 365
        The maximum length of the period. Ignored if `end` is not set.
        The stricter of this or `start` is used to determine the earliest
        allowable baseline period date.
    ...
    """

The language here talks about both datetimes and dates, but they're different things and you'd expect different handling:

If using dates, you assume the smallest unit of processing is the date and then the fact you're passing a 'timezone-aware datetime' seems weird and leads you to question what happens if you pass a datetime in the middle of a day (is the time ignored?).
On the other hand, if you assume these values are handled as timestamps, then the talk of dates makes it hard to understand what will happen without reading the code (max_days gets added to start, so if start is 12:00 one day, does the end 'date' end up being at midday as well?)

It turns out that really everything is using timestamps/datetimes and the talk of dates is a bit misleading. It would be a lot clearer if the language of timestamps was used throughout:

def get_baseline_data(...):
    """
    ...
    start : :any:`datetime.datetime`
        A timezone-aware datetime that represents the earliest allowable moment
        for the baseline data. The stricter of this or `max_days` is used
        to determine the earliest allowable baseline period timestamp.
    end : :any:`datetime.datetime`
        A timezone-aware datetime that represents the latest allowable end
        moment for the baseline data, i.e., the latest moment for which data is
        available before the intervention begins.
    max_days : :any:`int`, default 365
        The maximum length of the period. Ignored if `end` is not set.
        The stricter of this or `start` is used to determine the earliest
        allowable baseline period timestamp.
    ...
    """

It would also be worth mentioning that start and max_days are mutually exclusive, as that's not clear in the current description for this function :)

Versions of eemeter > 1.5.1 on PyPI?

Hello,

It appears that none of the 2.x series have been uploaded to PyPI. Is that an oversight or by design? I noticed that several releases for 2.5.x were released today.

Thanks!

openeemeter / eemeter Goto Github PK

eemeter's People

Contributors

Stargazers

Watchers

Forkers

eemeter's Issues

Output from Dataset 1 (LCL-June2015v2_126)

Output from Dataset 2 (LCL-June2015v2_0)

INSTALLED VERSIONS

Data

Weekly and monthly models

Weekly

Monthly

Daily data

Recommend Projects

Recommend Topics

Recommend Org