Git Product home page Git Product logo

pvoutput's Introduction

All Contributors

codecov

Download historical solar photovoltaic data from PVOutput.org.

This code is a work-in-progress. The aim is to provide both a Python library for interacting with PVOutput.org's API, and a set of scripts for downloading lots of data :)

Installation

$ pip install pvoutput-ocf

Register with PVOutput.org

You need to get an API key and a system ID from PVOutput.org.

If you don't have a PV system, click the "energy consumption only" box when registering on PVOutput. If you don't include a system ID, then you'll get a "401 Unauthorized" response from the PVOutput API.

You can pass the API key and system ID into the PVOutput constructor. Or, create a ~/.pvoutput.yml file which looks like:

api_key: <API key from PVOutput.org>
system_id: <SystemID from PVOutput.org>

The default location of the .pvoutput.yml is the user's home directory, expanded from ~. This can be overridden by setting the PVOUTPUT_CONFIG environment variable.

e.g. export PVOUTPUT_CONFIG="/my/preferred/location/.pvoutput.yml"

Alternatively, you can set API_KEY, SYSTEM_ID and DATA_SERVICE_URL (see below) as environmental variables.

API quotas and paid subscriptions

Please see here for update info.

Free

PVOutput.org gives you 60 API requests per hour. Per request, you can download one day of data for one PV system. (See PVOutput's docs for more info about rate limits.)

Donate

Donating to PVOutput.org increases your quota for a year to 300 requests per hour.

Paid

To get more historical data, you can pay $600 Australian dollars for a year's 'Live System History' subscription for a single country (more info here. And here's PVOutput.org's full price list). This allows you to use the get batch status API to download 900 PV-system-years per hour.

If you have subscribed to PVOutput's data service then either

  • add data_service_url to your configuration file (~/.pvoutput.yml) or
  • pass data_service_url to the PVOutput constructor.

The data_service_url should end in .org (note this dones include the /service/r2 part of the URL) For example: data_service_url: https://pvoutput.org/

Install pvoutput Python library

`pip install -e git+https://github.com/openclimatefix/pvoutput.git@main#egg=pvoutput-ocf

Usage

See the Quick Start notebook.

Contributors โœจ

Thanks goes to these wonderful people (emoji key):

Jack Kelly
Jack Kelly

๐Ÿ’ป
Sam Murphy-Sugrue
Sam Murphy-Sugrue

๐Ÿ’ป
Gabriel Tseng
Gabriel Tseng

๐Ÿ’ป
Jamie Taylor
Jamie Taylor

๐Ÿ’ป
Peter Dudfield
Peter Dudfield

๐Ÿš‡
Shanmukh Chava
Shanmukh Chava

๐Ÿ’ป
Antsthebul
Antsthebul

๐Ÿ’ป
Rachit Singh
Rachit Singh

๐Ÿ”ฃ ๐Ÿ’ป
devsjc
devsjc

๐Ÿ’ป

This project follows the all-contributors specification. Contributions of any kind welcome!

pvoutput's People

Contributors

allcontributors[bot] avatar antsthebul avatar devsjc avatar gabrieltseng avatar jackkelly avatar jacobbieker avatar jamietaylor-tuos avatar peterdudfield avatar pre-commit-ci[bot] avatar rachtsingh avatar ssmssam avatar vnshanmukh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

pvoutput's Issues

Investigate strange batch download behaviour

  1. On 28th Aug, why did it freeze up? Waiting too long for network to timeout?

  2. On Sat 31 Aug, why didn't it skip to ~50% before downloading?

  3. What's going on here:

2019-09-01 12:18:28,949 - pvoutput - INFO - system_id 9570: Got empty timeseries back for 2012-03-31
2019-09-01 12:19:16,906 - pvoutput - INFO - system_id 9570: 143543 total rows downloaded

Include a Dockerised dev environment

Detailed Description

For easier environment configuration, add a Docker image which installs any system libs needed and installs pip requirements.

Context

  • Faster setup for developers with consistent dev envs
  • Easier for windows user than wrestling with dependencies and missing system libs.

Possible Implementation

PR to follow.

Mis-labelled column in systems metadata: `panel_capacity_W_each`

When downloading system metadata using either search() or get_metadata(), you have labelled the System Size field from PVOutput as system_AC_capacity_W, but I think this may actually be the DC capacity of the system, since it seems to correspond to Panels multiplied by Panel Power (or num_panels multiplied by panel_capacity_W_each using OCF notation). Also, the Inverter Power (inverter_capacity_W) is recorded separately, and is typically considered the "AC capacity" of the system. The API docs don't make this clear though so I could be wrong.

PR to follow...

Get pvoutput code running on AWS

The code in this repository can be used to get data from pvoutput.org.
It is not run anywhere yet though.
We should turn it into something that runs on AWS so we continually get live PV power data (that we then can plot on the UI).

Ideally we achieve this by dockering the whole script, because then we can still decide whether we want to run it in an ECS cluster or via Batch or...
But also very open for other suggestions.

We do have an API Key for pvoutput.org and we already have a bucket for the output that's accessed by the frontend. Those two things should probably be env vars to the docker image.

raise ValueError("data_service_url must end in '.org'")

Hello,

I want to use this code to download data from pvoutput.org. I don't have any PV system registered but I've donated in order to be able to download the data of other pv systems. I installed the library via the pip install method. After that importing the library is not causing any problem.

The problem occurs when I want to acces a PV system. I do this by writing 'PVOutput(API_KEY, SYSTEM_ID)'. With API_Key= my API_key that i can find under settings in pvouput.org; and System_ID= the id of the system from which I want to acces the data, so the 5 numbers coming after &sid= in the url. This gives me following error in python:

File "", line 1, in
PVOutput(API_KEY, SYSTEM_ID)

File "c:\users\hazaioud\src\pvoutput\pvoutput\pvoutput.py", line 91, in init
raise ValueError("data_service_url must end in '.org'")

ValueError: data_service_url must end in '.org'

After that I've putted source code as comment:

if self.data_service_url is not None:

if not self.data_service_url.strip('/').endswith('.org'):

raise ValueError("data_service_url must end in '.org'")

Than I don't see the error anymore but I'm not sure wether that is intented or not. Anyway, after that I try to asses the pvsystems within a radius of 5 km from the given coordinates:

pv.search(query='5km', lat=52.0668589, lon=-1.3484038)

But than Python freezes totally.....

What I'm doing wrong?

Implement get_batch_metadata

The function signature could look something like this:

def PVOutput.get_batch_metadata(
  pv_system_ids,
  output_filename: Optional, # if none provided then just return metadata, don't write to disk 
  overwrite=False # if true then download new metadata, regardless of whether it's in the
                  # HDF file, only relevant of output set
) 

ValueError: cannot match existing table structure for [missing_date_PV_localtime] on appending data

append_missing_date(
    date_to_load=date(2019, 1, 1), 
    datetime_of_api_request=pd.Timestamp.now(),
    system_id=system_id)

~/anaconda3/lib/python3.7/site-packages/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3639                         "on appending data".format(
   3640                             items=(','.join(pprint_thing(item) for
-> 3641                                             item in items))))
   3642             blocks = new_blocks
   3643             blk_items = new_blk_items

ValueError: cannot match existing table structure for [missing_date_PV_localtime] on appending data

Failing test mapscraper_test

Describe the bug
FAILED pvoutput/tests/mapscraper_test.py::test_process_output_col - TypeError: Expected unicode, got NavigableString

To Reproduce
Steps to reproduce the behavior:

  1. run pytest pvoutput/tests/mapscraper_test.py

Expected behavior
test to pass

** Context **
May also see other error in test_utils

ValueError: day is out of range for month

2019-09-01 20:31:21,991 - pvoutput - INFO - **********************
2019-09-01 20:31:21,991 - pvoutput - INFO - system_id 5766: 833 of 2556 (32.589984%)
2019-09-01 20:31:22,779 - pvoutput - INFO - system_id 5766: 0 missing dates already found
2019-09-01 20:31:22,939 - pvoutput - INFO - system_id 5766: Will download these date ranges: [DateRange(start_date=datetime.date(2011, 11, 17), end_date=datetime.date(2012, 2, 29)), DateRange(start_date=datetime.date(2015, 4, 9), end_date=datetime.date(2015, 4, 16)), DateRange(start_date=datetime.date(2018, 10, 12), end_date=datetime.date(2018, 11, 24))]
2019-09-01 20:31:22,941 - pvoutput - ERROR - Exception! day is out of range for month
Traceback (most recent call last):
  File "<ipython-input-15-6932e1778dc1>", line 8, in <module>
    output_filename=OUTPUT_TIMESERIES_FILENAME)
  File "/home/jack/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py", line 584, in batch_download
    timezone)
  File "/home/jack/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py", line 647, in _batch_download_using_get_batch_status
    years = merge_date_ranges_to_years(date_ranges_to_download)
  File "/home/jack/dev/python/openclimatefix/solar/pvoutput/pvoutput/daterange.py", line 111, in merge_date_ranges_to_years
    date_from = date_to.replace(year=date_to.year-1)
ValueError: day is out of range for month

Scrape all country codes

At present, consts.py only includes a few PVOutput.org country codes.

PVOutput.org's country codes seem to be custom for PVOutput.org. (They don't appear to be ISO 3166 codes)

Write a script to iterate through pvoutput.org/map.jsp?country=c where c is between 1 and 257 inclusive, and scrape the name from the text just above the table.

Use appropriate dtypes

PVOutput functions (like get_status and get_batch_status and get_statistics) should all return the correct dtypes.

Re-processed existing HDF files to convert to correct dtypes.

Include a production Docker image

Detailed Description

To make environment configuration easier and to streamline setup, add a Docker image which can serve as both a playground and a production environment if we later add CLI.

Context

  • Makes quick start even quicker for Docker users
  • Avoids nasty dependency (pyproj, pyshp, shapely, cython, cartopy, geopandas)* and system lib (Proj, libgeos, GDAL) issues on Windows machines.
  • Improve deployability

* N.B. These dependencies are not yet needed, but will be included in a future PR

Possible Implementation

PR to follow.

Use config file(s) instead of environment variables

e.g. have a config_defaults.yaml which is checked into git; and a custom config.yaml which people set up themselves, and overrides defaults and sets custom config settings (like the directory for data, etc.)

ConnectionError

--------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/connection.py in _new_conn(self)
    159             conn = connection.create_connection(
--> 160                 (self._dns_host, self.port), self.timeout, **extra_kw)
    161 

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options)
     56 
---> 57     for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     58         af, socktype, proto, canonname, sa = res

~/miniconda3/envs/pvoutput/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    747     addrlist = []
--> 748     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    749         af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    602                                                   body=body, headers=headers,
--> 603                                                   chunked=chunked)
    604 

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    354         else:
--> 355             conn.request(method, url, **httplib_request_kw)
    356 

~/miniconda3/envs/pvoutput/lib/python3.7/http/client.py in request(self, method, url, body, headers, encode_chunked)
   1228         """Send a complete request to the server."""
-> 1229         self._send_request(method, url, body, headers, encode_chunked)
   1230 

~/miniconda3/envs/pvoutput/lib/python3.7/http/client.py in _send_request(self, method, url, body, headers, encode_chunked)
   1274             body = _encode(body, 'body')
-> 1275         self.endheaders(body, encode_chunked=encode_chunked)
   1276 

~/miniconda3/envs/pvoutput/lib/python3.7/http/client.py in endheaders(self, message_body, encode_chunked)
   1223             raise CannotSendHeader()
-> 1224         self._send_output(message_body, encode_chunked=encode_chunked)
   1225 

~/miniconda3/envs/pvoutput/lib/python3.7/http/client.py in _send_output(self, message_body, encode_chunked)
   1015         del self._buffer[:]
-> 1016         self.send(msg)
   1017 

~/miniconda3/envs/pvoutput/lib/python3.7/http/client.py in send(self, data)
    955             if self.auto_open:
--> 956                 self.connect()
    957             else:

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/connection.py in connect(self)
    182     def connect(self):
--> 183         conn = self._new_conn()
    184         self._prepare_conn(conn)

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/connection.py in _new_conn(self)
    168             raise NewConnectionError(
--> 169                 self, "Failed to establish a new connection: %s" % e)
    170 

NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f8788ce06d8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    448                     retries=self.max_retries,
--> 449                     timeout=timeout
    450                 )

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    640             retries = retries.increment(method, url, error=e, _pool=self,
--> 641                                         _stacktrace=sys.exc_info()[2])
    642             retries.sleep()

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
    398         if new_retry.is_exhausted():
--> 399             raise MaxRetryError(_pool, url, error or ResponseError(cause))
    400 

MaxRetryError: HTTPConnectionPool(host='data.pvoutput.org', port=80): Max retries exceeded with url: /service/r2/getbatchstatus.jsp?sid1=14776&dt=20160426&key=3f784ff6cfa27c44436f88da0c429b410687ad14&sid=68732 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8788ce06d8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
<ipython-input-15-4123708f25df> in <module>
      6         start_date=START_DATE,
      7         end_date=END_DATE,
----> 8         output_filename=OUTPUT_TIMESERIES_FILENAME)
      9 except Exception as e:
     10     logger.exception('Exception! %s', e)

~/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py in download_multiple_systems_to_disk(self, system_ids, start_date, end_date, output_filename, timezone, min_data_availability, use_get_batch_status_if_available)
    604                         pv_system_id,
    605                         date_ranges_to_download,
--> 606                         timezone)
    607                 else:
    608                     raise ValueError('data_service_url is not set!')

~/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py in _download_multiple_using_get_batch_status(self, output_filename, pv_system_id, date_ranges_to_download, timezone)
    675         total_rows = self._download_multiple_worker(
    676             output_filename, pv_system_id, dates_to, timezone,
--> 677             use_get_status=False)
    678 
    679         # Re-load data, sort, remove duplicate indicies, append back

~/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py in _download_multiple_worker(self, output_filename, pv_system_id, dates, timezone, use_get_status)
    715             else:
    716                 timeseries = self.get_batch_status(
--> 717                     pv_system_id, date_to=date_to_load)
    718             if timeseries.empty:
    719                 _LOG.info('system_id %d: Got empty timeseries back for %s',

~/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py in get_batch_status(self, pv_system_id, date_to, max_retries, **kwargs)
    288                 pv_system_status_text = self._api_query(
    289                     service='getbatchstatus', api_params=api_params,
--> 290                     use_data_service=True, **kwargs)
    291             except NoStatusFound:
    292                 _LOG.info(

~/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py in _api_query(self, service, api_params, wait_if_rate_limit_exceeded, use_data_service)
    781 
    782         try:
--> 783             response = get_response_func(service, api_params)
    784         except Exception as e:
    785             _LOG.exception(e)

~/dev/python/openclimatefix/solar/pvoutput/pvoutput/pvoutput.py in _get_data_service_response(self, service, api_params)
    844             self.data_service_url, 'service/r2/{}.jsp'.format(service))
    845 
--> 846         return _get_response(api_url, api_params, headers)
    847 
    848     def _check_api_params(self):

~/dev/python/openclimatefix/solar/pvoutput/pvoutput/utils.py in _get_response(api_url, api_params, headers)
     83     full_api_url = '{}?{}'.format(api_url, api_params_str)
     84     session = _get_session_with_retry()
---> 85     response = session.get(full_api_url, headers=headers)
     86     _LOG.debug(
     87         'response: status_code=%d; headers=%s',

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/requests/sessions.py in get(self, url, **kwargs)
    544 
    545         kwargs.setdefault('allow_redirects', True)
--> 546         return self.request('GET', url, **kwargs)
    547 
    548     def options(self, url, **kwargs):

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    531         }
    532         send_kwargs.update(settings)
--> 533         resp = self.send(prep, **send_kwargs)
    534 
    535         return resp

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/requests/sessions.py in send(self, request, **kwargs)
    644 
    645         # Send the request
--> 646         r = adapter.send(request, **kwargs)
    647 
    648         # Total elapsed time of the request (approximately)

~/miniconda3/envs/pvoutput/lib/python3.7/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    514                 raise SSLError(e, request=request)
    515 
--> 516             raise ConnectionError(e, request=request)
    517 
    518         except ClosedPoolError as e:

ConnectionError: HTTPConnectionPool(host='data.pvoutput.org', port=80): Max retries exceeded with url: /service/r2/getbatchstatus.jsp?sid1=14776&dt=20160426&key=3f784ff6cfa27c44436f88da0c429b410687ad14&sid=68732 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8788ce06d8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

Unittest fails on Jenkins Build: AttributeError: 'numpy.bytes_' object has no attribute 'get'

https://travis-ci.com/github/openclimatefix/pvoutput/builds/203478127

=================================== FAILURES ===================================
_______________________ test_get_date_ranges_to_download _______________________
    def test_get_date_ranges_to_download():
        date_ranges = utils.get_date_ranges_to_download(
>           TEST_HDF, PV_SYSTEM, '2018-01-01', '2019-01-10')
pvoutput/tests/test_utils.py:36: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../virtualenv/python3.7.1/src/pvoutput/pvoutput/utils.py:123: in get_date_ranges_to_download
    store_filename, system_id)
../../../virtualenv/python3.7.1/src/pvoutput/pvoutput/utils.py:174: in get_dates_already_downloaded
    key=key, columns=['datetime', 'query_date'])
../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pandas/io/pytables.py:867: in select
    return it.get_result()
../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pandas/io/pytables.py:1921: in get_result
    results = self.func(self.start, self.stop, where)
../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pandas/io/pytables.py:851: in func
    return s.read(start=_start, stop=_stop, where=_where, columns=columns)
../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pandas/io/pytables.py:4474: in read
    result = self._read_axes(where=where, start=start, stop=stop)
../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pandas/io/pytables.py:3679: in _read_axes
    a.set_info(self.info)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = name->index,cname->index,axis->0,pos->0,kind->datetime64
info = b'(dp0\nI1\n(dp1\nVnames\np2\n(lp3\nNasVtype\np4\nVIndex\np5\nssVindex\np6\n(dp7\nVfreq\np8\nccopy_reg\n_reconstructor...\nsbsVindex_name\np19\nVdatetime\np20\nssVdatetime_of_API_request\np21\n(dp22\nVtz\np23\ncpytz\n_UTC\np24\n(tRp25\nss.'
    def set_info(self, info):
        """ set my state from the passed info """
>       idx = info.get(self.name)
E       AttributeError: 'numpy.bytes_' object has no attribute 'get'
../../../virtualenv/python3.7.1/lib/python3.7/site-packages/pandas/io/pytables.py:2179: AttributeError

add pre-commit-config.yaml

Detailed Description

add pre-commit yaml to the repo

Context

Useful to have some formatting automatically

Possible Implementation

Copy from other repo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.