Git Product home page Git Product logo

dgpc's Introduction

ย cnugteren

dgpc's People

Contributors

cnugteren avatar dcremonini avatar dependabot[bot] avatar erikwt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

erikwt dcremonini

dgpc's Issues

Errors in tests

Running the tests using investpy==0.9.14 I get the errors below:

FAILED                            [100%]
tests/test_market.py:10 (test_etf_history)
def test_etf_history() -> None:
        """Tests querying historical ETF information using investpy, based on two open market days followed by 3 market
        closing days afterwards ('dag van de arbeid' and Saturday)."""
        dates = [datetime.date(2020, 4, 28) + datetime.timedelta(days=days) for days in range(0, 5)]
>       market_info, etf_name = market.get_data_by_isin(isin="IE00B4L5Y983", dates=tuple(dates), is_etf=True)

tests/test_market.py:15: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/market.py:79: in get_data_by_isin
    history = investpy.get_etf_historical_data(name, country=country, from_date=from_date, to_date=to_date)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

etf = 'ishares core msci world ucits', country = 'netherlands'
from_date = '28/04/2020', to_date = '09/05/2020', stock_exchange = None
as_json = False, order = 'ascending', interval = 'Daily'

    def get_etf_historical_data(etf, country, from_date, to_date, stock_exchange=None, as_json=False, order='ascending', interval='Daily'):
        """
        This function retrieves historical data from the introduced `etf` from Investing via Web Scraping on the
        introduced date range. The resulting data can it either be stored in a :obj:`pandas.DataFrame` or in a
        :obj:`json` object with `ascending` or `descending` order.
    
        Args:
            etf (:obj:`str`): name of the etf to retrieve recent historical data from.
            country (:obj:`str`): name of the country from where the etf is.
            from_date (:obj:`str`): date as `str` formatted as `dd/mm/yyyy`, from where data is going to be retrieved.
            to_date (:obj:`str`): date as `str` formatted as `dd/mm/yyyy`, until where data is going to be retrieved.
            as_json (:obj:`bool`, optional):
                to determine the format of the output data (:obj:`pandas.DataFrame` or :obj:`json`).
            order (:obj:`str`, optional):
                optional argument to define the order of the retrieved data (`ascending`, `asc` or `descending`, `desc`).
            interval (:obj:`str`, optional):
                value to define the historical data interval to retrieve, by default `Daily`, but it can also be `Weekly` or `Monthly`.
    
        Returns:
            :obj:`pandas.DataFrame` or :obj:`json`:
                The function returns either a :obj:`pandas.DataFrame` or a :obj:`json` file containing the retrieved
                recent data from the specified etf via argument. The dataset contains the open, high, low and close
                values for the selected etf on market days.
    
                The returned data is case we use default arguments will look like::
    
                    Date || Open | High | Low | Close | Currency | Exchange
                    -----||------|------|-----|-------|----------|---------
                    xxxx || xxxx | xxxx | xxx | xxxxx | xxxxxxxx | xxxxxxxx
    
                but if we define `as_json=True`, then the output will be::
    
                    {
                        name: name,
                        historical: [
                            {
                                date: dd/mm/yyyy,
                                open: x,
                                high: x,
                                low: x,
                                close: x,
                                currency: x,
                                exchange: x
                            },
                            ...
                        ]
                    }
    
        Raises:
            ValueError: raised whenever any of the arguments is not valid or errored.
            IOError: raised if etfs object/file not found or unable to retrieve.
            RuntimeError:raised if the introduced etf does not match any of the indexed ones.
            ConnectionError: raised if GET requests does not return 200 status code.
            IndexError: raised if etf information was unavailable or not found.
    
        Examples:
            >>> investpy.get_etf_historical_data(etf='bbva accion dj eurostoxx 50', country='spain', from_date='01/01/2010', to_date='01/01/2019')
                             Open   High    Low  Close Currency Exchange
                Date
                2011-12-07  23.70  23.70  23.70  23.62      EUR   Madrid
                2011-12-08  23.53  23.60  23.15  23.04      EUR   Madrid
                2011-12-09  23.36  23.60  23.36  23.62      EUR   Madrid
                2011-12-12  23.15  23.26  23.00  22.88      EUR   Madrid
                2011-12-13  22.88  22.88  22.88  22.80      EUR   Madrid
    
        """
    
        if not etf:
            raise ValueError("ERR#0031: etf parameter is mandatory and must be a valid etf name.")
    
        if not isinstance(etf, str):
            raise ValueError("ERR#0030: etf argument needs to be a str.")
    
        if country is None:
            raise ValueError("ERR#0039: country can not be None, it should be a str.")
    
        if country is not None and not isinstance(country, str):
            raise ValueError("ERR#0025: specified country value not valid.")
    
        if stock_exchange is not None and not isinstance(stock_exchange, str):
            raise ValueError("ERR#0125: specified stock_exchange value is not valid, it should be a str.")
    
        if not isinstance(as_json, bool):
            raise ValueError("ERR#0002: as_json argument can just be True or False, bool type.")
    
        if order not in ['ascending', 'asc', 'descending', 'desc']:
            raise ValueError("ERR#0003: order argument can just be ascending (asc) or descending (desc), str type.")
    
        if not interval:
            raise ValueError("ERR#0073: interval value should be a str type and it can just be either 'Daily', 'Weekly' or 'Monthly'.")
    
        if not isinstance(interval, str):
            raise ValueError("ERR#0073: interval value should be a str type and it can just be either 'Daily', 'Weekly' or 'Monthly'.")
    
        if interval not in ['Daily', 'Weekly', 'Monthly']:
            raise ValueError("ERR#0073: interval value should be a str type and it can just be either 'Daily', 'Weekly' or 'Monthly'.")
    
        try:
            datetime.strptime(from_date, '%d/%m/%Y')
        except ValueError:
            raise ValueError("ERR#0011: incorrect data format, it should be 'dd/mm/yyyy'.")
    
        try:
            datetime.strptime(to_date, '%d/%m/%Y')
        except ValueError:
            raise ValueError("ERR#0011: incorrect data format, it should be 'dd/mm/yyyy'.")
    
        start_date = datetime.strptime(from_date, '%d/%m/%Y')
        end_date = datetime.strptime(to_date, '%d/%m/%Y')
    
        if start_date >= end_date:
            raise ValueError("ERR#0032: to_date should be greater than from_date, both formatted as 'dd/mm/yyyy'.")
    
        date_interval = {
            'intervals': [],
        }
    
        flag = True
    
        while flag is True:
            diff = end_date.year - start_date.year
    
            if diff > 19:
                obj = {
                    'start': start_date.strftime('%m/%d/%Y'),
                    'end': start_date.replace(year=start_date.year + 19).strftime('%m/%d/%Y'),
                }
    
                date_interval['intervals'].append(obj)
    
                start_date = start_date.replace(year=start_date.year + 19, day=start_date.day + 1)
            else:
                obj = {
                    'start': start_date.strftime('%m/%d/%Y'),
                    'end': end_date.strftime('%m/%d/%Y'),
                }
    
                date_interval['intervals'].append(obj)
    
                flag = False
    
        interval_limit = len(date_interval['intervals'])
        interval_counter = 0
    
        data_flag = False
    
        resource_package = 'investpy'
        resource_path = '/'.join(('resources', 'etfs', 'etfs.csv'))
        if pkg_resources.resource_exists(resource_package, resource_path):
            etfs = pd.read_csv(pkg_resources.resource_filename(resource_package, resource_path))
        else:
            raise FileNotFoundError("ERR#0058: etfs file not found or errored.")
    
        if etfs is None:
            raise IOError("ERR#0009: etfs object not found or unable to retrieve.")
    
        country = unidecode.unidecode(country.strip().lower())
    
        if country not in get_etf_countries():
            raise RuntimeError("ERR#0034: country " + country + " not found, check if it is correct.")
    
        etf = unidecode.unidecode(etf.strip().lower())
    
        def_exchange = etfs.loc[((etfs['name'].str.lower() == etf) & (etfs['def_stock_exchange'] == True)).idxmax()]
    
        etfs = etfs[etfs['country'].str.lower() == country]
    
        if etf not in [value for value in etfs['name'].str.lower()]:
            raise RuntimeError("ERR#0019: etf " + etf + " not found, check if it is correct.")
    
        etfs = etfs[etfs['name'].str.lower() == etf]
    
        if def_exchange['country'] != country:
            warnings.warn(
                'Selected country does not contain the default stock exchange of the introduced ETF. ' + \
                'Default country is: \"' + def_exchange['country'] + '\" and default stock_exchange: \"' + \
                def_exchange['stock_exchange'] + '\".',
                Warning
            )
    
            if stock_exchange:
                if stock_exchange.lower() not in etfs['stock_exchange'].str.lower().tolist():
                    raise ValueError("ERR#0126: introduced stock_exchange value does not exists, leave this parameter to None to use default stock_exchange.")
    
                etf_exchange = etfs.loc[(etfs['stock_exchange'].str.lower() == stock_exchange.lower()).idxmax(), 'stock_exchange']
            else:
                found_etfs = etfs[etfs['name'].str.lower() == etf]
    
                if len(found_etfs) > 1:
                    warnings.warn(
                        'Note that the displayed information can differ depending on the stock exchange. Available stock_exchange' + \
                        ' values for \"' + country + '\" are: \"' + '\", \"'.join(found_etfs['stock_exchange']) + '\".',
                        Warning
                    )
    
                del found_etfs
    
                etf_exchange = etfs.loc[(etfs['name'].str.lower() == etf).idxmax(), 'stock_exchange']
        else:
            if stock_exchange:
                if stock_exchange.lower() not in etfs['stock_exchange'].str.lower().tolist():
                    raise ValueError("ERR#0126: introduced stock_exchange value does not exists, leave this parameter to None to use default stock_exchange.")
    
                if def_exchange['stock_exchange'].lower() != stock_exchange.lower():
                    warnings.warn(
                        'Selected stock_exchange is not the default one of the introduced ETF. ' + \
                        'Default country is: \"' + def_exchange['country'] + '\" and default stock_exchange: \"' + \
                        def_exchange['stock_exchange'].lower() + '\".',
                        Warning
                    )
    
                etf_exchange = etfs.loc[(etfs['stock_exchange'].str.lower() == stock_exchange.lower()).idxmax(), 'stock_exchange']
            else:
                etf_exchange = def_exchange['stock_exchange']
    
        symbol = etfs.loc[((etfs['name'].str.lower() == etf) & (etfs['stock_exchange'].str.lower() == etf_exchange.lower())).idxmax(), 'symbol']
        id_ = etfs.loc[((etfs['name'].str.lower() == etf) & (etfs['stock_exchange'].str.lower() == etf_exchange.lower())).idxmax(), 'id']
        name = etfs.loc[((etfs['name'].str.lower() == etf) & (etfs['stock_exchange'].str.lower() == etf_exchange.lower())).idxmax(), 'name']
    
        etf_currency = etfs.loc[((etfs['name'].str.lower() == etf) & (etfs['stock_exchange'].str.lower() == etf_exchange.lower())).idxmax(), 'currency']
    
        final = list()
    
        header = symbol + ' Historical Data'
    
        for index in range(len(date_interval['intervals'])):
            interval_counter += 1
    
            params = {
                "curr_id": id_,
                "smlID": str(randint(1000000, 99999999)),
                "header": header,
                "st_date": date_interval['intervals'][index]['start'],
                "end_date": date_interval['intervals'][index]['end'],
                "interval_sec": interval,
                "sort_col": "date",
                "sort_ord": "DESC",
                "action": "historical_data"
            }
    
            head = {
                "User-Agent": get_random(),
                "X-Requested-With": "XMLHttpRequest",
                "Accept": "text/html",
                "Accept-Encoding": "gzip, deflate, br",
                "Connection": "keep-alive",
            }
    
            url = "https://www.investing.com/instruments/HistoricalDataAjax"
    
            req = requests.post(url, headers=head, data=params)
    
            if req.status_code != 200:
                raise ConnectionError("ERR#0015: error " + str(req.status_code) + ", try again later.")
    
            if not req.text:
                continue
    
            root_ = fromstring(req.text)
            path_ = root_.xpath(".//table[@id='curr_table']/tbody/tr")
    
            result = list()
    
            if path_:
                for elements_ in path_:
                    if elements_.xpath(".//td")[0].text_content() == 'No results found':
                        if interval_counter < interval_limit:
                            data_flag = False
                        else:
                            raise IndexError("ERR#0010: etf information unavailable or not found.")
                    else:
                        data_flag = True
    
                    info = []
    
                    for nested_ in elements_.xpath(".//td"):
                        info.append(nested_.get('data-real-value'))
    
                    if data_flag is True:
                        etf_date = datetime.strptime(str(datetime.fromtimestamp(int(info[0])).date()), '%Y-%m-%d')
    
                        etf_close = float(info[1].replace(',', ''))
                        etf_open = float(info[2].replace(',', ''))
                        etf_high = float(info[3].replace(',', ''))
                        etf_low = float(info[4].replace(',', ''))
    
                        result.insert(len(result),
                                      Data(etf_date, etf_open, etf_high, etf_low, etf_close, None, etf_currency, etf_exchange))
    
                if data_flag is True:
                    if order in ['ascending', 'asc']:
                        result = result[::-1]
                    elif order in ['descending', 'desc']:
                        result = result
    
                    if as_json is True:
                        json_ = {'name': name,
                                 'historical':
                                     [value.etf_as_json() for value in result]
                                 }
    
                        final.append(json_)
                    elif as_json is False:
                        df = pd.DataFrame.from_records([value.etf_to_dict() for value in result])
                        df.set_index('Date', inplace=True)
    
                        final.append(df)
            else:
>               raise RuntimeError("ERR#0004: data retrieval error while scraping.")
E               RuntimeError: ERR#0004: data retrieval error while scraping.

venv/lib/python3.8/site-packages/investpy/etfs.py:710: RuntimeError

Stock splits not supported

Stock splits don't work.

In the csv file they look a lot like normal buy and sell rows:

STOCK SPLIT: Koop 15 @ 442,68 USD
STOCK SPLIT: Verkoop 3 @ 2.213,4 USD

However processing them as such still won't work since it seems that retrieving market prices of before the split took place returns the splitted price.

bank_cash and nominal value don't work as expected

Hey!

First off, thanks for open sourcing this work, nicely done! I was super lucky to run into this. I just started a very similar side-project and it looks like I can build a POC of what I have in mind on top of this. I had some trouble getting started because of parsing issues, and I submitted fixes through a PR. Also, some transaction types are not yet supported, like "Rente" and a few others. I think I'll add those at some point. I also plan to look into the currency conversion, because the one used now does not match what is paid through degiro, probably because the bid/ask spread. I do believe it's possible to derive it from the data though.

The bigger problem I see is that bank_cash and nominal value don't get calculated correctly, and from the code I also don't see how it's intended to work. If you have a degiro account where money goes in and out and back again, the data gets messed up. I was able to get the right data for my use case by using this as deposit code, and not using bank_cash at all.

if description in ("iDEAL storting", "Storting"):
    cash[date_index:] += mutation
    invested[date_index:] += mutation

elif description in ("Terugstorting",):
    cash[date_index:] += mutation
    invested[date_index:] += mutation

I can submit it in a PR, but I'll hold off to see if you respond with some more info about the intention.

Before the changes
dgpc-storting-fix

After the changes (this is closer to the correct representation). Ignore the profit/loss line, that's unrelated. Also note that you can see that the performance is number is not valid anymore when you pull money out of the account, because your (unrelized) gain is now relative to a smaller number.
dgpc-fixed

Again, thanks for open sourcing and let me know if you have questions or remarks about the above. Cheers, Erik

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.