Git Product home page Git Product logo

finance's People

Contributors

imgbotapp avatar suminb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

finance's Issues

Manage `Asset` records

Need to provide a tool to manage Asset records. For example, when we issue a command like this,

finance request_import_stock_values AAPL

we must be able to assure that AAPL exists or create one by fetching some data from a designated data source such as Yahoo Finance.

Sample data: view-source:https://finance.yahoo.com/quote/AAPL/profile?p=AAPL (root.App.main)

동사 일관성 갖추기

Verb Source Target
fetch Remote structured/unstructured data Local structured data
parse Local structured/unstructured data CSV
import Local structured data Database
insert Local structured/unstructured data Database

Remove Flask-related stuff

This is going to be a non-trivial task, as we are going to have to replace Flask-SQLAlchemy with pure SQLAlchemy.

Numeric type

Outline

We are currently using a numeric(20, 4) type to store monetary values, but numeric(10, 4) should be enough. We would like to see such a change will decrease the amount of storage space needed and potentially increase the overall performance.

https://www.postgresql.org/docs/9.1/static/datatype-numeric.html

Counting the number of bytes per row

WITH x AS (
   SELECT count(*)               AS ct
        , sum(length(t::text))   AS txt_len  -- length in characters
        , 'asset_value'::regclass AS tbl  -- provide (qualified) table name here
   FROM  asset_value t  -- ... and here
   )
, y AS (
   SELECT ARRAY [pg_relation_size(tbl)
               , pg_relation_size(tbl, 'vm')
               , pg_relation_size(tbl, 'fsm')
               , pg_table_size(tbl)
               , pg_indexes_size(tbl)
               , pg_total_relation_size(tbl)
               , txt_len
             ] AS val
        , ARRAY ['core_relation_size'
               , 'visibility_map'
               , 'free_space_map'
               , 'table_size_incl_toast'
               , 'indexes_size'
               , 'total_size_incl_toast_and_indexes'
               , 'live_rows_in_text_representation'
             ] AS name
   FROM   x
   )
SELECT unnest(name)                AS what
     , unnest(val)                 AS "bytes/ct"
     , pg_size_pretty(unnest(val)) AS bytes_pretty
     , unnest(val) / ct            AS bytes_per_row
FROM   x, y

UNION ALL SELECT '------------------------------', NULL, NULL, NULL
UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x
UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x
UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;

See https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-row for more details.

Result

               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4669440 | 4560 kB      |           104
 visibility_map                    |     8192 | 8192 bytes   |             0
 free_space_map                    |    24576 | 24 kB        |             0
 table_size_incl_toast             |  4702208 | 4592 kB      |           105
 indexes_size                      |  4227072 | 4128 kB      |            94
 total_size_incl_toast_and_indexes |  8929280 | 8720 kB      |           199
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |     1186 |              |              
(11 rows)

Take 1

Plans

  1. Clone the current table (asset_value) as asset_value_cloned.
  2. Alter the numeric(20, 4) columns as numeric(10, 4).
  3. Run the same query to see if there is any sign of improvement.

Result

The entire table has been cloned as

CREATE TABLE asset_value_cloned AS SELECT * FROM asset_value

Interestingly, the table size has been shrunk. It's probably because there is no dead_tuples in the cloned table.

                           List of relations
 Schema |        Name        | Type  | Owner  |    Size    | Description 
--------+--------------------+-------+--------+------------+-------------
 public | asset_value        | table |        | 4592 kB    | 
 public | asset_value_cloned | table |        | 4408 kB    | 

As a result, the average row size got smaller. (104 -> 101 bytes)

               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4513792 | 4408 kB      |           101
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |        0 | 0 bytes      |             0
 table_size_incl_toast             |  4513792 | 4408 kB      |           101
 indexes_size                      |        0 | 0 bytes      |             0
 total_size_incl_toast_and_indexes |  4513792 | 4408 kB      |           101
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |        0 |              |              
(11 rows)

We altered the type of the numeric columns as follows.

ALTER TABLE asset_value_cloned ALTER open TYPE numeric(10, 4);
ALTER TABLE asset_value_cloned ALTER high TYPE numeric(10, 4);
ALTER TABLE asset_value_cloned ALTER low TYPE numeric(10, 4);
ALTER TABLE asset_value_cloned ALTER close TYPE numeric(10, 4);

No luck. The table size remains unaffected.

               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4513792 | 4408 kB      |           101
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |        0 | 0 bytes      |             0
 table_size_incl_toast             |  4513792 | 4408 kB      |           101
 indexes_size                      |        0 | 0 bytes      |             0
 total_size_incl_toast_and_indexes |  4513792 | 4408 kB      |           101
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |        0 |              |              
(11 rows)

Take 2

Plans

  1. Create a new, empty table with a new column type (numeric(10, 4)).
  2. Copy over the entire data set into the new table.

Result

CREATE TABLE asset_value_cloned (
    id bigint NULL,
    asset_id bigint NULL,
    base_asset_id bigint NULL,
    evaluated_at timestamp without time zone NULL,
    source asset_value_source NULL,
    granularity granularity NULL,
    open numeric(10,4) NULL,
    high numeric(10,4) NULL,
    low numeric(10,4) NULL,
    close numeric(10,4) NULL,
    volume integer NULL);
INSERT INTO asset_value_cloned SELECT * FROM asset_value;
               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4513792 | 4408 kB      |           101
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |    24576 | 24 kB        |             0
 table_size_incl_toast             |  4538368 | 4432 kB      |           101
 indexes_size                      |        0 | 0 bytes      |             0
 total_size_incl_toast_and_indexes |  4538368 | 4432 kB      |           101
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |        0 |              |              
(11 rows)

No changes in terms of bytes_per_row after the alteration. We will put this one on halt and re-visit in the future.

Make a datetime utility function

Instead of directly using datetime.utcnow(). In that way, if we ever decide to use a different data type (e.g., float or int), our job becomes a bit easier.

Unique key for asset_value

Currently defined as:

-- auto-generated definition
create unique index asset_value_asset_id_evaluated_at_granularity_key
    on asset_value (asset_id, evaluated_at, granularity);

We're going to need source as the fourth element in the unique key as:

(asset_id, evaluated_at, granularity, source)

Batch insertion (with psycopyg2) is way too slow

We need to insert a large quantity of AssetValue records, but it's too slow. Takes more than 60sec to insert tens of rows.

This may be helpful...

Or we could use a thread pool to make things faster. We also could send all the insertion queries in a single batch (within a single transaction) but that will greatly increase the likelihood of failures, as we will try to insert data over overlapping periods most of the time.

미래에셋대우: 해외거래내역

화면번호 9465

거래일자

yyyy/mm/dd

번호

A daily sequential number

원번

???

거래구분

  • 주식매수
  • 은행이체입금
  • 외화인지세
  • 예이용료
  • 주식매도
  • 해외주배당금
  • 해외주매수
  • 환전매수
  • 배당금입금
  • 해외주매도

통화구분

USD, KRW, ...

외화입출금액

원화입출금액

종목코드, 종목명

Looks like some internal code

KR7005380001	현대차
KR7232080002	TIGER 코스닥150
US8552441094	스타벅스

수량

Number of shares

가격

Unit price in the specified currency

원화금잔금액

외화금잔금액

외화수수료

제세금(원화)

변제금액

미수발생금액

기준환율

지점명

적요코드

서비스매체구분

의뢰인명

전자공시(DART) 데이터 가져오기

작업 개요

전자공시 데이를 가져와서 구조화된 형태로 저장하고 관리함으로써 나중에 필요한 형태로 가공하거나, 필터링 해서 알림을 줄 수 있는 시스템을 만든다.

Example Request

curl 'http://m.dart.fss.or.kr/md3002/search.st?currentPage=2&maxResultCnt=15&corporationType=&textCrpNm=%EC%82%BC%EC%84%B1%EC%A0%84%EC%9E%90&textCrpCik=00126380&startDate=20160907&endDate=20170307&publicType=&publicOrgType=&reportName=&textPresenterNm=&finalReport=&lastRcpNo=20170303800956&totalPage=&textTerm=&_=1488893683281' -H 'Cookie: JSESSIONID=xkpZNbiUquaV4eRUBcjdaEa2OYEYZJhDKRMWmOrKoM6ixFJogocZBG5UhVgpY2lz.dart2_servlet_engine1' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en-US,en;q=0.8,ko;q=0.6' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Referer: http://m.dart.fss.or.kr/html_mdart/MD1005.html?startDate=20160907&endDate=20170307&textCrpCik=00126380&screenid=md1001&stype=md1001&currentPage=1&corporationType=&textCrpNm=%EC%82%BC%EC%84%B1%EC%A0%84%EC%9E%90&textCrpNmAuto=%EC%82%BC%EC%84%B1%EC%A0%84%EC%9E%90' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' --compressed

Example Response

{
    "rlist": [{
        "curr_page": 3,
        "row_num": 31,
        "bsn_yr": null,
        "crc_gubun": "N",
        "crc_nm": null,
        "crp_rgs_cls": "P",
        "dsm_crp_cik": "00126380",
        "dcm_id": null,
        "dcm_no": null,
        "dcm_nm": null,
        "dcm_int_tp": "N",
        "flr_cik": "00725534",
        "ifm_nm": "삼성전자",
        "ifm_nm2": "권오현",
        "rpt_id": "00634",
        "rpt_nm": "임원ㆍ주요주주특정증권등소유상황보고서",
        "rcp_no": "20161222000173",
        "rcp_dm": "2016.12.22",
        "sub_com_tp": "N",
        "rpt_title": null,
        "dcm_tp": "0",
        "rptNm": "임원ㆍ주요주주특정증권등소유상황보고서 "
    }, {

NOTE: 여기서 20161222000173와 같은 값은 일종의 식별자로 보고서로 바로 넘어갈 수 있는 키가 된다.

http://m.dart.fss.or.kr/html_mdart/MD1007.html?rcpNo=20161222000173

문서의 세부 항목은 다음과 같이 접근 가능하다.

curl 'http://m.dart.fss.or.kr/report/main.do' -H 'Cookie: JSESSIONID=xkpZNbiUquaV4eRUBcjdaEa2OYEYZJhDKRMWmOrKoM6ixFJogocZBG5UhVgpY2lz.dart2_servlet_engine1' -H 'Origin: http://m.dart.fss.or.kr' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.8,ko;q=0.6' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: text/html, */*; q=0.01' -H 'Referer: http://m.dart.fss.or.kr/html_mdart/MD1007.html?rcpNo=20161222000173' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' --data 'rcpNo=20161222000173&dcmNo=5390164&eleId=2' --compressed

여기서 eleId가 세부 항목의 아이디로 보여진다.

TODO

  • 특정 종목에 대한 전자공시 보고서 리스트 받아오기
  • 특정 종목에 대한 전자공시 리스트 모든 페이지 받아오기
  • 특정 종목에 대한 전자공시 내용 받아오기
  • 잘못된 종목 코드가 주어졌을 때의 처리
  • (미리 등록된) 모든 종목에 대한 전자공시 리스트 및 내용 받아오기 (이것도 따로 이슈 만들어서 진행하는 편이 좋을 것 같다)
  • 전자공시 데이터를 웹 인터페이스로 보여주기 (이건 따로 이슈를 만들어서 진행하는게 좋을 듯)

Date format in CSV

Current:

1514903400, 195.77999877929688, 199.5, 194.5, 199.35000610351562, 8890400, yahoo

What we want:

2018-01-01 10:00:00, 195.77999877929688, 199.5, 194.5, 199.35000610351562, 8890400, yahoo

See ISO 8601.

미래에셋대우: 거래내역조회

화면번호 0650 (거래내역조회), '한줄로 보기'

거래일자

Date format: yyyy/mm/dd
Example: 2016/12/29

번호

An one-based daily sequential number

원번

Looks like some kind of secondary index

거래구분

  • 주식매수
  • 은행이체입금
  • 외화인지세
  • 주식매도
  • 해외주매수
  • 환전매수
  • 해외주매도
  • 배당금입금
  • 해외주배당금
lines = open('transaction_types.txt').readlines()
types = set([x.strip() for x in lines])

for t in types:
    print('- {}'.format(t))

입출금액

Total transaction amount (including fees and taxes)

종목명

Some examples:

  • USD(단위-cent)
  • GS리테일
  • KODEX 200

Other fields to be analyzed

수량 가격 예수금금잔 수수료 변제금액 미수발생금액 유가잔고 지점명 매체구분 의뢰인명/메모 제세금 단가 상대기관 상대계좌명 입금메모 출금메모

Securely store DB_URL with Terraform and AWS

Option 1: Hard-code in a Terraform file

Oh, hell no. It ain't gonna happen.

Option 2: DynamoDB

We could dedicate one record in DynamoDB to store DB_URL, but is it worth to introduce additional complexity within our architecture just to accomplish this?

Option 3: S3

Pretty much the same idea as Option 2, but in S3.

Option 4: Embed it in build-time

Embed DB_URL when assembling a Lambda .zip package.

Option 5: Input variables

terraform apply -var db_url=$DB_URL

TODO

  • Detect if a ticker has been delisted (6047bf7)
  • Discover new tickers
  • Calculate MDM

Multiple base assets

Account.net_worth() should be able to aggregate all asset values and categorize them into base assets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.