suminb / finance Goto Github PK

View Code? Open in Web Editor NEW

131.0 131.0 10.0 3.19 MB

경제적 자유로의 여정

License: BSD 4-Clause "Original" or "Old" License

Python 64.29% HTML 33.58% Mako 0.20% Shell 0.27% HCL 1.66%

bonds cagr finance investment python stock

finance's People

Contributors

Stargazers

Watchers

Forkers

freeyang lijielife magictoy wuxiaolei499390725 csyouk jungmin03 leetoo webhacking koorukuroo changhun-lee

finance's Issues

Manage `Asset` records

Need to provide a tool to manage Asset records. For example, when we issue a command like this,

finance request_import_stock_values AAPL

we must be able to assure that AAPL exists or create one by fetching some data from a designated data source such as Yahoo Finance.

Sample data: view-source:https://finance.yahoo.com/quote/AAPL/profile?p=AAPL (root.App.main)

Set up Alembic scripts

Initial model definitions
Write test cases for Alembic upgrade/downgrade

동사 일관성 갖추기

Verb	Source	Target
`fetch`	Remote structured/unstructured data	Local structured data
`parse`	Local structured/unstructured data	CSV
`import`	Local structured data	Database
`insert`	Local structured/unstructured data	Database

TODO

보유 자산 목록
비중, 금액
CAGR
PER, BPR

Remove Flask-related stuff

This is going to be a non-trivial task, as we are going to have to replace Flask-SQLAlchemy with pure SQLAlchemy.

Show dividend payout schedules

Data can be obtained from

http://www.nasdaq.com/dividend-stocks/dividend-calendar.aspx
http://www.seibro.or.kr/websquare/control.jsp?w2xPath=/IPORTAL/user/company/BIP_CNTS01022V.xml&menuNo=20 (However, this does not disclose the amount of dividend)
https://www.investing.com/dividends-calendar/

Outline

We are currently using a numeric(20, 4) type to store monetary values, but numeric(10, 4) should be enough. We would like to see such a change will decrease the amount of storage space needed and potentially increase the overall performance.

https://www.postgresql.org/docs/9.1/static/datatype-numeric.html

Counting the number of bytes per row

WITH x AS (
   SELECT count(*)               AS ct
        , sum(length(t::text))   AS txt_len  -- length in characters
        , 'asset_value'::regclass AS tbl  -- provide (qualified) table name here
   FROM  asset_value t  -- ... and here
   )
, y AS (
   SELECT ARRAY [pg_relation_size(tbl)
               , pg_relation_size(tbl, 'vm')
               , pg_relation_size(tbl, 'fsm')
               , pg_table_size(tbl)
               , pg_indexes_size(tbl)
               , pg_total_relation_size(tbl)
               , txt_len
             ] AS val
        , ARRAY ['core_relation_size'
               , 'visibility_map'
               , 'free_space_map'
               , 'table_size_incl_toast'
               , 'indexes_size'
               , 'total_size_incl_toast_and_indexes'
               , 'live_rows_in_text_representation'
             ] AS name
   FROM   x
   )
SELECT unnest(name)                AS what
     , unnest(val)                 AS "bytes/ct"
     , pg_size_pretty(unnest(val)) AS bytes_pretty
     , unnest(val) / ct            AS bytes_per_row
FROM   x, y

UNION ALL SELECT '------------------------------', NULL, NULL, NULL
UNION ALL SELECT 'row_count', ct, NULL, NULL FROM x
UNION ALL SELECT 'live_tuples', pg_stat_get_live_tuples(tbl), NULL, NULL FROM x
UNION ALL SELECT 'dead_tuples', pg_stat_get_dead_tuples(tbl), NULL, NULL FROM x;

See https://dba.stackexchange.com/questions/23879/measure-the-size-of-a-postgresql-table-row for more details.

Result

               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4669440 | 4560 kB      |           104
 visibility_map                    |     8192 | 8192 bytes   |             0
 free_space_map                    |    24576 | 24 kB        |             0
 table_size_incl_toast             |  4702208 | 4592 kB      |           105
 indexes_size                      |  4227072 | 4128 kB      |            94
 total_size_incl_toast_and_indexes |  8929280 | 8720 kB      |           199
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |     1186 |              |              
(11 rows)

Take 1

Plans

Clone the current table (asset_value) as asset_value_cloned.
Alter the numeric(20, 4) columns as numeric(10, 4).
Run the same query to see if there is any sign of improvement.

Result

The entire table has been cloned as

CREATE TABLE asset_value_cloned AS SELECT * FROM asset_value

Interestingly, the table size has been shrunk. It's probably because there is no dead_tuples in the cloned table.

                           List of relations
 Schema |        Name        | Type  | Owner  |    Size    | Description 
--------+--------------------+-------+--------+------------+-------------
 public | asset_value        | table |        | 4592 kB    | 
 public | asset_value_cloned | table |        | 4408 kB    |

As a result, the average row size got smaller. (104 -> 101 bytes)

               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4513792 | 4408 kB      |           101
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |        0 | 0 bytes      |             0
 table_size_incl_toast             |  4513792 | 4408 kB      |           101
 indexes_size                      |        0 | 0 bytes      |             0
 total_size_incl_toast_and_indexes |  4513792 | 4408 kB      |           101
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |        0 |              |              
(11 rows)

We altered the type of the numeric columns as follows.

ALTER TABLE asset_value_cloned ALTER open TYPE numeric(10, 4);
ALTER TABLE asset_value_cloned ALTER high TYPE numeric(10, 4);
ALTER TABLE asset_value_cloned ALTER low TYPE numeric(10, 4);
ALTER TABLE asset_value_cloned ALTER close TYPE numeric(10, 4);

No luck. The table size remains unaffected.

               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4513792 | 4408 kB      |           101
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |        0 | 0 bytes      |             0
 table_size_incl_toast             |  4513792 | 4408 kB      |           101
 indexes_size                      |        0 | 0 bytes      |             0
 total_size_incl_toast_and_indexes |  4513792 | 4408 kB      |           101
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |        0 |              |              
(11 rows)

Take 2

Plans

Create a new, empty table with a new column type (numeric(10, 4)).
Copy over the entire data set into the new table.

Result

CREATE TABLE asset_value_cloned (
    id bigint NULL,
    asset_id bigint NULL,
    base_asset_id bigint NULL,
    evaluated_at timestamp without time zone NULL,
    source asset_value_source NULL,
    granularity granularity NULL,
    open numeric(10,4) NULL,
    high numeric(10,4) NULL,
    low numeric(10,4) NULL,
    close numeric(10,4) NULL,
    volume integer NULL);

INSERT INTO asset_value_cloned SELECT * FROM asset_value;

               what                | bytes/ct | bytes_pretty | bytes_per_row 
-----------------------------------+----------+--------------+---------------
 core_relation_size                |  4513792 | 4408 kB      |           101
 visibility_map                    |        0 | 0 bytes      |             0
 free_space_map                    |    24576 | 24 kB        |             0
 table_size_incl_toast             |  4538368 | 4432 kB      |           101
 indexes_size                      |        0 | 0 bytes      |             0
 total_size_incl_toast_and_indexes |  4538368 | 4432 kB      |           101
 live_rows_in_text_representation  |  5046614 | 4928 kB      |           112
 ------------------------------    |          |              |              
 row_count                         |    44668 |              |              
 live_tuples                       |    44668 |              |              
 dead_tuples                       |        0 |              |              
(11 rows)

No changes in terms of bytes_per_row after the alteration. We will put this one on halt and re-visit in the future.

Make a datetime utility function

Instead of directly using datetime.utcnow(). In that way, if we ever decide to use a different data type (e.g., float or int), our job becomes a bit easier.

Yahoo Finance daily data

Use plural forms for table names

Implement import_stock_values command

Load AssetValue from a CSV file.

Type checking with mypy

Travis CI support 'stages'. We could have a stage to run mypy for type checking.

Unique key for asset_value

Currently defined as:

-- auto-generated definition
create unique index asset_value_asset_id_evaluated_at_granularity_key
    on asset_value (asset_id, evaluated_at, granularity);

We're going to need source as the fourth element in the unique key as:

(asset_id, evaluated_at, granularity, source)

Use prefix for environment variables

We could use a prefix for environment variables in use. Perhaps SBF_ or something like that.

Batch insertion (with psycopyg2) is way too slow

We need to insert a large quantity of AssetValue records, but it's too slow. Takes more than 60sec to insert tens of rows.

This may be helpful...

Or we could use a thread pool to make things faster. We also could send all the insertion queries in a single batch (within a single transaction) but that will greatly increase the likelihood of failures, as we will try to insert data over overlapping periods most of the time.

미래에셋대우: 해외거래내역

화면번호 9465

거래일자

yyyy/mm/dd

번호

A daily sequential number

원번

???

거래구분

주식매수
은행이체입금
외화인지세
예이용료
주식매도
해외주배당금
해외주매수
환전매수
배당금입금
해외주매도

통화구분

USD, KRW, ...

외화입출금액

원화입출금액

종목코드, 종목명

Looks like some internal code

KR7005380001	현대차
KR7232080002	TIGER 코스닥150
US8552441094	스타벅스

수량

Number of shares

가격

Unit price in the specified currency

원화금잔금액

외화금잔금액

외화수수료

제세금(원화)

변제금액

미수발생금액

기준환율

지점명

적요코드

서비스매체구분

의뢰인명

전자공시(DART) 데이터 가져오기

작업 개요

전자공시 데이를 가져와서 구조화된 형태로 저장하고 관리함으로써 나중에 필요한 형태로 가공하거나, 필터링 해서 알림을 줄 수 있는 시스템을 만든다.

Example Request

curl 'http://m.dart.fss.or.kr/md3002/search.st?currentPage=2&maxResultCnt=15&corporationType=&textCrpNm=%EC%82%BC%EC%84%B1%EC%A0%84%EC%9E%90&textCrpCik=00126380&startDate=20160907&endDate=20170307&publicType=&publicOrgType=&reportName=&textPresenterNm=&finalReport=&lastRcpNo=20170303800956&totalPage=&textTerm=&_=1488893683281' -H 'Cookie: JSESSIONID=xkpZNbiUquaV4eRUBcjdaEa2OYEYZJhDKRMWmOrKoM6ixFJogocZBG5UhVgpY2lz.dart2_servlet_engine1' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en-US,en;q=0.8,ko;q=0.6' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Referer: http://m.dart.fss.or.kr/html_mdart/MD1005.html?startDate=20160907&endDate=20170307&textCrpCik=00126380&screenid=md1001&stype=md1001&currentPage=1&corporationType=&textCrpNm=%EC%82%BC%EC%84%B1%EC%A0%84%EC%9E%90&textCrpNmAuto=%EC%82%BC%EC%84%B1%EC%A0%84%EC%9E%90' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' --compressed

Example Response

{
    "rlist": [{
        "curr_page": 3,
        "row_num": 31,
        "bsn_yr": null,
        "crc_gubun": "N",
        "crc_nm": null,
        "crp_rgs_cls": "P",
        "dsm_crp_cik": "00126380",
        "dcm_id": null,
        "dcm_no": null,
        "dcm_nm": null,
        "dcm_int_tp": "N",
        "flr_cik": "00725534",
        "ifm_nm": "삼성전자",
        "ifm_nm2": "권오현",
        "rpt_id": "00634",
        "rpt_nm": "임원ㆍ주요주주특정증권등소유상황보고서",
        "rcp_no": "20161222000173",
        "rcp_dm": "2016.12.22",
        "sub_com_tp": "N",
        "rpt_title": null,
        "dcm_tp": "0",
        "rptNm": "임원ㆍ주요주주특정증권등소유상황보고서 "
    }, {

NOTE: 여기서 20161222000173와 같은 값은 일종의 식별자로 보고서로 바로 넘어갈 수 있는 키가 된다.

http://m.dart.fss.or.kr/html_mdart/MD1007.html?rcpNo=20161222000173

문서의 세부 항목은 다음과 같이 접근 가능하다.

curl 'http://m.dart.fss.or.kr/report/main.do' -H 'Cookie: JSESSIONID=xkpZNbiUquaV4eRUBcjdaEa2OYEYZJhDKRMWmOrKoM6ixFJogocZBG5UhVgpY2lz.dart2_servlet_engine1' -H 'Origin: http://m.dart.fss.or.kr' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.8,ko;q=0.6' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded' -H 'Accept: text/html, */*; q=0.01' -H 'Referer: http://m.dart.fss.or.kr/html_mdart/MD1007.html?rcpNo=20161222000173' -H 'X-Requested-With: XMLHttpRequest' -H 'Connection: keep-alive' --data 'rcpNo=20161222000173&dcmNo=5390164&eleId=2' --compressed

여기서 eleId가 세부 항목의 아이디로 보여진다.

TODO

특정 종목에 대한 전자공시 보고서 리스트 받아오기
특정 종목에 대한 전자공시 리스트 모든 페이지 받아오기
특정 종목에 대한 전자공시 내용 받아오기
잘못된 종목 코드가 주어졌을 때의 처리
(미리 등록된) 모든 종목에 대한 전자공시 리스트 및 내용 받아오기 (이것도 따로 이슈 만들어서 진행하는 편이 좋을 것 같다)
전자공시 데이터를 웹 인터페이스로 보여주기 (이건 따로 이슈를 만들어서 진행하는게 좋을 듯)

Date format in CSV

Current:

1514903400, 195.77999877929688, 199.5, 194.5, 199.35000610351562, 8890400, yahoo

What we want:

2018-01-01 10:00:00, 195.77999877929688, 199.5, 194.5, 199.35000610351562, 8890400, yahoo

See ISO 8601.

Remove Asset.name field

Record data source

For each row of AssetValue, we need to record the source.

미래에셋대우: 거래내역조회

화면번호 0650 (거래내역조회), '한줄로 보기'

거래일자

Date format: yyyy/mm/dd
Example: 2016/12/29

번호

An one-based daily sequential number

원번

Looks like some kind of secondary index

거래구분

주식매수
은행이체입금
외화인지세
주식매도
해외주매수
환전매수
해외주매도
배당금입금
해외주배당금

lines = open('transaction_types.txt').readlines()
types = set([x.strip() for x in lines])

for t in types:
    print('- {}'.format(t))

입출금액

Total transaction amount (including fees and taxes)

종목명

Some examples:

USD(단위-cent)
GS리테일
KODEX 200

Other fields to be analyzed

수량 가격 예수금금잔 수수료 변제금액 미수발생금액 유가잔고 지점명 매체구분 의뢰인명/메모 제세금 단가 상대기관 상대계좌명 입금메모 출금메모

전자공시(DART) CRP 코드 받아오기

종목 이름, CPR 코드 매핑이 필요하다.

e.g., 삼성전자, 00126380

Make a dashboard to monitor some indicators

Securely store DB_URL with Terraform and AWS

Option 1: Hard-code in a Terraform file

Oh, hell no. It ain't gonna happen.

Option 2: DynamoDB

We could dedicate one record in DynamoDB to store DB_URL, but is it worth to introduce additional complexity within our architecture just to accomplish this?

Option 3: S3

Pretty much the same idea as Option 2, but in S3.

Option 4: Embed it in build-time

Embed DB_URL when assembling a Lambda .zip package.

Option 5: Input variables

terraform apply -var db_url=$DB_URL

TODO

Detect if a ticker has been delisted (6047bf7)
Discover new tickers
Calculate MDM

suminb / finance Goto Github PK

finance's People

Contributors

Stargazers

Watchers

Forkers

finance's Issues

TODO

Outline

Counting the number of bytes per row

Result

Take 1

Plans

Result

Take 2

Plans

Result

거래일자

번호

원번

거래구분

통화구분

외화입출금액

원화입출금액

종목코드, 종목명

수량

가격

원화금잔금액

외화금잔금액

외화수수료

제세금(원화)

변제금액

미수발생금액

기준환율

지점명

적요코드

서비스매체구분

의뢰인명

작업 개요

Example Request

Example Response

TODO

거래일자

번호

원번

거래구분

입출금액

종목명

Other fields to be analyzed

Option 1: Hard-code in a Terraform file

Option 2: DynamoDB

Option 3: S3

Option 4: Embed it in build-time

Option 5: Input variables

Recommend Projects

Recommend Topics

Recommend Org