ai4finance-foundation / finnlp Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 196.0 4.86 MB

Democratizing Internet-scale financial data.

Home Page: https://ai4finance.org

License: MIT License

Python 37.40% Jupyter Notebook 62.60%

finnlp's People

Contributors

Stargazers

Watchers

Forkers

athe-kunal cemberk automata-studio learncodesmart tarekaloui sunshinewlz projecttopstep daxiajames hotbaby guolanbon wangyeye66 zackchang007 cleancoindev qiaoyu-tan parrondo rodrigo-fonseca-oliveira zzl133 webclinic017 eatinghungry linqing2022 blue0rigin nguyenvanthanhhust doytsujin codeaudit shawnlimn ploutarch tauruscanis shizelong1985 thanhpham1987 mahesh-iitbhu coreylau roman-212 hungmd7 lizalexandrita bigandsweet yuanmeng1120 dorucioclea rogercummins techthiyanes hugging dousp richgene zcymagic tianhaofu mooreliving777 wsawf achrefboukhili xuwenlong02 danielzengqx nickydark1 kumar045 hbcbh1999 mazon1 barrosm remkamal ajlee1946 yigadawa iwillcodeu n-h00 xiaozhao1795 jiancheng-ai eltociear nick-harvey defo1988 josephtaly finnbags chengchengadsfjskaf jyh7595 suimir matthiasreccius ailabteam abuasifkhan renovattio22 xiuyu0000 angeloluidens georgerobescu davizucon jnakshansh victorji2002 ratnasaikosuru wayyyu coinhubx adamliu1 familyguynow hudao430426 coderwpf khlin216 riskintellab omarnagy91 kevi5 ayyagari-dalalstreet l021021 0xfreeman-ai wagranungyo keyman9848 whitespur thambodouglas aliivihu786 ymir-badam deshenl

finnlp's Issues

twiter can not download AttributeError: 'DataFrame' object has no attribute 'created_at'

Getting started with FinNLP

Hi, I'm a PhD student & a beginner.

When I run this line of code I get an error message that there's no module named 'finnlp'.
from finnlp.data_sources.sec_filings import SECFilingsLoader

So I saw in another post and tried these lines of code but got error messages as well. I'm trying to get access to earnings call transcripts and SEC. Could you help ?

`#you have to clone first

!git clone https://github.com/AI4Finance-Foundation/FinNLP

#then change the directory

!cd FinNLP

#Add Repository to Python Path

import sys
sys.path.append('/content/FinNLP')`

fatal: could not create work tree dir 'FinNLP': Read-only file system zsh:cd:1: no such file or directory: FinNLP

Private accounts of social networks for accurate sentiment analysis

It would be possible to develop, add, connect some API or use those that are available to also be able to use private / proprietary accounts of the main networks such as Twitter / X, stocktwits, reddit... to improve sentiment analysis since the most retail traders tend to go against the direction of the market and it is better to train the models with better data so that they are more reliable.

connection error in dataframe values

run:

# Finnhub (Yahoo Finance, Reuters, SeekingAlpha, CNBC...)
from finnlp.data_sources.news.finnhub_date_range import Finnhub_Date_Range

start_date = "2023-01-01"
end_date = "2023-01-03"
config = {
    "use_proxy": "us_free",    # use proxies to prvent ip blocking
    "max_retry": 5,
    "proxy_pages": 5,
    "token": "clon9npr01qtp8tab4ngclon9npr01qtp8tab4o0"  # Available at https://finnhub.io/dashboard
}

news_downloader = Finnhub_Date_Range(config)                      # init
news_downloader.download_date_range_stock(start_date,end_date)    # Download headers
news_downloader.gather_content()                                  # Download contents
df = news_downloader.dataframe
selected_columns = ["headline", "content"]
df[selected_columns].head(10)

# 	headline						content
# 0	My 26-Stock $349k Portfolio Gets A Nice Petrob...	Home\nInvesting Strategy\nPortfolio Strategy\n...
# 1	Apple’s Market Cap Slides Below $2 Trillion fo...	Error
# 2	US STOCKS-Wall St starts the year with a dip; ...	(For a Reuters live blog on U.S., UK and Europ...
# 3	Buy 4 January Dogs Of The Dow, Watch 4 More	Home\nDividends\nDividend Quick Picks\nBuy 4 J...
# 4	Apple's stock market value falls below $2 tril...	Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto...
# 5	CORRECTED-UPDATE 1-Apple's stock market value ...	Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto...
# 6	Apple Stock Falls Amid Report Of Product Order...	Apple stock got off to a slow start in 2023 as...
# 7	US STOCKS-Wall St starts the year with a dip; ...	Summary\nCompanies\nTesla shares plunge on Q4 ...
# 8	More than $1 trillion wiped off value of Apple...	apple store\nMore than $1 trillion has been wi...

====================================================================
Error:
connection error in dataframe values as shown below

How to install FinNLP?

Whether to consider supporting Cryptocurrency market information

Whether to consider supporting Cryptocurrency market information？BTC\ETH...

finnlp/data_sources/news/eastmoney_streaming.py xpath bug.

The xpath of the page has changed, and the new xpath correction is as follows.

 def _gather_pages(self, stock, page):
     ....
     # gather the comtent of the first page
        page = etree.HTML(response.text)
        trs = page.xpath('//*[@id="mainlist"]/div/ul/li[1]/table/tbody/tr')
        have_one = False
        for item in trs:
            have_one = True
            read_amount = item.xpath("./td[1]//text()")[0]
            comments = item.xpath("./td[2]//text()")[0]
            title = item.xpath("./td[3]/div/a//text()")[0]
            content_link = item.xpath("./td[3]/div/a/@href")[0]
            author = item.xpath("./td[4]//text()")[0]
            time = item.xpath("./td[5]//text()")[0]
            tmp = pd.DataFrame([read_amount, comments, title, content_link, author, time]).T
            columns = [ "read amount", "comments", "title", "content link", "author", "create time" ]
            tmp.columns = columns
            self.dataframe = pd.concat([self.dataframe, tmp])
            #print(title)
        if have_one == False:
            return "break"
   ...

('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) - Error

I ran into this error when running the same cell you provided in the readme

('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

This error is specifically from this line:
news_downloader = Finnhub_Date_Range(config) , downloader = Stocktwits_Streaming(config)
and downloader = SEC_Announcement(config). (ALL US DATA)

My config was:
start_date = "2023-01-01"
end_date = "2023-01-03"
config = {
"use_proxy": "us_free", # use proxies to prevent ip blocking
"max_retry": 5,
"proxy_pages": 5,
"token": "YOUR_FINNHUB_TOKEN" # Available at https://finnhub.io/dashboard
}

Any idea?

Stocktwits_Streaming demo has a typo

Hi, I think you had a typo in below demo code:
downloader = Stocktwits_Streaming(config) downloader.download_date_range_stock(stock, pages)
"download_date_range_stock" should be "download_streaming_stock" instead. As "download_date_range_stock" is not implemented in Stocktwits_Streaming.

US_proxy connection error

`from finnlp.data_sources.news.finnhub_date_range import Finnhub_Date_Range

start_date = "2023-01-01"
end_date = "2023-01-02"
config = {
"use_proxy": "us_free", # use proxies to prvent ip blocking
"max_retry": 5,
"proxy_pages": 5,
"token": "ck22t49r01qng12gonugck22t49r01qng12gonv0" # Available at https://finnhub.io/dashboard
}

news_downloader = Finnhub_Date_Range(config) # init
news_downloader.download_date_range_stock(start_date,end_date) # Download headers
news_downloader.gather_content() # Download contents
df = news_downloader.dataframe
selected_columns = ["headline", "content"]
print(df[selected_columns].head(10))
`

Getting the us proxy will incur a connection error.
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

When the function get_us_free_proxy, at the line response = requests.get(url, headers=headers)

from finnlp.data_sources.news.finnhub import Finnhub_News

ModuleNotFoundError Traceback (most recent call last)
Cell In[17], line 9
7 from tqdm.notebook import tqdm
8 # from meta.data_processors.yahoofinance import Yahoofinance
----> 9 from finnlp.data_sources.news.finnhub import Finnhub_News
10 from finnlp.large_language_models.openai.openai_chat_agent import Openai_Chat_Agent

ModuleNotFoundError: No module named 'finnlp.data_sources.news.finnhub'

where is trade_with_gpt3.ipynb file?

ModuleNotFoundError: No module named 'unstructured'

from finnlp.data_sources.sec_filings import SECFilingsLoader

ModuleNotFoundError Traceback (most recent call last)
Cell In[43], line 1
----> 1 from finnlp.data_sources.sec_filings import SECFilingsLoader

File ~/FinNLP/finnlp/data_sources/sec_filings/init.py:1
----> 1 from finnlp.data_sources.sec_filings.main import SECFilingsLoader

File ~/FinNLP/finnlp/data_sources/sec_filings/main.py:1
----> 1 from finnlp.data_sources.sec_filings.sec_filings import SECExtractor
2 import concurrent.futures
3 import json

File ~/FinNLP/finnlp/data_sources/sec_filings/sec_filings.py:3
1 from typing import Any, Dict, List
----> 3 from finnlp.data_sources.sec_filings.prepline_sec_filings.sec_document import (
4 REPORT_TYPES,
5 VALID_FILING_TYPES,
6 SECDocument,
7 )
8 from finnlp.data_sources.sec_filings.prepline_sec_filings.sections import (
9 ALL_SECTIONS,
10 SECTIONS_10K,
(...)
14 validate_section_names,
15 )
16 from finnlp.data_sources.sec_filings.utils import get_filing_urls_to_download

File ~/FinNLP/finnlp/data_sources/sec_filings/prepline_sec_filings/sec_document.py:18
14 import numpy.typing as npt
17 from sklearn.cluster import DBSCAN
---> 18 from unstructured.cleaners.core import clean
19 from unstructured.documents.elements import (
20 Element,
21 ListItem,
(...)
24 Title,
25 )
26 from unstructured.documents.html import HTMLDocument

ModuleNotFoundError: No module named 'unstructured'

Evaluation time-consuming on FIQA

Hi,

I hope this message finds you well. First and foremost, I would like to express my gratitude for the incredible work you have put into this project; it has been instrumental in my work.

I am reaching out to seek some guidance and insights regarding the evaluation time of the model across different test sets. In my current setup, I am observing that the evaluation phase is significantly time-consuming, for example on FIQA roughly taking around two to three hours to complete even in A6000 with batch size 64 for LLAMA. This duration seems to persist across various test sets, which has brought me to seek your expertise.

I am wondering if there might be any specific recommendations or strategies that could potentially help in accelerating the evaluation process.

Here are a few questions I have in mind:

Are there any known bottlenecks in the evaluation process for FIQA that I should be aware of?
Could you please suggest any best practices or settings that could help in reducing the evaluation time?
Is there any parallelization or optimization technique available that is recommended for speeding up the evaluation?
I am more than willing to provide additional information or clarify any aspects if needed. My main goal is to ensure that I am utilizing the tool to its fullest potential and in the most efficient manner.

Thank you very much for taking the time to read my inquiry. I am looking forward to any advice or suggestions you might have.

china_free proxy xpath bug

trs = res.xpath("/html/body/div[1]/div[4]/div[2]/div[2]/div[2]/table/tbody/tr")
this xpath will get an empty list.
change to trs = res.xpath('//table/tbody/tr')

Server error

Tried running the below code as demonstrated in README.md

# Finnhub (Yahoo Finance, Reuters, SeekingAlpha, CNBC...)
from finnlp.data_sources.news.finnhub_date_range import Finnhub_Date_Range

start_date = "2023-01-01"
end_date = "2023-01-03"
config = {
    "use_proxy": "us_free",    # use proxies to prvent ip blocking
    "max_retry": 5,
    "proxy_pages": 5,
    "token": "finnhub_api_token"  # Available at https://finnhub.io/dashboard
}

news_downloader = Finnhub_Date_Range(config)                      # init
news_downloader.download_date_range_stock(start_date,end_date)    # Download headers
news_downloader.gather_content()                                  # Download contents
df = news_downloader.dataframe
selected_columns = ["headline", "content"]
df[selected_columns].head(10)

# 	headline						content
# 0	My 26-Stock $349k Portfolio Gets A Nice Petrob...	Home\nInvesting Strategy\nPortfolio Strategy\n...
# 1	Apple’s Market Cap Slides Below $2 Trillion fo...	Error
# 2	US STOCKS-Wall St starts the year with a dip; ...	(For a Reuters live blog on U.S., UK and Europ...
# 3	Buy 4 January Dogs Of The Dow, Watch 4 More	Home\nDividends\nDividend Quick Picks\nBuy 4 J...
# 4	Apple's stock market value falls below $2 tril...	Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto...
# 5	CORRECTED-UPDATE 1-Apple's stock market value ...	Jan 3 (Reuters) - Apple Inc's \n(AAPL.O)\n sto...
# 6	Apple Stock Falls Amid Report Of Product Order...	Apple stock got off to a slow start in 2023 as...
# 7	US STOCKS-Wall St starts the year with a dip; ...	Summary\nCompanies\nTesla shares plunge on Q4 ...
# 8	More than $1 trillion wiped off value of Apple...	apple store\nMore than $1 trillion has been wi...
# 9	McLean's Iridium inks agreement to put its sat...	The company hasn't named its partner, but it's...

but was given error

visited the url https://openproxy.space/list/http and got this page

I get a module not found error when attempting to run the example in the docs

I get this error:
Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/media/yfprime/763D304F18FA13FA/tbot1/venv/lib/python3.10/site-packages/main.py", line 106, in <module> stock_industry_category_cninfo_df = stock_industry_category_cninfo( File "/media/yfprime/763D304F18FA13FA/tbot1/venv/lib/python3.10/site-packages/main.py", line 59, in stock_industry_category_cninfo js_content = _get_file_content_ths("cninfo.js") File "/media/yfprime/763D304F18FA13FA/tbot1/venv/lib/python3.10/site-packages/main.py", line 30, in _get_file_content_ths setting_file_path = get_ths_js(file) File "/media/yfprime/763D304F18FA13FA/tbot1/venv/lib/python3.10/site-packages/main.py", line 17, in get_ths_js with resources.path(package="py_mini_racer.data", resource=file) as f: File "/usr/lib/python3.10/importlib/resources.py", line 119, in path reader = _common.get_resource_reader(_common.get_package(package)) File "/usr/lib/python3.10/importlib/_common.py", line 66, in get_package resolved = resolve(package) File "/usr/lib/python3.10/importlib/_common.py", line 57, in resolve return cand if isinstance(cand, types.ModuleType) else importlib.import_module(cand) File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked ModuleNotFoundError: No module named 'py_mini_racer.data'

I am using Python 3.10 and a relative import path.

Here is my code:

from .FinNLP.finnlp.data_sources.news.finnhub_date_range import Finnhub_Date_Range

start_date = "2024-04-01"
end_date = "2024-04-03"
config = {
#"use_proxy": "us_free", # use proxies to prvent ip blocking
"max_retry": 5,
"proxy_pages": 5,
"token": "TOKEN_HERE" # Available at https://finnhub.io/dashboard
}

news_downloader = Finnhub_Date_Range(config) # init
news_downloader.download_date_range_stock(start_date,end_date) # Download headers
news_downloader.gather_content() # Download contents
df = news_downloader.dataframe
df.head(10)
selected_columns = ["headline", "content"]
df[selected_columns].head(10)

AttributeError: 'DataFrame' object has no attribute 'datetime'

when i run the code :

# Finnhub (Yahoo Finance, Reuters, SeekingAlpha, CNBC...)
from finnlp.data_sources.news.finnhub_date_range import Finnhub_Date_Range

start_date = "2023-01-01"
end_date = "2023-01-03"
config = {
    "use_proxy": "us_free",    # use proxies to prvent ip blocking
    "max_retry": 5,
    "proxy_pages": 5,
    "token": "YOUR_FINNHUB_TOKEN"  # Available at https://finnhub.io/dashboard
}

news_downloader = Finnhub_Date_Range(config)                      # init
news_downloader.download_date_range_stock(start_date,end_date)    # Download headers
news_downloader.gather_content()                                  # Download contents
df = news_downloader.dataframe
selected_columns = ["headline", "content"]
df[selected_columns].head(10)

the fellowing error is through.
Checking ips: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [01:04<00:00, 1.16it/s]
Get proxy ips: 75.
Usable proxy ips: 75.
stoped--
Downloading Titles: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.45s/it]
stop
Traceback (most recent call last):
File "/home/bbbs/Videos/FINTECH/main.py", line 15, in
news_downloader.download_date_range_stock(start_date,end_date) # Download headers
File "/home/bbbs/Videos/FINTECH/FinNLP/finnlp/data_sources/news/finnhub_date_range.py", line 50, in download_date_range_stock
self.dataframe.datetime = pd.to_datetime(self.dataframe.datetime,unit = "s")
File "/home/bbbs/anaconda3/envs/fin_nlp/lib/python3.8/site-packages/pandas/core/generic.py", line 5989, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'datetime'

Reddit scrapping doesnt work - AttributeError: 'NoneType' object has no attribute 'text'

I simply pasted the example code for Reddit and it errored out..

Downloading by pages...: 0%| | 0/3 [00:00<?, ?it/s]
Downloading by pages...: 33%|███████████████████████████████ | 1/3 [00:02<00:04, 2.22s/it]

AttributeError Traceback (most recent call last)
Cell In[12], line 11
4 config = {
5 "use_proxy": "us_free",
6 "max_retry": 5,
7 "proxy_pages": 2,
8 }
10 downloader = Reddit_Streaming(config)
---> 11 downloader.download_streaming_all(pages)
12 selected_columns = ["created", "title"]
13 downloader.dataframe[selected_columns].head(10)

File ~/FinNLP/finnlp/data_sources/social_media/reddit_streaming.py:40, in Reddit_Streaming.download_streaming_all(self, rounds)
38 if rounds > 1:
39 for _ in range(1,rounds):
---> 40 last_id = self._fatch_other_pages(last_id, pbar)

File ~/FinNLP/finnlp/data_sources/social_media/reddit_streaming.py:82, in Reddit_Streaming._fatch_other_pages(self, last_page, pbar)
49 data = {
50 "id": "02e3b6d0d0d7",
51 "variables": {
(...)
79 }
80 }
81 response = self._request_post(url = url, headers= headers, json = data)
---> 82 data = json.loads(response.text)
83 data = data["data"]["subredditInfoByName"]["elements"]["edges"]
84 for d in data:

AttributeError: 'NoneType' object has no attribute 'text'

ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

hello,
I am trying to use the Finnhub_Date_Range to download news on Sandp 500 (^GSPC).
The news_downloader.download_date_range_stock(start_date, end_date, stock) works well. But when i try news_downloader.gather_content() , it returns the error above. The same script worked before.
can somebody help me with this bug please?

ai4finance-foundation / finnlp Goto Github PK

finnlp's People

Contributors

Stargazers

Watchers

Forkers

finnlp's Issues

Downloading by pages...: 0%| | 0/3 [00:00<?, ?it/s] Downloading by pages...: 33%|███████████████████████████████ | 1/3 [00:02<00:04, 2.22s/it]

Recommend Projects

Recommend Topics

Recommend Org

Downloading by pages...: 0%| | 0/3 [00:00<?, ?it/s]
Downloading by pages...: 33%|███████████████████████████████ | 1/3 [00:02<00:04, 2.22s/it]