Git Product home page Git Product logo

python-data's Introduction

Data processing coded with Python

To code something with Python for analysing data, and code machine learning algorithms.

To Run

To initialize environment, run the commands below:

python -m venv .env

source .env/bin/activate

To install dependencies, run "pip install -r requirements.txt" script at the root of the project, to speed up in China, use mirror like to append the script "-i https://pypi.tuna.tsinghua.edu.cn/simple/";

shiny application

Run "shiny run" script in terminal to run shiny app (coded in the app.py).

time series analysis and forcasting of exchange rate

Request API key accessing the URL https://fredaccount.stlouisfed.org/apikey, need registration first if not yet

Can search to find series id, such as Chinese Yuan Renminbi to U.S. Dollar Spot Exchange Rate: https://fred.stlouisfed.org/series/DEXCHUS, which id is DEXCHUS and frequency is Daily, the EXCHUS https://fred.stlouisfed.org/series/EXCHUS frequency is monthly

Edit the xchange_rate_analysis.py to replace API key and series id with yours.

No complete yet - TODO

more code relevant to Machine Learning/Data Science

TODO

more code relevant to Modern Applied Statistics

TODO

more code relevant to Data Lake or Lake House

TODO

docker compose to start containers for MinIO/Dremio/Nessie by referring to the code repo: https://github.com/miniohq/datalake_ref_arch (note that the code repo is to create data source in Dremio with Nessie for catalog)

Run execute: brew install apache-spark; once done, to run spark-shell for quick running script in terminal but which is optional

24/08/14 21:30:46 WARN FileSystem: Failed to initialize fileystem hdfs://master:8020/user/hive/warehouse: java.lang.IllegalArgumentException: java.net.UnknownHostException: master
24/08/14 21:30:46 WARN SharedState: Cannot qualify the warehouse path, leaving it unqualified.
java.lang.IllegalArgumentException: java.net.UnknownHostException: master

Run "python3 pyspark-iceberg-test.py", at first it will download dependencies (maven) from maven repo, note that it's to download maven jars during executing the python script, which may a bit slow like in China

After execution, you can access to view what table created and data inserted via Spark SQL implemented in the python script mentioned above:

Tips

The pyrightconfig file is used to enable Python virtual environment (so that code navigation works).

References

python-data's People

Contributors

jeffcai avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.