A Python library to read files in Azure Blob Storage as Pandas DataFrames
The latest version is available on PyPI:
pip install bluepandas
The library reads an environment variable containing the connection string to your
blob storage account. The variable must be named AZ_<STORAGE-ACCOUNT-NAME>
and set to your connection string, which is obtained from the Azure Portal by
navigating to you storage account under settings/access keys as shown below
import bluepandas df = bluepandas.read_csv("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/path/to/your.csv")
This returns a BluePandas data frame, which subclasses the pandas data frame, but allows you to write to Blob Storage using the write_csv()
method.
import bluepandas # Import the iris dataset from sklearn import datasets values = datasets.load_iris()['data'] columns = datasets.load_iris()['feature_names'] df = bluepandas.DataFrame(values, columns=columns) df.to_csv("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/path/to/iris.csv", index=False)
import pandas as pd import bluepandas df = pd.DataFrame([(1,2,3),(4,5,6)], columns=['A','B','C']) # BluePandas data frame bpd_df = bluepandas.DataFrame(df.values, columns=df.columns)