cedricfr / dataenforce Goto Github PK
View Code? Open in Web Editor NEWPython package to enforce column names & data types of pandas DataFrames
License: Apache License 2.0
Python package to enforce column names & data types of pandas DataFrames
License: Apache License 2.0
Hi!
We pin packages to versions, could you make a tagged release of this repo?
In the meantime, I am going to fork it and tag it from my fork.
Thanks for the awesome package.
Is it possible to add shape information?
e.g.
Dataset[(10, 3), ("a": int, "b": int, "c": int)]
Wil type a dataframe with 10 rows and 3 columns
Would be nice to have a feature to define a Dataset using a dataclass as a source.
So, instead of
DUser = Dataset["id": int, "name": str]
def process1(data: DUser):
pass
we can use a dataclass as a source for field names and types, like
@dataclass
class User:
id: int
name: str
def process1(data: Dataset[User]):
pass
This can help to automatically update a list of fields based on the data class, and also might be useful in refactoring.
Can be used like
users = pd.DataFrame(
[
User(id=1, name="Sam"),
User(id=2, name="Rhett")
]
)
process1(users)
I join in... cool project. Are you planning to extend the functionality to pandas.Series (type of index, type of data)?
Was wondering if this project is still alive. I was considering including it in my projects.
Consider the following code:
import numpy as np
from dataenforce import Dataset, validate
@validate
def myfunc(data: Dataset["a": int, "b": np.float], data2: Dataset['x'], other_param: int):
return data
when calling myfunc
:
t.myfunc(pd.DataFrame([{'a':1, 'b': 1.2}]), pd.DataFrame([{'x': 10}]), 39)
I get:
ValueError: myfunc() requires a code object with 0 free vars, not 3
Expected behaviour: Ignore non-Dataset arguments, both positional and keyword, or optionally delegate the type validation for non-Dataset arguments to the another type-enforcing library.
thank you!
from typing import cast
df = cast(Dataset["timestamp", "volume"], df)
Gives the error:
Expected class type but received "Unknown | DatasetMeta"
"DatasetMeta" is not a classPylancereportGeneralTypeIssues
Is this an issue with Pylance? I would expect to be able to cast a dataframe to the Dataset type
Support for dtype checking alone would be useful.
e.g.
Dataset[int, str, str]
--which would be rather useful.
Cool project. Thanks.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-f00935f5c253> in <module>
/opt/Anaconda3/envs/basic_ml/lib/python3.8/site-packages/dataenforce/__init__.py in wrapper(*args, **kwargs)
45 dtypes = dict(value.dtypes)
46 for colname, dt in hint.dtypes.items():
---> 47 if not np.issubdtype(dtypes[colname], np.dtype(dt)):
48 raise TypeError("%s is not a subtype of %s for column %s" % (dtypes[colname], dt, colname))
49 return f(*args, **kwargs)
/opt/Anaconda3/envs/basic_ml/lib/python3.8/site-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2)
417 """
418 if not issubclass_(arg1, generic):
--> 419 arg1 = dtype(arg1).type
420 if not issubclass_(arg2, generic):
421 arg2 = dtype(arg2).type
TypeError: Cannot interpret 'datetime64[ns, UTC]' as a data type
I saw that your package has a validate
decorator to ensure the data frame during run time,
Is there a way for it to integrate with mypy
for static code analysis?
The library is not compatible with pylance 3.9
when using PyRe check, i get the following
Undefined or invalid type [11]: Annotation
DummyDataframe is not defined as a type.
code example
from dataenforce import Dataset
DummyDataframe = Dataset["id": int]
def test_annotation(_df: DummyDataframe) -> DummyDataframe:
pass
is there something that can be done to avoid/fix this issue?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.