Comments (6)
Yea, we had to remove this because it was a bad addition to the library (it didn't make sense after I thought about it deeper). Can you give me a better idea of what you're trying to accomplish, so I can see if it's possible with chispa or if the library should be modified? Thank you.
from chispa.
If I am comparing two dataframes and don't care about the types of the columns. In that case I want the assert dataframes to pass even if the types are different. Is there another way of accomplishing this behavior?
from chispa.
@Hiderdk - yea this should work: chispa.assert_basic_rows_equality(df1.collect(), df2.collect())
. Let me know if that works for you.
from chispa.
@MrPowers first of all, thanks for the wonderful library. Why did you decide to change the API of this assertion in the minor version bump of the package?
This caused our tests to break, the convention is to rely on the fact that the minor version bumps don't change the API and thus package managers (like poetry) update the version of the dips to the latest minor version.
from chispa.
We used this option a lot because we don't really care of whether the column is IntegerType
or LongType
but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality
. Without it, we will need to conduct some boilerplate type casting in the test code to make tests work again. It's a petty that you decided to remove it.
from chispa.
@ivanychev - yea, I have the work-around that will meet your use case above.
Why did you decide to change the API of this assertion in the minor version bump of the package?
We're using Semantic Versioning 2.0. Per the spec: "Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable."
It's a petty that you decided to remove it.
No, this wasn't petty. This option was causing bugs and breaking workflows. We needed to remove it. I do my best to make all changes backwards compatible. This one absolutely needed to be removed cause it was causing lots of issues.
We used this option a lot because we don't really care of whether the column is IntegerType or LongType but we do want to compare Spark DataFrames and use other comparator options of assert_df_equality
Feel free to propose another abstraction that's not breaking, not buggy, and will be a good addition for the entire chispa community 🚀
from chispa.
Related Issues (20)
- Reconfigure CI to run tests on pull requests
- Run tests for multiple PySpark versions in CI HOT 5
- Refactor code to conform to PEP8 HOT 5
- Font colors in error messages are bad in some terminals HOT 1
- assert_df_equality throws SchemasNotEqualError when the dataframes are identical (except for the metadata) HOT 3
- Replace poetry dev-dependencies with dependency groups HOT 2
- Add unit tests to highlight limitations of this library HOT 1
- Make proper StructField comparer and DataType comparer abstractions
- Give user control to customize output formatting HOT 2
- SchemaNotEqualError not showing the difference in metadata
- Issues on Python 3.10 due to six 1.15.0 HOT 4
- Unit tests are only run against a single version of Python on the `main` branch.
- underline_cells failing if dataframes are different lengths HOT 1
- assert None while ignore_metadata=True
- Unit testing the code with Spark Connect
- SchemaNotEqual error is unreadable for wide schemas HOT 1
- chispa 1.0 release
- Investigate "SPARK_TESTING" environment variable
- Adding Type Hinting to Chispa public API
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chispa.