Comments (2)
Given I can see that the default behavior may not be ideal, perhaps we can add a configuration setting that controls how non-existent paths that don't end with / are handled
We previously had single_file_output
which was a statement level config that determined if the path should be treated as a file or directory. We intentionally removed this in #9041 in favor of inference based on the path ending in '/'.
We could add a session level config to control how a path is interpreted as @alamb suggests. Perhaps we could also improve the inference logic by additionally checking for the presence of a valid file extension before concluding a path is a file. E.g.:
tmp/dataset/
-> is a folder since it ends in/
tmp/dataset
-> is still a folder since it does not end in/
but has no valid file extensiontmp/file.parquet
-> is a file since it does not end in/
and has a valid file extension.parquet
tmp/file.parquet/
-> is a folder since it ends in/
from arrow-datafusion.
Thank you for the report @progval
I believe this was an intentional change (cc @devinjdangelo) in order to distinguish writing files and parititioned datasets
Given I can see that the default behavior may not be ideal, perhaps we can add a configuration setting that controls how non-existent paths that don't end with /
are handled
from arrow-datafusion.
Related Issues (20)
- Index out of bounds in `file_stream.rs`. HOT 3
- EnforceDistribution fails, seems to turn all the types of the schema to UInt64 HOT 6
- DataFusion repo got 40MB larger
- Support "User defined coercion" rules HOT 2
- `stride` is not optional for new `array_slice` UDF HOT 17
- `array_slice` panics with `stride=1` HOT 2
- Make `CommonSubexprEliminate` faster by avoiding the use of strings HOT 12
- Add push down sort to the source (table provider) HOT 4
- "Unknown frame descriptor" for ZSTD data. HOT 6
- Type coercion when creating table HOT 1
- Unify schema usage in Datafusion HOT 1
- make some datasource listing helper functions public? HOT 1
- Document `CREATE EXTERNAL TABLE ... OPTIONS`
- [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs HOT 10
- Move optimizer rule that has aggregate function out of core HOT 3
- Apply guarantee rewriter to sql workflow HOT 8
- Add to_date function to scalar functions doc
- Add to_unixtime function to scalar functions doc
- bug: `CAST(<array>)` causes internal error HOT 3
- Implement `LogicalPlanBuilder::from` for `Arc<LogicalPlan>` HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-datafusion.