Comments (7)
remove
IDataReader
implementation, doesn't really make sense and some methods have nonsense implementations (e.g.return 0;
no matter the method arguments)
To this specific point, IDataReader
is a very... comprehensive... interface that supports a lot more than the version of the dBASE format that we implement.
I do endorse exposing an IDataReader
implementation, as it (in theory) allows someone to connect their shapefile data source to any of a number of external APIs that use ADO.NET to model their data access.
That said, I do not endorse having that IDataReader
implementation be the primary method for reading data from shapefiles as appears to be the case right now.
from nettopologysuite.io.shapefile.
IMHO you are right that the whole shapefile related API is opaque and redundant. A cleanup/new lean API is appreciated, what do you think @airbreather.
from nettopologysuite.io.shapefile.
I totally agree that the API today is horrendous. Not only is it hard to use, but it's also slow and can't really be made to be fast.
In one internal project I've been working on, the reader code was usable at a prototyping stage, but it very quickly became a huge bottleneck to the point that I wound up reimplementing just the parts that we needed.
Even just one feature, DBF encodings, seems to barely work when you actually try to use it.
IMO, the shapefile project is in desperate need of a rewrite, starting from just the minimum needed to implement fast and robust support for reading / writing SHP + SHX and (separately) DBF, and then add layers on top of it to support things like IEnumerable<T>
and IDataReader
.
I can come up with some outlines for how I think this should look, balancing performance and usability.
from nettopologysuite.io.shapefile.
I agree with @airbreather except for keeping IDataReader
but I can live with that if it's not the primary method for reading data as suggested. Also related is this issue #2
from nettopologysuite.io.shapefile.
I can come up with some outlines for how I think this should look, balancing performance and usability.
I've made a basic start over here: https://github.com/NetTopologySuite/NetTopologySuite.IO.ShapeFile/wiki/Ideas-for-new-API-(Reading)
So far, I've just jotted what's been on my mind for reading, since I've been thinking about it a lot recently.
Note that, whereas writing is much more straightforward than reading for most formats, with shapefiles, it's kind-of the opposite. The format makes quite a few tradeoffs to help make it easier to develop an efficient reader, at the expense of making it impossible to develop an efficient forward-only streaming writer unless you have significant a priori knowledge of the entire dataset's contents.
- One immediately obvious example: you need to know the exact size of the file and how many records it contains before you've even written out the first 100 bytes of the
.shp
file.- In practice, you could fix that by just writing placeholder slots and filling them in with the real values once you've gotten to the end, but that's just the beginning.
- In the
.dbf
files, you must specify the length of each field and (for non-integer numbers) the maximum number of digits to the right of the decimal point.- So, all of your IDs happen to be less than 1000? Great, the ideal width is 3... but you need to tell us that, otherwise when we see an
int
field in a (mostly) forward-only writing mode, we're going to have to either read through the entire input at least twice, or (more likely) we're going to triple the width that we give to that field in order to support the theoretical maximum.
- So, all of your IDs happen to be less than 1000? Great, the ideal width is 3... but you need to tell us that, otherwise when we see an
Long story short... writing is more complicated. Even finding the right trade-offs is probably going to be a nontrivial process:
- What options do we give you for splitting up a data set that's too big for a single shapefile?
- Maybe we always fail if it's too big, but we expose a way for you to test your dataset before this happens?
- Maybe you can inject a
Func<(Stream, Stream, Stream)>
to create new.shp
/.shx
/.dbf
files if we run out of room in the first one before we run out of records?
- What options do we give you for column details?
- Maybe we force you to provide them, but we also expose a way for you to figure this out for your specific dataset in advance?
- Maybe we assume the worst (255-character
string
s, 9-characterint
s) unless you tell us otherwise?
- How to handle data members whose types are completely unsupported in
.dbf
?- Ignore them?
- Have you create new
IFeature
instances without them? - Maybe you can provide a way to remap some of them to type that are supported without recreating each
IFeature
?
- Perhaps those "what options" questions suggest that we should have two separate writer APIs entirely?
- One is, "I have my entire dataset in memory as an
IReadOnlyList<T>
", so we can loop over it as many times as we want to figure out the perfectly optimal way to write it out (column widths, dataset split points, etc.). - The other is, "I'm building my dataset on-the-fly as an
IEnumerable<T>
", so we need the caller to tell us some minimum amount of metadata, and we may all have to live with suboptimal parts in order to only ever loop through the input once.
- One is, "I have my entire dataset in memory as an
from nettopologysuite.io.shapefile.
I've sorta got a start for the writing side of things here: https://github.com/NetTopologySuite/NetTopologySuite.IO.ShapeFile/wiki/Ideas-for-new-API-(Writing)
It is a tough challenge to balance efficiency, ease of use, and correctness for this, so especially on the writer side, I'd appreciate feedback on some of the ideas over there...
from nettopologysuite.io.shapefile.
I propose to have only one Header - ShapefileHeader which should deliver from DbaseFileHeader. This new Header should have first field assigned to SHAPE column as it is already done internally.
from nettopologysuite.io.shapefile.
Related Issues (20)
- System.NotSupportedException : Memory stream is not expandable. HOT 3
- Inconsistency determining and writing Int32 in the DBaseFileHeader HOT 1
- System.OverflowException when reading empty header HOT 1
- Creating a Dbase column with a length larger 254 (1 byte) results in strange behavior HOT 7
- String values are trimmed when read using DbaseFileReader HOT 1
- Null Values in Dbase file HOT 1
- Maybe a little bug about class DbaseFileHeader's encoding ? HOT 11
- Geometry with only Z aren't supported HOT 4
- Cannot read a polygon with a hole from a ShapeFile HOT 3
- Write shape data to MemoryStreams -> MemoryStream not expandable HOT 2
- How to update existing shape file HOT 2
- DbaseFileHeader.DetectEncoding method should be modified HOT 3
- Max "M" value is always set to double.PositiveInfinity HOT 5
- Can't encoding UTF-8 error HOT 2
- GDB missing HOT 8
- Publish release with ExternallyManagedStreamProvider HOT 1
- NullReferenceException in ShapeDataReader HOT 5
- What is the license of NetTopologySuite.IO.ShapeFile? HOT 25
- Discontinued HOT 2
- DbaseFileWriter DBF write Geometry issue with the Geometry type "Polygon" and "Multipolygon" HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nettopologysuite.io.shapefile.