Git Product home page Git Product logo

dbase.net's People

Contributors

henck avatar mashbrno avatar nicholasnoise avatar skyyearxp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbase.net's Issues

dbt file structure

Issue is based on information provided in «Memo fields» («Memo file header» structure and «Memo block structure») in http://www.independent-software.com/dbase-dbf-dbt-file-format.html and not by evaluating the code.

I have a memo file that uses the same dBASE IV structure, as reported in https://www.clicketyclick.dk/databases/xbase/format/dbt.html#DBT_STRUCT

That is:

  • file uses LITTLE ENDIAN.
  • bytes from 8 to 15 hold DBF file name without extention. Name ends with zero that may have subsequent irrelevant bytes (in my case is b'ASTEN\x00\x7f\x1e', where ASTEN is the dbf name)
  • bytes 20-21 hold the length of block (as a little endian 16 bits integer).

MemoEncoder.findMemo is Very Slow

Hi,

I was able to improve the time to read a big DBF file (about 200MB) with a FPT (which is about 2MB), and the result was not even comparable - Dbf.Read was taking 10 minutes, and is now taking 4 seconds. Apparently - the LINQ methods are super inefficient.

The implementation I suggest is:

private static string findMemo(int index, byte[] memoData)
{
	// The index is measured from the start of the file, even though the memo file header blocks takes
	// up the first few index positions.

	UInt16 blockSize = BitConverter.ToUInt16(new[] { memoData[7], memoData[6] }, 0);
	int length = (int)BitConverter.ToUInt32(new[]
		{
			memoData[index * blockSize + 4 + 3],
			memoData[index * blockSize + 4 + 2],
			memoData[index * blockSize + 4 + 1],
			memoData[index * blockSize + 4 + 0],
		}, 0);

	byte[] memoBytes = new byte[length];
	int lengthToSkip = index * blockSize + 8;

	for (int i = lengthToSkip; i < lengthToSkip + length; ++i)
	{
		memoBytes[i - lengthToSkip] = memoData[i];
	}
			
	string text = Encoding.ASCII.GetString(memoBytes).Trim();
	return text;
}

Thank you!
Chris

Table reading does not skip deleted rows, missing property of row deletion

This code help for skipping deleted rows. (for foxpro dbf)

        private void ReadRecords(BinaryReader reader, byte[] memoData)
        {
            Records.Clear();

            // Records are terminated by 0x1a char (officially), or EOF (also seen).
            while (reader.PeekChar() != 0x1a && reader.PeekChar() != -1)
            {
                try
                {
                    var isDeleted = reader.PeekChar() == 0x2a; ;
                    var rec = new DbfRecord(reader, header, Fields, memoData, Encoding);
                    if (!isDeleted)
                        Records.Add(rec);
                }
                catch (EndOfStreamException) { }
            }
        }

"Deleted" flag is ignored in the DbfRecord constructor

In the Write method, the 0x20 is correctly added, but when a record is being read, the DbfRecord can't tell if it's a deleted record or not.

It can be improved with 2 lines of code -

  1. Add a "Deleted" bool property to the DbfRecord class
public bool Deleted { get; private set; }
  1. Set its value in the DbfRecord constructor immediately after the "marker" variable declaration
// Read record marker.
byte marker = reader.ReadByte();
Deleted = marker == 0x2A;

FloatEncoder problem

line 21 of FloatEncoder.cs is...
text.Substring(0, field.Length);

it needs to be...
text = text.Substring(0,field.Length);

Line 21 is not assigning the substring to anything.

test.dbf is not a table

static void Main(string[] args)
        {
            var dbf = new dBASE.NET.Dbf();
            var field = new dBASE.NET.DbfField("TEST", dBASE.NET.DbfFieldType.Character, 12);
            dbf.Fields.Add(field);
            var record = dbf.CreateRecord();
            record.Data[0] = "HELLO";
            dbf.Write("test.dbf", dBASE.NET.DbfVersion.VisualFoxPro);            
        }

Open the generated test.dbf got test.dbf is not a table

System.FormatExecption when reading the file

Hi, I have a problem reading the file. I'm loading map from shapefile using DotSpatial, but the .dbf file contains some fields encoded in code page that cannot be read by the library, so I wanted to use your library to read those fields. However I get an exception when trying to read the file:

dbf.Read(); gives me:

System.FormatException: 'Input string was not in a correct format.'

The path is good, I'm using the same path with File.ReadAllText()
I can open the file in OpenOffice correctly using UTF-8.
I've tried to set encoding in Dbf constructor but it's the same.

You can download the files HERE

Dbf Memo not loading

Whatever I do, i cannot get the memo file to open.

Would it not be better to include an extra argument in the Read function that defines the path of the memo file?
(Using DBF 3 with a dbt memo file)

character field with 256 length

Hi,

I am trying to modify a DBF which has an existing charter column with a 256 length.

However when writing to the dbf it corrupts it and wont open in ADA.
image

I have also tried creating a new DBF and Visual studio wont let me assign a length of 256.
image

DbfField field = new DbfField("TEST", DbfFieldType.Character, 256);

Any advice please?

DBF version while read

Why there was no implementation of the reading of the version of the DBF file?

It is very simple just to read first byte and translate it to the version format to understand what we have loaded.

Byte | Version
0x02 | FoxBase 1.0
0x03 | FoxBase 2.x / dBASE III
0x83 | FoxBase 2.x / dBASE III with memo file
0x30 | Visual FoxPro
0x31 | Visual FoxPro with auto increment
0x32 | Visual FoxPro with varchar/varbinary
0x43 | dBASE IV SQL Table, no memo file
0x63 | dBASE IV SQL System, no memo file
0x8b | dBASE IV with memo file
0xcb | dBASE IV SQL Table with memo file
0xfb | FoxPro 2
0xf5 | FoxPro 2 with memo file

Error reading and writing Numeric fields on German systems

On a German system, or any other system using a comma as decimal sign, an Exception is thrown when reading DBF files containing Numeric fields:

System.FormatException: 'Input string was not in a correct format.'

Numeric fields can be written, but aren't read correctly by Visual FoxPro.

Cause

Text is parsed to float with Convert.ToSingle(), which uses the localization settings of the OS to determine if the decimal separator is a period or a comma. DBF-Files always use period.

Solution

"works for me"

Tell the conversion method to ignore OS settings:

File NumericEncoder.cs, Method Encode():
string text = Convert.ToString(data, System.Globalization.CultureInfo.InvariantCulture).PadLeft(field.Length, ' ');

File NumericEncoder.cs, Method Decode():
return Convert.ToSingle(text, System.Globalization.CultureInfo.InvariantCulture);

observations

The following observations are based on CA-Clipper 5.2 Database Utility (xBase 3), Microsoft dBASE Driver (dBase 3, 4 and 5), Borland Database Engine (dBase 7 and xBase 3), Microsoft Visual FoxPro 9 (FoxPro 9) and Devart Universal Data Access Components source.

Fields
FoxPro has its flags on byte 18 (0x01 bit means the field is system, 0x02 bit - nullable, 0x04 bit - binary and 0x08 bit - autoincrement). Byte 19 to 23 (uint32 in little endian) is the next autoincrement value and the 23 byte is the autoincrement step. You can easily add support for DBase 7 (see below).

Records
The nullflags ('0' type) is a system field named _NullFlags and placed normally at the end when there is a nullable (0x02 bit flag), varchar ('V' type) or varbinary ('Q' type) field. Its a collection of bits that can't be indexed directly and can have size of more than 1. The order of the bits is from the least to the most significant bit following a stream byte order (first come, first serve). Each bit may represent null or in case of varchar and varbinary its size. If the field is a nullable varchar or nullable varbinary it will have 2 bits, the first one (less significant bit) will be the size bit and the second one (more significant bit) will be the null. If the null bit is 1 then the field is null. If the size bit is 1 then the varchar or varbinary field is not full and the size of the field is determinated by the last field byte (otherwise all bytes of the field are part of the data).

(c V(10) null, i I null, b Q(10) null) [_NullFlags 0(1)]
(null, null, null) 00011111
(null, 0, null) 00011011
('0', 1, null) 00011001
(null, 2, '0') 00001011
('0', null, '0') 00001101
(null, 4, '0123456789') 00000011
('0123456789', 5, null) 00011000
('0123456789', 6, '0123456789') 00000000
switch (column.Type) {
	case 'V':
	case 'Q':
		if ((null_flags[bit_index / 8] & (1 << (bit_index % 8))) != 0
		&& field[field.Length - 1] < column.Size) {
			byte t = new byte[field[field.Length - 1]];
			Buffer.BlockCopy(field, 0, t, 0, t.Length);
			field = t;
		}
		bit_index++;
		if (bit_index / 8 >= null_flags_size) {
			return;
		}
		break;
}
if ((column.Flags & 2) != 0) {
	if ((null_flags[bit_index / 8] & (1 << (bit_index % 8))) != 0) {
		field = null;
	}
	bit_index++;
	if (bit_index / 8 >= null_flags_size) {
		return;
	}
}

Memo
dBase 3 memos have always its block size set to 512. Each data block is terminated by 0x1a byte (usually 2). It's possible for a long data to be word wrapped (Clipper's DBU for example may add 0x8D and 0x0A bytes when wrapping lines for MS-DOS U.S.).
dBase 4, 5 and 7 memos have variable block size, determinated by 4 to 8 byte (u32int in little endian) in the header block where 0 will mean 512 for DBase 4 and 5 and 1024 for DBase 7. Each data block will have own 8 byte header where the size will be in 4 to 8 byte (uint32 in little endian). The size includes the first 8 bytes.
FoxPro memos have variable block size, determinated by 6 to 8 byte (u16int in big endian) in the header block where 0 will mean 64. Each data block will have own 8 byte header where the size will be in 4 to 8 byte (uint32 in big endian). The size does not include the first 8 bytes.

Types
The currency field ('Y' type) in FoxPro is a int64 in little endian with implied 4 decimal digits.

long c = 0
c / 10000.0

The integer field ('I') and the autoincrement field ('+') in dBase 7 is a int32 in big endian where the most significant bit is a sign bit (1 means possitive).

uint i = 0
unchecked((int)(i > 0 ? i ^ 0x80000000 : 0))

The double field ('O') in dBase 7 is a double in big endian where the most significant bit is a sign bit (1 means possitive).

ulong d = 0
BitConverter.ToDouble(BitConverter.GetBytes(d > 0 ? (d & 0x8000000000000000) == 0 ? ~d : d ^ 0x8000000000000000 : 0), 0)

The timestamp field ('@') in dBase 7 is milliseconds since 0001-01-01 00:00:00 - 1 day as double in big endian.

double m = 86400000
new DateTime((long)((m - 86400000) * 10000))

Question on dbt file handling

Working with memo files When memo file accompanying the .dbf file is found (either .dbt or .fpt), with the same base name as the table file, then dBASE.NET will load the memo file's contents.

Can you expand on this please? Does it attempt to read a .dbt file in the same directory with the same base file name automatically when you read a base file? I get an error "unsupported dbase version" when attempting to read the .dbt file directly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.