exceldatareader / exceldatareader Goto Github PK

View Code? Open in Web Editor NEW

3.8K 187.0 947.0 22.17 MB

Lightweight and fast library written in C# for reading Microsoft Excel files

License: MIT License

C# 100.00%

excel xlsb xlsx parser xls csharp dotnet

exceldatareader's Introduction

ExcelDataReader

Lightweight and fast library written in C# for reading Microsoft Excel files (2.0-2021, 365).

Please feel free to fork and submit pull requests to the develop branch.

If you are reporting an issue it is really useful if you can supply an example Excel file as this makes debugging much easier and without it we may not be able to resolve any problems.

Continuous integration

Branch	Build status
develop
master

Supported file formats and versions

File Type	Container Format	File Format	Excel Version(s)
.xlsx	ZIP, CFB+ZIP	OpenXml	2007 and newer
.xlsb	ZIP, CFB	OpenXml	2007 and newer
.xls	CFB	BIFF8	97, 2000, XP, 2003 98, 2001, v.X, 2004 (Mac)
.xls	CFB	BIFF5	5.0, 95
.xls	-	BIFF4	4.0
.xls	-	BIFF3	3.0
.xls	-	BIFF2	2.0, 2.2
.csv	-	CSV	(All)

Finding the binaries

It is recommended to use NuGet through the VS Package Manager Console Install-Package <package> or using the VS "Manage NuGet Packages..." extension.

As of ExcelDataReader version 3.0, the project was split into multiple packages:

Install the ExcelDataReader base package to use the "low level" reader interface. Compatible with net462, netstandard2.0 and netstandard2.1.

Install the ExcelDataReader.DataSet extension package to use the AsDataSet() method to populate a System.Data.DataSet. This will also pull in the base package. Compatible with net462, netstandard2.0 and netstandard2.1.

How to use

using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
    // Auto-detect format, supports:
    //  - Binary Excel files (2.0-2003 format; *.xls)
    //  - OpenXml Excel files (2007 format; *.xlsx, *.xlsb)
    using (var reader = ExcelReaderFactory.CreateReader(stream))
    {
        // Choose one of either 1 or 2:

        // 1. Use the reader methods
        do
        {
            while (reader.Read())
            {
                // reader.GetDouble(0);
            }
        } while (reader.NextResult());

        // 2. Use the AsDataSet extension method
        var result = reader.AsDataSet();

        // The result of each spreadsheet is in result.Tables
    }
}

Reading .CSV files

Use ExcelReaderFactory.CreateCsvReader instead of CreateReader to parse a stream of plain text with comma separated values.

See also the configuration options FallbackEncoding and AutodetectSeparators.

The input CSV is always parsed once completely to set FieldCount, RowCount, Encoding, Separator (or twice if the CSV lacks BOM and is not UTF8), and then parsed once again while iterating the row records. Throws System.Text.DecoderFallbackException if the input cannot be parsed with the specified encoding.

The reader returns all CSV field values as strings and makes no attempts to convert the data to numbers or dates. This caller is responsible for interpreting the CSV data.

Using the reader methods

The AsDataSet() extension method is a convenient helper for quickly getting the data, but is not always available or desirable to use. IExcelDataReader extends the System.Data.IDataReader and IDataRecord interfaces to navigate and retrieve data at a lower level. The most important reader methods and properties:

Read() reads a row from the current sheet.
NextResult() advances the cursor to the next sheet.
ResultsCount returns the number of sheets in the current workbook.
Name returns the name of the current sheet.
CodeName returns the VBA code name identifier of the current sheet.
FieldCount returns the number of columns in the current sheet.
RowCount returns the number of rows in the current sheet. This includes terminal empty rows which are otherwise excluded by AsDataSet(). Throws InvalidOperationException on CSV files when used with AnalyzeInitialCsvRows.
HeaderFooter returns an object with information about the headers and footers, or null if there are none.
MergeCells returns an array of merged cell ranges in the current sheet.
RowHeight returns the visual height of the current row in points. May be 0 if the row is hidden.
GetColumnWidth() returns the width of a column in character units. May be 0 if the column is hidden.
GetFieldType() returns the type of a value in the current row. Always one of the types supported by Excel: double, int, bool, DateTime, TimeSpan, string, or null if there is no value.
IsDBNull() checks if a value in the current row is null.
GetValue() returns a value from the current row as an object, or null if there is no value.
GetDouble(), GetInt32(), GetBoolean(), GetDateTime(), GetString() return a value from the current row cast to their respective type.
GetNumberFormatString() returns a string containing the formatting codes for a value in the current row, or null if there is no value. See also the Formatting section below.
GetNumberFormatIndex() returns the number format index for a value in the current row. Index values below 164 refer to built-in number formats, otherwise indicate a custom number format.
GetCellStyle() returns an object containing style information for a cell in the current row: indent, horizontal alignment, hidden, locked.
The typed Get*() methods throw InvalidCastException unless the types match exactly.

CreateReader() configuration options

The ExcelReaderFactory.CreateReader(), CreateBinaryReader(), CreateOpenXmlReader(), CreateCsvReader() methods accept an optional configuration object to modify the behavior of the reader:

var reader = ExcelReaderFactory.CreateReader(stream, new ExcelReaderConfiguration()
{
    // Gets or sets the encoding to use when the input XLS lacks a CodePage
    // record, or when the input CSV lacks a BOM and does not parse as UTF8. 
    // Default: cp1252 (XLS BIFF2-5 and CSV only)
    FallbackEncoding = Encoding.GetEncoding(1252),

    // Gets or sets the password used to open password protected workbooks.
    Password = "password",

    // Gets or sets an array of CSV separator candidates. The reader 
    // autodetects which best fits the input data. Default: , ; TAB | # 
    // (CSV only)
    AutodetectSeparators = new char[] { ',', ';', '\t', '|', '#' },

    // Gets or sets a value indicating whether to leave the stream open after
    // the IExcelDataReader object is disposed. Default: false
    LeaveOpen = false,

    // Gets or sets a value indicating the number of rows to analyze for
    // encoding, separator and field count in a CSV. When set, this option
    // causes the IExcelDataReader.RowCount property to throw an exception.
    // Default: 0 - analyzes the entire file (CSV only, has no effect on other
    // formats)
    AnalyzeInitialCsvRows = 0,
});

AsDataSet() configuration options

The AsDataSet() method accepts an optional configuration object to modify the behavior of the DataSet conversion:

var result = reader.AsDataSet(new ExcelDataSetConfiguration()
{
    // Gets or sets a value indicating whether to set the DataColumn.DataType 
    // property in a second pass.
    UseColumnDataType = true,

    // Gets or sets a callback to determine whether to include the current sheet
    // in the DataSet. Called once per sheet before ConfigureDataTable.
    FilterSheet = (tableReader, sheetIndex) => true,

    // Gets or sets a callback to obtain configuration options for a DataTable. 
    ConfigureDataTable = (tableReader) => new ExcelDataTableConfiguration()
    {
        // Gets or sets a value indicating the prefix of generated column names.
        EmptyColumnNamePrefix = "Column",

        // Gets or sets a value indicating whether to use a row from the 
        // data as column names.
        UseHeaderRow = false,

        // Gets or sets a callback to determine which row is the header row. 
        // Only called when UseHeaderRow = true.
        ReadHeaderRow = (rowReader) => {
            // F.ex skip the first row and use the 2nd row as column headers:
            rowReader.Read();
        },

        // Gets or sets a callback to determine whether to include the 
        // current row in the DataTable.
        FilterRow = (rowReader) => {
            return true;
        },

        // Gets or sets a callback to determine whether to include the specific
        // column in the DataTable. Called once per column after reading the 
        // headers.
        FilterColumn = (rowReader, columnIndex) => {
            return true;
        }
    }
});

Setting up AsDataSet() configuration, use the FilterRow callback to implement a "progress indicator" while loading, e.g.:

var result = reader.AsDataSet(new ExcelDataSetConfiguration()
{
    ConfigureDataTable = (tableReader) => new ExcelDataTableConfiguration()
    {
        FilterRow = (rowReader) => {
            int progress = (int)Math.Ceiling((decimal)rowReader.Depth / (decimal)rowReader.RowCount * (decimal)100);
            // progress is in the range 0..100
            return true;
        }
    }
});

Formatting

ExcelDataReader does not support formatting directly. Users may retreive the number format string for a cell through IExcelDataReader.GetNumberFormatString(i) and use the third party ExcelNumberFormat library for formatting purposes.

Example helper method using ExcelDataReader and ExcelNumberFormat to format a value:

string GetFormattedValue(IExcelDataReader reader, int columnIndex, CultureInfo culture)
{
    var value = reader.GetValue(columnIndex);
    var formatString = reader.GetNumberFormatString(columnIndex);
    if (formatString != null)
    {
        var format = new NumberFormat(formatString);
        return format.Format(value, culture);
    }
    return Convert.ToString(value, culture);
}

Important note when upgrading from ExcelDataReader 2.x

ExcelDataReader 3 had some breaking changes, and older code may produce error messages similar to:

'IExcelDataReader' does not contain a definition for 'AsDataSet'...
'IExcelDataReader' does not contain a definition for 'IsFirstRowAsColumnNames'...

To fix:

Make sure to rename any Excel namespace references in the code to the new namespace ExcelDataReader
Make sure the project has a reference to the ExcelDataReader.DataSet package to use AsDataSet()
Remove the line of code with IsFirstRowAsColumnNames and change the call to AsDataSet() to something like this:

var result = reader.AsDataSet(new ExcelDataSetConfiguration()
{
    ConfigureDataTable = (_) => new ExcelDataTableConfiguration()
    {
        UseHeaderRow = true
    }
});

Important note on .NET Core

By default, ExcelDataReader throws a NotSupportedException "No data is available for encoding 1252." on .NET Core and .NET 5.0 or later.

To fix, add a dependency to the package System.Text.Encoding.CodePages and then add code to register the code page provider during application initialization (f.ex in Startup.cs):

System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);

This is required to parse strings in binary BIFF2-5 Excel documents encoded with DOS-era code pages. These encodings are registered by default in the full .NET Framework, but not on .NET Core and .NET 5.0 or later.

exceldatareader's People

Contributors

Stargazers

Watchers

Forkers

anderezekial ryukbk jeffadvgrower steveholle sashkosnail jas-on zongran2 llenroc mnsaybi pandagopal sergiocm wshhsw2011 sigmapointindiainc shadowmonk tiesont mklaber citybird dalejs giswt pfardilha david-blandon nparrish vegasoftti mattincode chyftom chiepomme fangburn mah-khalid lj520github colindcli blake2002 sergeyasryan steve1961 yuta2 vortx-inc huoxudong125 jbendson prosser johnjuuljensen rotemlevi zhangf911 ly774508966 tsinglee2009 pythoner-xu dasjack raymondwalsh rhythmsosad jamesabruiners rongarlit vino2012 uurtech christiandillems tonyyuen tokyow herch haithemaraissia harry19902002 kezunlong ozlemond gamedevtech devenlu perolesen oleksandrmeister tin43800 duanshuaimin arandra henryhwang moristar guranbog cookcoder zoulufei mvandersteen arventwei raagzcd wuyuze sovos-compliance even8595 zheimer a925 hyrio chentieqi at1993 akhibam java02014 mpbill dawsonzhang xciles luanng zccdz williamockham joelccdev flyeven foufure vassav jowillis jacksmartv thanhlai kushalvenkatesh anjumrizwi mariobot

exceldatareader's Issues

Default name for columns with no name

migrated from codeplex https://exceldatareader.codeplex.com/workitem/12857

"I have a .xlsx file where only some of the columns in row 1 has names. The columns with no name is automatically named 'Column?' (This is both Caption and ColumnName).
I have previously used Excel by opening as a database via oledb and where these columns were automatically named 'F?'. No I have a backward compatibility problem because of this.
This is giving me problems as I do not feel I can just rename Column? to F? because I cannot seem to differentiate between a column actually named Column2 and one which was internally named Column2 because no name is specified in the spreadsheet.

So basically I have two questions;
Is is possible to change the default behavior of this auto naming of columns with no name? Basically exchange 'Column' with 'F'
Or is possible to tell if column didn't actually have a name (was blank). Then I can write the logic myself.
Thanks in advance :)

Ok, after reading the documentation I created a patch with the id; 16126. Hopefully it will accepted as my issue is now resolved by it.

REVIEW patch and apply if appropriate"

"
bleze wrote Apr 4 at 10:23 AM

It turns out that this patch is not enough to get the correct behavior.

The column index starts at 0 and I need it to start at 1.

I've found 3 places in the code where the column name is handled;

Helpers.AddColumnHandleDuplicate(table, string.Concat(COLUMN, i));

I've added a backward compatible bool setting for changing this index behavior. Attached is rev. 2 of the patch.
"
DefaultColumnName-rev2.patch | uploaded via ZenHub

Newlines in cells result in additional rows in DataTables

If a cell contains multiple lines (done by using Alt+Enter) the output gets corrupted.
The newline in the cell seems to cause the ExcelDataReader to create an additional row, containing everything that follows after the newline in the same row as said cell.
Therefore the output is corrupted. It would be better to preserve the '\n' in that cell.

Columns skipped when DIMENSION record is incorrect

Originally posted by stephensjs to the codeplex site, here original post.

"A spreadsheet I've encountered in the wild doesn't conform properly to the Excel binary format spec, and in the DIMENSION record uses the zero based index of the last column, rather than the zero based index of the column after the last column, for the colMac value. As a consequence, ExcelDataReader skips reading the last column of the spreadsheet. Excel and OpenOffice both open the file without any problems, and if you save it after opening it, they will fix the bad DIMENSION record.

I've attached the spreadsheet in question. If you're curious about its provenance, you can find it here: http://www.adrbnymellon.com/dr_directory.jsp. You have to look for the "Download to Excel" link, click it, agree to some legalese, and then you can download.

I've written a patch that allows this file to be read properly. It's similar to the fix for Issue #12667; it compares the number of columns from the DIMENSION record with the number of columns from the first row record, and if the number of columns from the row record is greater, it goes with that. The fix does not cause any existing working tests to fail, and I've included a unit test to exercise the new code. The patch is attached."

dimension_columns_one_too_few.diff | uploaded via ZenHub
Test_DimensionColumnsOneTooFew.xls | uploaded via ZenHub

Attachments.zip

Only 1 column name with empty worksheets or columns in multiple worksheets in xlsx file

I've loaded a workbook with 123 worksheets with identical headers, containing 3 worksheets with only a header row. When loading as xlsx file, and using AsDataset, only the worksheets containing data rows are processed as a table, but with only 1 column. Loading the same data as xls file ('export as' from the xlsx) and using AsDataset works just fine, loading all worksheets inclusing the emppty one as tables, and all columns.

Possible null reference exception in tryConvertOADateTime

was patch submitted to codeplex (15924 )
ExcelBinaryReader.AsDataSet(true); can cause NullReferenceException

The patch checks the value parameter for null before attempting conversion to DateTime.
tryConvertOADateTime.patch | uploaded via ZenHub

Visual Studio 2013 using Excel not found.

This is VERY odd!

I reference the DLL and add the using, and everything looks great until I try to build it, then I get the blue line under the using statement.

Is there a possibility of some conflict with another Excel?

Portable class library

Convert the library to a portable class library so maitenance and integration with different frameworks is easier.

Select which worksheet to read

I am trying to read from various excel files that may or may not have multiple worksheets. I would like to read from the latest one, i.e., if there is only 1, read that. If there are 3, read the 3rd one. Is there some simple way to select which worksheet I would like to read from, possibly by an index?

Not a legal Ole Aut date

https://exceldatareader.codeplex.com/discussions/562088

 public static object ConvertFromOATime(double value)
    {
        if ((value >= 0.0) && (value < 60.0))
        {
            value++;
        }
        //if (date1904)
        //{
        //    Value += 1462.0;
        //}

      // added to fix out of range problem, sdh, 20140826
      if (value < 657435.0)
      {
        value = 657435.0;
      }
      else if (value > 2958465.99999999)
      {
        value = 2958465.99999999;
      }
        return DateTime.FromOADate(value);
    }

New namespace ExcelDataReader?

Rather than just Excel, which is a bit conflicting

ExcelOpenXmlReader: numbers stored as text

Hi,

I come to a issue that when using the ExcelOpenXmlReader, the numbers formated as text in Excel are always converted to double, ex: the number 0006181(text) is converted to 6181.0(double).
In line 279 there is a verification to see if the format is text:

else if (xf.NumFmtId == 49)
  o = o.ToString();

but the convertion is from the object already converted to double!

Sugestion:

else if (xf.NumFmtId == 49)
   o = _xmlReader.Value;

this will preserve numbers as text, as long as they are formated as text in Excel.

Unable to read files generated by rexcel (rjava) plus suggested code fix

Excel files generated by the statistical program R xlsx package (http://cran.r-project.org/web/packages/xlsx/xlsx.pdf) are incorrectly read by ExcelDataReader. Only 1 column is read.

The same file reads correctly into Excel. Doing a save-as from Excel can then be correctly opened by ExcelDataReader.

After investigation of the worksheet1.xml file in a debugger, it appears that the Dimension property of this file contained only "A1" and not "A1:S1206" like the value of Dimension when the file is saved by Excel.

As a result, ExcelDataReader thinks the number of columns is 0 (LastCol = 1) and only the first column of the worksheet is read.

A solution is suggested, which is to replace the single "Break;" statement in ExcelOpenXmlReader.cs, line ~130, by the following:

// If LastCol is 1 then number of columns probably is wrong (although it might be just 1 column).
// Set dimension to null so rows and cols are calculated below.
if (sheet.Dimension.LastCol == 1)
sheet.Dimension = null;
else
break;

Basically, if LastCol == 1 then it probably did not read in the number of columns correctly so setting sheet.Dimension to null causes sheet.Dimension to be created further down in the code where the rows and columns are actually counted rather than determined from the Dimension element of the xml file.

Error: Buffer size is less than minimum BIFF record size.

Hi, I had error like in topic and i hope i found also solution which works fine for me.

I noticed XlsBiffEOF in ExcelBinaryReader record in some of cases doesn't break loading current workbook.
It was some reasons:

If XlsBiffEOF is encouted m_depth is not incremented,
Offset is not in end of the buffer,
moveToNextRecord has line

if (!m_canRead && m_depth > 0) m_canRead = true;

which continues execution. My stream from file which i have contains only 0x00 to end of the stream and it cannot be load correctly.

I solved my error by modifing function readWorkSheetRow
If there is XlsBiffEOF then i increment m_depth.

if (rec is XlsBiffEOF) { m_depth++;  return false; };

Then loading of workbook is stopped correctly. Other solution which i found is to modiy moveToNextRecord by removing m_canRead = false or removing whole condition - i hadn't spotted reason of that line.

Multithreaded Read issue in Log

see https://exceldatareader.codeplex.com/workitem/12936 for detail

Handle Hidden Spreadsheets

It would be helpful if the AsDataSet() function would handle hidden sheets more elegantly. Options include:

Ignore them
Provide an option to ignore them
Don't ignore them but apply an extended property to the corresponding DataTable

An example sheet is here: TestHiddenSheet.xlsx

This was created by taking the TestMultiSheet.xlsx file and hiding Sheet3.

Only one tab sheet is read

Hi there,

I have an Excel file that contains several tab sheets, named like this: HUN-CZE, HUN-FRA, FRA-ENG and so on. I load the file using this code:

IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
DataSet result = excelReader.AsDataSet();

But the resulting dataset contains only one tab sheet, named "Sheet 1".

What am I doing wrong?

Regards,

-Eduardo

Non-negative number required. Parameter name: count

From https://exceldatareader.codeplex.com/workitem/13222
"This exception is produced in XlsBiffSST.cs on last - offset - prefix; The excel file contains Chinese simplified data with some formatting. I don't know how to fix this issue. Your help will be highly appreciated in this regard."

Develop in Dot Net Compact Framework 2.0

This API can use in dot net compact framework 2.0? I got this issue "Error 2 The type 'System.Data.IDataRecord' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'."

Converting cell formatted as percentage to incorrect decimal

I have an issue where I'm reading a cell that is formatted as percentage incorrectly. For example, I can enter "0.5" into the cell, and it will display "0.5000%". However, when I read the cell back in, using GetString(), it reads in as "0.005", rather than "0.5" or "0.5%". Is there something I can do to get around this, or does this need a fix?

Text like numeric incorrectly (differently) handled.

I have in excel cell value "0682".

ExcelOpenXmlReader in function ReadSheetRow converts value to double on beginning. Is it correct approach? I though value returned by reader should be same as visible in excel workbook. For solving that issue i changed my code a bit:

if (_xmlReader.NodeType == XmlNodeType.Text && hasValue)
                    {
                        object o = _xmlReader.Value;

                        var style = NumberStyles.Any;
                        var culture = CultureInfo.InvariantCulture;

                        bool is_string = false;

                        if (null != a_t && a_t == XlsxWorksheet.N_inlineStr)
                        {
                            o = Helpers.ConvertEscapeChars(o.ToString());
                            if (_xmlReader.XmlSpace == XmlSpace.Preserve)
                            {
                                is_string = true;
                            }
                        }
                        else if (null != a_t && a_t == XlsxWorksheet.A_s) //if string
                        {
                            o = Helpers.ConvertEscapeChars(_workbook.SST[int.Parse(o.ToString())]);
                            is_string = true;
                        } // Requested change 4: missing (it appears that if should be else if)                        
                        else if (a_t == "b") //boolean
                        {
                            o = _xmlReader.Value == "1";                            
                        }

                        if (!is_string && a_s != null && !string.IsNullOrEmpty(o.ToString()))
                        {
                            XlsxXf xf = _workbook.Styles.CellXfs[int.Parse(a_s)];
                            double number;
                            long l_number;
                            int i_number;
                            if (IsDateTimeStyle(xf.NumFmtId) && double.TryParse(o.ToString(), out number))
                            {
                                o = Helpers.ConvertFromOATime(number);
                            }
                            else if (!xf.ApplyNumberFormat || xf.NumFmtId == 49)
                            {
                                o = o.ToString();
                            }
                            else if (int.TryParse(o.ToString(), NumberStyles.Integer, culture, out i_number))
                            {
                                o = i_number;
                            }
                            else if (long.TryParse(o.ToString(), NumberStyles.Integer, culture, out l_number))
                            {
                                o = l_number;
                            }
                            else if (double.TryParse(o.ToString(), NumberStyles.Any, culture ,out number))
                            {
                                o = number;
                            }                            
                        }



                        if (col - 1 < _cellsValues.Length)
                            _cellsValues[col - 1] = o;
                    }

Please remove dependency on SharpZipLib its GPL

Please change SharpZipLib to something with a less restrictive license like DotNetZip or 7ZIP SDK for C#.

Preserve information about Excel's named ranges

In Excel, one can define a "named range". It would be nice to be able to preserve those ranges in the dataset. Perhaps something like this:

let workbook = excelReader.AsDataSet(namedRangesInTheirOwnTables: true)

This could create additional tables in the DataSet, one table for each named range.

Drop Framework 2.0 support?

I really want to drop framework 2.0 support, because maintaining it is restricting the rest of the project and how it can be structured. Personally I have no use for framework 2.0 support so my motivation to keep supporting it is very low.

What do people feel about this?

Could someone fix the Project so that it works

Hi have had to change several things to get this to work.

I am talking about the TestApp

One of the references is confused. had to remove Excel.4.5 and add back. (strange behavior)
Was complaining about missing bin/Debug45/Excel.dll

2.The reference flies are missing from TestApp to PCLStorage and PCLStorage.Abstractations
if you don't add this is will complain about method not implemented i the Portable Project.

After all that when i try and Test a file it says that it has no
_workbook..... line " for (int ind = 0; ind < _workbook.Sheets.Count; ind++)"

_workbook is null.... I think this has to do with all the async methods.

Could be wrong about all this but would be nice if it just worked out the box.

Can not acquire Table with provided file. Looking for guidance.

While using this linked spreadsheet below, the index is not found in method readWorkSheetGlobals, and the method returns false. I'm currently debugging ExcelBinaryReader.cs but i'm also having to come up to speed on the BIFF format. I'm posting this issue here in the hopes that someone that has more experience with this format can easily spot the problem.

https://drive.google.com/file/d/0BxEpRvUUcK3lczhWUk01MlVmazQ/view?usp=sharing

Request: Excel Data Writer

I'm impressed by the performance of your library and would be very helpful is could add also the possibility to write a DataSet into an Excel file (xlsx or xls).

Thanks

Stream does not support seeking.

https://exceldatareader.codeplex.com/workitem/11988

"I am using a shell that calls a program that decrypts a file to StandardOutput.

Process process = new Process();
//... setup to run decrypter
process.Start();
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(process.StandardOutput.BaseStream);

Can there be a constructor that will handle non-seekable (StreamReader) Streams as well?

My constraints...
I don't want to write the sensitive data to file and the solution must handle large files (reading to memory first will not work)."

Unify shared code in ExcelOpenXmlReader & binaryreader

.net 2 and portable version have some very similar code in them, that should just be in one place.

How to read an excel file with repeated column names?

I need to read excel files with the same column names repeated several times.
DataTables will not allow repeated column headers so excelReader.AsDataSet() throws an error if the excel file has repeated column headers. Is there a way to handle this?

Thank you!

Encoding issue

One of my file has an encoding issue. First screenshot shows the original Excel view. 2nd one is via LinqPad, 3rd one is via Visual Studio watch tab. Any idea how to solve this issue? Thx

const string worksheet = "BD- 1";
const string fileName = "199811021.xls";
const int rowNumber = 3;
const int columnNumber = 8;

string directory = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
string filePath = Path.Combine(directory, fileName);

FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
DataSet result = excelReader.AsDataSet();

foreach (DataTable t in result.Tables)
{
    if (t.TableName.Equals(worksheet))
    {
        DataRow r = t.Rows[rowNumber];
        object c = r[columnNumber]; 

        // outputs
        r.Dump();
        Console.WriteLine (c);
    }
}

excelReader.Close();

Value contains phonetic string data for xls / xlsm

Asian strings such as Japanese can contain phonetic string data (furigana/yomigana), but ExcelDataReader just includes a phonetic string in a string value and doesn't separate them. The expected result is when retrieving a string value it doesn't contain phonetic string data as it's just there to show how to read it.
It seems for xls (BinaryReader) there is an uncommented code,
https://exceldatareader.codeplex.com/discussions/50930
but for xlsm (XmlReader) there's nothing.

CreateOpenXmlReader -> AsDataSet . When column has comma it creates extra columns

Hi,

I am trying to open an excel spreadsheet which has commas in its heading names.
When I call AsDataSet() it parses each comma as a new column.

Is there any workaround for this (other than to delete the commas from the source file)?

Thanks,

Conor

Support for Excel (.xlsb) Binary File Format

I actually didn't know this format existed, seems to be a hybrid between xlsx and xls. We are getting submitted to us in this format sometimes.

http://msdn.microsoft.com/en-us/library/office/cc313133(v=office.12).aspx

Unify Helpers code between .net 2 and 4.5

Read Csv Files

wizardgsz kindly submitted this to the codeplex site
https://exceldatareader.codeplex.com/workitem/13097

I think there are few points here:

Do we want this library to also cope with csv files?
If so, should we use another csv library to do the csv reading part. Are there pitfalls in reading a csv that we might fall in to that another library will have already overcome?
We need unit tests for this class

// If defined it uses core/helper functions provided by ExcelDataReader Library.
//#define USE_EXCEL_CORE

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.IO;
using System.Globalization;

using Excel;
using Excel.Core;

namespace Excel
{
    /*
        Extends ExcelReaderFactory:

        /// <summary>
        /// Creates an instance of <see cref="ExcelCsvReader"/>
        /// </summary>
        /// <param name="fileStream">The file stream.</param>
        /// <returns></returns>
        public static IExcelDataReader CreateOpenXmlReader(Stream fileStream)
        {
            IExcelDataReader reader = new ExcelCsvReader();
            reader.Initialize(fileStream);

            return reader;
        }
     */

    /// <summary>
    /// ExcelDataReader Class
    /// </summary>
    public class ExcelCsvReader: IExcelDataReader, IDataReader, IDataRecord, IDisposable
    {
        #region Helpers and private members/methods

        //private CultureInfo ITA = new CultureInfo("it-IT");
        //private CultureInfo USA = new CultureInfo("en-US");

        private const string WORKBOOK = "Workbook";
        private const string BOOK = "Book";
        private const string COLUMN = "Column";
        private const string SHEET = "Sheet";

        private readonly Encoding m_Default_Encoding = Encoding.UTF8;
        private Encoding m_encoding;
        private bool m_isValid;
        private bool m_isClosed;
        private List<CsvWorksheet> m_sheets;
        private TextReader m_stream;
        private DataSet m_workbookData;
        private string m_exceptionMessage = String.Empty;
        private bool _isFirstRowAsColumnNames = false;
        private Stream m_file;
        private int m_SheetIndex;
        private bool m_ConvertOADate;
        private bool m_IsFirstRead;
        private object[] m_cellsValues;

        internal ExcelCsvReader()
        {
            this.m_encoding = this.m_Default_Encoding;
            this.m_isValid = true;
            this.m_SheetIndex = -1;
            this.m_IsFirstRead = true;
        }

        ~ExcelCsvReader()
        {
            this.Dispose(false);
        }

        private void fail(string message)
        {
            this.m_exceptionMessage = message;
            this.m_isValid = false;
            this.m_file.Close();
            this.m_isClosed = true;
            this.m_workbookData = null;
            this.m_sheets = null;
            this.m_stream = null;
            this.m_encoding = null;
        }

        static private string[] TokenizeLine(string line)
        {
            List<string> tokens = new List<string>();

            bool quoting = false;
            string token = String.Empty;

            for (int i = 0; i < line.Length; i++)
            {
                char ch = line[i];
                if (ch == '\"')
                {
                    // Does it open a new literal string or close the token?
                    if (quoting)
                    {
                        // New token
                        tokens.Add(token);
                        token = String.Empty;
                        // Skip any char until the next ';'
                        for (; i < line.Length; i++)
                        {
                            ch = line[i];
                            if (ch == ';')
                            {
                                break;
                            }
                        }
                    }
                    else token = String.Empty;
                    quoting = !quoting;
                }
                else if (ch == ';')
                {
                    // Are we reading a literal string?
                    if (!quoting)
                    {
                        // New token
                        tokens.Add(token);
                        token = String.Empty;
                    }
                    else token += ch;
                }
                // Continue tokenizing
                else token += ch;
            }

            tokens.Add(token);

            return tokens.ToArray();
        }

#if !USE_EXCEL_CORE
        static private void FixDataTypes(DataSet dataset)
        {
            List<DataTable> tables = new List<DataTable>(dataset.Tables.Count);
            bool convert = false;
            foreach (DataTable table in dataset.Tables)
            {
                if (table.Rows.Count == 0)
                {
                    tables.Add(table);
                }
                else
                {
                    DataTable newTable = null;
                    for (int i = 0; i < table.Columns.Count; i++)
                    {
                        Type type = null;
                        foreach (DataRow row in table.Rows)
                        {
                            if (!row.IsNull(i))
                            {
                                Type curType = row[i].GetType();
                                if (curType != type)
                                {
                                    if (!(type == null))
                                    {
                                        type = null;
                                        break;
                                    }
                                    type = curType;
                                }
                            }
                        }
                        if (type != null)
                        {
                            convert = true;
                            if (newTable == null)
                            {
                                newTable = table.Clone();
                            }
                            newTable.Columns[i].DataType = type;
                        }
                    }
                    if (newTable != null)
                    {
                        newTable.BeginLoadData();
                        foreach (DataRow row2 in table.Rows)
                        {
                            newTable.ImportRow(row2);
                        }
                        newTable.EndLoadData();
                        tables.Add(newTable);
                    }
                    else
                    {
                        tables.Add(table);
                    }
                }
            }
            if (convert)
            {
                dataset.Tables.Clear();
                dataset.Tables.AddRange(tables.ToArray());
            }
        }

        static private void AddColumnHandleDuplicate(DataTable table, string columnName)
        {
            string adjustedColumnName = columnName;
            DataColumn column = table.Columns[columnName];
            int i = 1;
            while (column != null)
            {
                adjustedColumnName = string.Format("{0}_{1}", columnName, i);
                column = table.Columns[adjustedColumnName];
                i++;
            }
            table.Columns.Add(adjustedColumnName, typeof(object));
        }
#endif

        private void readWorkBookGlobals()
        {
            try
            {
                // Create the first and only Worksheet
                this.m_sheets = new List<CsvWorksheet>();
                this.m_sheets.Add(new CsvWorksheet(m_sheets.Count, SHEET + (m_sheets.Count + 1)));
            }
            catch (Exception ex)
            {
                this.fail(ex.Message);
                return;
            }
        }

        private void readWholeWorkSheetNoIndex(bool triggerCreateColumns, DataTable table)
        {
            while ((this as IDataReader).Read() && this.m_depth != this.m_maxRow)
            {
                bool justAddedColumns = false;
                if (triggerCreateColumns)
                {
                    if (this._isFirstRowAsColumnNames || (this._isFirstRowAsColumnNames && this.m_maxRow == 1))
                    {
                        for (int i = 0; i < Math.Min(this.m_maxCol, this.m_cellsValues.Length); i++)
                        {
                            if (this.m_cellsValues[i] != null && this.m_cellsValues[i].ToString().Length > 0)
                            {
#if USE_EXCEL_CORE
                                Helpers.
#endif
                                AddColumnHandleDuplicate(table, this.m_cellsValues[i].ToString());
                            }
                            else
                            {
#if USE_EXCEL_CORE
                                Helpers.
#endif
                                AddColumnHandleDuplicate(table, COLUMN + i);
                            }
                        }
                    }
                    else
                    {
                        for (int j = 0; j < Math.Min(this.m_maxCol, this.m_cellsValues.Length); j++)
                        {
                            table.Columns.Add(null, typeof(object));
                        }
                    }
                    triggerCreateColumns = false;
                    justAddedColumns = true;
                    table.BeginLoadData();
                }
                if (!justAddedColumns && this.m_depth > 0 && (!this._isFirstRowAsColumnNames || this.m_maxRow != 1))
                {
                    table.Rows.Add(this.m_cellsValues);
                }
            }
            if (this.m_depth > 0 && (!this._isFirstRowAsColumnNames || this.m_maxRow != 1))
            {
                table.Rows.Add(this.m_cellsValues);
            }
        }

        private DataTable readWholeWorkSheet(CsvWorksheet sheet)
        {
            DataTable table = new DataTable(sheet.Name);
            bool triggerCreateColumns = true;
            readWholeWorkSheetNoIndex(triggerCreateColumns, table);
            table.EndLoadData();
            return table;
        }

        public bool ConvertOaDate
        {
            get
            {
                return this.m_ConvertOADate;
            }
            set
            {
                this.m_ConvertOADate = value;
            }
        }

        #endregion

        #region IExcelDataReader Members

        void IExcelDataReader.Initialize(Stream fileStream)
        {
            this.m_file = fileStream;
            this.m_stream = new StreamReader(m_file);
            this.readWorkBookGlobals();
            this.m_SheetIndex = 0;
        }

        DataSet IExcelDataReader.AsDataSet(bool convertOADateTime)
        {
            if (!this.m_isValid)
            {
                return null;
            }

            if (this.m_isClosed)
            {
                return this.m_workbookData;
            }

            this.ConvertOaDate = convertOADateTime;
            this.m_workbookData = new DataSet();

            for (int index = 0; index < (this as IExcelDataReader).ResultsCount; index++)
            {
                DataTable table = this.readWholeWorkSheet(this.m_sheets[index]);
                if (table != null)
                {
                    this.m_workbookData.Tables.Add(table);
                }
            }

            this.m_file.Close();
            this.m_isClosed = true;
            this.m_workbookData.AcceptChanges();

#if USE_EXCEL_CORE
            Helpers.
#endif
            FixDataTypes(this.m_workbookData);

            return this.m_workbookData;
        }

        DataSet IExcelDataReader.AsDataSet()
        {
            return (this as IExcelDataReader).AsDataSet(false);
        }

        string IExcelDataReader.ExceptionMessage
        {
            get { return m_exceptionMessage; }
        }

        bool IExcelDataReader.IsFirstRowAsColumnNames
        {
            get
            {
                return this._isFirstRowAsColumnNames;
            }
            set
            {
                this._isFirstRowAsColumnNames = value;
            }
        }

        bool IExcelDataReader.IsValid
        {
            get { return this.m_isValid; }
        }

        string IExcelDataReader.Name
        {
            get
            {
                if (this.m_sheets != null && this.m_sheets.Count > 0)
                {
                    return this.m_sheets[this.m_SheetIndex].Name;
                }
                return null;
            }
        }

        int IExcelDataReader.ResultsCount
        {
            get { return m_sheets.Count; }
        }

        #endregion

        #region IDataReader Members

        void IDataReader.Close()
        {
            this.m_file.Close();
            this.m_isClosed = true;
        }

        int IDataReader.Depth
        {
            get { return this.m_depth; }
        }

        DataTable IDataReader.GetSchemaTable()
        {
            throw new NotSupportedException();
        }

        bool IDataReader.IsClosed
        {
            get
            {
                return this.m_isClosed;
            }
        }

        bool IDataReader.NextResult()
        {
            if (this.m_SheetIndex >= (this as IExcelDataReader).ResultsCount - 1)
            {
                return false;
            }
            this.m_SheetIndex++;
            this.m_IsFirstRead = true;
            return true;
        }

        private bool m_canRead;
        private int m_maxCol = 65535;
        private int m_maxRow = 65535;
        private int m_depth = 0;

        private bool readWorkSheetRow()
        {
            if ((m_stream as StreamReader).EndOfStream)
            {
                return false;
            }

            string line = m_stream.ReadLine();

            if (!String.IsNullOrEmpty(line.Trim()))
            {
                string[] tokens = TokenizeLine(line);

                // With the exception of the first line, if it represents the column header
                if (m_depth == 0 && _isFirstRowAsColumnNames)
                {
                    this.m_cellsValues = tokens;
                }
                else
                {
                    this.m_cellsValues = new object[tokens.Length];

                    for (int i = 0; i < this.m_cellsValues.Length; i++)
                    {
                        string aString = tokens[i];
                        try
                        {
                            //long Result;
                            //DateTime Date;
                            //if (long.TryParse(aString, out Result))
                            //{
                            //    string conv = Result.ToString();
                            //    if (aString == conv)
                            //    {
                            //        if ((Result & -281474976710656L) != -281474976710656L)
                            //        {
                            //            double d = BitConverter.ToDouble(BitConverter.GetBytes(Result), 0);
                            //            this.m_cellsValues[i] = d;
                            //        }
                            //    }
                            //    // else "not-significant leading zero (0)"
                            //}
                            //else if (DateTime.TryParseExact(aString, new string[] { "dd/MM/yyyy" }, ITA.DateTimeFormat, DateTimeStyles.None, out Date))
                            //{
                            //    this.m_cellsValues[i] = Date;//.ToString(USA);
                            //}
                            //else
                            {
                                this.m_cellsValues[i] = aString;
                            }
                        }
                        catch (Exception ex)
                        {
                            ex.ToString();
                        }
                    }
                }
            }

            this.m_depth++;

            return this.m_depth < this.m_maxRow;
        }

        private bool moveToNextRecord()
        {
            this.m_canRead = this.readWorkSheetRow();
            return this.m_canRead;
        }

        bool IDataReader.Read()
        {
            if (!this.m_isValid)
            {
                return false;
            }
            if (this.m_IsFirstRead)
            {
                this.m_IsFirstRead = false;
                if (this.m_SheetIndex == -1)
                {
                    this.m_SheetIndex = 0;
                }
            }
            return this.moveToNextRecord();
        }

        int IDataReader.RecordsAffected
        {
            get { throw new NotSupportedException(); }
        }

        #endregion

        #region IDataRecord Members

        int IDataRecord.FieldCount
        {
            get
            {
                if (m_cellsValues == null)
                {
                    return 0;
                }
                return Math.Min(this.m_maxCol, m_cellsValues.Length);
            }
        }

        bool IDataRecord.GetBoolean(int i)
        {
            return !(this as IDataRecord).IsDBNull(i) && bool.Parse(this.m_cellsValues[i].ToString());
        }

        byte IDataRecord.GetByte(int i)
        {
            throw new NotSupportedException();
        }

        long IDataRecord.GetBytes(int i, long fieldOffset, byte[] buffer, int bufferoffset, int length)
        {
            throw new NotSupportedException();
        }

        char IDataRecord.GetChar(int i)
        {
            throw new NotSupportedException();
        }

        long IDataRecord.GetChars(int i, long fieldoffset, char[] buffer, int bufferoffset, int length)
        {
            throw new NotSupportedException();
        }

        IDataReader IDataRecord.GetData(int i)
        {
            throw new NotSupportedException();
        }

        string IDataRecord.GetDataTypeName(int i)
        {
            throw new NotSupportedException();
        }

        DateTime IDataRecord.GetDateTime(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return DateTime.MinValue;
            }
            object val = this.m_cellsValues[i];
            if (val is DateTime)
            {
                return (DateTime)val;
            }
            string valString = val.ToString();
            double dVal;
            try
            {
                dVal = double.Parse(valString);
            }
            catch (FormatException)
            {
                return DateTime.Parse(valString);
            }
            return DateTime.FromOADate(dVal);
        }

        decimal IDataRecord.GetDecimal(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return -79228162514264337593543950335m;
            }
            return decimal.Parse(this.m_cellsValues[i].ToString());
        }

        double IDataRecord.GetDouble(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return -1.7976931348623157E+308;
            }
            return double.Parse(this.m_cellsValues[i].ToString());
        }

        Type IDataRecord.GetFieldType(int i)
        {
            throw new NotSupportedException();
        }

        float IDataRecord.GetFloat(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return -3.40282347E+38f;
            }
            return float.Parse(this.m_cellsValues[i].ToString());
        }

        Guid IDataRecord.GetGuid(int i)
        {
            throw new NotSupportedException();
        }

        short IDataRecord.GetInt16(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return -32768;
            }
            return short.Parse(this.m_cellsValues[i].ToString());
        }

        int IDataRecord.GetInt32(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return -2147483648;
            }
            return int.Parse(this.m_cellsValues[i].ToString());
        }

        long IDataRecord.GetInt64(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return -9223372036854775808L;
            }
            return long.Parse(this.m_cellsValues[i].ToString());
        }

        public string GetName(int i)
        {
            throw new NotSupportedException();
        }

        int IDataRecord.GetOrdinal(string name)
        {
            throw new NotSupportedException();
        }

        string IDataRecord.GetString(int i)
        {
            if ((this as IDataRecord).IsDBNull(i))
            {
                return null;
            }
            return this.m_cellsValues[i].ToString();
        }

        object IDataRecord.GetValue(int i)
        {
            return this.m_cellsValues[i];
        }

        int IDataRecord.GetValues(object[] values)
        {
            throw new NotSupportedException();
        }

        bool IDataRecord.IsDBNull(int i)
        {
            if (this.m_cellsValues == null) return true;
            if (this.m_cellsValues.Length < i) return true;
            return this.m_cellsValues[i] == null || DBNull.Value == this.m_cellsValues[i];
        }

        object IDataRecord.this[string name]
        {
            get { throw new NotSupportedException(); }
        }

        object IDataRecord.this[int i]
        {
            get { return this.m_cellsValues[i]; }
        }

        #endregion

        #region IDisposable Members

        private bool disposed = false;

        void IDisposable.Dispose()
        {
            this.Dispose(true);
            GC.SuppressFinalize(this);
        }

        private void Dispose(bool disposing)
        {
            if (!this.disposed)
            {
                if (disposing)
                {
                    if (this.m_workbookData != null)
                        this.m_workbookData.Dispose();
                    if (this.m_sheets != null)
                        this.m_sheets.Clear();
                }
                this.m_workbookData = null;
                this.m_sheets = null;
                this.m_stream = null;
                this.m_encoding = null;
                this.disposed = true;
            }
        }

        #endregion
    }

    internal class CsvWorksheet
    {
        public CsvWorksheet(int index, string SheetName)
        {
            this.m_Index = index;
            this.m_Name = SheetName;
        }

        private readonly int m_Index;
        private readonly string m_Name;

        public string Name
        {
            get { return this.m_Name; }
        }

        public int Index
        {
            get { return this.m_Index; }
        }
    }
}

Bugfix in TokenizeLine, last "empty" token:

            // ...
            if (line.Trim().EndsWith(";"))
                tokens.Add(token);
            else if (!String.IsNullOrEmpty(token))
                tokens.Add(token);

            return tokens.ToArray();

Tidy up project

Analyse whether it could be structured better

Need testers/contributors for ExcelDataReader.Portable (branch f-portable)

The branch f-portable contains the portable version of ExcelDataReader together with a Windows desktop implementation. The V2 ExcelDataReader for .Net 4.5 now uses the portable library.

This will become a V3 release of ExcelDataReader

Xamarin

I could really do with someone who uses Xamarin to flesh out the ExcelDataReader.iOS and ExcelDataReader.Android projects. It should be pretty easy as there isn't much to implement. Have a look at ExcelDataReader.Desktop - and you'll see that there are really just 3 interfaces to implement

Current library interface still supported

The existing Excel.IExcelDataReader interface is maintained in the Excel.4.5 project and this makes this release a drop in replacement for the current v2 library.

.Net 2 still supported (for now)

.Net framework 2 is currently still supported but the code has now diverged so I would like to drop it, or find someone who wants to support framework 2 (I have no need for it).

Older Excel version support dropped

The v2 library was a bit patchy on anything < Biff8. To make it easier to support going forward only Biff8 is going to be supported. This is anything from Excel 97 onwards.

Current pull requests & issues

I need to go through these and incorporate into the new code. Help with this would be appreciated also. Bearing in mind that we really require unit tests to be written to first expose the issue and thus demonstrate that the issue exists,. That means we need test files.

We could also do with a better way of being able to quickly create these test methods. Currently you have to do a few steps to add the excel file and make it referenceable in the test classes. Anything that makes it easier to add these tests with less steps would make maintenance much easier.

Is it possible to read background colour of a cell?

Does ExcelDataReader have this feature? or is it strictly just data?

Issue reading .xls file with header information on top

I am trying to read an .xls file with some extra lines of header information on top above the column names and the data. I keep getting the "Input array is longer than the number of columns in this table." error. Is there a way to work around this?

Reading wrongly formatted cells

From https://exceldatareader.codeplex.com/discussions/560931
"
Hello everyone! I have been reading a lot around my issue but I am still struggling to get an idea how to go around it. Any help would be appreciated!

I am reading an Excel spreadsheet that can contain wrongly formatted data, e.g. a cell could be formatted as a date but in Excel appears normally- as a string as it contains a string. When I read it with the Excel data reader it appears as 31/12/1899. How can I just get the value of the string as displayed in Excel?
"

XlsDirectoryEntry Workbook or Book cannot be found.

I have issue with one xls which cannot be loaded with ExcelDataReader library.

XlsDirectoryEntry workbookEntry = dir.FindEntry(WORKBOOK) ?? dir.FindEntry(BOOK);

I tried debug and push to XlsDirectoryEntry each STGTY_STREAM from dir but anyone load worksheet or destroys.

Elements in Entries i have:

"Root Entry", STGTY_ROOT
"EncryptedPackage", STGTY_STREAM
"-DataSpaces", STGTY_STORAGE
"Version", STGTY_STREAM
"DataSpaceMap", STGTY_STREAM
"DataSpaceInfo", STGTY_STORAGE
"StrongEncryptionDataSpace" STGTY_STREAM
"TransformInfo", STGTY_STORAGE
"StorageEncryptionTransform", STGTY_STORAGE
"-Primary", STGTY_STREAM
"EncryptionInfo", STGTY_STREAM
"", STGTY_INVALID

I'm wondering why last directory is invalid.

Partially i found a reason, xls was renamed to wrong extension. It is xlsm file - but it doesn't explain why it is not loaded correctly.

.Net 2.0 Version : ExcelBinaryReader is not adding first row to dataset table even if _isFirstRowAsColumnNames==false

Hello,
just a small bug.

In the .NET 2.0 version even if _isFirstRowAsColumnNames==false in the method readWholeWorkSheetNoIndex the first row is not added to the table of the dataset.

The line

if (!justAddedColumns && m_depth > 0 && !(_isFirstRowAsColumnNames && m_maxRow == 1))

should become

if (!(justAddedColumns && _isFirstRowAsColumnNames) && m_depth > 0 && !(_isFirstRowAsColumnNames && m_maxRow == 1))

Poor performance when reading non indexed XLS file

from codeplex issue https://exceldatareader.codeplex.com/workitem/11827

"This is based on source code downloaded on 27th May 2013. I had a file with +100000 records in it and this file doesn't have index. So it took ~ 3 hours to complete reading that file. I debugged source code and found out that ExcelBinaryReader.moveToNextRecordNoIndex method skips all 100000 records (with ID = ROW) each time it reads 1 record! So perfomance degrades even worse when file size become bigger. According to my file's structure first come all blocks with ID = ROW and then all blocks with ID related to cell. I don't know whether this is correct structure for all files which do not have index. But I modified code and skipped all rows on sheet initialization and then just reading cells it took 40 seconds to complete large file."

NonIndexedXlsReadPatch.patch | uploaded via ZenHub

All data not being pulled

I am having issues seeing all the columns in a file. When stepping through the file I only see the first column. This creates issues in my code when I try to rename columns that do not exist. Could you please help me come up with a solution as to why I can't see the whole file.

I have attached a link to a sample xlsx file that is giving me the problem.

https://www.dropbox.com/s/4f8n29p9l2cab8x/HRA%20Medical.xlsx?dl=0

Cached formula string type ignored

SpreadsheetML provides a "str" cell type on formula cells that indicates that the cached formula result should be read as a string. ExcelDataReader does not have support for this cell type and instead handles it as an untyped cell, converting the value to a double if .NET will parse it. Instead, it should be handled similar to an inline string.

Additionally, when ExcelOpenXmlReader reads a cell value it first converts it to to a double if possible. Then, if it determines that the cell was an inline string, it converts the double back to a string. This can cause loss of information, such as leading zeros and trailing commas and periods. Instead, inline string types should never be converted to a double. I believe this is similar to issues #47 and #31.

XlsStream.ReadStream hangs on corrupted file

migrated from codeplex https://exceldatareader.codeplex.com/workitem/13090

"When I try to read a corrupted .xls file (with a part of file accidentally lost), my application hangs in an infinite loop, while I expect it to fail with some exception. As I use the library in a web application, such behavior can slow down the server greatly. It also creates a vulnerability to DDOS attacks. An example corrupted file is attached.

This issue can only be reproduced in "Any CPU" or "x64" build mode, as in "x86" OutOfMemory exception is thrown.

Issue reason:
I have discovered that the apllication loops in a "do while" cycle in XlsStream.ReadStream. The fat.GetNextSector never returns FATMARKERS.FAT_EndOfChain, so the loop never stops.

Fix proposal:
Infinite loop can easily be avoided by analyzing the return value of m_fileStream.Read call (the number of bytes read from file). When the file is read to end, m_fileStream.Read returns zero, and it means that we can exit the loop. So I added additional check to the "while" statement.
With such fix, an attempt to read a corrupted file results in an "ArgumentOutOfRange" exception thrown somewhere further in code. This is way better than a hang; however, the thrown exception is not informative. I guess we can throw an exception about file corruption explicitly when we get "zero" from m_fileStream.Read, but I'm not sure that in normal operation such situation can never happen (I just don't know the ExcelDataReader inner structure well enough). I have attached both variants of the fix.

Can the fix be added to the trunk version? Should I upload the patch in the "Source Code" section?

Thanks!"
XlsStream issue.zip | uploaded via ZenHub

Large Numbers are being converted into scientific notation

From code plex
"Am I the only one experiencing this? I can't believe that.

I haven't seen any documentation on this issue.
I'm using version 2.1.2.0. Everything else works fine.

If I do Save as from excel the number formats just fine.

When doing excelDataReader.AsDataSet I get a scientific notation number.
This number 100025440003960000 becomes this - 1.00E+17.

This is a complete deal breaker for me.
I like the library because it's fast and easy to use but if I can't resolve this I have to find another library to use.

Any info would be greatly appreciated.

Thanks."

Make it possible to read a DataSet and leaving the values as string

At the end of ReadDataSet, the function Helpers.FixDataTypes is called that sets the columndatatypes.
If you want the values as strings, you need to convert the tables again, setting the column datatypes as string. It would be move convenient if you would have the option of just leaving them as they are and not fixing the datatypes.

Zero columns detected reading Excel file

I tried to use your library to read a large file. The current version failed to get the current number of columns. To Fix the problem I had to un-comment the following lines:

// source: ExcelOpenXmlReader.cs, line 134
if (_xmlReader.NodeType == XmlNodeType.Element && _xmlReader.LocalName == XlsxWorksheet.N_col)
cols++;

Even doing so, the class failed to populate the table, to fix the problem I had to change the following lines:

// source: ExcelOpenXmlReader.cs, line 243
if (a_r != null)
XlsxDimension.XlsxDim(a_r, out col, out row);
else
col++;

With these fixes the library woks fine, consume few memory and is light-years better than EPPLUS that after consuming 2 GB of memoy was unable to read the file (out of memoy exception).

Moreorever I strongly suggest to change the namespace from "Excel" to something like "ExcelReader" because the name is in conflict with the Microsoft Excel Interop Library. I had to do this change to use both on my project.

Thanks!

ExcelOpenXmlReader importing dates as doubles (includes suggested fix)

I was uploading some data with dates from Excel using ExcelOpenXmlReader and the dates were being imported as doubles.

I stepped through the ExcelOpenXmlReader.cs code (2.1.2.3) and noticed that on line 277 (in ReadSheetRow()) that xf.ApplyNumberFormat was always returning false preventing the Helpers.ConvertFromOATime(number) from being called on the next line.

So I added a "!" character in front of xf.ApplyNumberFormat and it works for me now. I hope this is the right fix and that it didn't break anything else :) . Here is the new code:

if (!xf.ApplyNumberFormat && o != null && o.ToString() != string.Empty && IsDateTimeStyle(xf.NumFmtId))
o = Helpers.ConvertFromOATime(number);

Thanks for a great tool; it's must appreciated.