Git Product home page Git Product logo

spring-batch-extensions's People

Contributors

arafalov avatar cesaralves avatar chrisjs avatar dependabot[bot] avatar dgray16 avatar fmbenhassine avatar hasnainjaved avatar joeyvmason avatar lpalnau avatar mdeinum avatar mminella avatar spring-builds avatar spring-operator avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spring-batch-extensions's Issues

Make it possible to read FileSystemResource in PoiItemReader

Thank you for making a good library.
But it has one drawback, which it cannot read excel file resources using FileSystemResource.

In current version, If you put FileSystemResource in openExcelFile(Final Resources resources) method’s parameter and execute it, it throws Exception. Because FileInputStream isn't mark supported and also not wrapped as PushBackInputStream.

@Override
protected void openExcelFile(final Resource resource) throws Exception {
    workbookStream = resource.getInputStream();

    if (!workbookStream.markSupported() && !(workbookStream instanceof PushbackInputStream)) {
        throw new IllegalStateException("InputStream MUST either support mark/reset, or be wrapped as a PushbackInputStream");
    }

    this.workbook = WorkbookFactory.create(workbookStream);
    this.workbook.setMissingCellPolicy(Row.CREATE_NULL_AS_BLANK);
}

But it’s an unnecessary check because WorkbookFactory.create(workbookStream) method can wrap InputStream as PushBackInputStream when it isn’t mark supported.

public static Workbook create(InputStream inp) throws IOException, InvalidFormatException {
    if(!((InputStream)inp).markSupported()) {
        inp = new PushbackInputStream((InputStream)inp, 8);
    }

    if(POIFSFileSystem.hasPOIFSHeader((InputStream)inp)) {
        return new HSSFWorkbook((InputStream)inp);
    } else if(POIXMLDocument.hasOOXMLHeader((InputStream)inp)) {
        return new XSSFWorkbook(OPCPackage.open((InputStream)inp));
    } else {
        throw new IllegalArgumentException("Your InputStream was neither an OLE2 stream, nor an OOXML stream");
    }
}

So I think it's more useful to remove unnecessary validation check in openExcelFile(Final Resources resources) method to read excel file as FileSystemResource which is used frequenly in batch environment

Release spring-batch-bigquery version 0.1.0

@dgray16 Please add a comment here when the module is ready for a release.

Please note that we are reviewing the internal release process within the entire portfolio, so I will add an update here when the release is done.

Is project abandoned?

Is this project dead?
I see a lot of useful pull requests that have been ignored for several years.
How can we push this forward?

Publish Spring Batch Excel 0.1.1 artifacts to maven

Could you please publish the latest version to the maven repo?
I am using the excel extension in one of my projects and I am facing an issue that was fixed in #90 however, the latest version in maven 0.1.0 does not contain the fix.

Thanks!

Add module for Neo4j

This issue is to move the current item reader/writer for Neo4j in Spring Batch to this extension repository.

Support DataFormatter in spring-batch-excel POI implementation

DataFormatter enables the POI version in spring-batch-excel to read the cell values as they appear in Excel (rather than returning the value with the type that excel used internally.

I would like to add this as an option to the PoiItemReader - so the user can choose to retrieve all values as Strings and just in the way they appear in Excel.

The reason is that I am having numbers that I want to be read as strings. But this is currently not possible.

Not able to open excel files larger than 10MB

When trying to read excel file larger than 10MB, an error occurs:

"Unexpected error Tried to allocate an array of length 162,386,364, but the maximum length for this record type is 100,000,000. If the file is not corrupt or large, please open an issue on bugzilla to request increasing the maximum allowable size for this record type. As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()"

Update Apache POI

Apache POI is currently at version 3.15 we should support that version.

Use reader current row count from failed execution to get correct row on restart

I noticed when restarting jobs after failure, the AbstractExcelItemReader would begin reading from the first row in the spreadsheet. The doRead method is simply calling rowSet.next() and ignoring the AbstractItemCountingItemStreamItemReaders attempt to jump to the correct row. A solution that I found to work was to override the jumpToItem method in AbstractExcelItemReader and simply call rowSet.next until we have the correct row.

reading empty rows

the issue is that CustomMapper is reading empty rows. even after deleting the rows in excel, its reading.

Issue with spring-batch-excel using Resource which might not have getFile() implemented and does not throw a FileNotFoundException exception

The following code is used to read the excel sheets:
StreamingXlsxItemReader.java:

    protected void openExcelFile(Resource resource, String password) throws Exception {
        try {
            File file = resource.getFile();
            this.pkg = OPCPackage.open(file, PackageAccess.READ);
        } catch (FileNotFoundException var4) {
            this.inputStream = resource.getInputStream();
            this.pkg = OPCPackage.open(this.inputStream);
        }

        XSSFReader reader = new XSSFReader(this.pkg);
        this.initSheets(reader, this.pkg);
    }

PoiItemReader.java:

    protected void openExcelFile(Resource resource, String password) throws Exception {
        try {
            File file = resource.getFile();
            this.workbook = WorkbookFactory.create(file, password, false);
        } catch (FileNotFoundException var4) {
            this.inputStream = resource.getInputStream();
            this.workbook = WorkbookFactory.create(this.inputStream, password);
        }

        this.workbook.setMissingCellPolicy(MissingCellPolicy.CREATE_NULL_AS_BLANK);
    }

It's nice that there is a fallback to attempt to use resource.getInputStream() but I ran into a problem with this spring-cloud project which uses a GoogleStorageResource and the issue is that the exception being thrown is UnsupportedOperationException which isn't handled by the code above. Please see here:
https://github.com/spring-attic/spring-cloud-gcp/blob/main/spring-cloud-gcp-storage/src/main/java/org/springframework/cloud/gcp/storage/GoogleStorageResource.java#L244

To fix this wondering if it makes sense to check the Resource if it's a file and if that's true call getFile() otherwise attempt to use getInputStream(). So it would look like this:

                try {
                    if(resource.isFile()) {
                        File file = resource.getFile();
                        this.pkg = OPCPackage.open(file, PackageAccess.READ);
                    } else {
                        this.inputStream = resource.getInputStream();
                        this.pkg = OPCPackage.open(this.inputStream);
                    }
                } catch (Exception ex) {
                    throw new IllegalArgumentException("Unable to read data from resource", ex);
                }

                XSSFReader reader = new XSSFReader(this.pkg);
                this.initSheets(reader, this.pkg);

Remove JXL support

since JExcelAPI is an abandoned project (no release since 2009, with serious bugs remaining).

This way we could simplify the API and remove some of the abstraction and make it dedicated for POI. We could then also consider creating an ItemWriter for writing out excel files instead of only reading.

skip a column

Hey,
is there any method to skip the first column of an excel file while reading it in the batch ?
thank you

Release spring-batch-excel version 0.1.0

@mdeinum Please add a comment here when the module is ready for a release.

Please note that we are reviewing the internal release process within the entire portfolio, so I will add an update here when the release is done.

Excel file only read once - subsequent parses hang

I don't know for sure if this is an issue with this plugin, or whether it is caused by something I am doing / not doing. Details are here: http://stackoverflow.com/questions/29127028/grails-spring-batch-excel-reader-only-reads-file-once.

When I read an XLSX file, it works the first time, but subsequent attempts to parse the same file (by restarting Grails) just mean the job continues indefinitely. I need to reboot the computer to be able to rerun the job (and then it will only run once before hanging again).

Lines To skip not working properly

For POI Item Reader API if initial rows are null and we have applied linesToSkip it will skip those many lines but once it finds rows with not null values it will pick number of columns from row 0. Which should not be the case . It should pick that row column numbers.

//PoiSheet
@OverRide
public int getNumberOfColumns() {
if (numberOfColumns < 0) {
numberOfColumns = this.delegate.getRow(0).getLastCellNum();
}
return numberOfColumns;
}

Parsing #NA fields within excel sheet

At the moment the switch case dealing with the various data does not support invalid fields. this is an issue with data sets that are generated incorrectly meaning the whole sheet is unable to be parsed if one field is out. At the moment it returns

Cannot handle cells of type 5 for these fields. are we able to add a switch to support these type.

this is the field type I'm referring

image

Object (POJO) is getting null

I'm using Spring Batch Excel Extension to read Excel (.xlx) file. I cloned the source and did mvn install and added the dependency to my Spring Boot project. I also added Apache poi-ooxml.

My Excel file has simple data:

Id  Last Name   First Name
3   Aguinaldo   Emilio
4   Aquino      Melchora
5   Dagohoy     Francisco
6   Luna        Antonio
7   Jacinto     Emilio

This is my Student class:

@Entity
public class Student {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private long id;
    @NotBlank(message = "{NotBlank.student.lastName}")
    private String lastName;
    @NotBlank(message = "{NotBlank.student.firstName}")
    private String firstName;
    private LocalDateTime createdAt;

    // getters, setters
}

I created utility class whose method does actual reading of Excel file:

public class ExcelUtils {
    public static <T> ItemReader<T> excelToItemReader(Path file, Class<T> clazz) throws Exception {
        PoiItemReader<T> reader = new PoiItemReader<>();
        reader.setLinesToSkip(1);
        System.out.println("File Name: " + file.toString()); // Displays: File Name: uploads/excel/<Excel file selected to import>
        Resource resource = new FileSystemResource(file);
        System.out.println("File exists? " + resource.exists()); // Displays: File exists? true
        reader.setResource(resource);
        reader.setRowMapper(excelRowMapper(clazz));
        return reader;
    }

    private static <T> RowMapper<T> excelRowMapper(Class<T> clazz) {
        BeanWrapperRowMapper<T> rowMapper = new BeanWrapperRowMapper<>();
        rowMapper.setTargetType(clazz);
        return rowMapper;
    }
}

After uploading the files, I would select a file to import its data to my database:

@PostMapping("/import")
public String importStudents(@RequestParam String fileName, RedirectAttributes redirectAttributes) throws Exception {
    ItemReader<Student> studentItemReader = ExcelUtils.excelToItemReader(storageService.load(fileName), Student.class);
    Student student = studentItemReader.read();
    if (student != null) {
        System.out.println("Student has data.");
        studentService.save(student);
    } else {
        System.out.println("Student is null");
        throw new Exception("Student is null");
    }

    redirectAttributes.addFlashAttribute("message", "You successfully imported students data from " + fileName + "!");

    return "redirect:/students";
}

I don't understand why student is getting null when there is not error being logged in console at all.

Exception when running in Async

Hi,
When I am running the job (PoiItemReader) using Simple Async Executor, I am getting the following exception -
Exception parsing Excel file (Because of null rows)
Whereas if I run the job normally (without SimpleAsyncTaskExecutor), I do not get any exceptions.

What could be the issue? Can someone help me out here?

Sheet index (1) is out of range (0..0)

Hey. I'm having this error using your extension library.

when I execute the job it works just fine. but at the second time of executing, it gives me that error.

 @Bean
    public PoiItemReader<ActivosExcel> excelReader() throws MalformedURLException {
        PoiItemReader<ActivosExcel> reader = new PoiItemReader<>();
        reader.setSaveState(false);
        reader.setLinesToSkip(1);
        reader.setResource(new UrlResource("file:\\file.xlsx") {
        });
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

that is my Reader there... the complete error is:

Caused by: java.lang.IllegalArgumentException: Sheet index (1) is out of range (0..0)
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.validateSheetIndex(XSSFWorkbook.java:1527) ~[poi-ooxml-3.16.jar:3.16]
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.getSheetAt(XSSFWorkbook.java:1134) ~[poi-ooxml-3.16.jar:3.16]
	at org.apache.poi.xssf.usermodel.XSSFWorkbook.getSheetAt(XSSFWorkbook.java:121) ~[poi-ooxml-3.16.jar:3.16]
	at org.springframework.batch.item.excel.poi.PoiItemReader.getSheet(PoiItemReader.java:47) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
	at org.springframework.batch.item.excel.AbstractExcelItemReader.openSheet(AbstractExcelItemReader.java:120) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
	at org.springframework.batch.item.excel.AbstractExcelItemReader.doOpen(AbstractExcelItemReader.java:112) ~[spring-batch-excel-0.5.0-SNAPSHOT.jar:na]
	at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.open(AbstractItemCountingItemStreamItemReader.java:144) ~[spring-batch-infrastructure-3.0.8.RELEASE.jar:3.0.8.RELEASE]
	... 79 common frames omitted

can you please help me? or should I do a custom Reader with POI. If that's the case, can you give me an example of how to do it?

thank you.

UPDATE: The first time that I use it, it works (my .xlsx file only have 1 sheet.) but the second time it doesnt because The Reader doesnt find a second sheet.. so it throws that error.

I did a little test and i just created another sheet in file and it worked. but still have the problem in code.

Where is the index incrementing? it should always be 0!

Need of reading one particular sheet.

Since the xlsx format supports storing multiple tabs named differently and with different columns there is a need for supporting such files. It could be done by giving the user an ability to specify which sheet to read from by adding the Id or Name.

AbstractExcelItemReader ignores a number of empty rows inbetween filled set of rows.It would help if the this functioning is made configurable since this causes incorrect meta data provided for the row number.

AbstractExcelItemReader ignores a number of empty rows in between filled set of rows. It would help if the this functioning is made configurable since this causes incorrect meta data provided for the row number.
Issue Description:
Suppose 5 rows containing data which 6th and 7th row is empty and the last 8th row has data(Screenshot attached for reference). This would return the 8th row as the 6th which may be problem if exact row number from cell is to be determined.

image

Date format when reading

In my file dates are in DD/MM/YYYY and when Spring/POI are reading data org.apache.poi.ss.usermodel.DataFormatter is used and in performDateFormatting method the parameter dateFormat has a pattern of M/d/yy.

Is there a way to force the date pattern when reading ?

My RowMapper configuration is

	<bean id="caricaAnagraficheReader" class="org.springframework.batch.extensions.excel.poi.PoiItemReader" scope="step">
		<property name="resource" value="file:#{batchParameters.genericBatchParameters.allegatoNomeCompleto}" />
		<property name="linesToSkip" value="1" />
	    <property name="rowMapper">
	        <bean class="it.blue.batch.portali.components.CaricaAnagraficheRowMapper" />
    	</property>
	</bean>

startStatement() should not be required in Neo4jItemReader

Bug description
In Neo4jItemReaderBuilder, startStatement(String startStatement) is required, but Neo4j itself deprecated the START statement and throw error when used. If not used, application will throw BeanCreationException with message java.lang.IllegalArgumentException: startStatement is required.

Environment
Spring Boot: 2.7.0
Kotlin: 1.6.10
Neo4j: 4.4.4

Steps to reproduce

@Bean
    fun postReader(): ItemReader<Post> {
        return Neo4jItemReaderBuilder<Post>()
            .name("postReader")
            .sessionFactory(getSessionFactory())
            .startStatement("")
            .matchStatement("(p:Post)")
            .returnStatement("p")
            .targetType(Post::class.java)
            .pageSize(1000)
            .build()
    }

Expected behavior
startStatement() should be optional, not mandatory.

ElasticsearchItemReader keeps grabbing data indefinitely because of the implementation of the doPageRead() method.

The method doPageRead() from ElasticsearchItemReader will stop if the ES query return null.

So normally if there are 100 items to retrieve with a 50 items range, the method doPageRead() is called three times. The first time 50 items are retrieved, the second time 50 others and the last time, the query returns null so doPageRead() stops.

Here the query keeps retrieving indefinitely the 50 first items, even if the SearchQuery is paginated.

I will find a solution, then share it here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.