htmfilho / csvsource Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 0.0 8.67 MB

Converts a CSV file to SQL Insert Statements.

Home Page: https://www.hildeberto.com/csvsource/

License: Apache License 2.0

Rust 100.00%

conversion csv data migration rust sql

csvsource's People

Stargazers

Watchers

csvsource's Issues

Implement Filters

Is your feature request related to a problem? Please describe.

In some cases, not all rows of a CSV file might be useful to import into a database. By using filters, it might be possible to delimiter a subset of data to import.

Describe the solution you'd like

Filters could be logical expressions with END, OR, and NOR, comparing columns and values. Filters are strings passed as argument, to be interpreted as logical expressions. The expressions are used while reading a row to include or exclude it.

Describe alternatives you've considered

When using a staging table, the application can do the filtering there before updating its core tables. But when inserting into a core table directly, we may have issues with unwanted data.

Use a template engine to allow variables in prefix and suffix files

Is your feature request related to a problem? Please describe.

Yes. In many cases, the content in the prefix and suffix files are related to the insert statements. For example, the prefix file may have a script to create a table, but the name of the table should be the same used in the insert statements. Unfortunately, the content of the prefix and suffix files are fixed.

Describe the solution you'd like

It would be interesting if the prefix and suffix files were actually template files, so we could use variables to be replaced by arguments during run-time. Variables would be replaced by arguments, such as --table and --column.

Describe alternatives you've considered

An alternative would be to edit the files, put current values, and execute the command, but it would reduce automation opportunities.

Separate the logic to generate the SQL from the logic to handle the command line

Is your feature request related to a problem? Please describe.

Today, the entire code is in a single file (main.rs). As the code grows, it becomes harder and harder to navigate and understand what it does. We want to separate the logic to generate the SQL from the logic to handle the command line.

Describe the solution you'd like

The logic to generate the SQL would be in lib.rs, making it a library usable by other applications that do not necessarily use the command line. After this change, we can publish Roma to Cargo, allowing it to be a dependence in other projects.

Info:

Set the default behavior to not add transactions to the SQL file

Is your feature request related to a problem? Please describe.

Some users may need to use the exported sql file in a database migration tool, like Liquibase. Those tools already deal with transactions, so it wouldn't be necessary to include begin transaction and commit in the file.

Describe the solution you'd like

We want to make transactions optional. The default behaviour would be to not have transactions when the argument --chunk is not present. We would add a new argument --transactional to explicitly indicate we want a transaction embracing all statements. This new argument would become useless in case the argument --chunk is present, since the whole point of it is to create transaction chunks.

Describe alternatives you've considered

The only alternative to this would be to manually remove the begin transaction and commit from the beginning and end of the file.

Support an incremental numeric identifier (primary key)

Is your feature request related to a problem? Please describe.

In case the insert statements point to a table that doesn't support auto-incremented primary keys.

Describe the solution you'd like

Support the option to include an auto-incremented or auto-generated identifier in the insert statements. The user would inform the initial value from which the auto-increment starts or if the identifier is an UUID, and the column of the identifier.

Describe alternatives you've considered

The target table would need to support an auto-incremented or auto-generated identifier. Without this support, either because the database doesn't support it or the table structure can't be changed to support it.

Read a CSV file and load the data in memory

Is your feature request related to a problem? Please describe.

This is a basic feature that is not related to a problem.

Describe the solution you'd like

Read a large CSV file.

Describe alternatives you've considered

If it is too hard to process the file manually, consider a library.

Additional context

Consider loading large files found on the internet to increase the robustness of the solution.

Format dates and times according to what is required by the target database

Is your feature request related to a problem? Please describe.

The feature is new and cover cases where the date/time format in the CSV file is not compatible with the database date/time format.

Describe the solution you'd like

Using column arguments, inform the date/time format present in the column
Using another argument, inform the date/time format supported by the database.
Convert the format in the column to the format supported by the database.

Describe alternatives you've considered

There is no alternative to solve this problem in the current version.

Additional context

The date/time format syntax will be inherited from the time library in use.

Refactor function generate_sql to use crate sql_builder

Is your feature request related to a problem? Please describe.

According to feedback from the community, the function generate_sql is complex and can be simplified by using an existing library to generate sql.

Describe the solution you'd like

The crate sql_builder seems to be a good option: https://docs.rs/sql-builder/latest/sql_builder/

Describe alternatives you've considered

The current implementation fulfills the needs of the project. Using sql_builder could simplify the code, but it can also increase its footprint. A lot of features in that library are not applicable in this project.

Based on the csv file, generate a sql table capable of storing the generated insert statements

Is your feature request related to a problem? Please describe.

The problem is when the table I want to import the data doesn't exist yet. I need to spend time creating it so I can run the insert statements.

Describe the solution you'd like

We can deduct the database to store the CSV data from the CSV itself. The solution would be to figure out the table name, columns, and column types, then generate the DDL script to create the table. The code would also verify if all rows of a column contains or not contain value to figure out if the column is null or not null.

The resulting DDL script would be added to the beginning of the script.

Describe alternatives you've considered

The alternative is to create the DDL script ourselves and add it as the prefix of the file. But this job can be automated.

Check if it works with TSV files

Is your feature request related to a problem? Please describe.

This is just to check if Roma supports a file format similar to CSV.

htmfilho / csvsource Goto Github PK

csvsource's People

Stargazers

Watchers

csvsource's Issues

Implement Filters

Use a template engine to allow variables in prefix and suffix files

Separate the logic to generate the SQL from the logic to handle the command line

Set the default behavior to not add transactions to the SQL file

Support an incremental numeric identifier (primary key)

Read a CSV file and load the data in memory

Format dates and times according to what is required by the target database

Refactor function generate_sql to use crate sql_builder

Based on the csv file, generate a sql table capable of storing the generated insert statements

Check if it works with TSV files

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent