Git Product home page Git Product logo

agate-sql's People

Contributors

aklaver avatar chrislkeller avatar git-clueless avatar jez avatar jpmckinney avatar onyxfish avatar pmlandwehr avatar s-t-e-v-e-n-k avatar smtakeda avatar z2s8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

agate-sql's Issues

MySQL needs REPLACE and not INSERT OR REPLACE (though it does use INSERT IGNORE)

When using csvsql to insert csv files to MySQL database, the REPLACE prefix generates incorrect SQL and fails with error.

csvsql --version
csvsql 1.0.5
python3 --version
Python 3.7.3
python --version
Python 2.7.16

Tested on Debian WSL.

Test file attached but can be anything.
test.txt

Commands tried:
csvsql --db mysql+mysqlconnector://$dbUser:$dbPass@$dbString:$dbPort/$dbSchema --tables test --unique-constraint Id --create-if-not-exist --prefix REPLACE --insert test.csv

csvsql --db mysql+mysqlconnector://$dbUser:$dbPass@$dbString:$dbPort/$dbSchema --tables test --unique-constraint Id --create-if-not-exist --prefix IGNORE --insert test.csv

When using IGNORE prefix the command finishes without any error, but with REPLACE it throws an ProgrammingError:

ProgrammingError: (mysql.connector.errors.ProgrammingError) 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'REPLACE INTO test (Name, Id, Width) VALUES ('Charles', '1', '1')' at line 1
[SQL: INSERT REPLACE INTO test (Name, Id, Width) VALUES (%(Name)s, %(Id)s, %(Width)s)]

As you can see on the bottom line, I have tried to highlight the error. It should have been only "REPLACE INTO"
MySQL Documentation states that the correct syntax for REPLACE INTO does not have INSERT in front.
https://dev.mysql.com/doc/refman/8.0/en/replace.html

As the syntax for IGNORE actually is "INSERT IGNORE INTO", this prefix does not throw an error.

Handle non-ascii text values in Python 2

over on the News Nerd slack, @chrislkeller reported problems with a unicode database.

Traceback (most recent call last):
  File "_init.py", line 16, in <module>
    new_table = agate.Table.from_sql('mysql:...', '...')
  File "/usr/local/lib/python2.7/site-packages/agatesql/table.py", line 87, in from_sql
    return agate.Table(rows, column_names, column_types)
  File "/usr/local/lib/python2.7/site-packages/agate/table/__init__.py", line 166, in __init__
    new_rows.append(Row(tuple(cast_funcs[i](d) for i, d in enumerate(row)), self._column_names))
  File "/usr/local/lib/python2.7/site-packages/agate/table/__init__.py", line 166, in <genexpr>
    new_rows.append(Row(tuple(cast_funcs[i](d) for i, d in enumerate(row)), self._column_names))
  File "/usr/local/lib/python2.7/site-packages/agate/data_types/text.py", line 36, in cast
    return six.text_type(d)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 51: ordinal not in range(128)

Further discussion in there suggested adding an encoding kwarg to from_sql. A little sniffing suggests that you can also manipulate the connection string to force results to utf-8, although I don't know what you'd do if a Connection were passed in, or if for some reason the string argument already had URL parameters.

Idea to refactor number code

Related to:

  1. comment #36 (comment)
  2. change / commit 4f8ad30

Idea / Proposal / Question

if isinstance(column.data_type, agate.Number) and dialect in ('ingres', 'mssql', 'mysql', 'oracle'):

Change the include list check to an exclude list. Looking at the code, sounds like PostgreSQL and SQLite could be an exclude list, and maybe easier to maintain.

            if isinstance(column.data_type, agate.Number) and dialect not in ('postgresql', 'sqlite'):

Test suite showing some ageing cracks

I'm the packager for this on Arch Linux and I'm starting to run into some problems. I realize you use Tox here for testing and that makes sense for your upstream use case, but it does not allow testing in place on a system to confirm that distro packages are serving their functions.

First, using setuptools as a test runner stopped working a while back because datetime as used in this project is no longer compatible with the current released version of datetime.

Second, switching to pytest as a runner, I get the following test failure. I presume this is again a time parsing issue related to Python upstream datetime changes.

============================= test session starts ==============================
platform linux -- Python 3.9.6, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /build/python-agate-sql/src/agate-sql-0.5.7
collected 18 items

tests/test_agatesql.py ...............F..                                [100%]

=================================== FAILURES ===================================
_______________ TestSQL.test_to_sql_create_statement_with_schema _______________

self = <tests.test_agatesql.TestSQL testMethod=test_to_sql_create_statement_with_schema>

        def test_to_sql_create_statement_with_schema(self):
            statement = self.table.to_sql_create_statement('test_table', db_schema='test_schema', dialect='mysql')

>           self.assertEqual(statement.replace('\t', '  '), '''CREATE TABLE test_schema.test_table (
      number DECIMAL(38, 3),
      textcol VARCHAR(1) NOT NULL,
      boolean BOOL,
      date DATE,
      datetime TIMESTAMP NULL
    );''')  # noqa
E   AssertionError: 'CREA[126 chars]\n  datetime TIMESTAMP NULL, \n  CHECK (boolean IN (0, 1))\n);' != 'CREA[126 chars]\n  datetime TIMESTAMP NULL\n);'
E     CREATE TABLE test_schema.test_table (
E       number DECIMAL(38, 3),
E       textcol VARCHAR(1) NOT NULL,
E       boolean BOOL,
E       date DATE,
E   -   datetime TIMESTAMP NULL,
E   ?                          --
E   +   datetime TIMESTAMP NULL
E   -   CHECK (boolean IN (0, 1))
E     );

tests/test_agatesql.py:142: AssertionError
=============================== warnings summary ===============================
../../../../usr/lib/python3.9/site-packages/leather/series/base.py:3
../../../../usr/lib/python3.9/site-packages/leather/series/base.py:3
../../../../usr/lib/python3.9/site-packages/leather/series/base.py:3
  /usr/lib/python3.9/site-packages/leather/series/base.py:3: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working

tests/test_agatesql.py: 400 warnings
  /usr/lib/python3.9/site-packages/packaging/version.py:127: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_with_schema
================== 1 failed, 17 passed, 403 warnings in 0.62s ==================

Additionally it's worth noting that some other deprecations warnings are in play and Python 3.10 is going to add another failure, although it looks like that might be in a transitive dependency.

It is probably worth updating this project to use current APIs and doing a minor release so that it can be used more robustly than inside a private venv bubble with old releases of stuff.

Support "Integer" data types when data has no fractional numbers (floats, decimals)

When the values are all integer, it would be nice to know that, instead of just "Number"

The daily subtitle file from opensubtitles.org is less than 100k is size, about 2000 lines, and is a nice dataset for showing off csvkit.

curl "http://dl.opensubtitles.org/addons/export/subtitles_day.txt.gz" |  gunzip -c > subtitles.txt
  
csvcut -c IDSubtitle,MovieYear subtitles.txt -t | csvstat -y 0
  1. "IDSubtitle"

	Type of data:          Number
	Contains null values:  False
	Unique values:         1856
	Smallest value:        9,747,231
	Largest value:         9,749,339
	Sum:                   18,092,851,467
	Mean:                  9,748,303.592
	Median:                9,748,352.5
	StDev:                 628.279
	Most common values:    9,747,231 (1x)
	                       9,747,232 (1x)
	                       9,747,233 (1x)
	                       9,747,234 (1x)
	                       9,747,235 (1x)

  2. "MovieYear"

	Type of data:          Number
	Contains null values:  True (excluded from calculations)
	Unique values:         76
	Smallest value:        1,931
	Largest value:         2,023
	Sum:                   3,619,369
	Mean:                  2,016.362
	Median:                2,022
	StDev:                 14.175
	Most common values:    2,023 (861x)
	                       2,016 (115x)
	                       2,015 (95x)
	                       2,021 (93x)
	                       2,019 (85x)

Row count: 1856
csvsql subtitles.txt 
CREATE TABLE subtitles (
	"IDSubtitle" DECIMAL NOT NULL, 
	"MovieName" VARCHAR NOT NULL, 
	"MovieYear" DECIMAL, 
	"LanguageName" VARCHAR NOT NULL, 
	"ISO639" VARCHAR NOT NULL, 
	"SubAddDate" TIMESTAMP, 
	"ImdbID" DECIMAL NOT NULL, 
	"SubFormat" VARCHAR NOT NULL, 
	"SubSumCD" DECIMAL NOT NULL, 
	"MovieReleaseName" VARCHAR, 
	"MovieFPS" DECIMAL NOT NULL, 
	"SeriesSeason" DECIMAL, 
	"SeriesEpisode" DECIMAL, 
	"SeriesIMDBParent" DECIMAL, 
	"MovieKind" VARCHAR NOT NULL, 
	"URL" VARCHAR NOT NULL
);

IDSubtitle, ImdbID and MovieYear would be better represented in the database as integer values, rather than decimal.

Thanks for your consideration.

Postgres TIMESTAMP not supported

agate==1.0.0
agate-sql==0.2.0

Seems database specific types are not supported.

In [1]: import agate

In [2]: import agatesql

In [3]: agatesql.patch()

In [4]: plants = agate.Table.from_sql('postgresql://postgres:@localhost/production', 'plant1')

ValueError Traceback (most recent call last)
in ()
----> 1 plants = agate.Table.from_sql('postgresql://postgres:@localhost/production', 'plant1')

/home/aklaver/py_virt/pandas/lib/python2.7/site-packages/agatesql/table.py in from_sql(cls, connection_or_string, table_name)
56 column_types.append(agate.DateTime())
57 else:
---> 58 raise ValueError('Unsupported sqlalchemy column type: %s' % sql_type)
59
60 s = select([sql_table])

ValueError: Unsupported sqlalchemy column type: <class 'sqlalchemy.dialects.postgresql.base.TIMESTAMP'>

how to install agate and agatesql in cloud foundry

Hi Team,
when i try to install agate and agate-sql from python-buildpack which is not able to install and getting error
"2017-08-06T13:21:48.42-0500 [STG/0] ERR Could not find a version that satisfies the requirement agatesql (from versions: )
2017-08-06T13:21:48.48-0500 [STG/0] ERR No matching distribution found for agatesql
2017-08-06T13:21:48.50-0500 [STG/0] ERR CondaValueError: Value error: pip returned an error."

please guide me how to install in my cloud foundry.

I appreciate you help in this regards

Thanks,
Raj

MSSQL dialect uses BOOLEAN instead of BIT

Looking at the code, I'm wondering if this is actually an SQLAlchemy issue, but if I use any of the csvkit commands with -i mssql, I get back out BOOLEAN instead of BIT and have to replace before running in MSSQL. Appreciate any tips.

TestSQL.test_to_sql_create_statement_with_schema fails

When packaging agate-sql for openSUSE, the test suite fails with this failure (using Python 3.6):

[   25s] + pytest-3.6 --ignore=_build.python36 --ignore=_build.python38 -v
[   25s] ============================= test session starts ==============================
[   25s] platform linux -- Python 3.6.12, pytest-6.2.2, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3.6
[   25s] cachedir: .pytest_cache
[   25s] rootdir: /home/abuild/rpmbuild/BUILD/agate-sql-0.5.5
[   25s] collecting ... collected 18 items
[   25s]
[   26s] tests/test_agatesql.py::TestSQL::test_back_and_forth PASSED              [  5%]
[   26s] tests/test_agatesql.py::TestSQL::test_chunk_size PASSED                  [ 11%]
[   26s] tests/test_agatesql.py::TestSQL::test_create_if_not_exists PASSED        [ 16%]
[   26s] tests/test_agatesql.py::TestSQL::test_make_sql_table_col_len_multiplier PASSED [ 22%]
[   26s] tests/test_agatesql.py::TestSQL::test_make_sql_table_min_col_len PASSED  [ 27%]
[   26s] tests/test_agatesql.py::TestSQL::test_prefixes PASSED                    [ 33%]
[   26s] tests/test_agatesql.py::TestSQL::test_sql_query_aggregate PASSED         [ 38%]
[   26s] tests/test_agatesql.py::TestSQL::test_sql_query_limit PASSED             [ 44%]
[   26s] tests/test_agatesql.py::TestSQL::test_sql_query_select PASSED            [ 50%]
[   26s] tests/test_agatesql.py::TestSQL::test_sql_query_simple PASSED            [ 55%]
[   26s] tests/test_agatesql.py::TestSQL::test_to_sql_create_statement PASSED     [ 61%]
[   26s] tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_no_constraints PASSED [ 66%]
[   26s] tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_unique_constraint PASSED [ 72%]
[   26s] tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_wide_width PASSED [ 77%]
[   26s] tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_with_dialects PASSED [ 83%]
[   26s] tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_with_schema FAILED [ 88%]
[   26s] tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_zero_width PASSED [ 94%]
[   26s] tests/test_agatesql.py::TestSQL::test_unique_constraint PASSED           [100%]
[   26s]
[   26s] =================================== FAILURES ===================================
[   26s] _______________ TestSQL.test_to_sql_create_statement_with_schema _______________
[   26s]
[   26s] self = <tests.test_agatesql.TestSQL testMethod=test_to_sql_create_statement_with_schema>
[   26s]
[   26s]         def test_to_sql_create_statement_with_schema(self):
[   26s]             statement = self.table.to_sql_create_statement('test_table', db_schema='test_schema', dialect='mysql')
[   26s]
[   26s]             self.assertEqual(statement.replace('\t', '  '), '''CREATE TABLE test_schema.test_table (
[   26s]       number DECIMAL(38, 3),
[   26s]       text VARCHAR(1) NOT NULL,
[   26s]       boolean BOOL,
[   26s]       date DATE,
[   26s]       datetime TIMESTAMP NULL,
[   26s]       CHECK (boolean IN (0, 1))
[   26s] >   );''')  # noqa
[   26s] E   AssertionError: 'CREA[59 chars] \n  `text` VARCHAR(1) NOT NULL, \n  boolean B[78 chars]\n);' != 'CREA[59 chars] \n  text VARCHAR(1) NOT NULL, \n  boolean BOO[76 chars]\n);
'
[   26s] E     CREATE TABLE test_schema.test_table (
[   26s] E       number DECIMAL(38, 3),
[   26s] E   -   `text` VARCHAR(1) NOT NULL,
[   26s] E   ?   -    -
[   26s] E   +   text VARCHAR(1) NOT NULL,
[   26s] E       boolean BOOL,
[   26s] E       date DATE,
[   26s] E       datetime TIMESTAMP NULL,
[   26s] E       CHECK (boolean IN (0, 1))
[   26s] E     );
[   26s]
[   26s] tests/test_agatesql.py:149: AssertionError
[   26s] =============================== warnings summary ===============================
[   26s] tests/test_agatesql.py::TestSQL::test_make_sql_table_col_len_multiplier
[   26s]   /home/abuild/rpmbuild/BUILD/agate-sql-0.5.5/tests/test_agatesql.py:191: DeprecationWarning: Please use assertEqual instead.
[   26s]
[   26s] tests/test_agatesql.py::TestSQL::test_make_sql_table_min_col_len
[   26s]   /home/abuild/rpmbuild/BUILD/agate-sql-0.5.5/tests/test_agatesql.py:203: DeprecationWarning: Please use assertEqual instead.
[   26s]
[   26s] -- Docs: https://docs.pytest.org/en/stable/warnings.html
[   26s] =========================== short test summary info ============================
[   26s] FAILED tests/test_agatesql.py::TestSQL::test_to_sql_create_statement_with_schema
[   26s] =================== 1 failed, 17 passed, 2 warnings in 0.54s ===================

Full build log with all details of the process and versions of all packages used.

sqlalchemy very outdated

Chasing a dependency tree to try to find out why SQLalchemy was being downgraded in my install set up, I notice that agate-sql has a pin on SQLAlchemy<2.

Are the any blockers on upgrading SQLAlchemy?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.