Git Product home page Git Product logo

locopy's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

locopy's Issues

CVE-2020-14343 (High) detected in PyYAML-5.3.1.tar.gz

CVE-2020-14343 - High Severity Vulnerability

Vulnerable Library - PyYAML-5.3.1.tar.gz

YAML parser and emitter for Python

Library home page: https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz

Path to dependency file: Data-Load-and-Copy-using-Python

Path to vulnerable library: Data-Load-and-Copy-using-Python,Data-Load-and-Copy-using-Python/requirements.txt

Dependency Hierarchy:

  • PyYAML-5.3.1.tar.gz (Vulnerable Library)

Found in HEAD commit: 0b930613055b1f748f7ca0422981cd4c9d47bb5b

Vulnerability Details

A vulnerability was discovered in the PyYAML library in all versions, where it is susceptible to arbitrary code execution when it processes untrusted YAML files through the full_load method or with the FullLoader loader. .load() defaults to using FullLoader and FullLoader is still vulnerable to RCE when run on untrusted input. Applications that use the library to process untrusted input may be vulnerable to this flaw. An attacker could use this flaw to execute arbitrary code on the system by abusing the python/object/new constructor.
The fix for CVE-2020-1747 was not enough to fix this issue.

Publish Date: 2020-07-21

URL: CVE-2020-14343

CVSS 3 Score Details (9.8)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: None
    • Scope: Unchanged
  • Impact Metrics:
    • Confidentiality Impact: High
    • Integrity Impact: High
    • Availability Impact: High

For more information on CVSS3 Scores, click here.


Step up your Open Source Security Game with WhiteSource here

Move to github actions from travis

I think moving to Github actions might make sense. I've been noticing that Travis' support for OSP has lead to long queuing times. GitHub actions sort of replaces Travis for our purposes.

Move to a develop / master branch workflow

I'd like to move to a develop / master workflow and update the docs a bit to reflect the proper release instructions. I think it is important to set this up to ensure more structure and better practices for the future. right now the package is relatively small so not a huge deal in its current state.

`asn1crypto` new release breaks the SF connection

The err msg is:

.conda/envs/py37/lib/python3.7/site-packages/asn1crypto/keys.py", line 1065, in unwrap
    'asn1crypto.keys.PublicKeyInfo().unwrap() has been removed, '
asn1crypto._errors.APIException: asn1crypto.keys.PublicKeyInfo().unwrap() has been removed, please use oscrypto.asymmetric.PublicKey
().unwrap() instead

which is probably triggered by the new release by asn1crypto. need to add asn1crypto==0.24.0 as dependency.

Use S3 functions without providing RS creds

Right now if you want to just use the S3 functionality you need to provide some redshift credentials.
This isn't ideal behaviour and we should maybe look into refactoring the code a bit to decouple some of this functionality so that we can interact with S3 independently.

  • Research and refactor some of the Cmd and S3 class.
  • Potential use of Mixins here.

CVE-2019-11358 (Medium) detected in jquery-3.2.1.js

CVE-2019-11358 - Medium Severity Vulnerability

Vulnerable Library - jquery-3.2.1.js

JavaScript library for DOM operations

Library home page: https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.js

Path to vulnerable library: /Data-Load-and-Copy-using-Python/_static/jquery-3.2.1.js

Dependency Hierarchy:

  • jquery-3.2.1.js (Vulnerable Library)

Found in HEAD commit: fc064b132c13e4214bde3ffae659bafa1d52ae52

Vulnerability Details

jQuery before 3.4.0, as used in Drupal, Backdrop CMS, and other products, mishandles jQuery.extend(true, {}, ...) because of Object.prototype pollution. If an unsanitized source object contained an enumerable proto property, it could extend the native Object.prototype.

Publish Date: 2019-04-20

URL: CVE-2019-11358

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-11358

Release Date: 2019-04-20

Fix Resolution: 3.4.0


Step up your Open Source Security Game with WhiteSource here

New Release: 0.3.2

Updates:

  • General cleanup of the code base/linting
  • Bumping up pytest and pytest-cov versions
  • Pandas to Snowflake Support (#57)
  • Adding support for SQLite (#56) (Note: Tweak to allow for compatibility for those whom are interested in usage. Not a primary use case)

Adding a "USE SCHEMA" call for Snowflake

In the Snowflake class, I'd like to add a feature for automatically running the command

USE SCHEMA {{ schema_name }}

upon connecting to Snowflake. The package already has a similar feature with database and warehouse, so I think it would be a relatively simple feature addition.

Missing files in sdist

It appears that the manifest is missing at least one file necessary to build
from the sdist for version 0.3.6. You're in good company, about 5% of other
projects updated in the last year are also missing files.

+ /tmp/venv/bin/pip3 wheel --no-binary locopy -w /tmp/ext locopy==0.3.6
Looking in indexes: http://10.10.0.139:9191/root/pypi/+simple/
Collecting locopy==0.3.6
  Downloading http://10.10.0.139:9191/root/pypi/%2Bf/4f1/46b583dff9457/locopy-0.3.6.tar.gz (20 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /tmp/venv/bin/python3 /tmp/tmp8ev7dvhg get_requires_for_build_wheel /tmp/tmp_l094jkx
       cwd: /tmp/pip-wheel-w_6j1bvw/locopy
  Complete output (18 lines):
  Traceback (most recent call last):
    File "/tmp/tmp8ev7dvhg", line 280, in <module>
      main()
    File "/tmp/tmp8ev7dvhg", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/tmp/tmp8ev7dvhg", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-3ri7vmu3/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 147, in get_requires_for_build_wheel
      return self._get_build_requires(
    File "/tmp/pip-build-env-3ri7vmu3/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 128, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-3ri7vmu3/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 249, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/tmp/pip-build-env-3ri7vmu3/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 143, in run_setup
      exec(compile(code, __file__, 'exec'), locals())
    File "setup.py", line 26, in <module>
      with open(os.path.join(CURR_DIR, "requirements.txt"), encoding="utf-8") as file_open:
  FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-wheel-w_6j1bvw/locopy/requirements.txt'
  ----------------------------------------
ERROR: Command errored out with exit status 1: /tmp/venv/bin/python3 /tmp/tmp8ev7dvhg get_requires_for_build_wheel /tmp/tmp_l094jkx Check the logs for full command output.

CVE-2020-11022 (Medium) detected in jquery-3.2.1.js

CVE-2020-11022 - Medium Severity Vulnerability

Vulnerable Library - jquery-3.2.1.js

JavaScript library for DOM operations

Library home page: https://cdnjs.cloudflare.com/ajax/libs/jquery/3.2.1/jquery.js

Path to vulnerable library: /Data-Load-and-Copy-using-Python/_static/jquery-3.2.1.js

Dependency Hierarchy:

  • jquery-3.2.1.js (Vulnerable Library)

Found in HEAD commit: fc064b132c13e4214bde3ffae659bafa1d52ae52

Vulnerability Details

In jQuery versions greater than or equal to 1.2 and before 3.5.0, passing HTML from untrusted sources - even after sanitizing it - to one of jQuery's DOM manipulation methods (i.e. .html(), .append(), and others) may execute untrusted code. This problem is patched in jQuery 3.5.0.

Publish Date: 2020-04-29

URL: CVE-2020-11022

CVSS 3 Score Details (6.1)

Base Score Metrics:

  • Exploitability Metrics:
    • Attack Vector: Network
    • Attack Complexity: Low
    • Privileges Required: None
    • User Interaction: Required
    • Scope: Changed
  • Impact Metrics:
    • Confidentiality Impact: Low
    • Integrity Impact: Low
    • Availability Impact: None

For more information on CVSS3 Scores, click here.

Suggested Fix

Type: Upgrade version

Origin: https://blog.jquery.com/2020/04/10/jquery-3-5-0-released/

Release Date: 2020-04-29

Fix Resolution: jQuery - 3.5.0


Step up your Open Source Security Game with WhiteSource here

Add Postgres support

  • Figure we should add Postgres support to locopy since Redshift is just a derivative of it.
  • Might need to refactor the redshift class a bit but shouldn't be too much work

Codeowners file

Please be sure to add trusted reviewers to your codeowners file

Use black for code style

Would be nice to setup a pre-commit hook for using black

  • ensure consistent code styling
  • make for better readability

Boolean values transform to t/f

Version: 0.3.7

After downloading data from Redshift-S3 values in boolean column become t or f instead of True or False. Therefore I can't read dataframe with boolean dtype columns (nullable) properly

Find better solution to classify 'object' type column

Current way is to loop through every row in the column to determine if the 'Object' type column could potentially be timestamp eg. 2019-01-01 or float Decimal(2.0). This will be computing consuming if the dataframe is large.
One idea is to use sampling but can lead to false positives.

Azure Snowflake

Snowflake can run off azure now (and gcp sometime this year I think), so we should at least add in the docs that we don't support azure/gcp, and maybe look at supporting them (though that could be tricksy with testing)

Soften requirement for pyyaml

There's a strict dependency on a specific version of pyyaml, but some times I'm trying to use this alongside other packages that may have a range. For example if I install pre-commit to my environment first, it installs the latest pyyaml:

https://github.com/pre-commit/pre-commit/blob/master/setup.cfg#L32

What are your thoughts on having a minimum version for pyyaml but leaving the max version out? Since this doesn't rely on a huge amount of the yaml package?

Data loss when splitting csv files with load_and_copy

Python 3.7.3
locopy==0.3.6

When locopy.Redshift().load_and_copy() is called with both splits=N and copy_options=["IGNOREHEADER AS 1"] arguments,
N-1 data rows are lost,
because load_and_copy function doesn't recreate the csv file's header in the chunks, so the first chunk still has the header, and is copied correctly, while the rest of the chunks' first rows are ignored because of copy_options=["IGNOREHEADER AS 1"].

There's a workaround: remove the csv file's header and call load_and_copy() without copy_options=["IGNOREHEADER AS 1"].

Max retries and connection timeout arguments

Sometimes I have connection issues when uploading data via load_and_copy, for example:

botocore.exceptions.ConnectionClosedError:
Connection was closed before we received a valid response from endpoint URL: "https://MY-S3-FILE".

I found a proposition to use bigger connection_timeout in this thread. Also I'm interested in using retries option (it's zero by default I guess). Both are attributes of botocore.config.Config.

Can I somehow pass Config to Redshift object?

pulling data from RS w/o adding S3 credentials raise an err

A question was raised in an internal slack channel about getting S3CredentialsError when trying to sample data from RS (not doing any ETL job so doesn't really require S3 credentials).

The msg I expect to get here would be: S3 credentials we not found. S3 functionality is disabled
Not sure if we both set the inputs wrong but want to do some investigation on this.

Load Pandas DF into Redshift table

This might be some nice functionality which we can build out.
I can see a bunch of people wrangling with Pandas and then wanting to get this into Redshift.

Snowflake's write_pandas() function

Just want to have the discussion here vs in the PR.

Proposal: Add a flag use_write_pandas to insert_dataframe_to_table which will defer to the write_pandas method rather than running the INSERT INTO statements. This would give ppl both options.

Basically:

if use_write_pandas:
   # run self.cur.write_pandas(......)
else:
   insert_query = """INSERT INTO {table_name} {columns} VALUES {values}""".format(
            table_name=table_name, columns=column_sql, values=string_join
        )

We can keep the table creation / metadata part in this scenario.

Docs: https://docs.snowflake.com/en/user-guide/python-connector-api.html#label-python-connector-api-write-pandas

Need to write a bit on SQL injection in the docs

  • I think given the type of processing locopy is doing we need a bit of a explanation and warning on SQL injections.
  • There doesn't seem to be a great solution when dealing with table names and COPY/UNLOAD statements as these can't be binded via parameterization you would say for a where clause.

Switch back to standard Loggging

Seems like the advantages for loguru are not really being met.
Proposal to switch back to just standard logging to ensure easier compatibility with dependent workflows.

Load NULL in database as numpy.nan in pandas

NULL values in Snowflake are loaded into python as None. When using Database.to_dataframe, these values remain as None rather than being converted to numpy.nan. As a result, any column containing None is forced to an "object" data type.

This makes it difficult to validate our data, since the data type has already changed. It also necessitates an extra step for type conversion.

I haven't tested this fully, but this issue could be fixed by changing this line
fetched = [tuple(column for column in row) for row in fetched]

to something like:

fetched = [tuple(column if column is not None else np.nan for column in row) 
    for row in fetched]

np.nan seems to be the de facto null value to use with pandas, as it doesn't mess up one's dtypes. If Database.to_dataframe is meant to be a convenient way of porting data into pandas, I think it makes sense for the method to be aware of the issue with None and to handle nulls more gracefully.

Support for SQLite3

I tried to connect to a local SQLite3 database using Locopy and it threw the following error:

ValueError: parameters are of unsupported type

After doing some digging, it looks like the error is being thrown by the default argument in locopy.database.Database.execute() for params. Once I started using params=(), it started working.

e.g.

with locopy.Database(dbapi=sqlite3, database=':memory:') as cmd:
  cmd.execute('''CREATE TABLE stocks (date text, qty real)''', params=())

I tested params=() with locopy.snowflake.Snowflake as well and it worked.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.