Git Product home page Git Product logo

Comments (8)

pnadolny13 avatar pnadolny13 commented on August 9, 2024 3

cc @menzenski since it sounds like they might be taking over some of the maintainer duties based on #28 (comment) 😄

from tap-spreadsheets-anywhere.

radbrt avatar radbrt commented on August 9, 2024 2

@menzenski plugin inheritance is one of the solutions I thought of, so we could add the connection string to the config and treat it like a normal password. I don't have any objections to that solution.

from tap-spreadsheets-anywhere.

menzenski avatar menzenski commented on August 9, 2024 1

Regarding your other question about the additional parameter:

I think my personal preference would be for something like this:

def get_streamreader(uri, universal_newlines=True,newline='',open_mode='r'):
    transport_params = None

    if uri.startswith('azure://'):
        connect_str = os.environ['AZURE_STORAGE_CONNECTION_STRING']
        transport_params = {
            'client': BlobServiceClient.from_connection_string(connect_str),
        }

    streamreader = smart_open.open(uri, open_mode, newline=newline, errors='surrogateescape', transport_params=transport_params)
    if not universal_newlines and isinstance(streamreader, StreamReader):
        return monkey_patch_streamreader(streamreader)
    return streamreader

This would keep the single reference to smart_open.open and would also keep the monkeypatching for azure (though to be honest I don't know if that's required).

from tap-spreadsheets-anywhere.

menzenski avatar menzenski commented on August 9, 2024

Thanks @pnadolny13 - @ets has indeed made me a collaborator on this repository although I haven't really taken any actions yet in that capacity.

I'll preface this comment with a caveat that I don't personally have any Azure experience. My own cloud experience is really AWS-only.

In Meltano, this works well by adding it to the .env file. But if anyone wants to read from mulitple azure storage accounts, this will be hard to configure.

My understanding is that this is something that's generally true of Meltano plugins - separating environment variables in this way isn't possible. (I don't believe I could do that with AWS S3 either).

Is there a need to connect to mulitple azure storage accounts in the same invocation of the same tap? Or could you use plugin inheritance to run the tap once for each storage account, providing different environment variables to each run?

from tap-spreadsheets-anywhere.

z3z1ma avatar z3z1ma commented on August 9, 2024

I would do this @radbrt

kwarg_dispatch = {
    "azure": lambda: {
        "transport_params": {
            "client": BlobServiceClient.from_connection_string(
                os.environ['AZURE_STORAGE_CONNECTION_STRING'],
            )
        }
    },
    "gcs": lambda: {
        "transport_params": {
            "client": storage.Client.from_service_account_json(
                os.environ['GOOGLE_APPLICATION_CREDENTIALS'],
                # We can add more nuanced transport params here
            )
        }
    },
   # Adding support for more is intuitive...
}

SCHEME_SEP = "://"
kwargs = kwarg_dispatch.get(uri.split(SCHEME_SEP, 1)[0], lambda: {})()
    
streamreader = smart_open.open(uri, open_mode, newline=newline, errors='surrogateescape', **kwargs)
if not universal_newlines and isinstance(streamreader, StreamReader):
    return monkey_patch_streamreader(streamreader)
return streamreader

EDIT: it lazily evaluates itself so nothing in gcs would be evaluated if the scheme was resolved to azure

from tap-spreadsheets-anywhere.

z3z1ma avatar z3z1ma commented on August 9, 2024

Or as an FP 1-liner if you go full-tilt 😆 same thing.

def get_streamreader(uri: str, universal_newlines: bool = True, newline: str = "", open_mode: str = "r"):
  return (lambda rdr: rdr if not universal_newlines and isinstance(rdr, StreamReader) 
                          else monkey_patch_streamreader(rdr))(
    smart_open.open(uri, open_mode, newline=newline, errors="surrogateescape", **{
        "azure": lambda: {
            "transport_params": {
                "client": BlobServiceClient.from_connection_string(
                    os.environ["AZURE_STORAGE_CONNECTION_STRING"],
                )
            }
        },
        "gcs": lambda: {
            "transport_params": {
                "client": storage.Client.from_service_account_json(
                    os.environ["GOOGLE_APPLICATION_CREDENTIALS"],
                    # We can add more nuanced transport params here
                )
            }
        },
       # Adding support for more is intuitive...
    }.get(uri.split("://", 1)[0], lambda: {})())
  )

from tap-spreadsheets-anywhere.

menzenski avatar menzenski commented on August 9, 2024

@radbrt I'd encourage you to go ahead and open a PR for this.

from tap-spreadsheets-anywhere.

radbrt avatar radbrt commented on August 9, 2024

@menzenski I absolutely plan to, there were a lot of good ideas here. Will probably have time this weekend.

from tap-spreadsheets-anywhere.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.