Comments (5)
smart_open supports http/s so it can open and read files using the protocol. However - take a look at line 101 here https://github.com/ets/tap-spreadsheets-anywhere/blob/master/tap_spreadsheets_anywhere/file_utils.py
That's the section where http/s isn't yet supported...it's where we collect all the available source files for a config block. For URLs we should probably just return a single target since a file listing isn't practical. Relatively easy PR if you're interested in contributing.
from tap-spreadsheets-anywhere.
Yep, I can take a stab at it.
Looking at the other list_files_in_*
functions, I'm guessing we'd just do a simple passthrough of the URL here, returned as a single element list?
from tap-spreadsheets-anywhere.
Agreed - that seems like the best fit for a URL.
from tap-spreadsheets-anywhere.
Working on it now. Does the list_files function need to do anything else other than this?
def list_files_in_http_bucket(uri, search_prefix=None):
entries = [uri]
LOGGER.info("Found {} files.".format(entries))
return entries
from tap-spreadsheets-anywhere.
Added support in the latest update.
from tap-spreadsheets-anywhere.
Related Issues (20)
- Skip extra header lines
- Add option to set encoding
- Extend "json_path" config option with JSONPath parser for deep nested data
- Error during discovery doesn't fail job
- Walking a non existant local file directory doesn't fail
- Add new output type `object` HOT 1
- Silent failure during sampling of an Excel spreadsheet with blank rows before the data
- CI is failing on missing PDM lock file
- `zipfile.BadZipFile: File is not a zip file` when loading an `.xlsx` file HOT 7
- Azure sync process logs quite noisy HOT 2
- Stream contains all rows in `.xlsx` sheet instead of only data rows.
- Azure: Use DefaultAzureCredential over storage key access to blob container
- SFTP error
- *csv not working as RegEx in pattern (but .csv$ does work)
- Create a way to extract spreadsheets with no header row
- Ability to set granularity of replication key HOT 1
- [Documentation] How to tap from s3ninja
- Bug when reading `.xlsx` files. Excel files not properly tapped and no output with `ERROR Unable to write Catalog entry for 'filexlsx' - it will be skipped due to error File is not a zip file`
- TAP_SPREADSHEETS_ANYWHERE_TABLES environment variable is not seen by the tap
- Executable 'tap-spreadsheets-anywhere' could not be found HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tap-spreadsheets-anywhere.