Comments (8)
The way to do this is:
@VisiData.api
def open_nsv(vd, p):
return NsvSheet(p.base_stem, source=p)
class NsvSheet(TsvSheet):
pass
NsvSheet.options.delimiter = '\x00'
Also is it literally not possible to pass a NUL character into Python from the CLI? Not even with $'\0'
? That seems pretty egregious. Maybe we could make an empty separator mean NUL.
from visidata.
Ah right, thanks.
Yes, it is impossible for the shell to execute a program with arguments that contain a NUL character, as the argv in theexec*()
system calls (in C) uses NUL to terminate its strings.
from visidata.
Ah, of course. Well, I'd take a PR to make options.delimiter=''
mean NUL-delimited, if you're up for it. We'll want to update the (new) docs at visidata/features/xsv_guide.py too.
from visidata.
Okay, great!
from visidata.
From @midichef
The issue is more complicated than I realized though. How should we handle comments when the delimiter is NUL?
i.e. TsvSheet.options.regex_skip = '^#.*' will currently skip over lines that look like comments. But it should definitely not do that when handling the output of find -print0.
My intuition is, regex_skip should not be used when the row delimiter is NUL, as we're not in classic TSV format any more.
from visidata.
There are two more issues where NUL as delimiter has a mismatch with the traditional TSV behavior.
- The tsv loader assumes data is text, not binary.
It runsopen_text_source()
. This causes some unusual behavior.
If we read a NUL-separated file from disk, we get one row.
echo -n 'col\0' > one-row.nsv; vd -f tsv --row-delimiter= one-row.nsv
But if we pipe the same data:
echo -n 'col\0' |vd -f tsv --row-delimiter=
then the sheet has an extra row, containing just a newline. It happens because piped data passes through aRepeatFile
.RepeatFile
is for holding text data. If the data doesn't have a final newline,RepeatFile
appends one. For text, that won't change its interpretation. But for NUL-delimited data, it makes the sheet gain a newline row.
I'm not sure what the right answer is here. The code that reads piped data makes quite strong assumptions that the piped data is text. (This is why binary file-guessing code like guess_zip()
does not work on piped data.)
- The tsv loader skips blank rows.
echo -n 'header\n\n\n\n\nval' |vd -f tsv
makes two rows. So does:
echo -n 'header\0\0\0\0\0val' |vd -f tsv --row-delimiter=
I am not sure if this will surprise users or not.
That's done byif not line
here:
visidata/visidata/loaders/tsv.py
Line 89 in 7c9799c
I'll think about these situations some more. For now, for my specific use cases, the code works well enough.
from visidata.
The tsv loader skips blank rows.
Some TSV formats are delineated by \n\n
, so without this, they load a blank row after every row by default. Also this only applies
to entirely blank rows, so multi-column TSVs won't be affected (if you have a single-column TSV, in my mind that's just a text file and should be loaded with txt
). Do you have a case where you want the blank lines to load?
from visidata.
Okay, makes sense. No, I don't have a case where I want the blank lines to load, the current behavior works for me.
from visidata.
Related Issues (20)
- shell-command-on-cell HOT 1
- vdsql: edit cell, copy as sql, dump data...
- `history` parameter of input() is appears ignored HOT 1
- Support Decimal type HOT 1
- vsdql related errors on load and `&` HOT 1
- Can't open any file HOT 2
- Scientific notation shown for column with large number even when type is string HOT 2
- [texttables] incorrect 'tabulate' module installed with brew HOT 6
- Autodetect file delimiters by scanning the first ten lines HOT 1
- Some issues during first time testing vdsql HOT 5
- [BUG] Opening in interactive shell using `xdg-open` yields `unexpected EOF` HOT 4
- [question] Cannot select rows from CSV file where column value has leading whitespace HOT 2
- [addcol-shell]. Quoted special shell characters are not properly quoted. HOT 2
- [multiline] wordwrap for non-single-width characters (CJK) HOT 4
- [feature request] msgpack and msgpackz loaders HOT 4
- How to get started with writing a Guide HOT 1
- rendering goes wrong (control chars are shown) HOT 2
- Odd memory sheet binding HOT 1
- [fec] opening rows in FEC files does not seem to work
- [playback] choking on column name HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from visidata.