Git Product home page Git Product logo

hexyl's People

Contributors

0323pin avatar a1346054 avatar arnavb avatar awidegreen avatar blacklotus avatar cuishuang avatar dm9pzcaq avatar erichdongubler avatar grigorenkopv avatar guozhenduo avatar judaew avatar lilyball avatar merkrafter avatar mkatychev avatar notramo avatar oowl avatar pewz avatar purveshpatel511 avatar qyriad avatar rinhizakura avatar scimas avatar selfup avatar sharifhsn avatar sharkdp avatar sorairolake avatar tarnadas avatar tommilligan avatar uetchy avatar v0idx avatar vlkrs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hexyl's Issues

Display input offset

... in hexadecimal, just like hexdump -C:

00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
00000010  00 00 00 80 00 00 00 44  08 02 00 00 00 c6 25 aa  |.......D......%.|
00000020  3e 00 00 00 c2 49 44 41  54 78 5e ed d4 81 06 c3  |>....IDATx^.....|
00000030  30 14 40 d1 b7 34 dd ff  ff 6f b3 74 56 ea 89 12  |[email protected]...|
00000040  6c 28 73 e2 aa 34 49 03  87 d6 fe d8 7b 89 bb 52  |l(s..4I.....{..R|
00000050  8d 3b 87 fe 01 00 80 00  00 10 00 00 02 00 40 00  |.;............@.|
00000060  00 08 00 00 01 00 20 00  00 04 00 80 00 00 10 00  |...... .........|
00000070  00 02 00 40 00 00 08 00  00 01 00 20 00 00 00 d4  |...@....... ....|
00000080  5e 6a 64 4b 94 f5 98 7c  d1 f4 92 5c 5c 3e cf 9c  |^jdK...|...\\>..|
00000090  3f 73 71 58 5f af 8b 79  5b ee 96 b6 47 eb f1 ea  |?sqX_..y[...G...|
000000a0  d1 ce b6 e3 75 3b e6 b9  95 8d c7 ce 03 39 c9 af  |....u;.......9..|
000000b0  c6 33 93 7b 66 37 cf ab  bf f9 c9 2f 08 80 00 00  |.3.{f7...../....|
000000c0  10 00 00 02 00 40 00 00  08 00 00 01 00 20 00 00  |.....@....... ..|
000000d0  04 00 80 00 00 10 00 00  02 00 40 00 00 08 00 00  |..........@.....|
000000e0  01 00 20 00 00 8c 37 db  68 03 20 fb ed 96 65 00  |.. ...7.h. ...e.|
000000f0  00 00 00 49 45 4e 44 ae  42 60 82                 |...IEND.B`.|
000000fb

Windows

It looks like Windows did work, but was recently broken

57e1c67

Support 16 color terminal output

this would allow people using colorschemes they may have set up on their terminal emulator. It should be an easy change and I'm happy to make it myself, but let me know if you would agree to such change in the first place, thanks!

Output layout is broken in CJK environment

Some kinds of characters (including ruled lines, ×, and ) have east asian (ambiguous width) characters, so they have single width in some environments, and double width in some environments.
In area where such characters have double width (mainly east asia), output layout of current hexyl (v0.3.1) is broken.

screenshot-2019-01-12-205304 0900

There are some way to fix this:

  • Know line characters width (by wcwidth, CLI options, or something), and use appropriate number of line characters.
    • However this won't solve ×, and issue.
  • Or, use only ASCII characters when user wanted.
  • Or, make symbols configurable (through config file or CLI options?)

If those characters are made configurable, #17 will be solved at the same time.

Automatic paging?

While using it for large file, the output will scrolling long-long time , and piping to more will output bad string . So, please add pager to output .

Units of measurement of bytes?

Unit Implemented? Description Examples Suggested implementation
Decimal [x] A decimal integer, which is equivalent to specifying a single byte unit for the count. 23, 1024 u64::from_str(...)
Hex [x] Implemented in #45. A hexadecimal integer. Specified with a leading 0x. 0x17, 0x100 u64::from_str_radix(...)
Blocks [ ] A single block, which is by default 512 bytes but configurable via config flag. -b 512 -n 1block

N.B: one cannot use a block unit to define the block size.
Add a flag to optionally define block size, then check for a trailing block when parsing numbers. Multiply by block size.
Bytes [ ] A byte size familiar to most IT professionals. Specified by B at the end of the count, and can include an optional magnitudinal spec like kilobytes (K) or megabytes (M).
  • 23B: 23 bytes
  • 9KB: 9 kilobytes
Implement a regex of the form (?P<count>\d+)(?P<magnitude_unit>[KM]?)B.

Other open questions

  • Should the block unit, which might be controversial, be scoped into another issue?
  • Should a single leading - or + sign be supported?
    • Not sure how + is useful -- xxd's manual states that for the -s option + is useful only for stdin. Not sure what that means, though.

Is a Chocolatey package interesting?

I'd be more than happy to create an automatically updating Chocolatey package to hand off to whoever wants it -- is that interesting at all? I use Chocolatey to automate my own workflow right now, and this is a slick tool I'd love to add to my automatic toolbelt. :)

Other sizes of data (group size and Endianness)

I frequently have to dump data files (ADC output, for example) that don't just have byte-oriented data. It would be nice to be able to specify data width in the dump so I get the hex data grouped in the natural data size instead of having to do the little-endian two-step and mentally group indistinguishable bytes by 2 or 4 or whatever. Something like:

--word-size=1 (uint8_t, default)
--word-size=2 (uint16_t)
--word-size=4 (uint32_t)
--word-size=8 (uint64_t)
--word-size=16 (uint128_t)

That covers the common-ish types. If you want to be really brave you could do weird crap like 3-byte or 17 byte, but that is likely low return on investment.

Not all such data is little-endian, so an extra flag for those cases where word-size > 1 would be:

--little-endian (default)
--big-endian

Also, interpretation could be signed or unsigned

--signed
--unsigned (default)

Of course with this you'd drop the byte-oriented colouration (but maybe with --signed you'd highlight negative numbers in red or something).

Use maximum available width by default

Hello. I tried hexyl and find it very nice.

However, it currently uses too less a part of my screen. It would be more efficient if it used the maximum width available. Please enable this by default.

There can be an option to output to a fixed width which is related to #13.

Thank you!

cargo install failure

Hi,

I'm trying to install hexyl via cargo but I am getting a couple of compilation errors.

error: expected one of `,` or `as`, found `::`
 --> .cargo/registry/src/github.com-1ecc6299db9ec823/hexyl-0.4.0/src/main.rs:8:28
  |
8 | use std::io::{self, prelude::*, StdoutLock};
  |                            ^^ expected one of `,` or `as` here

error: expected one of `;`, `as`, or `self`, found `::`
 --> .cargo/registry/src/github.com-1ecc6299db9ec823/hexyl-0.4.0/src/main.rs:8:28
  |
8 | use std::io::{self, prelude::*, StdoutLock};
  |                            ^^ expected one of `;`, `as`, or `self` here

error: aborting due to 2 previous errors

error: failed to compile `hexyl v0.4.0`, intermediate artifacts can be found at `/tmp/cargo-install.yy1MBc6NxEZT`

Some details about my environment.

OS: Fedora 29 Linux dev.localdomain 4.20.16-200.fc29.x86_64 #1 SMP Thu Mar 14 15:10:22 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Rust version: 1.33.0
rustc version: rustc 1.23.0 (766bd11c8 2018-01-01)
cargo version: cargo 0.24.0 (45043115c 2017-12-05)

Thanks

Options for reading between two offsets

The -n/--length flag is a great feature -- good especially for quickly checking if file headers match something. Another use case I can think of for limited output is inspecting, say, an entire block of some data from a file that's been dumped from a disk. Let's say I'm using Linux and reading the first block of a disk somewhere to determine its contents manually:

$ hexyl -n 512 "$disk_dump"
// Some output here...

I read the output and discover that there's an MBR at the beginning, with the first partition starting at logical block address 1. Sweet, let's mosey on over to 0x200 and read another block. I could implement this by using dd:

$ input_file="_viminfo" block_size=512 block_num=1
$ dd bs="$block_size" status=none skip="$block_num" count="$block_size" if="$input_file" \
    | hexyl

...but there's two issues with this:

  • dd isn't usually available on Windows machines!
  • Because we're using stdin, we can't get the correct set of offsets -- the "file" starts at 0x0 regardless of what parameters I gave dd.

Perhaps something like this spitballed set of options might help:

$ hexyl \
    --start 512      \ # Could also be written as 
                     \ #
                     \ # Could be bikeshed to `--begin`?
                     \
    --length 512     \
                     \
                     \ # One could use an end offset instead of a length:
    # --end 1024     \ # Could also be written as `-e 0x300

Having something similar to bat's --range could also be really handy, especially when combined with relative offsets (positive and negative):

$ hexyl --range 512:+512 # same as using `--skip 512 --length 512`
$ hexyl --range=-512: # read the last block
$ hexyl --block-size 4096 -1block: # like above, but use the block unit
$ hexyl --range 32:-32 # cut out a common header and footer for the input stream we don't care about

I would be more than happy to push implementation of this, since I've great personal interest in allowing more of my reverse engineering flow in the terminal. Let me know if you want me to hack on it!

Unresolved questions

  • Adding several more arguments that accept offsets/sizes might add pressure to create a system of units/radices a la xxd for the appropriate arguments. Where should the line be drawn in terms of what this project is willing to support? We've added support for xb and xib with #44. That's as far as we've decided to go right now.

Code point output

For unicode text, it would be nice if an option was available to output code points rather than bytes

Provide examples

I wanted to use hexyl as a library to pretty print the memory of a VM I am working on, for error dumps. The memory is just [u8; 4096] so I imagine it should be pretty simple, but I couldn't find an example of how to use this as a library.

Squeeze zero rows

A useful feature hexdump is the ability to compress contigous rows of 0s with a *

dd if=/dev/zero of=swapfile bs=1024k count=1
mkswap swapfile
hexdump swapfile

0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000400 0001 0000 00ff 0000 0000 0000 5dd8 a838
0000410 6fe7 d247 cc9b 6af5 a41d 50da 0000 0000
0000420 0000 0000 0000 0000 0000 0000 0000 0000
*
0000ff0 0000 0000 0000 5753 5041 5053 4341 3245
0001000 0000 0000 0000 0000 0000 0000 0000 0000
*
0100000

I was wondering if this is something hexyl can implement

Some colors are printed even if they're disabled

$ printf "%32s" "" | hexyl --border none --color never
 00000000  20 20 20 20 20 20 20 20   20 20 20 20 20 20 20 20                    
 *                                                                              
 00000020                                                                       
$ printf "%32s" "" | hexyl --border none --color never | hexdump -C
00000000  20 30 30 30 30 30 30 30  30 20 20 32 30 20 32 30  | 00000000  20 20|
00000010  20 32 30 20 32 30 20 32  30 20 32 30 20 32 30 20  | 20 20 20 20 20 |
00000020  32 30 20 20 20 32 30 20  32 30 20 32 30 20 32 30  |20   20 20 20 20|
00000030  20 32 30 20 32 30 20 32  30 20 32 30 20 20 20 20  | 20 20 20 20    |
00000040  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000050  0a 20 1b 5b 33 38 3b 35  3b 32 34 32 6d 2a 1b 5b  |. .[38;5;242m*.[|
00000060  30 6d 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |0m              |
00000070  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
000000b0  0a 20 30 30 30 30 30 30  32 30 20 20 20 20 20 20  |. 00000020      |
000000c0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000100  20 0a                                             | .|
00000102

The star is colored even if colors are disabled.

(On another note, would it be possible to make --color=auto the default? That's how most CLI utilities work.)

Provide library as well as binary

I'm interested in using hexyl as an hex viewer internally in another project I'm working on.

It would be great to have access to hexyl as a library, and to have it operate on generic Read and Write traits, rather than being tightly coupled to stdin/stdout.

Additional space on last line

If the last line of data contains less than 8 bytes, an additional space is written after the outer_sep:

hexyl/src/lib.rs

Lines 340 to 349 in 1758477

if len < 8 {
let _ = writeln!(
&mut self.buffer_line,
"{0:1$}{3}{0:2$}{4} ",
"",
8 - len,
8,
self.border_style.inner_sep(),
self.border_style.outer_sep(),
);

$ echo -n "12345678" | hexyl --color never --border ascii | wc -L
80

$ echo -n "1234567" | hexyl --color never --border ascii | wc -L
81

This causes an empty line before the footer if the the terminal is exactly 80 characters wide. Is this behavior intended or a bug, because the short_input_passes test also contains the tailing space?

Gibberish on Windows

Tested with Cygwin and cmd.exe on Windows 7:

$ cat alpha.txt
hello world

$ ./hexyl alpha.txt
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐

│ [38;2;117;113;94m00000000 [0m│  [38;2;102;217;239m68  [0m [38;2;102;217;239m65
  [0m [38;2;102;217;239m6c  [0m [38;2;102;217;239m6c  [0m [38;2;102;217;239m6f
[0m [38;2;166;226;46m20  [0m [38;2;102;217;239m77  [0m [38;2;102;217;239m6f  [0m
┊  [38;2;102;217;239m72  [0m [38;2;102;217;239m6c  [0m [38;2;102;217;239m64  [0m
 [38;2;166;226;46m0a  [0m            │ [38;2;102;217;239mh [0m [38;2;102;217;239
me [0m [38;2;102;217;239ml [0m [38;2;102;217;239ml [0m [38;2;102;217;239mo [0m [
38;2;166;226;46m  [0m [38;2;102;217;239mw [0m [38;2;102;217;239mo [0m┊ [38;2;102
;217;239mr [0m [38;2;102;217;239ml [0m [38;2;102;217;239md [0m [38;2;166;226;46m
_ [0m    │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

Simplify check for "ByteCategory::AsciiPrintable"

hexyl/src/lib.rs

Lines 38 to 40 in 67b6c25

} else if self.0.is_ascii_alphanumeric()
|| self.0.is_ascii_punctuation()
|| self.0.is_ascii_graphic()

I guess that can be simplified to just look for is_ascii_graphic:

else if  self.0.is_ascii_graphic()

I would also prefer to change the name of the variable "AsciiPrintable", because "printable" normally also includes the space.

Please use (or support) -c for byte count

hexyl -n 100 is, in my mind, replacing head -c 100 | xxd. If it is directly replacing xxd, xxd calls this option -l (length, which I had to look up, because I use head).

Please support -c, with the same behaviour as the current -n, and/or improve the error output to remind users of the correct option:

% hexyl -c 257 /dev/urandom
error: Found argument '-c' which wasn't expected, or isn't valid in this context

USAGE:
    hexyl [file]

For more information try --help

Annoyingly, xxd uses -c for cols.

Please don't print the header before input is seen

hexyl appears to print the "header" (the top of the box) too soon.

If you have an app that thinks before printing output (e.g. gpg), it is unnecessarily ugly.

Something like this:

% (echo Er... >&2; sleep 0.2; \
 echo Thinking... >&2; sleep 0.2; \
 echo Um... >&2; sleep 0.2; \
 echo Output, finally; \
 echo 'Done!' >&2) | hexyl

Typically (it's racy!) looks like:

Er...
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
Thinking...
Um...
Done!
│00000000│ 4f 75 74 70 75 74 2c 20 ┊ 66 69 6e 61 6c 6c 79 0a │Output, ┊finally_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

I'd rather it showed:

Er...
Thinking...
Um...
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 4f 75 74 70 75 74 2c 20 ┊ 66 69 6e 61 6c 6c 79 0a │Output, ┊finally_│
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
Done!

Gaming the thinking is probably easy. Gaming the Done! is probably impossible. (gpg doesn't print the done.)

Pale cyan on white background

screenshot

In gnome-terminal or in xterm or in lxterminal.

Only looks properly in Linux text console (probably because of it has dark background).

Measure rendering performance in terminals

We've been benchmarking the performance of the tool without considering the rendering performance of the terminal. Specifically, I'm thinking about how we turn colors on and off again for every single hex pair and textual character. Ideally we wouldn't turn colors off if the next printed hex/char uses the same color.

I'm not really sure how to programmatically measure the terminal performance (and of course performance would change for different terminals), but it's worth at least trying to measure. Optimizing our color usages would be more overhead on our side and therefore slow down our benchmark (though perhaps not significantly) but if it produces faster rendering it might be worth it.

At the very least, we could investigate not printing the style suffix for each character, under the assumption that the style prefix for the next character will suffice (and then just printing the suffix prior to printing a frame character).

Handle sparse files

Because hexyl truncates repeating sections, it would be nice to be able to have hexyl quickly skip over these sections instead of scanning them byte-by-byte.

Bad background color on rxvt-unicode

Hi there,
Thank you for this tool.

There is a background color making it difficult to see the characters on rxvt-unicode only.

rxvt-unicode with default settings:
urxvt

maybe that looks not a big problem, but when I have dark terminal theme, that make it difficult to see:
urxvt-black

That background color not appear on other terminals, just rxvt-unicode.

Hexyl don't respond to ctrl-c

On MacOS, when running hexyl without any input the application can't be terminated with ctrl-c. Like this:

Screenshot 2020-02-11 at 16 53 29

This is with hexyl 0.6.0 on MacOS 10.14.6

'skip' option

I'd like to have an option to skip n bytes.

Examples:

hexyl --skip 0x20
Prints bytes from pos 0 to 0x20

hexyl --skip 0x50 --length 0x100
Prints bytes from pos 0x100 to 0x150

Wrong hexdump

$ printf "%32s" a | hexyl
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 20 20 20 20 20 20 20 20 ┊ 20 20 20 20 20 20 20 20 │        ┊        │
│*       │                         ┊                         │        ┊        │
│00000020│                         ┊                         │        ┊        │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘
$ printf "%32s" a | hexdump -C
00000000  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
00000010  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 61  |               a|
00000020

brew install failure

> brew install hexyl
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 3 taps (homebrew/cask-versions, homebrew/core and homebrew/cask).
==> Updated Formulae
bash ✔               brew-php-switcher    fonttools            mariadb              telegraf
libidn2 ✔            buildifier           grv                  mercurial            tor
wget ✔               chakra               hcloud               mutt                 urdfdom_headers
alexjs               citus                jabba                neofetch             vegeta
amqp-cpp             clang-format         jenkins              nvm                  whois
aws-sdk-cpp          conan                kitchen-sync         picard-tools         wtf
azure-cli            console_bridge       libphonenumber       rabbitmq             yarn
bat                  emscripten           libpsl               serverless
bettercap            erlang@20            lmod                 sox
bitrise              exploitdb            lxc                  swiftlint

/usr/local/Homebrew/Library/Homebrew/config.rb:39:in `initialize': no implicit conversion of nil into String (TypeError)
        from /usr/local/Homebrew/Library/Homebrew/config.rb:39:in `new'
        from /usr/local/Homebrew/Library/Homebrew/config.rb:39:in `<top (required)>'
        from /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.7/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/local/Homebrew/Library/Homebrew/vendor/portable-ruby/2.3.7/lib/ruby/2.3.0/rubygems/core_ext/kernel_require.rb:55:in `require'
        from /usr/local/Homebrew/Library/Homebrew/global.rb:25:in `<top (required)>'
        from /usr/local/Homebrew/Library/Homebrew/brew.rb:13:in `require_relative'
        from /usr/local/Homebrew/Library/Homebrew/brew.rb:13:in `<main>'

> hexyl
-bash: hexyl: command not found

macOS High Sierra Version 10.13.16 10.13.6

MacBook Pro

Stop putting background color on spaces

If a byte's coloration includes a background color, the following space also gets the same background color.

The fix for this is trivial but any PR to fix it right now would conflict with #23.

Interrupting hexyl with ^C leaves ugly remnant

screenshot

It should probably handle SIGINT and terminate the output as if it was EOF (but maybe with some mark):

│00001410│ 57 37 f9 ae 0b ae 6c 8a ┊ df 0e d8 20 15 f1 d7 f6 │W7×ו×l×┊×•× •×××│
│00001420│ a5 5e 99 b0 bd bf 22 a7 ┊ b2 e0 ab ^C             | ×^×××"×┊×××^C   |

Print using custom width

This is excellent, and the default width of 16 is great! It'd be really nice if we could print according to custom alignments, though -- this is a fairly standard feature in most hex editors, and I can't imagine it being egregiously complex to add this.

Another couple of ideas:

  • Getting the max width based on a provided terminal width would be nice -- this would let me simply plug in a width and let hexyl calculate the space overhead of borders for me.
  • An auto-width flag (which is like the above for the current terminal width) to be used would also be handy -- it would have some overlap with the above, but there could be reasons for setting

Prototypical suggestion for options:

# Ideas for a fixed column width:
-c --columns        Sets the number of hex data columns to be displayed.
                    Cannot be used with other width-setting options.
-w --width

# Ideas for terminal auto-width:
-a --auto-width     Sets the number of hex data columns to be adjusted according
                    to the detected terminal width. Conceptually, this could be an
                    alias of `-t $terminal_width`.
                    Cannot be used with other width-setting options.

# Ideas for fixed terminal width:
-t --terminal-width Sets the number of terminal columns to be displayed.
                    Since the terminal width may not be an evenly divisible by the 
                    width per hex data column, this will use the greatest number of 
                    hex data columns that can fit in the requested width but still
                    leave some space to the right.
                    Cannot be used with other width-setting options.

I'm more than happy to push on this, if effort is a concern. :)

Argument --length silently takes precedence over --bytes.

When specifying both --length and --bytes in either long or short form, the value of --length will be chosen while the value specified by --bytes will be discarded silently. While I don't think anybody would intentionally run into it, having a warning or error just in case is something that would help improve UX.

hexyl/src/bin/hexyl.rs

Lines 174 to 176 in cc5b308

let mut reader = if let Some(length) = matches
.value_of("length")
.or_else(|| matches.value_of("bytes"))


Note: I'm doing a class assignment regarding creating tests (from scratch) for an open-source command line program, and I chose hexyl for it. You might have a couple more of these on the way, depending on what the assignment brings to light :)

User-specified offset

I would like to integrate hexyl in GDB using the python API. Therefore it would be useful if I could supply a different offset, so that input offsets in hexyl matches with the addresses in virtual memory in the process.

Print error message if parsing of --length/--skip/… fails

We should print an error if the parsing of the size argument for --length, --skip, … fails. Currently, we silently ignore the argument:

▶ echo 12345 | hexyl --skip 2b   
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000002│ 33 34 35 0a             ┊                         │345_    ┊        │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

~                                                                                                       
▶ echo 12345 | hexyl --skip 2byte
┌────────┬─────────────────────────┬─────────────────────────┬────────┬────────┐
│00000000│ 31 32 33 34 35 0a       ┊                         │12345_  ┊        │
└────────┴─────────────────────────┴─────────────────────────┴────────┴────────┘

I'd prefer if we use anyhow for error handling (if we want to rely on a library at all).

If we want to have good error messages, we probably want to modify

fn parse_byte_count(n: &str, block_size: PositiveI64) -> Option<PositiveI64>

to return a Result<PositiveI64> with several possible error paths.

Argument `--length=0`/`--bytes=0` should be considered an error.

This one might be a bit pedantic, but a user requesting that hexyl print nothing seems like a user error. The entire point of hexyl is to print a binary preview of a file, and explicitly asking it not to defeats the purpose of using it in the first place.


Note: I'm doing a class assignment regarding creating tests (from scratch) for an open-source command line program, and I chose hexyl for it. You might have a couple more of these on the way, depending on what the assignment brings to light :)

Skip duplicate lines

hexdump will skip duplicate lines and output a single * for any number of duplicate lines. Would you accept a PR that implements that behaviour?

0000280 04 00 00 00 2a 00 00 00 00 00 00 00 00 00 00 00
0000290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000300 00 00 00 00 00 00 00 00 ac 0f 00 00 01 00 00 00
0000310 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000330 0c 00 00 00 38 00 00 00 18 00 00 00 02 00 00 00
0000340 01 fa e4 04 00 00 01 00 2f 75 73 72 2f 6c 69 62

Unicode dump

A version of hexdump I wanted to write but never got around to, would do:

  • display Unicode characters on the right side, plus a colored filler (? ?) for the space not taken (for wcwidth()==1 it's 1 extra character for <U+0800, 2 extra for <=U+FFFF, 3 extra for non-BMP; likewise for wcwidth()==2). Obviously, a CJK (width 2) character might go over the right edge but as it always takes at least three bytes, there'll be enough space in the next line.

  • display controls using appropriate symbols — Unicode provides a set for this exact task at U+2400. Some specific characters could be better shown using more readable symbols: for 07, for 0a, for 1b, for 08, for 7f, and especially one of for for 00. You do get a lot of nulls, newlines and escapes in dumps...

  • with an option, display Unicode code points rather than individual bytes on the left side — ie, "U+FFFD" instead of "ef bf bd".

Sounds like your hexyl would be a perfect place to implement the above...

Use a different vertical separator character

On Windows when I use the Fira Code font in Alacritty, the character is not available so it looks like this:

image

Fira Code seems like a popular font, so it would be nice to use a different character.

Beat xxd

I did a bit of benchmarking and I can't help but notice that xxd is faster than hexyl.
On my machine on a file of about 700M:

$ time xxd myfile > /dev/null

real	0m43.245s
user	0m42.950s
sys	0m0.272s

$ time hexyl --color=never --no-squeezing --border=none myfile > /dev/null

real	1m10.967s
user	1m1.371s
sys	0m9.592s

It would be nice to beat xxd in speed... I got no idea how to do it though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.