Git Product home page Git Product logo

awk-jvm's Introduction

Hi there ๐Ÿ‘‹

:bowtie: I'm an independent JVM (Scala, Kotlin & Java) contractor specializing in backend web development. Please contact me at (github-username) (at) pm.me for work.

:neckbeard: I tweet memorable quotes from podcasts at @podquotesio and AWK stuff at @mawkic.

๐Ÿค“ My other interests are Rust and Haskell.

awk-jvm's People

Contributors

bmx avatar rethab avatar tyingq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

awk-jvm's Issues

suggestion how to read binary input directly with gawk

hmmm what do you mean by that ? here's a fully-functional hex-encoder for gawk (sorry for the poor formatting - i dug it up from my pile)

even in gawk unicode-byte, i got it to hex encode 2 different binary mp3 files with ease, and without any error messages popping up (try not to use it in gawk -P posix mode - all kinds of weird behavior may bubble up. I think the octal encoder also works, but haven't tested it lately. lemme know if this works or not ?

if that offset 8^8 doesn't work, use 0xDC00 instead. if that also fails, then try the last resort of -4^4.

gawk -e 'function hexencode(str,chr) { for(chr in b2hex) { if (chr!~/[[:alnum:]%\\]/) { gsub(chr,b2hex[chr],str) } }; return str } function octencode(str,chr) { gsub(/\\/,b2oct["\\"],str); gsub(/[0-7]/,"\06&",str); for(chr in b2oct) { if(chr!~/[0-7\\]/) { gsub(chr,b2oct[chr],str) str } }; return str } BEGIN { offset=8^8;for(x=0;x<256;x++) { byte=sprintf("%c",x+offset);b2hex[byte]=sprintf("\\x%.2X",x);b2oct[byte]=sprintf("\\%03o",x) }; spc1="/\\^[]";spc2="~!@#%&_-{}:;\42\47\140 <>,$.|()*+=?"; for(x=length(spc1);x;x-=1) { byte=substr(spc1,x,1); b2hex[("\\"(byte))]=b2hex[byte]; b2oct[("\\"(byte))]=b2oct[byte]; delete b2hex[byte]; delete b2oct[byte] }; for(x=length(spc2);x;x--) { byte=substr(spc2,x,1); b2hex[("["(byte)"]")]=b2hex[byte]; b2oct[("["(byte)"]")]=b2oct[byte]; delete b2hex[byte]; delete b2oct[byte] } } BEGIN { RS=FS="^$"; OFS=""; ORS=""; } END { print hexencode($0) }'

this encoder may not be 100% to URL-encoding spec per se - it was simply i quickly slabbed together another time before. it's currently instructed to only skip encoding the alphanumeric ones, but will encode the other punctuation symbols that aren't part of the spec. feel free to modify it.

Remove hexdump dependency

If you are willing to use the gawk -b argument it isn't hard to make a hexdumper. The following is something I cooked up that gives identical output to your hexdump -v -e '/1 "%01u "' script.

I recommend using https://www.gnu.org/software/gawk/manual/gawk.html#Extension-Sample-Readfile instead of the gross randomstring() stuff below.

#!/usr/bin/gawk -bf
# If you look up in the shebang the -b argument is what makes this work. It
# forces gawk to read the characters in as a stream of bytes rather than
# encoded characters.
#
# I know that makes no sense, but the docs describe it as:
#
# > an easy way to tell gawk, "Hands off my data!"
#
# and that turns out to be just what we need.

function randomstring() {
  output = ""
  for (i=0; i < 16; i++) {
    output = output sprintf("%04x", int(rand() * 65536))
  }
  return output
}

BEGIN {
  srand()
  # By setting the RS to a big random string we get the file as a single record
  # without using a gnu extension or the ugly concat loop. As the strring is
  # very unlikely to appear in any file ever. You could just hardcode a uuid if
  # you don't mind it not being future proof.
  RS = randomstring()
  FPAT = "."

  # We just build an encoding table rather than try to compute this somehow
  for (i=0; i <= 255; i++) {
    c = sprintf("%c", i)
    codes[c] = i
  }
}

{
  for (i=1; i <= NF; i++) {
    printf("%d ", codes[$i])
  }
}

run github actions on windows

we're currently running the tests on macos-latest and ubuntu-latest. Can we make it work on windows-latest as well? Given that both javac and gawk are installed there by default, this should not be too difficult :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.