Git Product home page Git Product logo

command-line-text-processing's Introduction

Command Line Text Processing

Learn about various commands available for common and exotic text processing needs. Examples have been tested on GNU/Linux - there'd be syntax/feature variations with other distributions, consult their respective man pages for details.


⚠️ ⚠️ I'm no longer actively working on this repo. Instead, I've converted existing chapters into ebooks (see ebook section below for links), available under the same license. These ebooks are better formatted, updated for newer versions of the software, includes exercises, solutions, etc. Since all the chapters have been converted, I'm archiving this repo.



Ebooks

Individual online ebooks with better formatting, explanations, exercises, solutions, etc:

See https://learnbyexample.github.io/books/ for links to pdf/epub versions and other ebooks.


Chapters

As mentioned earlier, I'm no longer actively working on these chapters:


Webinar recordings

Recorded couple of videos based on content in the chapters, not sure if I'll do more:


Exercises

Check out exercises directory to solve practice questions on grep, right from the command line itself.


Contributing

  • Please open an issue for typos or bugs
    • As this repo is no longer actively worked upon, please do not submit pull requests
  • Share the repo with friends/colleagues, on social media, etc to help reach other learners
  • In case you need to reach me, mail me at echo '[email protected]' | tr 'a-z' 'n-za-m' or send a DM via twitter

Acknowledgements


License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

command-line-text-processing's People

Contributors

learnbyexample avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

command-line-text-processing's Issues

GitBook download link failure

I didn't have a GitBook account and appears I'm unable to make one with the "legacy" version of the site when clicking on this link:

Download ebook for offline reading - link

401
Unauthorized
New accounts should be created on the new version of GitBook. Head to https://www.gitbook.com

edit: able to download but wasn't able to preview or attempt to login on the legacy URL

Complex character class description as code

In gnu_grep.md (around line ) and gnu_sed.md (around line 1402) there is an example of equalivent word and space character classes. Unfortunately these are valid-ish markdown, and so do not get rendered correctly. They should be escaped as code in both places.

This:

| Character classes | Description |
| ------------- | ----------- |
| \w | Same as [0-9a-zA-Z_] or [[:alnum:]_] |
| \W | Same as [^0-9a-zA-Z_] or [^[:alnum:]_] |
| \s | Same as [[:space:]] |
| \S | Same as [^[:space:]] |

Should be this:

| Character classes | Description |
| ------------- | ----------- |
| `\w` | Same as `[0-9a-zA-Z_]` or `[[:alnum:]_]` |
| `\W` | Same as `[^0-9a-zA-Z_]` or `[^[:alnum:]_]` |
| `\s` | Same as `[[:space:]]` |
| `\S` | Same as `[^[:space:]]` |

The ticks around the slash classes aren't necessary for formatting, but I think are good style (elsewhere in the document these are generally formatted as code).

For consistency and clarity, it might also make sense to apply this pattern to the table a couple paragraphs above this. I don't see any markdown issues there, but I think it would generally be clearer to list the actual regex patterns as code.

Error in output

In section Filtering, subsection Fixed string matching, the command $ awk 'index($0,"a+b")' eqns.txt should return all lines of the input file, since a+b appears in each line, right?

XML/HTML parsing

What did you have in mind for this? I could maybe throw something together for the xmllib tools (xmllint, xsltproc) and/or xmlgawk (gawk with a SAX-like XML parser). Xmllib tools also have a —html option to parse XHTML (maybe regular HTML as well, but haven’t tried).

I’ve also wrote awk scripts to convert various patterned text to XML, if that sounds useful.

Do more with less

First, I'm sorry for submitting a PR (#11) without reading the README more closely. Just wanted to contribute what I hope will be some useful knowledge to the project.

IMHO, less is a largely underutilized tool that can greatly improve upon, if not replace, the functionality of several other commands--namely cat & tail--as you'll see in my PR. Also less is a tool of convenience because how many times have you opened a file with it, then exited less to open it in your editor? Yet you can jump straight into your editor from within less! Lastly, you can view multiple files with less in a manner--and keystrokes--akin to changing buffers in vim.

tr command

Decoding your rot13 email address at the bottom of page 2 does not work in solaris, unless GNU utils are installed. The string1 string2 patterns for tr in solaris are literal.

rot13 code/decode that works with all versions of "tr" and handles both upper and lower case.
the alias is wrapped in this text box but belongs all on the same line.
alias rot13="tr 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' 'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM'"

Now you can simply pipe text to rot13.
echo '[email protected]' | rot13

ex13_regex_character_class_part2 mismatch for question 1

Submitting my solution for question 1 of ex13_regex_character_class_part2 results in the below:

Mismatch for question 1:
Expected output is:
a[2]
foo_bar
appx_pi
greeting
food[4]
b[0][1]

But it is the exact same output I'm getting. I've even tried with the solution from .ref_solutions, without any luck.

image

Using GNU bash, version 4.4.23(1)-release (x86_64-pc-msys)

Add pretty-print commands

XML: xmllint --format
JSON: cat $file | python -mjson.tool

That kind of thing might be useful for this collection.

Missing cat command

In gnu_sed.md around line 182, there should be an additional cat command that results in the output being shown.

This:

# original file gets preserved in 'greeting.txt.bkp'
Hi there
Have a nice day

Should be this instead:

# original file gets preserved in 'greeting.txt.bkp'
$ cat greeting.txt.bkp
Hi there
Have a nice day

$NF vs NF in the first paragraph

$NF points to last field ### This is not true. It contains the last field "value". NF (without the $ sign) indirectly points to the last field, by holding the total Number of Fields found in $0 (the current record).

NF is built-in variable and can be used in expressions ### to reference the last, or number of, fields in the current record being examined/processed. More simply, NF is the total Number of Fields found in the current record.

$(NF-1) points to second last field and so on ### Again, this would be the "contents" of the second to last field in the current record. NF-1 (without the $ sign) represents the field Number of the second to last field.

Add section for tee(1)

I think it would be a good idea to mention the tee command, probably somewhere in the "Cat, Less, Tail and Head" chapter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.