Git Product home page Git Product logo

linux_csv_commands's Introduction

Linux_csv_commands

csv handle by Linux

Create csv

echo "column_a,column_b,column_c,column_d
1,foo,10,bar
2,baz,20,qux
3,quux,30,corge
4,grault,40,garply
" > output.csv

Visualizing

column --separator "," --table output.csv

Process

Command Pipeline

# sum the values in column_a and column_c, exclude column_b, and output the result.
# The pipeline reads the CSV file, processes it to sum `column_a` and `column_c`, excludes `column_b`, and adds a new header row to the output. This approach is efficient for processing large CSV files as it uses stream processing without creating intermediate files.
tail --lines=+2 output.csv | awk --field-separator ',' '{sum=$1+$3; print sum","$4}' OFS="," | sed '1i\sum_a_plus_c,column_d' 

Pretty

tail --lines=+2 output.csv | awk --field-separator ',' '{sum=$1+$3; print sum","$4}' OFS="," | sed '1i\sum_a_plus_c,column_d' | column --separator "," --table

Here's a breakdown of what each command does:

  • tail -n +2 data.csv: This command skips the header row of the CSV file. tail -n +2 starts reading from the second line, effectively removing the header.

  • awk -F, '{sum=$1+$3; print sum","$4}' OFS=,: This command uses awk to process each line of the CSV file.

    • -F, sets the field separator to a comma.
    • {sum=$1+$3; print sum","$4} calculates the sum of column_a ($1) and column_c ($3), and prints the result along with column_d ($4).
    • OFS=, sets the Output Field Separator to a comma, ensuring the output remains in CSV format.
  • sed '1i\sum_a_plus_c,column_d': This command uses sed to insert a new header row. The 1i indicates that the line should be inserted before the first line.

Compare

# e.g.
diff file1.csv file2.csv

# e.g.
diff <(sort -t, -k1,1 file1.csv) <(sort -t, -k1,1 file2.csv)
# or
comm -3 <(sort -t, -k1,1 file1.csv) <(sort -t, -k1,1 file2.csv)

Transform each file, Sort the results and Compare the sorted results

comm -3 <(tail --lines=+2 file1.csv | awk -F ',' '{sum=$1+$3; print sum","$4}' | sed '1i\sum_a_plus_c,column_d' | sort) \
<(tail --lines=+2 file2.csv | awk -F ',' '{sum=$1+$3; print sum","$4}' | sed '1i\sum_a_plus_c,column_d' | sort)
  1. tail --lines=+2 file1.csv: Skip the header row in file1.csv.
  2. awk -F ',' '{sum=$1+$3; print sum","$4}': Compute the sum of the first and third columns, then print the sum and the fourth column, separated by a comma.
  3. sed '1i\sum_a_plus_c,column_d': Add a new header row with the specified column names.
  4. sort: Sort the output to ensure it's in the correct order for comparison.
  5. Repeat the same transformation for file2.csv.
  6. comm -3: Compare the sorted results, showing lines unique to each file.

Insert Data on Postgres

CREATE TABLE information (
    id SERIAL PRIMARY KEY,
    sum_a_plus_c INT,
    column_d INT
);
cat data.csv | psql --dbname=postgres://postgres:123456@localhost:5432/test \
--command "COPY information(sum_a_plus_c, column_d) FROM STDIN WITH (FORMAT csv, DELIMITER ',', HEADER true);"

Export Data from Postgres

psql --dbname=postgres://postgres:123456@localhost:5432/test \
--command "select * from information where id >= 1;" --csv

or

psql --dbname=postgres://postgres:123456@localhost:5432/test \
--command "COPY (select * from information where id >= 1) TO STDOUT WITH CSV DELIMITER ','"

linux_csv_commands's People

Contributors

enghitalo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.