japananh / til Goto Github PK

Today I learned

til's Introduction

TIL - Today I learned

After extensively using Notion for tech notes, I've recently transitioned to a GitHub repo. This shift allows me to enjoy the process more, enrich my GitHub profile, and showcase my efforts, as GitHub tracks and visually represents all activities.

Check it out!

Test workflow!!!

til's People

Contributors

Stargazers

Watchers

til's Issues

Prevent API Security Risks

Prevent critical attacks in APIs

1. CORS

Example: https://github.com/japananh/zero-and-one/blob/main/middleware/cors.go

2. DoS (Denial of Service)

package main

import (
	"fmt"
	"log"
	"net/http"
	"time"

	"github.com/gin-gonic/gin"
	"github.com/gin-contrib/limit"
)

func main() {
	r := gin.Default()

	// Apply rate limiting middleware
        // The number depends on server capacity, expected traffic, response time, resource intensity, load testing, failover and scalability, security, monitoring and adjustments.
	r.Use(limit.MaxAllowed(10)) // Limit to 10 requests per second

	r.GET("/api/data", getData)

	r.Run(":8080")
}

func getData(c *gin.Context) {
	// Simulate some work
	time.Sleep(100 * time.Millisecond)

	c.JSON(http.StatusOK, gin.H{"message": "Data retrieved successfully"})
}

3. SQL Injection

4. XSS (Cross-Site Scripting)

To protect against XSS, you should properly escape and sanitize user-generated content before rendering it in your web pages using html/template.

package main

import (
	"html"
	"net/url"
)

func sanitizeInput(input string) string {
	// Sanitize for HTML
	htmlSafe := html.EscapeString(input)

	// Sanitize for URL
	urlSafe := url.QueryEscape(htmlSafe)

	return urlSafe
}

SSRF (Server-Side Request Forgery)

How to enable TLS/SSL for Postgres 15

Step 1: Create a postgres container with Docker

docker run -e POSTGRES_PASSWORD=postgres -d -p 5432:5432 --name pg postgres

Step 2: Edit postgres config

# Go to the created container
docker exec -it pg bash

# Now you're in the container, go to the data folder and edit the config
cd var/lib/postgresql/data

# Install `vim` and `sudo` to edit the file (if needed)
apt-get update && apt-get install vim && apt-get install sudo

# Edit file
sudo vim postgresql.conf

# Find these lines and Edit them as below
ssl = on
ssl_cert_file = 'cert.pem'
ssl_key_file = 'private.pem'

Note:

To search in vim: /<text-to-search>
To undo in vim: :u

Step 3: Generate a RSA key to use in the config file above

# Generate a RSA key
openssl req -x509 -newkey rsa:4096 -nodes -keyout private.pem -out cert.pem

# Edit file permission so postgres can read its content
chmod 600 private.pem

# Edit file owner
chown postgres private.pem

Finally, you're good to go ^^.

Golang vs Nodejs

Feature/Aspect	Go (Golang)	Node.js
Language	Statically-typed language	Dynamically-typed (JavaScript)
Concurrency	Goroutines (lightweight thread-like structures). Built-in concurrency model	Event-driven, non-blocking I/O model
Performance	Compiled directly to machine code. Typically faster for CPU-bound tasks	Interpreted (V8 engine). Typically faster for I/O-bound tasks
Standard Library	Comprehensive standard library supporting many functionalities	Rich ecosystem with NPM for additional libraries/packages
Error Handling	Explicit error handling using error return values	Traditional throw/catch, along with callback-based errors
Use Cases	System programming, web servers, data processing, CLI tools	Web servers, real-time applications, scripting, tooling
Web Frameworks	Gin, Echo, Beego, etc.	Express.js, Koa, NestJS, etc.
Package Management	Modules (as of Go 1.11)	NPM (Node Package Manager)
Community and Ecosystem	Growing community, less mature than Node.js but very active	Massive community, very mature and vast ecosystem
Real-world Example	Docker	LinkedIn's backend mobile application

SQL save point

https://www.postgresql.org/docs/current/sql-savepoint.html

Go’s Concurrency Building Blocks

1. Goroutines

A goroutine is a function that is running concurrently (remember: not necessarily in parallel!) alongside other code. You can start one simply by placing the go keyword before a function:

func main() {
    go sayHello()
    // continue doing other things
}

func sayHello() {
    fmt.Println("hello")
}

Every Go program has at least one goroutine: the main goroutine, which is automatically created and started when the process begins.

Goroutines are not OS threads, and they’re not exactly green threads—threads that are managed by a language’s runtime—they’re a higher level of abstraction known as coroutines. Coroutines are simply concurrent subroutines (functions, closures, or methods in Go) that are non-preemptive—that is, they cannot be interrupted.

Goroutines don’t define their own suspension or reentry points; Go’s runtime observes the runtime behavior of goroutines and automatically suspends them when they block and then resumes them when they become unblocked.

Go’s mechanism for hosting goroutines is an implementation of what’s called an M:N scheduler, which means it maps M green threads to N OS threads. Goroutines are then scheduled onto the green threads. When we have more goroutines than green threads available, the scheduler handles the distribution of the goroutines across the available threads and ensures that when these goroutines become blocked, other goroutines can be run.

Go follows a model of concurrency called the fork-join model. The word fork refers to the fact that at any point in the program, it can split off a child branch of execution to be run concurrently with its parent. The word join refers to the fact that at some point in the future, these concurrent branches of execution will join back together. Where the child rejoins the parent is called a join point.

Tool vs Framework vs Library vs Service vs Platform

Tool: A software application with a specific purpose. GitHub provides specific tools for interaction, like the GitHub CLI or GitHub Desktop, but GitHub itself is broader than a single tool.
Framework: Provides a structure or set of conventions to build software applications.
Libraries: Collections of routines, functions, or classes that an application can use.
Service: A function one program or machine provides for other programs or machines.
Platform: Provides an environment for developing, running, or managing applications. GitHub fits this definition as it provides an environment for developers to collaborate, manage, and track their codebase changes.

Ory Kratos Demo

Ory provides solutions for authentication, authorization, and permission. It contains 3 main services:

Ory Kratos: Identity Management Server (social login, session, recovery, OTP, OIDC, ...)
Ory Hydra: OAuth 2.0 and OpenID Connect provider (Identity Server + APIs for third-party apps)
Ory Keto: access control server

Demo: To run Ory Kratos, you must run both FE and BE apps. Check README.md on each repo for more details.

Sqlite3 Locks

Problem

Today, I ran into an issue with Sqlite3 locks.

We used Sqlite3 for unit testing and ProgreSQL for running services. One of the main reasons for using Sqlite3 for unit testing is that it is a lightweight database management system, and the data is stored in a file. This makes running multiple unit tests in a separate database easier without needing a dedicated server or complex setup.

But it has a disadvantage due to its simplicity. When I added unit tests for a function that created some go routines to increase test coverage, the tests failed due to an error database table is locked.

The root cause was that my function created many go routines that select/update to DB, and Sqlite3 didn't support well for concurrent requests. In dev/staging/prod environment, we didn't see that issue due to

Solution

There are some solutions that I tried to apply:

Check whether if dialect Sqlite3 then set numbers of go routine = 1: Most effective! Solved my problem. But if the requests increase, we must replace sqlite3 with another database.
Enable WAL: WAL helps reduce the table lock by writing data changes to a file called WAL (write-ahead log) and then from WAL to the database. I enable it but see no change.
Enable recursive triggers: I enable it but see no change.

References

Tricolor Algorithms in Go - TODO

https://www.developer.com/languages/tricolor-algorithm-golang/#:~:text=This%20Tricolor%20Marks%20and%20Sweep,objects%20in%20the%20white%20set
Mastering Go (last chapter)

Homomorphic encryption

💡 Future tech: Homomorphic encryption performs calculations and database queries on the encrypted data but it runs so slow.

💯 Demo: https://github.com/IBM/fhe-toolkit-linux/blob/master/GettingStarted.md

🐛 Bug: FATAL: Aborting. The FHE Toolkit is not supported on the arm64 platform.

Best practice struct configuration pattern for Golang

Reference: https://www.youtube.com/watch?v=MDy7JQN5MN4

Before

package main

type Server struct {
    maxConn int
    id              string
    tls             bool
}

// Problem: Parameter will grow when we need more configs
func newServer(maxConn int, id string, tls bool) *Server {
    return &Server{
        maxConn: maxConn,
        id:              id,
        tls:             tls,
}

func main() {
    s := NewServer(20, "id", false)
    
    fmt.Printf("%+v\n", s)
}

After

package main

type OptFunc func(*Opts)

type Opts struct {
    maxConn int
    id              string
    tls             bool
}

func defaultOpts() Opts {
    return Opts{
        maxConn: 10, 
        id:              "default",
        tls:             false,
    }
}

func withTLS(opts *Opts) {
    opts.tls = true
}

func withMaxConn(n int) OptFun {
    return func(opts *Opts) {
        opts.maxConn = n
    }
}

type Server struct {
    Opts
}

func NewServer(opts ...OptFunc) *Server {
    o := defailtOpts()

    for _, fn := range opts {
        fn(&o)
    }

    return &Server{
        Opts: o,
    }
}

func main() {
    s := NewServer(withTLS, withMaxConn(20))
    
    fmt.Printf("%+v\n", s)
}

Query top 10 biggest table in database

Script to create millions of records: https://gist.github.com/japananh/4bd13ca2af0813af2246164cbe64b48d
Query top 10 biggest table in database:

SELECT
    relname AS "tables",
    pg_size_pretty (
        pg_total_relation_size (X .oid)
    ) AS "size"
FROM
    pg_class X
LEFT JOIN pg_namespace Y ON (Y.oid = X .relnamespace)
WHERE
    nspname NOT IN (
        'pg_catalog',
        'information_schema'
    )
AND X .relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY
    pg_total_relation_size (X .oid) ASC
LIMIT 10;

Explain analyze in Postgres

CMS - Content management system - No code BE

Start everything with 1 container: api, database, webhook, cronjob, asset storage, ...

Directus: https://github.com/directus/directus

PocketBase: https://github.com/pocketbase/pocketbase

Consistent hashing

Consistent Hashing

Overview

Consistent hashing is a technique used in distributed computing and data storage systems, often relevant in the context of backend development.

Example

Imagine you have a distributed system with 4 nodes, and you want to distribute data items across these nodes using consistent hashing.

1. Node Identification

Node S0 is assigned a hash identifier of S%360=0.
Node S90 is assigned a hash identifier of S%360=90.
Node S180 is assigned a hash identifier of S%360=180.
Node S270 is assigned a hash identifier of S%360=270.

2. Data Item Mapping

Data items are hashed using the same hash function, producing numerical values between 0 and N. We'll consider a few data items and their hash values:

Data Item 1 hashes to 1000. -> 1000%360=280. 280 is between S270 and S0 -> Data item 1 is mapped to the nearest node following it, which is Node S0 (hash 0)
Data Item 2 hashes to 1500. -> S90
Data Item 3 hashes to 2000. -> S270
Data Item 4 hashes to 3000. -> S120
Data Item 5 hashes to 4000. -> S90

Now, the data items are consistently distributed across the nodes.

3. Add/Remove Node

If a new node, S50, with a hash identifier of 50, is added to the system, only Data Item 5 (hash 4000) needs to be remapped, and it will be assigned to Node S50. The rest of the data items remain with their current nodes. This minimizes data movement and keeps the distribution efficient.

Similarly, if a node goes offline, its data can be reassigned to the next available node on the ring, ensuring data availability and load balancing.

Auto Assign Github action

Auto Assign GitHub reviewers and assignees

My company didn't have any workflow that auto-adds reviewers and assignees, so I decided to do it myself.
Ref: https://github.com/kentaro-m/auto-assign-action

AES - GCM in Golang

https://github.com/japananh/cryptography-intro/blob/main/aes.go

Back-of-the-envelope estimation

https://bytebytego.com/courses/system-design-interview/back-of-the-envelope-estimation

According to Jeff Dean, Google Senior Fellow, “back-of-the-envelope calculations are estimates you create using a combination of thought experiments and common performance numbers to get a good feel for which designs will meet your requirements” [1]

[1] J. Dean.Google Pro Tip: Use Back-Of-The-Envelope-Calculations To Choose The Best Design:
http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html

The Difference Between Concurrency and Parallelism

Concurrency is a property of the code; parallelism is a property of the running program. - Concurrency in Go
Katherine Cox-Buday

The chunks of our program may appear to be running in parallel, but really they’re executing in a sequential manner faster than is distinguishable. The CPU context switches to share time between different programs, and over a coarse enough granularity of time, the tasks appear to be running in parallel. If we were to run the same binary on a machine with two cores, the program’s chunks might actually be running in parallel.

This reveals a few interesting and important things. The first is that we do not write parallel code, only concurrent code that we hope will be run in parallel. Once again, parallelism is a property of the runtime of our program, not the code.

The second interesting thing is that we see it is possible—maybe even desirable—to be ignorant of whether our concurrent code is actually running in parallel. This is only made possible by the layers of abstraction that lie beneath our program’s model: the concurrency primitives, the program’s runtime, the operating system, the platform the operating system runs on (in the case of hypervisors, containers, and virtual machines), and ultimately the CPUs. These abstractions are what allow us to make the distinction between concurrency and parallelism, and ultimately what give us the power and flexibility to express ourselves. We’ll come back to this.

The third and final interesting thing is that parallelism is a function of time, or context. Remember in “Atomicity” where we discussed the concept of context? There, context was defined as the bounds by which an operation was considered atomic. Here, it’s defined as the bounds by which two or more operations could be considered parallel.

For example, if our context was a space of five seconds, and we ran two operations that each took a second to run, we would consider the operations to have run in parallel. If our context was one second, we would consider the operations to have run sequentially.

BigO

Time complexity

Space complexity

Space complexity is a measure of the amount of memory or auxiliary space required by an algorithm as a function of the input size. It helps to analyze how efficiently an algorithm uses memory.
Use cases: Space complexity is particularly relevant when you need to optimize memory usage, which is important in many real-world scenarios.
Time complexity: O(n log n)
Space complexity: O(log n)
Demo:

package main

import "fmt"

func quickSort(arr []int) {
	if len(arr) <= 1 {
		return
	}

	pivot := arr[0]
	less := make([]int, 0)
	greater := make([]int, 0)

	for _, num := range arr[1:] {
		if num <= pivot {
			less = append(less, num)
		} else {
			greater = append(greater, num)
		}
	}

	quickSort(less)
	quickSort(greater)

	copy(arr, less)
	arr[len(less)] = pivot
	copy(arr[len(less)+1:], greater)
}

func main() {
	arr := []int{3, 6, 8, 10, 1, 2, 1}
	quickSort(arr)
	fmt.Println(arr)
}

Channels in Go

1. How channel was invented?

The communication in the Go channel is inspired by CSP and guarded command.

CSP stands for “Communicating Sequential Processes,” which is both a technique and the name of the paper that introduced it. In this paper, Hoare suggests that input and output are two overlooked primitives of programming—particularly in concurrent code. CSP was only a simple programming language constructed solely to demonstrate the power of communicating sequential processes.

A guarded command, which Edgar Dijkstra had introduced in a previous paper written in 1974, “Guarded commands, nondeterminacy and formal derivation of programs”, is simply a statement with a left and righthand side, split by a →. The lefthand side served as a conditional, or guard for the righthand side in that if the lefthand side was false or, in the case of a command, returned false or had exited, the righthand side would never be executed.

writeCh := make(chan<- interface{})
readCh := make(<-chan interface{})

<-writeCh
readCh <- struct{}{}

This will cause error.

invalid operation: <-writeCh (receive from send-only type
    chan<- interface {})
invalid operation: readCh <- struct {} literal (send to receive-only
    type <-chan interface {})

2. How channels are created in Go?

When the Go compiler encounters the statement ch := make(chan int), it leads to the creation of a channel that is capable of transmitting integers. The process involves several steps under the hood, both at compile time and at runtime, to set up and initialize this channel for use in your Go program. Here's a simplified view of what happens:

2.1. Compile Time

Type Checking: The compiler verifies that the make function is called with a valid channel type, in this case, chan int. This ensures type safety, meaning the channel will only accept integers.
Code Generation: The compiler generates the necessary instructions to allocate and initialize a channel at runtime. This includes setting up any internal data structures required for the channel's operation.

2.2. Runtime

When the compiled code reaches the make(chan int) statement during execution, the Go runtime performs the following steps:

Channel Allocation: The runtime allocates memory for the channel. This memory includes not just the channel itself but also the internal data structures needed to manage the channel's state and the messages it will pass.
Initialization: The runtime initializes the channel's internal data structures. These structures include:
- A queue for storing sent values (for buffered channels, this queue has a capacity; for unbuffered channels, the capacity is effectively zero).
- Synchronization primitives to manage access to the channel, ensuring that send and receive operations are safe to use across multiple goroutines.
- Status flags or similar mechanisms to track whether the channel is open or closed.
Setting Zero Capacity: For an unbuffered channel like ch := make(chan int), the channel is set up with zero capacity. This means that send operations will block until another goroutine is ready to receive the value, facilitating direct handoff and synchronization between goroutines.
Returning a Reference: The runtime returns a reference to the newly created channel, which is assigned to the variable ch in your Go program. This reference is what you use to send and receive values through the channel.

2.3. Internal Data Structures

Although the exact implementation details can vary and may evolve over time, Go typically uses complex data structures to manage channels, including:

Send and Receive Queues: To manage goroutines that are waiting to send to or receive from the channel.
Locks or Atomic Operations: To ensure that concurrent access to the channel by multiple goroutines is safe and does not lead to race conditions.

3. When to use channels in Go?

4. Go’s Philosophy on Concurrency

Share memory by communicating; don’t communicate by sharing memory.

This phrase, "Share memory by communicating; don’t communicate by sharing memory", encapsulates a fundamental principle of concurrent programming in Go. It contrasts two approaches to concurrency:

4.1. Communicate by Sharing Memory

This traditional approach involves multiple threads accessing and modifying shared data structures. Synchronization primitives such as mutexes, semaphores, or locks are typically used to prevent race conditions and ensure data consistency. While effective in certain contexts, this model can be error-prone and difficult to reason about, especially as the complexity of the concurrency increases. The challenges include deadlocks, race conditions, and the cognitive load of tracking which parts of the code are accessing shared resources.

You would typically protect the counter with a mutex to prevent simultaneous updates.

var (
    counter int
    mutex   sync.Mutex
)

func Increment() {
    mutex.Lock()
    counter++
    mutex.Unlock()
}

4.1. Share Memory by Communicating

Go advocates for a different model of concurrency where goroutines communicate with each other through channels to pass data. In this model, instead of multiple goroutines accessing shared data, the data is sent from one goroutine to another. This passing of data ensures that only one goroutine has access to the particular piece of data at any time. By using channels as the primary means of synchronization and communication, the need for explicit locks is reduced, and the program becomes easier to understand and maintain.

In this model, CounterManager runs in its own goroutine, listening for increment requests. Other goroutines send an increment request through the channel. This design ensures that only one goroutine updates the counter at a time, based on messages received, thus "sharing memory by communicating."

var (
    counter int
    ch      = make(chan bool)
)

func CounterManager() {
    for range ch {
        counter++
    }
}

func Increment() {
    ch <- true
}

Go’s philosophy on concurrency can be summed up like this: aim for simplicity, use channels when possible, and treat goroutines like a free resource.

Terminology of Indexing in SQL database

Indexing in SQL database

Types of indexes

Clustered index: Data in the table is re-ordered by the index, with a maximum of 1 index per table. MySQL automatically creates a clustered index for every table.
Non-clustered Index: There will be a separate memory to store the index, then pointed to the table in the disk. All indexes in Postgres are non-clustered indexes.

Query Plan

Sequential Scan: Scan the whole table line by line until the engine finds the match.
Index Only Scan: Scan the separate memory where the index is stored, there is no need to scan the whole table
Index Scan: Searching with index.
Bitmap Index Scan: Create a map data structure to mark the records in the table as 1 (read) or 0 (unread).

B Tree vs B+ Tree

Buffered vs Unbuffered channel in Golang

Feature	Unbuffered Channel	Buffered Channel
Definition	No storage, transfers data directly between goroutines.	Contains a buffer, allowing storage of multiple values.
Capacity	0 (zero)	>= 1 (defined at creation)
Creation	`ch := make(chan int)`	`ch := make(chan int, bufferSize)`
Behavior (Send operation)	Will block until the sent value is received by another goroutine.	Will send immediately if the buffer has space, or else it will block.
Behavior (Receive operation)	Will block until a value is sent by another goroutine.	Will receive immediately if the buffer has values, or else it will block.
Use Case	- Real-time processes where immediate handling is crucial. - Ensuring step-by-step synchronization between goroutines.	- Situations where senders might momentarily produce data faster than receivers can handle. - When you want some "elasticity" between producers and consumers of data.
Synchronization	Synchronization is direct. When one goroutine sends a value on the channel, it blocks until another goroutine receives that value. This ensures direct hand-off between the sender and the receiver, which can be thought of as a form of strict synchronization.	Depends on buffer's state: For send operation: - If the buffer is not full, a goroutine can send a value to the channel without blocking. The value goes into the buffer. - If the buffer is full, the sending goroutine will block until there is space in the buffer (i.e. until some other goroutine receives a value from the channel). For receive operation: - If the buffer is not empty, a goroutine can receive a value from the channel without blocking. - If the buffer is empty, the receiving goroutine will block until there is a value in the buffer (i.e., some other goroutine sends a value to the channel).

Visibility Map in Postgres

The Visibility Map is a bitmap associated with each table in PostgreSQL, where each bit corresponds to a table data page. The VM tracks which pages in a table contain only tuples (rows) visible to all active transactions.

In the bitmap:

A set bit means that all tuples on the page are visible to all transactions.
An unset bit means the page might contain some tuples that are not visible to all transactions.

Ref: https://www.postgresql.org/docs/current/storage-vm.html

WAL (Write ahead log)

How does WAL work?

WAL (Write ahead log or Redo log) is a method used to ensure that changes to a database are recorded before the actual data is updated. It's primarily used in database management systems (DBMS) like PostgreSQL and SQLite.
How it works: When a change is made to the data, it's first written to a log (the WAL), and then the actual data is updated. This ensures that changes are durable and can be replayed if there's a crash or failure.
Configure WAL segment size in the configuration file (postgresql.conf)
- max_wal_size: default 1 GB
- min_wal_size: default 80 MB

Best practices

WAL has a size, when the WAL is full, it's time to flush changes into disk. Keep the WAL segment size as short as possible to reduce the checkpoint time, so the flushing of data from WAL to disk will be more frequent.

References

https://www.postgresql.org/docs/current/runtime-config-wal.html

japananh / til Goto Github PK

til's Introduction

TIL - Today I learned

til's People

Contributors

Stargazers

Watchers

til's Issues

Prevent critical attacks in APIs

1. CORS

2. DoS (Denial of Service)

3. SQL Injection

4. XSS (Cross-Site Scripting)

SSRF (Server-Side Request Forgery)

Go’s Concurrency Building Blocks

1. Goroutines

Problem

Solution

References

Best practice struct configuration pattern for Golang

Before

After

Consistent Hashing

Overview

Example

1. Node Identification

2. Data Item Mapping

3. Add/Remove Node

Auto Assign GitHub reviewers and assignees

The Difference Between Concurrency and Parallelism

BigO

Time complexity

Space complexity

Channels in Go

1. How channel was invented?

2. How channels are created in Go?

2.1. Compile Time

2.2. Runtime

2.3. Internal Data Structures

3. When to use channels in Go?

4. Go’s Philosophy on Concurrency

4.1. Communicate by Sharing Memory

4.1. Share Memory by Communicating

Indexing in SQL database

Types of indexes

Query Plan

B Tree vs B+ Tree

Visibility Map in Postgres

How does WAL work?

Best practices

References

Recommend Projects

Recommend Topics

Recommend Org