Git Product home page Git Product logo

andreasjhkarlsson.github.io's People

Contributors

andreasjhkarlsson avatar

Stargazers

Felix Kaspar avatar LowLevelCodingCH avatar Phoenixthrush UwU avatar  avatar Braden Farmer avatar  avatar Zoont avatar David Bischof avatar Till Vogelsang avatar Martin Kapal avatar Bohdan Mart avatar  avatar zgurea avatar Joshua Chung avatar  avatar Leah Casey avatar Green Lightning avatar

Watchers

Bohdan Mart avatar  avatar

Forkers

mr-pixelzen

andreasjhkarlsson.github.io's Issues

Blog article "4 billion if statements": Some improvements

I just read your article 4 billion if statements and I have to say I'm impressed by how blazingly efficient your program is - just 10 seconds per number, wow! However, I did see some room for improvement:

  • Your program only works for 32-bit numbers, but what about 64-bit numbers? Of course, instead of 4 billion comparsions, this would need 18 quintillion of them. This would make the file with the instructions slightly bigger: according to my calculations, it would be 172 exabyte big. Another problem that follows from that is that the file wouldn't fit into the 64-bit address space of x64 anymore, which is only a measly 18 exabyte in size. I'm sure both problems can be trivially solved by just bying a larger disk to store the file as well as a CPU with a larger address space.
  • To find out which number was passed to your program, you search through all 32-bit integers (so numbers in range 0-4294967296) sequentially. While that does match the algorithm from the original TikTok post, it's actually not the most efficient. Instead, you could recursively split the search space in half: first check whether the given number is bigger or less than the middle of the range 0-4294967296, which is 2147483648. If it was less, recursively do the same for the range 0-2147483648, and if it was bigger, for the range 2147483648-4294967296. That still matches the TikTok algorithm closely - the only difference is how the if's are arranged, and that each condition isn't ==, but >= (or similar). Even the number of total comparisons stays the same.
  • You generate the entire 40 GB of comparisons at "compile-time" and store them on disk. However, for users who want to have a more lightweight solution, it might be advantageous to generate the comparisons at runtime, and only on demand. Of course, this comes with a performance penalty for users who have a very fast disk because of the overhead of generating the instructions instead of just reading them from a file.
  • Your program is built for Windows. Instead, I suggest using Linux, which objectively is the better operating system.

I included all of these optimizations and came up with this:

eighteen_quintillion_if_statements.c
#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

// Idea stolen from https://andreasjhkarlsson.github.io/jekyll/update/2023/12/27/4-billion-if-statements.html

#define die(msg) do { puts(msg); exit(EXIT_FAILURE); } while(0)

size_t page_size;

char *create_lazy_code_page(size_t num, size_t bit);

void fill_code_page(char *page, size_t num, size_t bit) {
	fprintf(stderr, "\tFilling  code page for num 0x%016lx, bit 0x%016lx at %p\n", num, bit, page);

	size_t off = 0;
	#define emit(b) (page[off++] = (b))

	// should never happen - we should stop at bit 1, not bit 0.
	if(bit == 0)
		die("bit 0");

	// The code we write will overwrite the jump to the end of the page, where the call to fill_code_page is.
	// This means that future invocations will immediately start with the code we emit,
	// without wasting any cycles on checking whether the page has already been filled or not!

	emit(0x48); // mov rax, <num with bit set>
	emit(0xb8);
	for(char i = 0; i < 8; i ++)
		emit((char) ((num | bit) >> (8*i)));

	emit(0x48); // cmp rdi, rax
	emit(0x39);
	emit(0xc7);

	if(bit == 1) {
		emit(0xb8); // mov eax, 1
		emit(0x01);
		emit(0x00);
		emit(0x00);
		emit(0x00);

		// Can't use xor esi, esi here: that would clobber the flags
		emit(0xbe); // mov esi, 0
		emit(0x00);
		emit(0x00);
		emit(0x00);
		emit(0x00);

		// Now move 0 into rax if argument was above or equal to (num|bit), which means bit was set, which means number is odd.
		emit(0x48); // cmovae rax, rsi
		emit(0x0f);
		emit(0x43);
		emit(0xc6);

		emit(0xc3); // ret
	}
	else {
		char *handler_bit0 = create_lazy_code_page(num      , bit >> 1);
		char *handler_bit1 = create_lazy_code_page(num | bit, bit >> 1);

		emit(0x48); // mov rax, <handler for bit=0>
		emit(0xb8);
		for(char i = 0; i < 8; i ++)
			emit((char) (((size_t)handler_bit0) >> (8*i)));

		emit(0x48); // mov rsi, <handler for bit=1>
		emit(0xbe);
		for(char i = 0; i < 8; i ++)
			emit((char) (((size_t)handler_bit1) >> (8*i)));

		// Now move bit1 handler into rax if argument was above (num|bit) - which means bit was set
		emit(0x48); // cmovae rax, rsi
		emit(0x0f);
		emit(0x43);
		emit(0xc6);

		// jump to the correct handler
		emit(0xff); // jmp rax
		emit(0xe0);
	}
}

char *create_lazy_code_page(size_t num, size_t bit) {
	char *page = mmap(0, page_size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
	if(page == (char*) -1)
		die("mmap");
	fprintf(stderr, "\tNew lazy code page for num 0x%016lx, bit 0x%016lx at %p\n", num, bit, page);

	size_t off = 0;
	#define emit(b) (page[off++] = (b))

	size_t fill_call_length = 10+10+10+10+1+2+1+5;

	// Place the call to fill_code_page at the end of the page
	// so that fill_code_page doesn't overwrite it.
	// So, first, jump to fill_call_length bytes before the end of the page -
	// that's exactly how long the call (and the jmp back to the start) is.
	emit(0xe9); // jmp <fill_call_length bytes before end of page>
	for(char i = 0; i < 4; i ++)
		// additional -5 because that's how long this jmp itself is
		emit((char) ((page_size - fill_call_length - 5) >> (8*i)));

	off = page_size - fill_call_length;

	// save rdi - this also aligns the stack correctly
	emit(0x57); // push rdi

	emit(0x48); // mov rdi, <page>
	emit(0xbf);
	for(char i = 0; i < 8; i ++)
		emit((char) (((size_t)page) >> (8*i)));

	emit(0x48); // mov rsi, <num>
	emit(0xbe);
	for(char i = 0; i < 8; i ++)
		emit((char) (num >> (8*i)));

	emit(0x48); // mov rdx, <bit>
	emit(0xba);
	for(char i = 0; i < 8; i ++)
		emit((char) (bit >> (8*i)));

	emit(0x48); // mov rax, fill_code_page
	emit(0xb8);
	for(char i = 0; i < 8; i ++)
		emit((char) (((size_t) &fill_code_page) >> (8*i)));

	emit(0xff); // call rax
	emit(0xd0);

	emit(0x5f); // pop rdi

	emit(0xe9); // jmp <to start of page>
	for(char i = 0; i < 4; i ++)
		emit((char) ((-page_size) >> (8*i)));

	return page;
}

int main(int argc, char **argv) {
	page_size = sysconf(_SC_PAGE_SIZE);
	if(page_size == -1)
		die("sysconf");

	char(*core_code_entry)(size_t) = (char(*)(size_t)) create_lazy_code_page(0, 1L << 63);

	if(argc <= 1)
		for(;;) {
			size_t input;
			scanf("%zu", &input);
			char is_even = core_code_entry(input);
			puts(is_even ? "even" : "odd");
		}
	else
		for(int i = 1; i < argc; i ++) {
			size_t input;
			sscanf(argv[i], "%zu", &input);
			char is_even = core_code_entry(input);
			printf("%zu: %s\n", input, is_even ? "even" : "odd");
		}
	return EXIT_SUCCESS;
}

This program lazily allocates one RWX page per comparison to be made. At first, each such page doesn't even contain the comparison itself, but instead only bootstrap code. Once the comparison is actually needed the first time, the bootstrap code will run and generate the real comparison. Depending on the outcome, the generated code will jump to one of two new lazily filled-out RWX pages. Each of those will initially only contain bootstrap code as well and will get filled out with the correct subsequent comparison the first time they are actually needed.

Note that this filling out actually overwrites the bootstrap code, which means not a single cycle is wasted on checking whether a given comparison has already been generated or not! To be able to observe this speed bonus, the program supports checking multiple numbers for evenness in a single run.

Example:

$ time ./eighteen_quintillion_if_statements \
		1234 12345 18446744073709551615 18446744073709551614 18446744073709551613 18446744073709551612
	New lazy code page for num 0x0000000000000000, bit 0x8000000000000000 at 0x7f00268d7000
	Filling  code page for num 0x0000000000000000, bit 0x8000000000000000 at 0x7f00268d7000
	New lazy code page for num 0x0000000000000000, bit 0x4000000000000000 at 0x7f00268d6000
	New lazy code page for num 0x8000000000000000, bit 0x4000000000000000 at 0x7f00268d5000
	Filling  code page for num 0x0000000000000000, bit 0x4000000000000000 at 0x7f00268d6000
	[... snip: 182 lines ...]
	Filling  code page for num 0x00000000000004d0, bit 0x0000000000000002 at 0x7f0026678000
	New lazy code page for num 0x00000000000004d0, bit 0x0000000000000001 at 0x7f0026676000
	New lazy code page for num 0x00000000000004d2, bit 0x0000000000000001 at 0x7f0026675000
	Filling  code page for num 0x00000000000004d2, bit 0x0000000000000001 at 0x7f0026675000
1234: even
	Filling  code page for num 0x0000000000002000, bit 0x0000000000001000 at 0x7f002668d000
	[... snip: 35 lines ...]
	Filling  code page for num 0x0000000000003038, bit 0x0000000000000001 at 0x7f002665e000
12345: odd
	Filling  code page for num 0x8000000000000000, bit 0x4000000000000000 at 0x7f00268d5000
	[... snip: 185 lines ...]
	Filling  code page for num 0xfffffffffffffffe, bit 0x0000000000000001 at 0x7f00265e1000
18446744073709551615: odd
18446744073709551614: even
	Filling  code page for num 0xfffffffffffffffc, bit 0x0000000000000001 at 0x7f00265e2000
18446744073709551613: odd
18446744073709551612: even

real	0m0.012s
user	0m0.001s
sys	0m0.008s

More please!

Hey, I just watched https://youtu.be/nlFjL0B43-w which "reviewed" a post you have. I came to the site, but disappointed to find no more. Nice style and informative too. Please write more!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.