tuhdo / os01 Goto Github PK

View Code? Open in Web Editor NEW

11.7K 402.0 684.0 20.3 MB

Bootstrap yourself to write an OS from scratch. A book for self-learner.

GDB 2.38% Makefile 11.16% Assembly 10.18% C 0.06% TeX 76.21%

operating-system book

os01's People

Contributors

Stargazers

Watchers

Forkers

zuxfoucault xel nmarwen antouank antonini josefaeti geary sshilko maykonchagas pathcl sebasstrogg surfcao nribeka allensmile palerdot vishuv pajamaw kryptblue sanmathigb hbcbh1999 zakiindrasukma nemock channitham fawaf wmaillard mrzorin abhishekmallick rpj911 karbon0x archcloudlabs makeyang vedhavyas jacke duongvanlong tasarinan virajshah sunjieee keramix slobo evan778 1135586 backupmanager julienaubert umilab mrcorncob nonva flashriver ryangalamb steveops baoson2211 icjl forkedreposbak nangal ousia glaand iledarn hanyedeleng xilunli catherinetcai hhy5277 cjschneider2 acehreli benjamesbabala thuongka zachlungu hadisinaee michaltaratuta cromalisa euccas jconard3 neo4reo vincentcheungm prog012 aparna1996 viewtiful gougu castellanprime jkang12 bicho19 light-bringer nqtrieudev ar7ch sampisamuel keizar901 kuldeepkeshwar jkomyno 9eekahmed kangkot carter3x mfalconi hoangtho799 thanhniencung tinti khan-faiz noriyotcp alpardal chungy versine dannyboyer maximilianofelice

os01's Issues

Using gitbook with markdown instead of latex.[Suggestion]

https://github.com/GitbookIO/gitbook
It has more features.

[page 122] [Typo] : Example lists the wrong section

Example 5.3.1 states that the section is the NULL section.
The section is the .interp section at index 1. The NULL section is at index 0.

Assembly Incorrect?

On page 25, we have that this C:

if (argc1) {
  i=i;
} else {
  i=0;
}

And equivalent assembly:

cmp DWORD PTR [ebp+0x8],0x0
je 80483f7 <main+0x1c>
mov DWORD PTR [ebp-0x4],0x1
jmp 80483fe <main+0x23>
mov DWORD PTR [ebp-0x4],0x0

In the "if" portion of the assembly code (line 3), the registry is set to 0x1. This means i is set to hexadecimal 1, aka 1.

However, in the "if" portion of the C code, i is set to itself.

Am I missing something by chance?

"Byte sized" vs. "Quadword sized"

In chapter 4.8.1, "fundamental data types", you show a diagram (figure 4.8.1) which shows the different sized integers. On my PDF it's on page 72.

The quadword-sized [un]signed integers in the diagram are incorrectly labeled as "byte-sized [un]signed integer":

[pg.31][grammar] wrong word

"knowing" in line 1 should be "nothing".

[page 71][figure layout]: segment and address are reversed

The blue box is supposed to contain the segment address 0x1234 and the red box is supposed to contain the address 0x5678. The values do not have the correct colors.

Source code?

I've taken a glance at the PDF in your repository and it looks interesting. Do you happen to have the source code you used to create it to begin with?

(Note: Storing binary files, particularly large binary files in a Git repository is not a good idea as it increases the overall repository size. You can solve this by not including the PDF as part of the repository, but as a GitHub release download or a file on a webserver or similar.)

Reference to production implementation

For each chapter in part 3, the readers need to refer to a production implementation to check their code and improve it, so they gain practical knowledge that later can be used elsewhere. The reference code should be extracted from the Linux kernel.

Add line numbers for listing

For consistency.

[page 141][inconsistant] : Doesn't match other examples

In the other examples you use:
printf("%s\n", FUNCTION);

In this example you diverge and tell printf to print 'destructor'.
It might confuse the reader to have one example different from the others.

[pg.9][Repetition Error] Reuses last sentence

" Most examples revolve around variants of a“Hello World” program. Most examples revolve around variants of a “Hello World” program, which will acquaint you with core concepts. "

[page 71][content] CS and EIP values are swapped

Hi,

in page 71, when describing jmp far [eax], you say:

The far address consumes total of 6 bytes in size for a 16-bit segment and 32-bit address, which is encoded as m16:32 from the table 4.7.1. As can be seen from the figure above, the blue part is a segment address, loaded into cs register with the value 0x1234; the red part is the memory address within that segment, loaded into eip register with the value 0x5678 and start executing from there.

The two values 0x1234 and 0x5678 are switched.

Also, in figure 4.7.1, shouldn't 0x1234 be laid out as 0x34120000?

In addition to that, could you add a short comment on endianness, explaining why m16:32 is laid out in memory in reverse order?

Thank you.

[page 50][typo] hight level -> high level

3rd line of objdump section.

SUGGESTION: Modern Storage Devices

Let me first say that I really like your book and its presentation style that does not lose sight of the Forrest for the trees. In particular I really like the autodidact approach and starting from first principles. With that in mind I thought the mention of storage devices on pg 22. could have used a little more of this kind of treatment rather than saying "...the modern devices are so complex that is is impossible and unnecessary to understand every implementation detail.."
Rather than say that I propose a quick hi-level summary along the lines of ..

"Modern Storage Devices are implemented by injection of an electron into a material which is held and retrieved from said material by the opening and closing of potiential barriers that exist by virtue of the materials elemental properties and are manipulated by applying a voltage across the two materials. The materials involved will affect the response and retrieval times as well as the memory's fastidiousness in holding the charge over time. The details of this process can be found in any introductory Solid State Theory Textbook."

...While admittedly that is quite terse and probably could use some rephrasing, it does capture the basic gist of it without going into details of quantum mechanics or any talk of valence bands, etc.

Another approach might be to rephrase the above into an analogy along the lines of hungry hungry hippos where the electrons are the balls traveling along the conductive path / BUS and the opening and closing of the hippos mouth be the barrier / oxide and the inside of the mouth be the potential well of the receiving substrate. The players fingers which activate the hippos mouth would be the applied voltage. If you need more detail, I would say the best intro to solid state physics would be https://www.amazon.com/Semiconductor-Device-Fundamentals-Robert-Pierret/dp/0201543931/ref=sr_1_1?s=books&ie=UTF8&qid=1487522457&sr=1-1&keywords=semiconductor+device+fundamentals

Whether or not this is a worthwhile digression (maybe in an appendix?), I think this would help maintain the "from basic principles" approach.

Provide instructions to build the book from source

Perhaps a Makefile that includes the command to build the pdf...

Thanks,
Ali

[page 140][typo] : FINI_ARRAY has a typo '.abort()'.

It seems you want to use the abort function call, but imply you are referencing a section '.abort()'.

[pg.50][Typo] Hight -> High

In section 4.1, there is this sentence:

"Now, we use objdump to examine how hight level source code maps..."

Hight should be spelled high.

[Pg.63][Confusing] SIB Table

On page 63, we have this instruction:

jmp [EAX*2 + EBX]

Which turns into:

00000000 67 ff 24 43

43 is the SIB code.

The lookup table provided says that 43 corresponds with row [EBX2] and column EAX. This suggests EBX2 + EAX, which doesn't match up with the original EAX*2 + EBX. I have double checked with OS Wiki, and 43 is indeed the correct SIB code.

As is, the table feels confusing.

String writing does not appear to be working

The code used to write a string, exercise 7.5.1, does not appear to be working. Specifically the string is not written to screen, even though multiple manual calls to my PutChar implementation do work. Even not the reference implementation was able to write the string to the display.
Stripped down code:

bits 16
start: jmp boot

boot:
  cli	; no interrupts
  cld	; all that we need to init
  
  call PrintBootMsg
  hlt	; halt the system

   ; dl = x; dh = y
MovCursor:
; [redacted]
   ret

   ; al = chr, cx = repeat
PutChar:
;   [...] (redacted, is functional though)
   ret
   
   ;; ds:si = Zero terminated string
Print:
.loop:
   lodsb
   or al, al
   jz .done
   mov cx, 1
   call PutChar
   jmp .loop
   
   .done:
   ret
   
; Print the boot message
PrintBootMsg:
   ; Reset cursor
   mov bh, 0
   mov bl, 0
   call MovCursor
  
   ; Print
   mov si, bootMsg
   call Print
   ret
   
   ;; constant and variable definitions
bootMsg db "Booting the Operating System!", 10, 13, 0
cursor_X db 0
cursor_Y db 0

   ; We have to be 512 bytes. Clear the rest of the bytes with 0
times 510 - ($-$$) db 0

dw 0xAA55   ; Boot Signature

The string does appear to be compiled in the binary, but not loading in memory. I tried inspecting the system memory with gdb x/512sb but I couldn't find any trace of the string. Moving the db instructions above does have effect on the output file but not on actual program execution.

Using NASM version 2.10.09.

Add cross compiler as an exercise with some guidance and invoke linker scripts with the new toolchain

At the moment, the book uses -m32 and -m64 and -m elf_i386 (for ld), which is not a proper way to create an operating system image for a target machine. However, for the purpose of getting to a bootable image as soon as possible, it is the quickest way. However, we cannot ignore the proper way, so it will be integrated as an exercise by:

Following the cross compiler tutorial on OSDev wiki
Then integrate the new toolchain to the existing code base.
If the readers had done it correctly, the new image should behave the same like before.

After that, the Makefiles must be updated to use the new toolchain for invoking the linker scripts in the book.

What is the error in this line of code? My program ends unexpectedly

#include
#include

using namespace std;

struct employee

{
string empID;
char *empName;
float rate, bsal, gsal, netsal;
float dutyAllow, fuelAllow;
float tax, socSec;
int ndayWork;

float someone();
string userinfo();
};
string employee::userinfo()
{
employee::empName= new char [40];
cout<< "Enter Your name"<< endl;
cin>>employee::empName;
cout<<"Enter Your ID Number"<< endl;
cin>>employee::empID;
A:
cout<< "Enter Your Number of Days Of work"<< endl;
cin>>employee::ndayWork;

if (employee::ndayWork>31)
{
   goto A;
}

}

float employee::someone()
{
char choice;
cout<<"WELCOME TO KOFORIDUA NURSES PORTAL*"<< endl;
cout<< "Please enter Your denomination= ";
cout<<"Nurse = 1"<< endl<< "Doctor=2"<< endl;
cin>> choice;

switch (choice)
{
	case 1:
		if(choice<=1)
		{
		
		 cout<<"Daily Mark is 8Ghc"<<endl;
		 
		 
		 employee::netsal=8* employee::ndayWork;
		 cout<<"Your name is "<<employee::empName<<endl;
		 cout<<"Your ID is "<<employee::empID<<endl;
		 cout<<"Your monthly salary is "<< employee::netsal<< endl;
		}
		break;
	case 2:
		if(choice==2)
		{
			
		cout<<"Daily Mark is 10Ghc"<<endl;
		 employee::netsal=10* employee::ndayWork;
		 cout<<"Your name is "<<employee::empName<<endl;
		 cout<<"Your ID is "<<employee::empID<<endl;
		 cout<<"Your monthly salary is "<< employee::netsal<< endl;
		
		}
		break;
	default:
		cout<<"Error"<< endl;
		
	
}

}

int main()
{

int salary;


employee Doctor, Nurse;
Nurse.someone();
Nurse.userinfo();



	
system("PAUSE");
return 0;

}

[Page 18][Grammar]

"The software engineer must also select the right programming techniques that are apply to the problem domain he is trying to solve because many techniques that are effective in one domain might not be in another. " It should be " ... the right programming techniques that apply to the problem domain ... ".
Also, this is page 18 out of 313, however it is technically page 4 as the book says. Not sure which one to put

Creating your own programming language

I have come across this repo by coincidence and I am so happy I found it. I was always fascinated by operating systems and their implementations since college days.

Another thing which always interested me was programming languages. There is a lot of theory and a lot of practical books as well, but there isn't a place where I could find that connects the theory with practical implementation together at the same time and provides a deep understanding to how these concepts relate.

I would love to hear if you know of a such a resource or book ?
Kindest regards,

[pg.31] [Typo] "knowing" to "nothing"

"if a programmer knows absolutely knowing about hardware"

question / suggestion: are all the code capable with / optimized for x86_64 instructions?

Hi,

I am reading your book, and the book seems to be capable with 32 bit processor rather than 64 bit processor. For example, in Chapter 4 "x86 Assembly and C" you introduce x84 assembly, and in Chapter 6 "Runtime inspection and debug" you intentionally compile C program with -m32 flag.

My (limited) understanding is that assembly for x86 and x64 has some (significant) differences, and surely debugging x64 gcc program differs, too. And nowadays (year 2017) almost all new-released personal computers are using 64 bit processors.

Is it unnecessary to notice the differences? Or sufficient to write codes for 32 bit processors in almost all computers (every practical codes are written in this way)?

I am definitely a newbie here, so maybe I am missing something.

Thank you.

[pg. 44][grammar] Missing word in a phrase

2nd paragraph:

Physically, buses are just electrical wires
that connect all components together and each wire transfer a single
big of data.

It probably should be "big piece of data" or something like that

[page 63][content] SIB byte table and example

Hi,

On page 63, you use jmp [eax*2 + ebx] as a simple example to show how the SIB byte works.
The instruction is assembled to 0x67 ff 24 43, and 0x43 is the SIB byte, but when I look for 0x43 in the table, it looks like the final address would be [ebx*2 + eax].
It seems to me, from the table, that the SIB byte for jmp [eax*2 + ebx] would be 0x58, which it clearly isn't.

Also, in the next page, there's a small typo:

After the prefix, comes the opcode 0x67 [...]

It should be 0xff.

I'm loving your book and I'll continue to contribute!

Complete the chapter on descriptor that introduces debugging CPU using QEMU and the Intel manuals

The first chapter in part 3 is intended to teach x86 memory descriptors and the guidelines to implement a simple runtime memory model. In the process, guide the readers how to use QEMU logging and various info commands, in combination with the Intel manuals to debug CPU exceptions. This is the first step to build a foundation for working on more complicated features in future chapters.

[page 149][missing example]: missing command to generate output

in the prior page you say 'we check the sections of both files'

the command is given to output the sections of math.o
but the command is not given to output the sections of hello.o

Examples 4.5.2 and 4.5.3

It seems that the example 4.5.2 is generated with 16-bit code and the example 4.5.3 is generated with 32 bit code.

Example 4.5.2

The book says

jmp [0x1234]
Then, the machine code is:
ff 26 34 12

And it works with the following instructions for nasm :

bits 16
jmp [0x1234]

Example 4.5.3

The book says :

add eax, ecx
Then the machine code is:
01 c8

But assembling this :

bits 16
add eax, ecx

Gives 66 01 c8, while assembling :

bits 16
add eax, ecx

gives the expected result.

By the way, thanks for this book, it is really usefull, I am learning assembly with it in the hope to be able to complete Exercise 7.5.1 (I saw that there is some code in the repos, but I don't want to look at it at the moment).

Grammatical fix (pg: 9)

In the page 9 of the pdf, there is a line which says

e.g. Intel are critical for implementing an operating system or any other software that direct controls the hardware.

which should read

e.g. Intel are critical for implementing an operating system or any other software that directly controls the hardware.

It is better to have the source in this repo, to which people can give pull requests. You should probably want to define some guidelines/format like [page no] [type => grammatical/typo] for you to easily pinpoint the issue and to prevent same issue being raised again.

Question: Page width seems short

Hi,

It seems that the page borders for the text is not well adjusted as the text covers just around 60% of the space. I don't find it pretty to my eye so I was wondering if that was on purpose.

In any case, thanks for doing the book! It's been always that I have wanted for quite some time :)

Best regards,
Antonio Huete

[pg.123][grammar] Usage of 'then' instead of 'than'.

Found this one here:

The attribute section(“...”) put a function into a particu-
lar section rather then the default .text.

[typo][64] Prefix is repeated two times when describing generated code

At the bottom of the page 63 there is code 00000000 67 ff 24 43.

Then at the page 64

First of all, the first byte, 0x67 is not an opcode but a prefix. The
number is a predefined prefix for address-size override prefix. After
the prefix, comes the opcode 0x67 and the ModR/M byte 0x24.

0x67 is repeated, it should be 0xff

[pg.52][Typo] "%rb" to "%rbp"

"In the above example, the assembly instruction is push %rb."

Translation

Hi,

Do you plan to release translations in different languages when the English version is finished? Because if so, I'd gladly help 😄!

[pg. iii][typo] Repeated sentence in the third paragraph.

"Most examples revolve around variants of a 'Hello World' program" seems to be repeated for no reason.

k + k = 2^k?

In chapter 2, on page 15, the sentence "a k-input gate uses
k PMOS and k NMOS transistors, a total of 2^k transistors" is bolded. Shouldn't it be 2k instead of 2^k?

[page 138][typo] : execution of constructor output is incorrect

The code for hello.c has two functions init1 and init2.
The output shows 'constructor' as the output when the printf's are asking to return the function names.

[page 207][typo] dd command should have seek=1

Hi,

I believe the dd command should be invoked with seek=1 in this case, since we are writing the 2nd sector.

Original:
$ dd if=sample of=disk.img bs=512 count=1 seek=0

Correction:
$ dd if=sample of=disk.img bs=512 count=1 seek=1

[pg. 9][Typo] Repeat of "Hello World" Senctence

In the Section Why another book on Operating Systems. The senctence Most examples revolve around variants of a“Hello World” program is repeated.

[page 148][typo] : compiling lib.c instead of math.c

gcc -m32 -c lib.c should be
gcc -m32 -c math.c

grammar in Preface, second sentence

You probably have spent years programming, but still understand [THE] operating system as a collection of abstract concepts

[page 96][typo] 0 instead of 1

Hi,

At the end of the logical AND explanation, you describe what happens when both operands are not 0:

If both i and j are not 0, the result is certainly 1, or true.
(a) Set it accordingly with the instruction at 0x80484a7.
(b) Then jump over the instruction at 0x80484ae to set the
variable logical_and at [ebp-0x8] to 0.

The last 0 should be a 1.

Document Links Missing

As specified on page 9, the book's website should have copies of:

Intel® 64 and IA-32 Architectures Software Developer’s Manual (Volume 1, 2, 3)
Intel® 3 Series Express Chipset Family Datasheet
System V Application Binary Interface

However, the book's website doesn't have links to these documents.

I found the documents, but wanted to point out this discrepancy.

[pg.12][style] Period in the next line

[page 44][content] Transistors aren't capacitors.

(2nd to last paragraph in 3.2.2)

Technically, the charge in a DRAM cell is stored in a capacitor, which is a different type of electrical component than a transistor. I'd suggest something like the following:

"At the physical level, RAM is implemented as a grid of cells that each contain a transistor and an electrical device called a capacitor, which stores charge for short periods of time. The transistor controls access to the capacitor; when switched on, it allows a small charge to be read from or written to the capacitor. The charge on the capacitor slowly dissipates, requiring the inclusion of a refresh circuit to periodically read values from the cells and write them back after amplification from an external power source."

... if the element is repeated accessed many times...

should be

... if the element is repeatedly accessed many times ...

apologies if this was already spotted