Process Creation

While we won’t dive too deeply into the details of process creation, it’s important to understand that there’s no “magic” involved in running new programs or getting command-line arguments to main. When a command shell (or any other program—since the shell is just another program) runs another program, it performs a couple of system calls. First, it creates a new process using fork. This new process is an identical copy of the original, except for the return value from the fork call, which distinguishes the child from the parent. Then, the new process calls execve, which replaces its current program with the requested program.

The execve system call takes three arguments: the file containing the binary (or script) to run, the arguments to pass as argv (which must end with a NULL), and the environment variables to pass as envp. Even if the program doesn’t explicitly use envp, it’s still passed so environment functions can access it.

When the OS executes execve, the currently running program is overwritten and replaced (the system call only returns on an error), and the specified binary is loaded into memory. The argv and envp values are written into memory in a specific format defined by the ABI (Application Binary Interface)—a contract between the OS and programs. The OS then starts executing at a location specified in the executable binary. In Linux with gcc, this entry point is typically a symbol called _start, but the details are platform-specific.

This startup code is linked with any C program by default (unless specified otherwise) and initializes the C library before counting the elements of argv to determine argc and eventually calling main. Regardless of how main is defined, it always receives argc, argv, and envp. If fewer arguments are declared, they are still passed but ignored.

When main returns, it passes control back to the startup code, which handles any necessary cleanup for the C library and then calls exit, passing main’s return value to indicate the program’s exit status.

The shell (or another program) can then wait for the “child” process to finish and collect its exit status.

프로세스 생성

프로세스 생성에 대한 자세한 내용은 다루지 않지만, 새로운 프로그램을 실행하거나 main 함수에 명령줄 인수가 전달되는 과정에 "마법" 같은 것은 없다는 점을 이해하는 것이 중요합니다. 명령 셸(또는 다른 프로그램—셸도 단지 하나의 프로그램일 뿐)이 다른 프로그램을 실행할 때, 두 가지 시스템 호출을 수행합니다. 첫 번째로, fork를 호출하여 새로운 프로세스를 만듭니다. 이 새 프로세스는 원래 프로세스와 동일하지만, fork 시스템 호출의 반환 값이 다르다는 점에서만 구별됩니다. 그런 다음, 새 프로세스는 execve를 호출하여 실행 중인 프로그램을 요청된 프로그램으로 교체합니다.

execve 시스템 호출은 세 가지 인수를 받습니다: 실행할 바이너리(또는 스크립트) 파일, argv로 전달할 인수(반드시 NULL로 끝나야 함), 그리고 envp로 전달할 환경 변수입니다. 프로그램이 envp를 명시적으로 사용하지 않더라도, 환경 변수 함수들이 접근할 수 있도록 여전히 전달됩니다.

OS가 execve를 실행하면, 현재 실행 중인 프로그램은 덮어쓰여지고 대체되며(시스템 호출은 오류가 발생할 때만 반환됨), 지정된 바이너리가 메모리에 로드됩니다. argv와 envp 값은 ABI(응용 프로그램 바이너리 인터페이스)에 정의된 특정 형식으로 메모리에 기록됩니다. 이후 OS는 바이너리에서 지정된 위치에서 실행을 시작합니다. Linux와 gcc의 경우, 이 진입점은 일반적으로 _start라는 심볼로 지정되지만, 플랫폼에 따라 세부 사항이 다릅니다.

이 초기화 코드는 기본적으로 C 프로그램과 링크되며(명시적으로 요청하지 않는 한), C 라이브러리를 초기화하고 argv 요소를 계산하여 argc를 결정한 후 main을 호출합니다. main 함수가 어떻게 정의되든 상관없이, 항상 argc, argv, 그리고 envp를 받습니다. 더 적은 인수가 선언되어 있더라도 여전히 전달되지만 무시됩니다.

main이 반환되면, 제어는 초기화 코드로 돌아가며, 초기화 코드는 C 라이브러리에 필요한 정리 작업을 수행한 후 프로그램 종료 상태를 나타내는 exit를 호출합니다.

셸(또는 다른 프로그램)은 "자식" 프로세스가 종료될 때까지 기다린 후 종료 상태를 수집할 수 있습니다.