Memory Implementation

Where is the data?

Prog has variables like int x

Where is x in memory?
Recall Compile-time, Load-time and Run-time binding.
If user runs 2 copies of same program (e.g. 2 instances of textedit), where are the 2 copies of x in the user memory space?
If 3 users run the program on multi-user system, where are the 3 copies of x in the global memory space?

Where is the code?

Program uses system library code such as that defined in the C libraries like <stdio.h> in /usr/include
See also the C++ libraries like <iostream.h> in /usr/local/include/g++

Where is this code?
Linked in with the executable prog? (see libraries in /usr/lib)
Compile verbose: CC -v to see where the libraries are.

Separate to the executable prog, and only loaded into memory at run-time?
What happens if multiple progs reference these library routines? 1 copy in memory, or multiple copies?

What happens if only some of the library is referenced? Does the whole library have to be loaded into memory?

Indeed, what about the program itself? Does the whole program have to be loaded into memory? e.g. Some exception or error handling routines may almost never be called. e.g. Microsoft Word - millions of functions that won't be used - 90 % of the program may never be called. Do we have to load it all into memory?

Dynamic Loading

Load part of program if and when called. Load into process space. Program has a predefined address for the routine.

The ultimate extension of Dynamic Loading is Demand Paging (see below).

Dynamic Linking

Load system library code. Load into shared space. Multiple processes can access same code. 1 copy of printf in memory instead of dozens of copies of printf. Program's address for the routine not decided until run-time.

DLL, Run-time library - Can upgrade the library without recompiling the application. e.g. Change the look and feel of the windows from Win 95 style to Win 98 style. Replace draw-window.dll with draw-window.dll (updated). Don't have to recompile application. But next time you run it it will look different. This is very object-oriented.

Question - Why do we want to avoid recompiling applications?

Shared library code v. Shared library data

Shared library read-only code is no problem.
But how about if shared library routines use temporary data variables?
Can't have shared read-write data variables across multiple processes - will get conflict.

Solution - Each process has 1 copy of rw data, and there is 1 global copy of the read-only code.
e.g. When you:

  #include <stdio.h>

you include a declaration of each global variable in your program - so each process will have its own copies of these.
You also include declarations of all the library routines you will be calling at run-time, so we can check your code is correct at compile-time:

 int f ( double, int );

You then may dynamically link at run-time to the single global copy of the read-only definitions of the library code:

 int f ( double x, int i )
 {
  (actual code to implement f)
 }

Program File Types

Your source code - .java, .c, .cxx - text file.
Header file - .h, .hxx - text file, list of declarations, #included.
Library file - .a, .dll (and other extensions) - binary file, compiled code of library functions, linked to.
Object file - .o, .obj - compiled prog, awaiting compile-time linking of library functions to form .exe
Executable file - .exe, .com - ready to execute, either has library functions built in, or will link to them dynamically.
Executable file - .class - ready to execute by standard program (java)

Compiler looks at what library functions you actually use. Strip out declarations that aren't used. Don't need to include definitions of (or link to) functions that aren't called. Make prog.exe as small as possible.

Overlays - Run program that is bigger than memory

Program bigger than memory. But has serial structure. Pass 1, then pass 2. Can load pass 1 out, then load in pass 2 from disk.

Programmer must design it - must break up the program correctly so the 2 pieces do not need each other. Programming overlays is difficult.

This idea replaced on all modern Operating Systems by paging (below) - automatically breaking up the program into an arbitrary no. of pieces and swapping then to/from disk and memory.

Contiguous memory allocation

Prog resides between BASE and LIMIT.
Might have load-time binding: variable "x" maps to "BASE + 760". When prog loaded at BASE = 1400, references to "x" become references to location 2160. This is only done once, at load-time.
Multiple copies of x, each in its different process memory segment.

Can we move the process at run-time to a new contiguous section?
If so, then we can have run-time binding.
Overhead - "x" has to be resolved to "BASE + 760" every time it is referenced, instead of just once at startup.

What if, during execution, a process asks for more memory?

If no process above me, simply extend the LIMIT register.

If process above me, and if load-time binding, then that process can't be moved to extend my space, and I can't be moved into a larger hole. So I can't get more memory. i.e. In general, load-time binding plus contiguous process means you can't ask for more memory after you've started execution. You must ask for all of it up front.

Swapping - Run more programs than size of memory

Often, memory can be full of processes that aren't actually running:

User has gone home for the night, still logged in to a multi-user system from his locked office, leaving web browsers up, editors open etc., ready for him to pick up in the morning.
Even on single user system, many programs left in memory but in background for hours, not doing anything.

Swapping is when we move a process out of memory to disk to clear space in memory so other processes can run. We do not terminate the process. It is just that when it becomes active again there will be a noticeable delay as the memory image is loaded from disk and the process continues executing at the point where it left off.

Ideally we would never swap. Because swapping, involving disk I/O, is a huge overhead. We only do it because memory is scarce and we often run short of memory.

Also, we try to only swap processes that won't be needed for some time, e.g. idle web browsers. Ideally, only swap Waiting processes.

Swapping demands run-time binding

If load-time, processes can't come back in into different space. But another process might have started in that space, and they too might be un-relocatable. A war will start between you, each process being alternately swapped in and out, when you could live together if only you had started in different spaces. Might even be loads of memory free, but you two will still be swapping all the time.

i.e. Effectively swapping can't work at all.

Recall Compile-time, Load-time and Run-time binding.

Compile-time - Multiple processes effectively impossible. (Why?)
Load-time - Swapping effectively impossible.

And swapping is not a luxury - If you don't do it, you get memory entirely full of idle processes. Programs may run for days/weeks, so for any real flexibility need run-time binding and swapping.
The ultimate in run-time binding is paging (see below).

Contiguous memory allocation - Multiple processes

Holes in memory. Have a choice of where to put incoming process.

First-fit - first hole found that is big enough.
Best-fit - Smallest hole that is big enough. Perhaps fit more in if match them to holes nearest in size.
Worst-fit - Largest hole. Idea is to keep large leftover holes. See Ex. 8.5. Worst-fit can backfire - reduces all large holes to medium holes so large processes can't run.

Example

Memory size 2560 K. OS takes 400 K. Incoming processes:

process  memory  time
P1       600     10
P2       1000    5
P3       300     20
P4       700     8
P5       500     15

2160 free.
P1-P3 run, hole of 260, other 2 must wait.
P2 terminates, P4 goes into hole of 1000, leaving holes of 300 and 260, 500 must wait, even though there is enough memory, problem is it's not contiguous.

Fragmentation

External fragmentation - Enough resources (memory) exists to satisfy request, but it is fragmented and hence unusable.

Can easily end up with a small unusable block between every 2 processes. If added together could run several more processes.

Internal fragmentation - We allocate more resources than are actually needed. e.g. We allocate memory in blocks of 16 K, because we have a list of allocated blocks (for OS memory management purposes), and we don't want to need an entry on the list for every single byte in memory (list would have to be 16,000 times as big). As a result, we may allocate some memory that is not actually being used. Internal frag. normally not as much of a problem as external frag., but it can add up.

Compaction

Move processes in memory so end up with just 1 big gap at one of the ends.

Demands run-time binding.

Takes ages. Is exactly the same kind of thing as de-fragging a hard disk.

Modern OS doesn't do this anyway because it doesn't use contiguous memory spaces. Instead (yes, yet again) it uses paging.

Paging

Paging was introduced to UNIX in Berkeley 3 BSD (1978).
No external frag. at all.
Some internal frag., especially if page size large. But not significant.

Paging - Exercise

4 byte pages and frames.

Logical space is 4 pages, pages 0-3:
page 0 is logical addresses 0 .. 3, 4-7 in page 1, .., 12-15 in page 3. Page table:

page frame
0	5
1	6
2	1
3	2

Physical memory 8 frames, frames 0-7:
physical addresses 0-3 in frame 0, .., 28-31 in frame 7.

Q. Where do these logical addresses map to:

Page faults

Page fault - Page not in memory.

Suspend process while we fetch page off disk. May have to remove someone else's page to clear room (page replacement strategy). Restart process from same point.

Page replacement strategy

If memory is full, and we just had a page fault (process needs to bring new page into memory), we need to find a "victim" page (of this or some other process) and remove it from memory. This is dangerous - we may need it again almost immediately. If we get the victim-selection algorithm wrong, we could spend all our time reading and writing pages to/from disk.

Page faults more costly now than ever (in relative terms) since hard disk speed hasn't kept up with CPU/memory advances. There's more disk space, but it takes so long to access. Better for program to use up some CPU time and memory space (big page size, caches) to save on actual accesses to disk.

Q. If we have to remove a page from memory, why do we carefully save some pages, and simply discard others? (e.g. we simply discard pages containing read-only code).
A. We save pages that have changed. We can get the read-only code again from disk, so quicker to discard it.

Virtual memory

We can use demand paging to run programs that are bigger than memory.

How far can we take this concept?
Let total memory (or, say, memory available to this program) consist of 8 frames, and try to run a program that addresses thousands of pages. It could still run if there was intense locality of reference. Even if not, there would "just" be lots of swapping.

How about if memory consists of only 1 frame?
Seems we need at least 2 frames - else can't execute instruction (one page) to access data (other page).

Exercise - How about extremist non-contiguous approach. Say page size = 1 byte, and demand paging. Could this work?

Locality of reference - at any given time, prog only needs say 4-5 pages in memory. A different set of 4-5 pages at different times, but never needs say 20-30 pages in memory at same time. At any given time, is only referring to a localised group of functions and data ("Working set").
If high locality of reference, prog is very suitable for demand paging.

Demand paging is the ultimate in Dynamic Loading.
90 % of program may never be used (Word).
Or at least, 90 % of time spent on 10 % of program code. That's all that needs to be in memory.

A program addresses (in order) pages 1,1,4,1,6,6,6,1,6,1,1,1,6,1,6,6,1, making 17 page requests in a row.

Q. If there are 3 frames always available to it, how many page faults will there be?
A. 3 faults (each first time).

Q. If there is only 1 frame available to it, how many page faults will there be?
A. 11 faults (fault every time addresses new page).

Q. If there are 2 frames always available to it, how many page faults will there be?

Page replacement

Q. The optimal page replacement strategy is to replace the page that will not be used for the longest time. Why can we not actually implement this strategy?

Q. Replacing the Least Recently Used page might be a good approximation of the impossible optimal strategy. But again, why do we not in general implement this strategy?

A. Overhead.

How we approximate a Least Recently Used strategy:
Hardware sets bit when page used. If keep resetting bits, we can get rid of less recently used pages.

Second chance algorithm

All memory references: Hardware sets page bit=1.

Go thru pages in memory (FIFO linked list):
{
	Select victim (according to FIFO)
	if (bit=1)
	{
	 set (bit=0) and move to end of linked list
	 (it now has a 2nd chance to get bit set to 1 again
	  before it rises back to the start of the list)
	 move on to next one (FIFO)
	}
	else if (bit=0)
	 replace victim
}

If page used often enough it stays in memory.