Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:


Files Implementation


Opening a file

When file first referenced, directory structure must be searched. Does file exist. Where is file actually stored on disk. Do I have permission to read (write) it. May also take the opportunity to read the first n locations, assuming they will be used, rather than (inevitably) make the disk seek later. This data is then cached so search does not have to be done again. e.g. In C++:


#include <iostream.h>
#include <fstream.h>

{
 ifstream f ( "dir/file53.txt", ios::in );

 if ( ! f )
 {
  cout << "Can't open file. \n";
  exit(1);
 }

// else just use the f pointer 
// no longer refer to it by file name (don't do search again)

 f >> x;		// read 1st line into variable 

}	// end of scope closes file (remove cache info in memory)




File system mounting

Mounting a file system is like opening a new file, for entire file system:
i.e. Things that need only be done once:

Where do we access the hardware. Where do we attach the new file system (assign a unique drive number, assign point in hierarchy). Does it contain a valid file system in expected format.




Contiguous allocation

Advantages:
Fast searching. Can quickly jump to block n for direct access (e.g. binary file with fields). Don't have to move disk head back in forth. Especially if writing sequential data, head is always at correct position.

Disadvantages:
Get fragmentation problems like with memory. 1 G of disk space free but no more than 50 M in a row.
Defragmentation can take hours.

  


Non-contiguous allocation




Linked Allocation

1st block links to 2nd block, 2nd links to 3rd, etc., the link field for the last block contains an EOF marker (in fact, EOF may be half-way through the last block - internal fragmentation).

To expand file, simply grab free block anywhere and link it on end of list.

  1. Links actually held with blocks on disk. Problem: Lot of disk reads to go to n'th block of file. Can't go there direct. Must follow the chain.

    or:

  2. Centralised Linked List.
    e.g. FAT (DOS/Win). File Allocation Table is a table of all blocks on disk, each entry is the no. of the next block in this file or EOF. Directory entry for a file contains no. of the start block of the file. Can just search thru the index in the FAT to go to n'th block of file - don't have to read the actual blocks on disk. Still have to follow a chain, but might be able to cache the whole FAT in memory - follow chain in memory rather than with disk-reads.

    Free-blocks list in FAT is rather elegant - it is one enormous linked list linking together all unused blocks. When delete a file, just remove directory entry and hook its linked list of blocks onto end of free list. Minimal work to delete:

    1. Remove directory entry and its pointer to start block.
    2. Make end of free list now point to start block.

    Windows NT family replaces FAT with NTFS but provides backward compatibility.




Indexed Allocation

Centralised index also, like FAT. (i.e. Data not scattered round disk.)
In this case, each file has a list of blocks that it is held in. e.g. File's index is:   9, 16, 1, 10, 25  
Just a variable-size array.
Benefit - Direct access to n'th block. Direct access is good - Easier to append. Or rewrite after location 300 K (only 2nd half of file has been edited).

Problem - Index is proportional to size of file.
Solutions:

  1. Linked indexes. 1st index gives first 100 locations, and pointer to 2nd index which has next 100 locations and pointer to 3rd index. Problem - possible long chain to search through.

  2. Multi-level indexes - More efficient. 1st index tells you, given address, which 2nd-level index to go to. May have to go to 3rd-level index to get actual block no. But, because of exponential growth, end up being able to access a 200 M file in only 3 levels of searches, whereas linked would take forever. Problem: Always need 3 levels of searches, even for small files. And 90 % of files are small files anyway. Hence:

  3. Combined scheme (e.g. UNIX inode)
    - Direct access to index for say first 50 K, then multi-level index for rest of file. Ends up can still deal with large files when necessary, yet 90 % of accesses are done directly.
Indexes, like FAT, stored on disk when machine turned off. FAT may be cached in memory at boot-up. Index for a file may be cached in memory by Open() system call.




Disk is large. I/O speed is the problem.

Also along these lines - pre-allocate a big chunk of disk to FAT / inodes (pre-allocated even if disk empty). Can reserve a good chunk of disk for these structures if they help fast I/O.

We also cache file information from Open() system call and disk blocks in memory. This is good - avoid disk I/O again. But limited because memory is limited. Block replacement algorithm like page replacement in demand paging - perhaps use LRU to decide who to kick out of cache.

Just like memory info sometimes ends up on disk (swapping, demand paging), so disk info sometimes ends up in memory (disk-block cache). Disk is overflow of crowded memory. Memory is not overflow of disk - disk is not crowded. Rather, memory is fast mirror of bits of disk that we are working on.
e.g. Read block, keep in memory, if read again can return the info immediately without any I/O either now or later.

Technically, system does not have to actually write blocks to disk until a read request comes in for that file (from this process or other process). But this can be dangerous - if power goes off we may lose data that we thought was saved to disk long ago. In reality write-to-disk is asynchronous but not delayed indefinitely. Actually will happen quite quickly if nothing else going on.




Keeping track of file size

How do we always know how big a file is?

file_size = (no. of blocks - 1) (size of block)
              + (how many bytes used in last block)
Last block has data (binary or text) ending with EOF / Ctrl-D char half-way through block.
If blocks have pointers to next block, last block has a NULL / -1 / EOF pointer.




UNDELETE - Part One

Many systems now have a "soft" delete, where "deleted" files are moved to a "Trash can", "Waste basket" or "Recycle bin", but this bin is not emptied until the user asks for it to be emptied. The files can be recovered at any time.

Obviously, if the files can be recovered, then the blocks they occupy on disk cannot be marked free. Rather, they are still in use.

Exercise - On a UNIX system where a Recycle bin may not exist, how would you implement your own?




UNDELETE - Part Two

When you finally empty the Recycle Bin, the blocks are finally marked free. However, even after this "real" DELETE action, the files still exist on disk!

Why? Because there is no need for the OS to actually overwrite those blocks on disk. Doing so would take time, and is not necessary. All that is necessary is that the blocks are now marked free and available for use by other files. Generally, DELETE does the absolute minimum necessary to mark the space free and move on.

One unfortunate consequence of this is that no OS really deletes your files, and anyone who is willing to put a bit of effort in might be able to recover long-ago deleted data from your disk.

Question - How would you implement a "delete forever" command?

Exercise - In the above, when you invented a "Recycle bin" for UNIX, how would you implement a "Empty Recycle bin"? (e.g. If you have over-ridden the system "rm" command, then how do you really rm things?)




UNDELETE in a FAT system (DOS/Windows)

In a FAT, UNDELETE is easy. Recall above - DELETE leaves the linked list of blocks intact (or, to be precise, entries in the FAT that can be interpreted as a linked list). It just hooks the linked list on to the end of the free-blocks linked list. One (perhaps unexpected) consequence of this is that the file is easy to recover. Just find the start-block and we can recover the whole file. Because we hook to the end of the free-list, the blocks may not be overwritten for a long time.

And in fact the old directory entry may still exist. All DELETE did was strike out the 1st char of the filename to mark the entry free. The no. of the start block is still intact. UNDELETE can list the marked-free directory entries, user supplies the 1st char of the filename, and UNDELETE is often able to recover the whole file.

If dir entry reused / overwritten, have to search every block on disk by hand. Although when find start block, can use FAT to find next blocks.
- If it was recently that you deleted, you could start search at end of free-blocks list.




Is UNDELETE a bad thing?

Perhaps UNDELETE being easy is an advantage of a FAT. Or perhaps it is a disadvantage. e.g. In the Monica Lewinsky case people under investigation were startled to discover that old deleted files of theirs could be undeleted and recovered from their PCs.

Perhaps an OS should make clear to the user if their data still exists, and "Empty the Trash Bin" should mean "Delete forever", even if it takes a long time. Perhaps have a "Shredder" icon. Could even say "Shredding these files will take 2 hours. Continue?"

It gets worse. Data must have been in memory sometime. Might have been in swap space. Could it still exist in swap file on disk? Conclusion - If selling computer, the only real security is to remove and destroy the hard disk first.




UNDELETE in UNIX

Lot more difficult. The blocks do not refer to each other. If the list of them is lost, we may have to read every single block on disk to recover the file - no idea where it is.

And the list is lost, because the inode index data structures are immediately reused by other files.

Summary - after "rm" on UNIX, the data blocks still exist on disk (vulnerable to overwriting of course), but to UNDELETE you might have to read the entire disk (i.e. every block) by hand.

If you delete a file and it is not in an old backup, it's effectively gone, unless you have the resources to pay someone to search disk by hand (and file is that important).




ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.