Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:


Using files


File - A named section of disk.

Files implementation: Not necessarily a contiguous section of disk (but that fact may be hidden from users and programs).
Normally both user and programmer never deal with disk directly, but only by calling named files.

In some high-performance application (e.g. writing a high-speed search engine), you may need to implement your own file system, but this is obviously difficult and full of dangers.



File Types



File system divisions

Windows file system can spread over multiple pieces of hardware. Each given its own (single-letter) drive:

 drive:\dir\file
Can also partition a single piece of hardware into multiple drives.

UNIX file system can spread over multiple pieces of hardware too. But everything appears as sub-directories of a single file hierarchy.
Path may indicate hardware, something equivalent to:

 /drive/dir/file
or may hide hardware entirely:
 /dir/file





Hierarchical file system

Can organise files in separate dirs (Many web authors seem not to have discovered sub-dirs!).
Crucial to keep user files separate from system files (Why?).
Windows C:\Users\me
UNIX $HOME
Can reuse same file names in different sub-dirs (like index.html).




Long file names

All modern OS's allow long filenames:
 photos.kenya.apr.1963.html


Legacy systems:




Short file names are good for ..

Short file names are good, though, for:

  1. File names you type. e.g. If you are typing file names at command-line. All-lower-case is easiest to type.

  2. Program names at the command-line (i.e. the program you call has a short filename). sed, grep, ls, cut, etc. All-lower-case is easiest to type.

  3. Some people say also URLs?
    Maybe you should never type URLs. At most you type the host name that you saw somewhere. For everything else you cut and paste, or click.

    Maybe short URLs: http://en.wikipedia.org/wiki/Othello make the web a more pleasant experience than long URLs:
    http://dmoz-odp.org/Arts/Literature/World_Literature/British/Shakespeare/Works/Plays/Tragedies/Othello/.

    It is nice to have short, "guessable" URLs.
    See "URL as UI"
    See URL shortening. (Used e.g. on Twitter.)




Short URLs should probably be used in posters and ads:
This health poster on campus caught my eye.
This probably should use a shorter, and lowercase, URL, like:
hse.ie/chlamydia

Q. Is there still a problem with that URL?



Some web server set-ups generate super-complex URLs, which can then get pasted into documents.
This is apparently a real ad.
From here.


  

Symbolic link (cross-link, breaking the hierarchy, "shortcut") in UNIX


Directory

Can selectively break the hierarchy with shortcuts.

 ln -s dir shortcut
or in Windows see "Create Shortcut".
To see where shortcut leads on Windows: Hover on it.

e.g. On one system I used, there was no /bin dir:

$ ls -l /bin
lrwxrwxrwx   1 root     root           9 Apr 14  1997 /bin -> ./usr/bin

File

Can also just give a file multiple names:
 ln -s file secondname
Or have pointer in one dir to file in other dir.
Can do this on Windows as well (have multiple shortcuts to a data file or program).

On DCU Linux you will see lots of pointers:

$ ls -l /bin      | grep '^l'
$ ls -l /usr/bin  | grep '^l'
 
$ ls -l /usr/bin/touch
lrwxrwxrwx 1 root root 10 May  1  2020 /usr/bin/touch -> /bin/touch

  
Sometimes you see a link like this:
 /bin/ls -> /usr/bin/ls 
Q. Why do programs sometimes call a specific path to a program, e.g. they call /bin/ls rather than just ls ?



Problems with cross-links

With shortcuts, if doing a recursive search of disk, can get infinite loop problems, or at least duplication. e.g. List all files on disk. If follow symbolic links may list files twice.

Q. Also, if delete file, do you delete symbolic link? If so, how do you find them - do you have reverse directory of them? Also, I make symbolic link to other user's file. They delete file. They can't delete my link.
A. If link doesn't work, so what. Might even leave it dangling as reminder.

Security

If your directory is accessible by others on your local machine, someone on your machine can make it readable by the world on the Web (either maliciously or accidentally):

cd     /homes/your-userid/public_html
ln -s  /homes/other-userid/dir          shortcut
The world can then read other user's directory through:
http://host/~your-userid/shortcut/
Has valid uses too. Might want to make one of your own dirs visible without having to have it under public_html, e.g. public_html disk is full, dir is on another disk.

Another example - ftp may only drop you in home directory rather than root directory and you may not be able to go upwards. What you do is put symbolic links in your home directory and you can access any directory through them:

  ln -s /var/mail  email
  ln -s /htdocs    ht




"Hierarchy with some cross-links" a very powerful model

General conclusion is that a basic hierarchy, with some cross-links for difficult points, is excellent way to structure complex data (e.g. Open Directory) - rather than total cross-link free-for-all on one hand (e.g. the Web with just search engines and no directories), or rigid hierarchy on other (e.g. Dewey library system).

Interestingly, family trees are also basically hierarchical, with arbitrary cross-links, rather than strictly hierarchical as many people seem to think.




Recycle bin (Windows)

Windows Recycle bin visible through GUI, but also visible as directory through Windows command line:



Backup

If it's data (1's and 0's), there's no real excuse for losing it. You can make automated copies and store them all over the world. Disk space is big and cheap. Machines are often idle. The network is always on. Backups can be automated across the network by scripts.

In future, backup and long-term storage will be increasingly important service, like a bank.

  1. Removable media - DVDs, CDs, tapes, USB keys, external hard disk.
    v.
  2. Backup to cloud / server. Distributed file system. Network read-write ftp, automated scripts, mirrors.




Other people back you up

Even if you back up nothing, your web pages are being backed up by other people:



Backup policy

  1. Periodically dump entire file system to backup.
    v.
  2. Keep a running "mirror", and only backup things that have changed since last time they were synch-ed.
Perhaps only backup user files.
OS, system and application files can be recovered from install CDs / tapes.

Which of these is the most dangerous:

  1. Keep 1 synchronised copy of your files. Backup the changes every night.
  2. Keep 1 synchronised copy of your files. Backup the changes every hour.
  3. Take a copy of all of your files once a week. Keep all these old copies. Do no backups at all during the week.
  4. Take a copy of all of your files once a month. Keep all these old copies. Do no backups at all during the month.
Remember - it may take days or even months before an intrusion and destruction, or accidental damage, is noticed.
User may realise 2 years later that he has deleted some file and needs it back.




ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.