The Web
Many things can be done on the server-side
to speed up Web performance:
-
Multi-threaded.
Start servicing new client while still responding to last client.
- Cache of (maybe huge numbers of) files in memory.
Disk reads are slow.
So don't make separate disk access for every file request.
Instead maintain cache in RAM of frequently accessed files
and/or small files which are easy to hold in RAM.
OS will cache files in RAM
and Web server can also cache files in RAM.
Example of caching files in RAM
Example: Search my
genealogy site.
Searches text of web pages.
Over 2,400 web pages.
But search is instant.
Think it is caching
every single web page in RAM.
Web pages are text files and so are small compared to images, video.
2,400 pages is only about 20 M total.
Could easily hold all that in RAM.
Entire site is about 60 G.
So the HTML text is less than 1/1000 of the site.
This is normal enough.
- Multiple disks.
Site could be spread over multiple disks
to allow many reads going on at once.
-
Defragmentation
of disks.
Reduce seek times.
- Multiple servers.
"Server farm".
- Content delivery network
- distributed distribution of resources.
Related to how the site is designed:
- Minification
- Do various transforms to JS and other files
to reduce size (reduce download time) and make parsing faster.
Text files tend to be tiny anyway.
-
Bundling of Files.
One network request for a bundled JS file for the page,
instead of 20 network requests for 20 JS files.
Same for CSS - bundle into one CSS file.
Reducing network calls can make a big difference.
Example of file bundling
Example: On my Ancient Brain site,
at time of writing I have 5 JS files for each page,
that I bundle into one JS file
page.js.
And I have 13 CSS files for each page,
that I bundle into one CSS file
main.css.
- Small / low-resolution images (for any images used inline).
Can click to expand.
Definition of "small" changes over time.
For high-demand sites:
Multiple copies of entire site -
"server farm"
- front end routes requests to different CPUs.
Problem: OK to have all (small size) requests come in through one front end
and get routed to searching nodes.
Not OK to have all (large size) replies go back through one front end - bottleneck.
Solution: TCP handoff
- trick to have the searching node reply directly
in a manner that is invisible to client.
The reply load is therefore distributed over all the nodes.
- Caching can be done on server and client.
- Server can cache files in memory.
Could, say, check file date each time file requested. Only do disk read if changed.
- Client can cache files in memory or disk.
Does not ask server. Just uses local copy.
- Server can tell client how long to cache a resource for.
Uses HTTP headers.
- MDN web docs
- Caching on Apache
- How long to cache for?
- Some files change regularly:
As site develops, HTML, CSS, JS might change many times.
- Some files hardly ever change: JPEG might be unchanged for years.
Server logs
HTTP servers can log all accesses.
Can have separate log for errors.
Typical web server logs.
(Apart from being colour-coded.
Normal logs are not colour-coded.)
From
askapache.com.
Shows how the Web has tried to provide a unifying interface to all Internet protocols, data and activities.
Some URL formats.
URI schemes listed above (in use):
-
http:
- plain http.
If intercept traffic can see sensitive data
- being replaced by https
-
file:
- can browse off disk
- very useful
(may not need prefix)
-
mailto:
- very useful
- but spammers search for these
Obsolete:
-
ftp:
- guest login with no password - pre-web publication system
- news: -
usenet
- pre-web discussion system
- now survives on Google Groups
- gopher: - pre-web publication system
-
telnet:
- telnet login to a server as guest
- not used any more
Others (media):
- mms:
(Microsoft protocol)
- old streaming media
- rtsp:
- streaming media
(old RTE streaming)
- rtmp:
- streaming media
(old RTE streaming, also TG4)
- A lot of streaming uses http: and https:
now,
using
adaptive bitrate streaming.
Others (phone):
- wtai:
- old WAP system - can launch phone call
- tel:
- launch phone call
- callto:
- launch phone call / chat
- sms:
- send text to phone
Others:
-
https: - secure http.
After years of being only for serious sites involving shopping, banking, logins,
this is now taking off and becoming the norm for the entire Web.
Mixed content rule:
https should include https content not http content. Browser may enforce this.
- data:
- can include image in page direct, without it being a separate file.
- view-source:
- can make a link like this
to view source
Blocked by Chrome, and yet when you view source, you get a "view-source:" URL. So why block it?
HTTP client
Web browser
Uses MIME types.
(a) Plug-in - Runs inside browser process.
(b) Helper application - Separate process.
Relating one client-server stateless request
with other client-server requests.
Identify user (pay-to-view, register, personalisation).
Shopping carts.
- Cookies
- Server sends data that is stored on client side (in file).
-
Server can only read cookies that it previously sent
(not other site's cookies).
- How to see your cookies
in different browsers.
- PHP cookies
- Security issue:
- Can you spoof someone else's cookie?
-
e.g. For user login. If userid is stored as simple cookie, you could set cookie:
userid = someoneelse
and then log in as them.
- Sessions
are more secure.
Many things can be done on the client-side
to speed up Web performance.
Actually, all of these things, though taking place on the client, involve server support too:
- Client-side caching
- Site-wide (or ISP-wide) cache
via proxy server.
- Lazy load
- of images etc.
- Infinite scroll
- Load more of page on scroll to bottom.
Use with moderation.
See article
about why this is only suitable for some types of sites.
- Delayed loading of resources.
Delayed running of scripts.
Fetch some resources / run some JS only after initial page is rendered.
DCU is (apparently) not using proxy servers any more.
But they are still in use outside DCU.
In DCU,
some machines may
communicate with the outside world through a
proxy server.
Some communicate directly (not through a proxy).
- wwwproxy.computing.dcu.ie
= 136.206.11.243
(forwards requests through 136.206.11.249)
- proxy.dcu.ie
alternates between different IP addresses
(for load balancing)
- port: 8080 or 3128
- lookup
shows it
alternates randomly between:
- 136.206.1.17
- 136.206.1.20
To set proxy, something like:
- Firefox - Tools - Options - Advanced - Network - Settings
- IE - Tools - Options - Connections - LAN settings
You may use a
proxy auto-config (PAC) file:
- https://www.computing.dcu.ie/proxy.pac
- http://proxy.dcu.ie/proxy.pac
Test the IP address other sites see: