Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:


Link checker


Write a Java program to:
  1. Take a URL as a command-line argument.
  2. Catch errors if bad URL or URL not found.
  3. If good URL, download the page.
  4. Extract all links in the page.
  5. See Parsing HTML with Java
  6. Find all broken links.

  7. For this exercise, we will narrowly define a "broken" link as any link with a HTTP return code of 404, or a link that times out.
  8. For timeout settings see Networking Properties.

  9. Output is a web page:
    • Output the list of broken links to a web page that you can browse (offline) and click on the links.
    • Use this for debugging. If your program claims the link is broken, you can test it here.
    • Do not bother listing any links to Google.
    • Only list URLs with return code 404 or time out. Do not list other URLs.
    • Remove all duplicates.



Test on these URLs:

Your final output should demonstrate your program working on these URLs:

https://humphryscomputing.com/computers.internet.links.html
https://humphryscomputing.com/news.links.html
https://humphrysfamilytree.com/links.html
https://humphrysfamilytree.com/sources.html
https://humphrysfamilytree.com/sources.local.html


To hand up:

What to hand up (Include a printout of the output table when run on the URLs above.)


ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.