Link checker
Write a Java program to:
- Take a URL as a command-line argument.
- Catch errors if bad URL or URL not found.
- If good URL, download the page.
- Extract all links in the page.
- See Parsing HTML with Java
- Find all broken links.
- For this exercise, we will narrowly define a "broken" link
as any link with a
HTTP return code of 404,
or a link that times out.
- For timeout settings see
Networking Properties.
- Output is a web page:
-
Output the list of broken links to a web page that you can browse (offline)
and click on the links.
- Use this for debugging.
If your program claims the link is broken, you can test it here.
- Do not bother listing any links to Google.
- Only list URLs with return code 404 or time out.
Do not list other URLs.
- Remove all duplicates.
Test on these URLs:
Your final output should demonstrate your
program working on these URLs:
https://humphryscomputing.com/computers.internet.links.html
https://humphryscomputing.com/news.links.html
https://humphrysfamilytree.com/links.html
https://humphrysfamilytree.com/sources.html
https://humphrysfamilytree.com/sources.local.html
To hand up:
What to hand up
(Include a printout of the
output table
when run on the URLs above.)