How to write a Shell script to download videos from YouTube
Here is a challenging Shell script exercise.
Write a program to download a video from YouTube.
In Shell script.
From first principles.
Introduction
Ever since YouTube started video hosting in
2005,
it has been possible to write
a Shell script to download a video from YouTube.
As at 2020, it is still possible.
It is a nice demo of the power of Shell scripts.
There are lots of
YouTube downloaders
out there,
but it is a good exercise
to write one yourself
from first principles.
You may be surprised that one can
be written in Shell.
Usage
Usage will be like:
youtube (url)
For example:
youtube https://www.youtube.com/watch?v=rQZfCd9BOJE
Find the
MP4 file.
Download it to y.mp4
Many formats may exist.
Testing
URL of the page
-
The URL you see in your browser:
https://www.youtube.com/watch?v=ID
is a permanent "home page" for the movie, with comments, related movies, etc.
It is not the URL of the movie itself.
However if we look inside the
HTML source of this page, we can find the URL of the movie.
The movie may only exist at that URL for the next few minutes.
URL of the video
Where is the URL of the actual video
(as opposed to the URL of the page)?
In fact there are multiple video URLs for different formats.
They are buried deep inside the source code.
They look like this:
"https:\/\/SERVERID.googlevideo.com\/videoplayback?ARG=VALUE\\u0026ARG=VALUE\\u0026 ..."
They are delimited by double quotes.
Fixing the URL
The URL of the video looks a bit strange.
It seems like we need to make some changes to it:
- "https:\/\/" looks like it should be changed to "https://"
- "\/" looks like it should be changed to "/"
- If you are familiar with how URLs do arguments,
"\\u0026" looks like
it should be changed to ampersand.
In fact, that is pretty much it.
Do these fixes and you can in fact fetch the video.
Recipe that currently works
This recipe works as at 2020:
- Take in the URL of the page as a
command-line argument.
- Use
wget
to get the web page.
Top tip:
Get the web page once.
Save to a file.
Then when debugging, use that file,
without going to fetch the page from YouTube again.
When you have debugged the program, you can fix it so it always fetches the page.
- Use
sed
to put a new line in front of every "http".
This is to isolate the "http" lines.
- Then
grep
for "googlevideo"
-
Use
tr
to change all double quotes (") to new lines.
- grep for "googlevideo" again.
- grep for
"videoplayback"
Now we have a short list of URLs.
But the URLs need some editing.
-
Use
sed
to change all "\\u0026" to "&"
Top tip: "&" means something to sed.
So use "\&" which means "literally the ampersand character".
Also notice we have
"\" in the first pattern (the pattern to search for). That has special meaning.
You can either fix that now, or note that it will be fixed by the next step.
See what the URLs look like now. Do they look normal yet?
-
Use
tr
to delete the "\" character.
OK now do the URLs look normal?
- You now have a listing of multiple video URLs
with different "itag" values.
The itag values look like
itag=VALUE
-
Pick URL to download based on this
guide to itags:
YouTube video stream format codes
I suggest this one:
- itag=18 (MP4 360).
Save as file.mp4
Alternatives include:
- itag=5 (FLV).
Save as file.flv
- itag=22 (MP4 720).
Save as file.mp4
Not all formats always exist.
|
- We pipe the above to a grep for the itag we want.
- We now have a single URL, that looks something like this:
https://r1---sn-q0cedn7s.googlevideo.com/videoplayback?expire=1584569784&ei=V0lyXpLcOpuwxN8Pr8ynuAI&ip=136.206.217.30&id=o-AIqQ_-mxoy2Hncpz_rfUDe5HbfwbhqfhkvNhmTuYIQen&itag=18&source=youtube&requiressl=yes&mh=9w&mm=31%2C26&mn=sn-q0cedn7s%2Csn-5hne6nsr&ms=au%2Conr&mv=m&mvi=0&pl=16&initcwndbps=1503750&vprv=1&mime=video%2Fmp4&gir=yes&clen=752255&ratebypass=yes&dur=22.453&lmt=1559497970109948&mt=1584548075&fvip=1&c=WEB&txp=5431432&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cgir%2Cclen%2Cratebypass%2Cdur%2Clmt&sig=ADKhkGMwRgIhAIQYDX0NVV_9eQX57RzjTNKe4wPBWAXwdzhGcRGw7fxrAiEA7W2dAd6aZGw9edUHEDgLAanvI5Bm98WWVfrux7O9xmk%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=ABSNjpQwRgIhAM2L6hyS3JFtbQ6M5F7bGi8grfz6MNOb_EZ2cPtLwbB4AiEAy_LiHMqu3DI1_DCqTjTlZ0ykwq7l1wN3vy36Eudgdco%3D
- wget the URL to output to a file like file.mp4 and you are done!
Play video
Video can be played in various ways, depending on installation:
- In browser.
Use file://
Or put video in web directory and use http://
- "Videos"
- VLC.
Might have to change:
Tools - Preferences - Audio - Output Type - UNIX OSS audio
- RealPlayer
The script can launch the player automatically:
vlc file &
Links