Tag Archive : cache

/ cache

Caching Youtube Video with Squid and Nginx

December 9, 2015 | Article | No Comments

In this article we will discuss about how to cache Youtube Video using Squid as Cache Proxy server and Nginx as Web Server. For this article I use:

  1. Squid
  2. Nginx
  3. Ruby
  4. FreeBSD as server OS

The important part on this article would be Squid, Nginx, and Ruby. If you have installed Squid and Nginx under other Operating System, it’s okay. This method will cover the generic method.

On rest of article, I assume you have installed Squid, Nginx and  Ruby. There is no strict version limitation but I recommend to install the latest version you can find.

Advantages and Disadvantages

In most part of the world, bandwidth is very expensive, especially if you live in developing country. In some scenario, having Youtube or other flash videos cached can speed up loading and save bandwidth. As we might know, a single video might cost a dozen Megabytes or even bigger to download. This method will save the video on cache and if other user request the same video it will be loaded from cache. Working with this method allows us to save much bandwidth, if the content requested is same.

People on same LAN, sometimes watch similar videos. If one put youtube video link on Facebook, Twitter, or other social media, and other people would likely watch that video and that particular video gets viewed many times in few hours. Usually the videos are shared over facebook or other social networking sites so the chances are high for multiple hits per popular videos for my LAN users / friends.

While having an advantage, caching youtube video also has disadvantage.

The chances other user watch the same video is not really high. Or even small. If one search some specific on youtube, more than hundreds result come. The chance another user will search for the same thing and will click on the same link of result is smaller when the result or variation of video is high. That’s why we said the probability is small. For example if we search something and has 300 result, using the most basic probability theorem we get that our probability of clicking the same video is 1/300. Quite small.

If you intend to do youtube caching, you will need fast hardware with tons of space to handle such cache.

Squid Configuration

You can either wipe out your Squid configuration or alter it. Assuming you have configuring something else, better we alter it. This article will gives not specific configuration file but rather pin out what command should we give configuration file.

The default configuration file might be located on /usr/local/etc/squid/squid.conf for FreeBSD. It may be vary if you use other Operating System or installing Squid from source. In the rest of article, we will refer the configuration file as squid.conf.

Now let’s alter the squid.conf, you can adjust the setting:

##
# PORT and Transparent Option
##
http_port 8080 transparent
server_http11 on
icp_port 0

# How much days to keep users access web logs
# You need to rotate your log files with a cron job. For example:
# 0 0 * * * /usr/local/squid/bin/squid -k rotate
logfile_rotate 14
debug_options ALL,1
cache_access_log /var/log/squid/access.log
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log

##
# Cacge object size
##
cache_mem 8 MB
minimum_object_size 0 bytes
maximum_object_size 100 MB
maximum_object_size_in_memory 128 KB

##
# Youtube Cache Section
##
url_rewrite_program /usr/local/etc/nginx/nginx.rb
url_rewrite_host_header off
acl youtube_videos url_regex -i ^http://[^/]+\.youtube\.com/videoplayback\?
acl range_request req_header Range .
acl begin_param url_regex -i [?&]begin=
acl id_param url_regex -i [?&]id=
acl itag_param url_regex -i [?&]itag=
acl sver3_param url_regex -i [?&]sver=3
cache_peer 127.0.0.1 parent 8081 0 proxy-only no-query connect-timeout=10
cache_peer_access 127.0.0.1 allow youtube_videos id_param itag_param sver3_param !begin_param !range_request
cache_peer_access 127.0.0.1 deny all

Now create cache dir and set the permission to proxy.

mkdir /cache1
chown proxy:proxy /cache1
chmod -R 755 /cache1

Then initialize the squid cache directories:

squid -z

Nginx Configuration

Same thing here. You can either wipe out your Nginx configuration or alter it. Assuming you have configuring something else, better we alter it. This article will gives not specific configuration file but rather pin out what command should we give configuration file.

The default configuration file might be located on /usr/local/etc/nginx/nginx.conf for FreeBSD. It may be vary if you use other Operating System or installing Nginx from source. In the rest of article, we will refer the configuration file as nginx.conf.

Now let’s alter the nginx.conf, you can adjust the setting:

user www-data;
worker_processes 4;
pid /var/run/nginx.pid;
events {
   worker_connections 768;
}
http {
   sendfile on;
   tcp_nopush on;
   tcp_nodelay on;
   keepalive_timeout 65;
   types_hash_max_size 2048;
   include /usr/local/etc/nginx/mime.types;
   default_type application/octet-stream;
   access_log /var/log/nginx/access.log;
   error_log /var/log/nginx/error.log;
   gzip on;
   gzip_static on;
   gzip_comp_level 6;
   gzip_disable .msie6.;
   gzip_vary on;
   gzip_types text/plain text/css text/xml text/javascript application/json application/x-javascript application/xml application/xml+rss;
   gzip_proxied expired no-cache no-store private auth;
   gzip_buffers 16 8k;
   gzip_http_version 1.1;
   include /usr/local/etc/nginx/conf.d/*.conf;
   include /usr/local/etc/nginx/sites-enabled/*;
# starting youtube section
   server {
      listen 127.0.0.1:8081;
      location / {
         root /usr/local/www/nginx_cache/files;
         #try_files "/id=$arg_id.itag=$arg_itag" @proxy_youtube; # Old one
         #try_files  "$uri" "/id=$arg_id.itag=$arg_itag.flv" "/id=$arg_id-range=$arg_range.itag=$arg_itag.flv" @proxy_youtube; #old2
         try_files "/id=$arg_id.itag=$arg_itag.range=$arg_range.algo=$arg_algorithm" @proxy_youtube;
      }
      location @proxy_youtube {
         proxy_pass http://$host$request_uri;
         proxy_temp_path "/usr/local/www/nginx_cache/tmp";
         #proxy_store "/usr/local/www/nginx_cache/files/id=$arg_id.itag=$arg_itag"; # Old 1
         proxy_store "/usr/local/www/nginx_cache/files/id=$arg_id.itag=$arg_itag.range=$arg_range.algo=$arg_algorithm";
         proxy_ignore_client_abort off;
         proxy_method GET;
         proxy_set_header X-YouTube-Cache "[email protected]";
         proxy_set_header Accept "video/*";
         proxy_set_header User-Agent "YouTube Cacher (nginx)";
         proxy_set_header Accept-Encoding "";
         proxy_set_header Accept-Language "";
         proxy_set_header Accept-Charset "";
         proxy_set_header Cache-Control "";
      }
   }
}

Creating File and Folders

In this article we use specific folders for our needs. Better we create and configure it if you don’t have it before.

mkdir /usr/local/www
mkdir /usr/local/www/nginx_cache
mkdir /usr/local/www/nginx_cache/tmp
mkdir /usr/local/www/nginx_cache/files
chown www-data /usr/local/www/nginx_cache/files/ -Rf

Next create a ruby script nginx.rb:

touch /usr/local/etc/nginx/nginx.rb
chmod 755 /usr/local/etc/nginx/nginx.rb

Then edit the file so it would look like this:

#!/usr/bin/env ruby1.8

require "syslog"
require "base64"

class SquidRequest
   attr_accessor :url, :user
   attr_reader :client_ip, :method

   def method=(s)
      @method = s.downcase
   end

   def client_ip=(s)
      @client_ip = s.split('/').first
   end
end

def read_requests
   # URL <SP> client_ip "/" fqdn <SP> user <SP> method [<SP> kvpairs]<NL>
   STDIN.each_line do |ln|
      r = SquidRequest.new
      r.url, r.client_ip, r.user, r.method, *dummy = ln.rstrip.split(' ')
      (STDOUT << "#{yield r}\n").flush
   end
end

def log(msg)
   Syslog.log(Syslog::LOG_ERR, "%s", msg)
end

def main
   Syslog.open('nginx.rb', Syslog::LOG_PID)
   log("Started")

   read_requests do |r|
      if r.method == 'get' && r.url !~ /[?&]begin=/ && r.url =~ %r{\Ahttp://[^/]+\.youtube\.com/(videoplayback\?.*)\z}
         log("YouTube Video [#{r.url}].")
         "http://127.0.0.1:8081/#{$1}"
      else
         r.url
      end
   end
end
main

At this point, we have successfully configuring our server as video caching proxy. You can start the service now.

Squid is a caching proxy for Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently-request web pages. Squid has extensive access controls and makes a great server accelerator. Run on most available operating system and licensed under GNU GPL.

Squid is used by hundreds of Internet Providers world-wide to provide their users with the best possible web access. Squid optimises the data flow between client and server to improve performance and caches frequently-used content to save bandwidth. Squid can also route content requests to servers in a wide variety of ways to build cache server hierarchies which optimise network throughput.

Thousands of web-sites around the Internet use Squid to drastically increase their content delivery. Squid can reduce server load and improve delivery speeds to clients. Squid can also be used to deliver content from around the world – copying only the content being used, rather than inefficiently copying everything. Finally, Squid’s advanced content routing configuration allows you to build content clusters to route and load balance requests via a variety of web servers.

In this article we will discuss about how to install Squid, gives a simple configuration, and then use it as a local cache server. Our goals is to improves response times and minimizing bandwidth on Slackware64 machine.

I use following:

  1. Slackware64 14.0 with multilib support.
  2. Squid Cache 3.3.3 source code

Obtain the Materials

The only material we need is squid’s source code which can be downloaded from their official site. At the time of writing this article, the latest stable version available is version 3.3.3. The direct download can be made on here.

As stated on site, we need Perl installed on our system. On Slackware64 it, is already installed by default, unless you have uninstalled it before. Make sure Perl is available.

Installation

Create a working directory. You can use any directory you want but in this case I will use my home directory /home/xathrya/squid. The archive we got is squid-3.3.3.tar.xz.

Now extract and configure the makefile.In this article I use /usr/local/squid directory as root of installation which is the default path for installing squid. If you want to install squid on another directory, on ./configure use –prefix=/path/to/new/squid where /path/to/new/squid is a path like /usr After compilation finished, install Squid using root privilege. A complete command to do so is given below:

tar -Jxf squid-3.3.3.tar.xz
cd squid-3.3.3
./configure
make
make install clean

The compilation might took some times, depending on your machine.

Setup

Squid is officially installed on this stage, but we need to do some setup to make it work properly.

Before we proceed we need to specify what are resource allocated to squid and what configuration must we set to meet our need. In my case, the squid can be activated on demand, the directory for caching is using a dedicated partition on /cache (you can also use other directory, and a dedicated partition is not a must) which is 48.0 GiB allocated, squid can use some peer that can be configured dynamically without need for me to change the configuration file directly.

Your need might be different from me, so adjust it yourself.

Create Basic Configuration File

In this example, the configuration file is located at /usr/local/squid/etc/squid.conf, but might be vary if you install squid on different directory than /usr/local/squid. On general, squid configuration file is located on <root directory>/etc/squid.conf

Now adjust your configuration file. Below is the configuration I use:

###############################################################
##
## BlueWyvern Proxy Service
## XGN-Z30A : SquidProxy
##
###############################################################

##
#      Proxy Manager Information
##
cache_mgr [email protected]
visible_hostname proxy.bluewyvern.celestial-being.net

###############################################################

##
#    Basic Configuration
##
cache_effective_user squid
cache_effective_group squid

# DNS server (not required)
# Use this if you want to specify a list of DNS servers to use instead
# of those given in /etc/resolv.conf
#dns_nameservers 127.0.0.1 8.8.8.8

# Set Squid to listens port 1351 (normally listens to port 3128)
http_port 1351

# Timeouts
dead_peer_timeout 30 seconds
peer_connect_timeout 30 seconds

# Load the peer
include /usr/local/squid/peers.conf

###############################################################

##
#    Access Control List
#
#    My machine allow client from self, so IP other than self will be rejected
#    Also define some safe ports
##
acl localnet src 10.0.0.0/8        # RFC1918 possible internal network
acl localnet src 172.16.0.0/12    # RFC1918 possible internal network
acl localnet src 192.168.0.0/16    # RFC1918 possible internal network
acl localnet src fc00::/7       # RFC 4193 local private network range
acl localnet src fe80::/10      # RFC 4291 link-local (directly plugged) machines

acl SSL_ports port 443
acl Safe_ports port 80        # http
acl Safe_ports port 21        # ftp
acl Safe_ports port 443        # https
acl Safe_ports port 70        # gopher
acl Safe_ports port 210        # wais
acl Safe_ports port 1025-65535    # unregistered ports
acl Safe_ports port 280        # http-mgmt
acl Safe_ports port 488        # gss-http
acl Safe_ports port 591        # filemaker
acl Safe_ports port 777        # multiling http
acl CONNECT method CONNECT

#
# Recommended minimum Access Permission configuration:
#
# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager

# Deny requests to certain unsafe ports
http_access deny !Safe_ports

# Deny CONNECT to other than secure SSL ports
http_access deny CONNECT !SSL_ports

# We strongly recommend the following be uncommented to protect innocent
# web applications running on the proxy server who think the only
# one who can access services on "localhost" is a local user
#http_access deny to_localhost

#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#

# Example rule allowing access from your local networks.
# Adapt localnet in the ACL section to list your (internal) IP networks
# from where browsing should be allowed
http_access allow localnet
http_access allow localhost

# And finally deny all other access to this proxy
http_access deny all

###############################################################

##
#    Directory & Logs
#
#    We use /cache for directory
#    I have 48.0 GiB = 51 GB available
#        64 directories, 256 subdirectories for each directory
#
##

# Cache directory 48GiB = 51500MB
cache_dir ufs /cache 51500 64 256

# Coredumps is specified on /cache too
coredump_dir /cache

# Squid logs
cache_access_log /var/log/squid/access.log
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log

# Defines an access log format
logformat custom %{%Y-%m-%d %H:%M:%S}tl %03tu %>a %tr %ul %ui %Hs %mt %rm %ru %rv %st %Sh %Ss

###############################################################

##
#    Other
##
refresh_pattern ^ftp:        1440    20%    10080
refresh_pattern ^gopher:    1440    0%    1440
refresh_pattern -i (/cgi-bin/|\?) 0    0%    0
refresh_pattern .        0    20%    4320

Make user squid and group squid if you don’t have it yet. Then create cache directory if you don’t have any and change ownership to user squid and group squid (or any user and group you assign to squid, see squid.conf). Also I use /var/log/squid directory to log things squid need to.  After all the preparations are ready, we need to do initial setup. Below is the snippet I use to do:

ln -s /usr/local/squid/sbin/squid /usr/bin/squid

/bin/egrep -i "^squid" /etc/group
if [ $? -ne 0]; then
groupadd squid
fi

/bin/egrep -i "^squid" /etc/passwd
if [ $? -ne 0 ]; then
useradd -g squid -s /bin/false -M  squid
fi

if [ ! -d /cache ]; then
mkdir /cache
fi

chown squid.squid /cache

if [ ! -d /var/log/squid ]; then
mkdir /var/log/squid
fi

chown squid.squid /var/log/squid

/usr/local/squid/sbin/squid -z

Now create a file /usr/local/squid/etc/peers.conf and write all peer you want to use.

Creating Scripts

All the system are ready. Now we need to create a control panel script which can execute system. Using this script, I can start and stop squid, and also purge content from cache. The script I use is:

#! /bin/bash
ROOTFOLDER=/usr/local/squid
SQUID=${ROOTFOLDER}/sbin/squid
SQUIDCLIENT=${ROOTFOLDER}/bin/squidclient

case $1 in
"start")
$SQUID start
ifconfig | grep inet
;;
"purge")
$SQUIDCLIENT -h 127.0.0.1 -p 8080 -m purge $2
;;
"stop")
$SQUID stop
;;
esac

Known Issues

Compile & Installation

  • Squid 3.5 onward use GnuTLS instead of OpenSSL. If you are getting error message such as “error: ‘gnutls_transport_set_int’ was not declared in this scope” then chance that you have old GnuTLS installed. Make sure you installed newer version. The safe assumption is to use the latest version. I test GnuTLS v3.3 with Nettle 2.7.1.

Runtime

Social media & sharing icons powered by UltimatelySocial