<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jeff Beard &#187; Systems Administration</title>
	<atom:link href="http://jeffbeard.org/category/computing/systems-administration/feed/" rel="self" type="application/rss+xml" />
	<link>http://jeffbeard.org</link>
	<description>Blog.blog</description>
	<lastBuildDate>Sun, 01 Jan 2012 17:51:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Intercept HTTP requests with Squid</title>
		<link>http://jeffbeard.org/2011/04/intercept-http-requests-with-squid/</link>
		<comments>http://jeffbeard.org/2011/04/intercept-http-requests-with-squid/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 13:08:53 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Systems Administration]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[log requests]]></category>
		<category><![CDATA[log web traffic]]></category>
		<category><![CDATA[squid]]></category>
		<category><![CDATA[transparent proxy]]></category>

		<guid isPermaLink="false">http://jeffbeard.org/?p=260</guid>
		<description><![CDATA[On one of my projects we had some questions about how much bandwidth was being used by requests to a third party service but we didn&#8217;t have any a view beyond general traffic on the network interface. I hit upon the idea of using a transparent proxy to log requests then use log analysis to [...]]]></description>
			<content:encoded><![CDATA[<p>On one of my projects we had some questions about how much bandwidth was being used by requests to a third party service but we didn&#8217;t have any a view beyond general traffic on the network interface. I hit upon the idea of using a transparent proxy to log requests then use log analysis to break out data transfer amounts per third party service. And since we already had <a href="http://www.squid-cache.org/">squid</a> as part of our infrastructure applications it seemed like a good choice. </p>
<p>The tricky part of this setup is that everything is hosted on the same hardware node and we also have some web services that needed to be left untouched. These requirements implied some network configuration using <a href="http://www.netfilter.org/">iptables</a> to force outbound web requests through the proxy.<br />
<span id="more-260"></span><br />
So the first thing I needed to do was install squid. On this project we use <a href="http://www.centos.org/">CentOS</a> on all our hosts so this was easily accomplished like this:</p>
<p><code>sudo yum install squid</code></p>
<p>Next was adjusting the configuration. The default squid.conf comes with lots of documentation which is helpful but makes the configuration file difficult to read and navigate so the first thing I did was get rid of it like so:<br />
<code><br />
cd /etc/squid &#038;&#038; sudo cp squid.conf squid.conf.orig &#038;&#038; sudo egrep -v'^#' squid.conf > /tmp/squid.conf<br />
</code></p>
<p>This leaves a lot of empty lines in the file which can be removed like this:</p>
<p><code><br />
sudo sed '/^$/d' /tmp/squid.conf > /tmp/squid.clean &#038;&#038; sudo mv /tmp/squid.clean squid.conf<br />
</code></p>
<p>Next up was setting up networking and squid.</p>
<p>The squid site has a great set of examples, <a href="http://wiki.squid-cache.org/ConfigExamples/Intercept/LinuxLocalhost">one of which</a> looked like it suit my purposes nicely. </p>
<p>First I configured squid by adding this directive:</p>
<p><code>http_port 3128 transparent</code></p>
<p>Then I started squid:</p>
<p><code>sudo service squid start</code></p>
<p>I also wanted to make sure squid starts when the system is rebooted:</p>
<p><code>sudo chkconfig --levels 2345 squid on</code></p>
<p>Next up was network configuration. </p>
<p>I needed to setup iptables with some NAT rules to force requests through the proxy server. The first command clears out any existing rules. If you already have a custom kernel network config, use this with caution:</p>
<p><code>sudo iptables iptables -t nat -F </code></p>
<p>The next rule is for a typical transparent proxy setup. In the setup that I was working with I did not need this rule, something I discovered by disabling the existing web sites with this command. So if you have a web server <strong>DO NOT</strong> do this:</p>
<p><code><br />
sudo iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 80 -j REDIRECT --to-port 3128<br />
</code></p>
<p>Here is the start of the iptables configuration we implemented. </p>
<p>Apply the rules to force local HTTP traffic through the transparent proxy:</p>
<p><code><br />
gid=`id -g squid`<br />
sudo iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner --gid-owner $gid -j ACCEPT<br />
sudo iptables -t nat -A OUTPUT -p tcp --dport 80 -j DNAT --to-destination HOSTIP:3128<br />
</code></p>
<p>Replace the string &#8220;HOSTIP&#8221; with the IP address of the host you&#8217;re configuring.</p>
<p>At this point I needed to test the setup so I tailed the access log. Or at least I tried to. The default directory permissions on the /var/log/squid directory prevented me from viewing its&#8217; contents. I fixed that with this:</p>
<p><code>sudo chmod 0775 /var/log/squid</code></p>
<p>Then I was able to tail the /var/log/squid/access.log. So I created a request to see if it was logged:</p>
<p><code>wget http://www.google.com</code></p>
<p>I saw the request logged in the squid access log so I was satisfied that it and the networking were functional. However the log format wasn&#8217;t what we needed to feed to <a href="http://awstats.sourceforge.net/">awstats</a>, which is what I going to use to process the log.</p>
<p>Since we already use squid and process its&#8217; logs I grabbed the configuration from our production configuration file:</p>
<p><code><br />
logformat combined %>a %ui %un [%{%d/%b/%Y:%H:%M:%S %z}tl] "%rm %ru HTTP/%rv" %Hs %<st "%{User-Agent}>h" %Ss:%Sh<br />
access_log /var/log/squid/access.log combined<br />
</code></p>
<p>Then I restarted squid and tested again. It looked good so I tested one of our batch processes that makes HTTP requests to make sure that it did what I wanted. It did however I noticed that query strings from the URI were not being logged. A quick google told me that I needed to update the squid.conf with this:</p>
<p><code>strip_query_terms off</code></p>
<p>As it turns out squid squid strips query string after the &#8220;?&#8221; by default. This is apparently to &#8220;protect privacy&#8221; but we needed the query string to identify individual requests more accurately. </p>
<p>At this point I had the system setup and working. It logged all the outbound HTTP requests and the existing web services remained unaffected. All that was left to do was setup awstats to process the logs.</p>
]]></content:encoded>
			<wfw:commentRss>http://jeffbeard.org/2011/04/intercept-http-requests-with-squid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unix CLI Command Repository</title>
		<link>http://jeffbeard.org/2009/02/unix-cli-command-repository/</link>
		<comments>http://jeffbeard.org/2009/02/unix-cli-command-repository/#comments</comments>
		<pubDate>Thu, 05 Feb 2009 15:48:45 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[Computing]]></category>
		<category><![CDATA[Systems Administration]]></category>

		<guid isPermaLink="false">http://jeffbeard.org/?p=38</guid>
		<description><![CDATA[I just found the Command Line Fu command repository via reddit.com. Tons of very useful commands for a variety of tasks. For those that might not be a *nix sysadmin, you&#8217;re missing the joy of firing off scripts from the command line that can do an extraordinary amount of work in a short amount of [...]]]></description>
			<content:encoded><![CDATA[<p>I just found the <a href="http://www.commandlinefu.com/commands/browse">Command Line Fu</a> command repository via<a title="reddit" href="http://www.reddit.com"> reddit.com</a>. Tons of very useful commands for a variety of tasks.</p>
<p>For those that might not be a *nix sysadmin, you&#8217;re missing the joy of firing off scripts from the command line that can do an extraordinary amount of work in a short amount of time. I was a veritable hero early in my career when I was able to help a client replace a copyright string in some 5000 files and create backups with this one:</p>
<p><code>find . -type f -name '*.html' | xargs grep -l '&amp;copy;' | xargs perl -pi.bak -e 's/&amp;copy;1997/&amp;copy;1999/g'</code></p>
<p>Or something like that.</p>
<p>Regardless, the client was amazed and happy that one could work such magic in just a few minutes and the content in the Command Line Fu repository looks chock full of opportunities to amaze folks with your wizardry.</p>
]]></content:encoded>
			<wfw:commentRss>http://jeffbeard.org/2009/02/unix-cli-command-repository/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Web Site, Again</title>
		<link>http://jeffbeard.org/2009/01/new-web-site-again/</link>
		<comments>http://jeffbeard.org/2009/01/new-web-site-again/#comments</comments>
		<pubDate>Sun, 01 Feb 2009 01:14:35 +0000</pubDate>
		<dc:creator>Jeff</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Systems Administration]]></category>
		<category><![CDATA[Add new tag]]></category>
		<category><![CDATA[hosting]]></category>
		<category><![CDATA[slicehost]]></category>
		<category><![CDATA[ubuntu]]></category>
		<category><![CDATA[webhosting]]></category>

		<guid isPermaLink="false">http://jeffbeard.org/?p=3</guid>
		<description><![CDATA[I&#8217;ve setup a new blog on a new server for a variety of reasons, mostly having to do with it being a terrible idea to host on my home network. I&#8217;ve chosen a hosting service provider rather than hosting it on my own hardware. I&#8217;m using Slicehost for now. It&#8217;s fairly inexpensive (not the cheapest [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve setup a new blog on a new server for a variety of reasons, mostly having to do with it being a terrible idea to host on my home network. I&#8217;ve chosen a hosting service provider rather than hosting it on my own hardware. I&#8217;m using <a href="http://www.slicehost.net">Slicehost</a> for now. It&#8217;s fairly inexpensive (not the cheapest by far), has a good amount of positive buzz around it and they use operating system virtualization. This is cool since I get full control of my &#8220;slice&#8221; (<a title="Ubuntu" href="http://www.ubuntu.com">Ubuntu</a>) just like a regular server but upgrades are done through a web-based management console so I don&#8217;t have to drive to a Denver hosting facility to do maintenance. If I need to upgrade memory or storage I can do so with a click or two. Plus I can take snapshots of the entire slice which can be used to rebuild the whole VPS through a nifty administrative console. They don&#8217;t offer much storage but I plan to use <a title="Amazon S3" href="http://aws.amazon.com/s3/">Amazon S3</a> for storage on the cheap.</p>
<p>Anyway, so far so good.</p>
<p>Next up is moving my 25 web sites to the new server! (Who knew I was so rich in Internet property!)</p>
<p>Anyway, I&#8217;ll be posting up my old articles and files over the next couple of weeks but in the meantime here&#8217;s a nice picture:</p>
<p><a class="tt-flickr tt-flickr-Medium" title="The Flatirons" href="http://www.flickr.com/photos/bippity/3113560932/"><img class="alignleft" src="http://farm4.static.flickr.com/3189/3113560932_842ed2f82d.jpg" alt="The Flatirons" width="350" height="288" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://jeffbeard.org/2009/01/new-web-site-again/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using memcached
Page Caching using memcached
Object Caching 266/284 objects using memcached

Served from: jeffbeard.org @ 2012-02-05 13:16:29 -->
