In this tutorial you'll see how you can track your website's visitors and getting information such as user-agent or referrer. Also, we'll be using reverse DNS to find out more interesting information about his IP.
For this tutorial you’ll need a PHP server and a MySQL database.
First let’s create a MySQL table where we will store information about the visitors:
CREATE TABLE `logs` (
`VisID` int(10) unsigned NOT NULL auto_increment,
`VisIP` varchar(15) NOT NULL default '',
`VisRef` varchar(255) NOT NULL default '',
`VisUrl` varchar(255) NOT NULL default '',
`VisDate` datetime NOT NULL default '0000-00-00 00:00:00',
`VisAgent` varchar(255) NOT NULL default ''
PRIMARY KEY (`VisID`)
) TYPE=MyISAM ;
We need these fields to store the IP of the visitor, the referrer, the URL he’s currently visiting, the date of the visit and the user-agent (browser).
From the PHP script you need to connect to the MySQL database using the appropriate credentials, and select the database where the logs table can be found:
mysql_pconnect("meServer", "meUsername", "mePassword") or die (mysql_error());
mysql_select_db("meDatabase") or die ("An error occured. Could not select database.");
Now on every page on which you want the visitor to be tracked, you include the following PHP line (you will probably want to insert it in header.php or some other file which you include in all your pages):
mysql_query("INSERT INTO logs(VisIP, VisRef, VisUrl, VisDate, VisAgent) VALUES(\"".$HTTP_SERVER_VARS['REMOTE_ADDR']."\", \"".$HTTP_SERVER_VARS['HTTP_REFERER']."\", \"".$HTTP_SERVER_VARS['REQUEST_URI']."\", NOW(), \"".$HTTP_SERVER_VARS['HTTP_USER_AGENT']."\")");
Here we simply insert data into the table. Let’s see what the server variables ($HTTP_SERVER_VARS[]) can do:
$HTTP_SERVER_VARS[‘REMOTE_ADDR’] – Gets the visitor’s IP.
$HTTP_SERVER_VARS[‘HTTP_REFERER’] – Gets the URL that referred the visitor to the current page; so if the visitor clicks your link from http://www.geekpedia.com, you will get http://www.geekpedia.com. This way you know where your visitors come from.
$HTTP_SERVER_VARS[‘REQUEST_URI’] – The current URL that’s being visited.
$HTTP_SERVER_VARS[‘HTTP_USER_AGENT’] – The User-Agent gives information about the browser, plugins and extensions installed.
Now what’s left is retrieving data from the database and analyzing the visitors. We do this on another page (which you will probably want to password protect):
// We need this function later, to convert from a MySQL DATETIME to our own date format
function ToDateTime($datestamp, $format)
{
if ($datestamp!=0)
{
list($date, $time)=split(" ", $datestamp);
list($year, $month, $day)=split("-", $date);
list($hour, $minute, $second)=split(":", $time);
$stampeddate=mktime($hour,$minute,$second,$month,$day,$year);
$datestamp=date($format,$stampeddate);
return $datestamp;
}
}
// Get the values from the database, order by visitor IP so that we have them grouped by visitor
$LogQ = mysql_query("SELECT VisID, VisIP, VisRef, VisUrl, VisAgent, VisDate FROM logs ORDER BY VisIP");
echo "<table border='1' cellpadding='8'>";
echo "<tr>";
echo "<td align='center'><b>ID</b></td>";
echo "<td><b>IP</b></td>";
echo "<td><b>Host</b></td>";
echo "<td><b>URL</b></td>";
echo "<td><b>Referrer</b></td>";
echo "<td><b>User Agent</b></td>";
echo "<td><b>Date</b></td>";
echo "</tr>";
// Store the current date in a variable, in the Year-Day-Month format
$CurrDate = date("Y")."-".date("d")."-".date("m");
// Loop through the values in the database
while($LogF = mysql_fetch_array($LogQ))
{
// Store the date of the visit in a variable
$Date = $LogF['VisDate'];
// Compare the current date to the date in the database
//
if it matches we highlight the row since it means the visit was made recently
if($CurrDate == ToDateTime($Date, "Y-d-m"))
{
$NewBGColor = "#FFD7D7";
}
else
{
$NewBGColor = "";
}
// Reverse DNS
$Host = gethostbyaddr($LogF['VisIP']);
// Highlight visits from the same IP with the same color
if($LogF['VisIP'] != $LastIP)
{
if($BGColor == "#FFFBEA")
{
$BGColor = "#FFF3CA";
}
else
{
$BGColor = "#FFFBEA";
}
}
echo "<tr bgcolor='".$BGColor."'>";
// Unique ID for the visitor
echo "<td bgcolor='".$NewBGColor."'>".$LogF['VisID']."</td>";
// Link the IP to a WhoIs service. If clicked you get more information about the IP
echo "<td><a href='http://www.whois.sc/".$LogF['VisIP']."' target='_blank'>".$LogF['VisIP']."</a></td>";
// If the reverse DNS function returns a domain containing microsoft.com, highlight with red
if(strstr(strtolower($Host), "microsoft"))
{
echo "<td><font color='#CC0000'><b>".$Host."</b></font></td>";
}
// If it's my hostname (or IP since some IPs don't reverse DNS), use a distinct color
else if($Host == "myhost.astral.ro")
{
echo "<td><font color='#999999'>".$Host."</font></td>";
}
// Else simply show the reverse DNS result
else
{
echo "<td>".$Host."</td>";
}
// Show the page the visitor was visiting at the time; since it's relative don't forget to add the domain
echo "<td><a href='http://www.geekpedia.com".$LogF['VisUrl']."' target='_blank'>".$LogF['VisUrl']."</a></td>";
// If the referrer is specified, link to it
if($LogF['VisRef'] != "")
{
echo "<td><a href='".$LogF['VisRef']."' target='_blank'>".$LogF['VisRef']."</a></td>";
}
else
{
echo "<td> </td>";
}
// Display the user agent in one of the cells
echo "<td>".$LogF['VisAgent']."</td>";
// Display the date and time when the visit was made
echo "<td>".$Date."</td>";
echo "</tr>";
// We need to store the last IP so that we can highlight the same IPs
$LastIP = $LogF['VisIP'];
}
echo "</table>";
Many of the features of this script are optional. For example the one that highlights rows that were visited in the current day can easily be removed.
This script will normally run fast, however there’s something that’s slowing it down a lot – the gethostbyaddr() function which does the reverse DNS on the IP.
For those who don’t know, reverse DNS helps you find out the domain from which the visitor is visiting (this can reveal the ISP, proxy, or company the user is visiting from). For example an IP such as 198.116.144.49 might not say much, however doing a reverse DNS will reveal nasa.gov. So if your script takes a lot to load when you have many visitors stored in the table, don’t worry, it will eventually load but it’s currently doing reverse DNS on each and every IP. This can be optimized a bit by skipping reverse DNS if the previous IP is the same as the current IP.
Some folks may say reverse DNS can be done before entering the values in the database, however that will slow down the page loading when the visitors are browsing your website.
Say we want to monitor visitors from Microsoft Corporation. We know their IPs reverse DNS to tidexxx.microsoft.com, where xxx is a number from 0 to 999, such as tide181.microsoft.com. Also, some IPs from Microsoft Corporation resolve to research.microsoft.com (their Research team). In the script we monitor these visitors using strstr(strtolower($Host), "microsoft"
) which checks to see if microsoft can be found in the resolve DNS string. microsoft.com should work even better. You can use this method to track the Google crawler, since the Google crawler resolves to domains such as crawl-66-249-64-28.googlebot.com, so using gooblebot.com instead of microsoft.com will bring you the desired result. The Google crawler and several other crawlers can also be tracked by the user agent (Googlebot/2.1 (+http://www.google.com/bot.html)).
You will need to delete these logs from time to time or they’ll get really big. You can use a cron job, or you can do it manually by placing the following line in a different page which you can access to empty the table:
mysql_query("TRUNCATE TABLE logs");
Here’s how the page looks like when having 3 visits stored in the table made by 2 visitors (one visited a page, and another visited 2 pages):