ðòïåëôù 


  áòèé÷ 


  óôáôøé 


  ðåòóïîáìøîïå 


  ðòïçòáííù 


íÏÄÕÌÉ Apache 

ðÒÏÞÉÅ ÐÒÏÇÒÁÍÍÙ 


ðéûéôå
ðéóøíá














     ðòïçòáííù :: íÏÄÕÌÉ Apache

[òÕÓÓËÁÑ ×ÅÒÓÉÑ]

mod_uid.c version 1.1

a module issuing the "correct" cookies for counting the site visitors

Download: ftp://ftp.lexa.ru/pub/apache-rus/contrib/

Contents

  1. Copyright
  2. Purpose
  3. Installation (Apache 1.x)
  4. Installation (Apache 2.0.x)
  5. Configuration
  6. Cookie format
  7. What can be written to the log
  8. Why not mod_usertrack
  9. TODO

Copyright

Copyright (C) 2000-2002 Alex Tutubalin, lexa@lexa.ru

May be distributed and used in derived products under the conditions analogous to the Apache License: the author's copyright and the reference to http://www.lexa.ru/lexa must be preserved, and the derived product should not be called mod_uid.

A prototype of this module was written by the author when he was working at Rambler Co.; the present version has been significantly modified.

The author is grateful to Dmitry Khrustalev for valuable advice.

Description

The standard distribution of Apache does not provide adequate means for user tracking (for problems associated with mod_usertrack, see below), and this module provides them.

What it actually does:

  • if the user has provided the cookie header with the correct cookie-name, the module writes this cookie in notes with the name uid_got (accordingly, then it may be written to the log);
  • if the user has arrived without the required cookie, the module issues the SetCookie header for him/her and writes the cookie thus issued in notes with the name uid_set (and this may also be written to the log);
  • if built-in P3P support is included, the P3P header is also issued as the Set-Cookie header is issued.

Advantages:

  • the cookie contains the date it is issued and the "service number" (that is, the number specified during configuring); thus, it helps one understand when the user first arrived at our site and where exactly he/she arrived;
  • multiserver work is supported: under accurate configuring (or its total absence ;), it is guaranteed that the cookie issued to the user will be unique;
  • the cookie issued to the user and the one received from him/her are not mingled in the log file;
  • the cookies are 128 bit long, and one may work with them in the log analyzer (quick search etc.) using ready source code intended for working with IPv6 (for example, libpatricia);
  • support of P3P (minimal) is provided.

Installation

While configuring Apache, add the following to ./configure parameters: --add-module=/path/to/mod_uid.c:
tar xzvf apache_1.3xxx
tar xzvf mod_uid-1.0.xx.tar.gz
cd apache_1.3xx
./configure --prefix=/usr/local/apache ... --add-module=../mod_uid_1.0.xx/mod_uid.c other-params
make
make install

Installation (Apache 2.0)

You should use mod_uid2.c with Apache 2.0.x
Use the apxs program for installation:
tar xzvf mod_uid-1.xx.tar.gz
cd mod_uid-1.xx
/usr/local/apach/bin/apxs -i -c -a mod_uid2.c
This command will compile (-c), install (-i) and activate (-a) mod_uid2 module.

Configuration Directives

All the configuration directives may be specified wherever desired: Server/VirtualServer/Location/... To specify them in .htaccess, one should allow AllowOverride FileInfo (or All).

UIDActive On/Off
Cookie issue turned on/off.
If set to "off", the cookies received from the client are decoded all the same and may be written to the log.
Default: On

UIDCookieName string
Cookie name (default - uid).
The name of the cookie issued to the client. Should not match any other name(s) used at the site.

UIDService number
The "service number" is a strictly positive (nonzero) unique number identifying the given server in the cluster or the given document or document set.
This number is used for two purposes:
  1. If several servers are used within one domain (with the same cookie parameter domain=) or with one hostname, then the use of different UIDService numbers guarantees that the cookies issued by different servers will be unique.
  2. The use of different UIDService numbers for different parts of the server makes it possible to reveal (by log analysis) which of the parts was first visited by the client.
Default: server IP address.

UIDDomain .domain.name
Name of the domain for which the cookie is issued
In multiserver configurations, this directive makes it possible to have a common cookie namespace for all the servers (for example, mail.rambler.ru, www.rambler.ru, and info.rambler.ru use the .rambler.ru domain)
If domain= has to be set to "off" for a certain document set but stay "on" for the server as a whole, one should use UIDDomain none in the corresponding config section (Location/Directory/...).
Default: no domain; that is, the user's browser will return the cookie only to the originating server.

UIDPath string
The path for which the cookie is issued (parameter path= in Set-Cookie:)
Default: /

UIDExpires number
Sets the expiration date for the cookie.
UIDExpires number - number of seconds to be added to the current time.
UIDExpires plus 3 year 4 month 2 day 1 hour 15 minutes - the same expressed in normal human language.
Default: current date plus 10 years.

UIDP3P On/Off/Always
Controls if the P3P header is issued together with the cookie.
Variants:
  • Off - P3P header is not issued;
  • On - issued only if the domain parameter is issued for the cookie;
  • Always - always issued (i.e. even without domain).
Default: Off.
This directive is required for satisfying MS IE6+ in the multiserver configuration and, for example, for including the "counter" code from another server in the page. In case the cookie is issued without domain= or domain includes the current server name for the main document, MS IE6+ with default settings will be satisfied all the same; however, the cookies may be suppressed for compound documents collected from different servers.
mod_uid issues only the P3P header (by default, only with compact policy); support of /w3c/p3p.xml and the like is up to the owner of the server.
The P3P header is issued only if mod_uid issues the Set-Cookie header; that is, if you have to issue other cookies as well and also need P3P for them, the problem of P3P issuing should be solved separately and independently.

UIDP3PString string
Text of the P3P header sent to the client.
Default: CP="NOI PSA OUR BUS UNI"

Cookie Format

The cookie format in the binary form is unsigned int cookie[4], where
    cookie[0] is the "service number" (specified via UIDService);
    cookie[1] is the issue time (unix time);
    cookie[2] is the pid of the process that issued the cookie;
    cookie[3] contains a unique sequencer within the limits of the process (upper 24 bits, starting value 0x030303) and
    the cookie version number (lower 8 bits, now equal to 2).
These 128 bits are converted with respect for the network byte order, encoded (base64) and sent to the client. (In ver. 1, everything was sent in the host order, and support of server clusters with different architectures was thus complicated.)
Uniqueness
Evidently, only insurance can fully guarantee anything. ;) And if more than 2^128 cookies are issued within a single domain, some of them will be duplicate. However, the cookie format was developed in such a way that the cookies must be unique if their number is reasonable.
  1. If the "service number" is unique (each server has its own) within the given domain, different servers will surely issue different cookies.
  2. Inclusion of the issue time and pid in the cookie implies that pids of different processes are not duplicated during one second. This is true for all UNIX systems I know: pids monotonically increase up to a certain maximum (2^16 or higher). That is, cookie[1]/cookie[2] may be duplicated within one server if more than 2^16 fork() is done per second, which is hardly possible in the present state of matters.
  3. The sequencer (the upper 24 bits in cookie[3]) enables one to verify the uniqueness of the cookie within one process during one second. The capacity of the sequencer makes it possible to issue up to 1.0E+07 cookies per second by one process.

What can be written to the log

mod_uid writes one of the following two values to "notes":
  1. if a cookie was received from the client, it is placed in note uid_got;
  2. if a cookie was sent to the client, it is placed in note uid_set.
Cookies are logged as four 32-bit hexadecimal numbers in the host order (in ver. 2, a network-host conversion is performed; in ver. 1, everything is saved "as is" under the assumption that the server architecture did not change since the cookie had been issued). In LogFormat, these notes may be used in the form of \"%{uid_got}n\" and \"%{uid_set}n\", respectively.
Using LogFormat of the type
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
\"%{User-Agent}i\" \"%{uid_got}n\" \"%{uid_set}n  combined_cookie
we'll have approximately this kind of log entries:
Cookie sent to the client:
62.104.212.93 - - [05/Jan/2002:00:02:06 +0300] "GET / HTTP/1.0" 200
13487 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x
4.90)" "-" "ruid=000000013C36184E00009A2100002901" 

Cookie received from the client:
216.136.145.172 - - [05/Jan/2002:00:14:59 +0300] "GET /buttons/but-support-e.gif
 HTTP/1.0" 200 252 "http://apache.lexa.ru/english/meta-http-eng.html" 
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" 
"ruid=000000013C361B5000009A0100009501" "-" 
Such a format is easily understood by widespread log analyzers, including Webtrends, which nicely counts visitors according to such a log.

Why not mod_usertrack from the Apache distribution?

Because it has several drawbacks:
  • it does not strictly guarantee that the same cookie will not be issued to two users, although, of course, the probability of such an event is minimized due to consideration of getpid(), remote_ip, and time up to milliseconds;
  • it does not support multiserver work, and the probability of issuing identical cookies increases in this case;
  • one might wish to see the cookie sent to the user also in the log, and see it separately, whereas mod_usertrack mingles them;
  • one might wish to see the "service number" (see above) in order to understand which of our services was visited by the user during his first visit.

TODO

  1. Support of various formats (Netscape/Cookie/Cookie2, as in mod_usertrack), but only if it becomes really necessary - and so far I haven't noticed any such necessity.
  2. There is a vague suspicion that the sequencer increment should be surrounded with mutexes at multithread-apache and multiprocessor computers.

 




Copyright © Lexa Software, 1996-2009.