Git Product home page Git Product logo

mod_uid's Introduction

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <title>mod_uid.c, version 1.0</title>
  </head>
  <body bgcolor="#ffffff">

    <h1 align=center >mod_uid.c version 1.0</h1>
    <center>a module issuing the "correct" cookies for counting the
      site visitors<p></center>
    <h3>Contents</h3>
    <ol>
      <li><a href="#1">Copyright</a>
      <li><a href="#2">Purpose</a>
      <li><a href="#3">Installation (Apache 1.x)</a>
      <li><a href="#3a">Installation (Apache 2.0.x)</a>
      <li><a href="#4">Configuration</a>
      <li><a href="#format">Cookie format</a>
      <li><a href="#5">What can be written to the log</a>
      <li><a href="#6">Why not mod_usertrack</a>
      <li><a href="#7">TODO</a>
    </ol>
    <a name="1"></a>
    <h3>Copyright</h3>
    
    Copyright (C) 2000-2002 Alex Tutubalin, [email protected] <p>
      
      May be distributed and used in derived products under the conditions
      analogous to the
        <a href="http://www.apache.org/LICENSE.txt">Apache License</a>: 
      the author's copyright and the reference to 
       <a href="http://www.lexa.ru/lexa/">http://www.lexa.ru/lexa</a>
       must be preserved, and
       the derived product should not be called <b>mod_uid</b>.<p>

      A prototype of this module was written by the author when he was
      working at <a href="http://www.rambler.ru">Rambler Co.</a>; the
      present version has been significantly modified.<p>

      The author is grateful to <b>Dmitry Khrustalev</b> for valuable advice.<p>

      <a name="2"></a>
    <h3>Description</h3>

The standard distribution of Apache does not provide adequate means for 
user tracking (for problems associated with <b>mod_usertrack</b>, see 
<a href="#6">below</a>), and this module provides them. 

<p>
What it actually does:
<ul>
 <li> if the user has provided the cookie header with the correct cookie-name,
the module writes this cookie in notes with the name <tt>uid_got</tt>
(accordingly, then it may be written to the log);
 <li> if the user has arrived without the required cookie, the module
issues the SetCookie header for him/her and writes the cookie thus issued
in notes with the name <tt>uid_set</tt> (and this may also be written 
to the log);
 <li> if built-in P3P support is included, the P3P header is also
issued as the Set-Cookie header is issued.
</ul>
<p>
Advantages:
<ul>
 <li>the cookie contains the date it is issued and the "service number"
(that is, the number specified during configuring); thus, it helps one
understand when the user first arrived at our site and where exactly
he/she arrived;
 <li>multiserver work is supported: under accurate configuring
   (or its total absence ;), it is guaranteed that the cookie issued
   to the user will be unique;
 <li>the cookie issued to the user and the one received from him/her are
not mingled in the log file;
 <li>the cookies are 128 bit long, and one may work with them in the
log analyzer (quick search etc.) using ready source code intended for
working with IPv6 (for example, <TT>libpatricia</TT>);
 <li>support of P3P (minimal) is provided.
</ul>

      <a name="3"></a>
<h3>Installation (Apache 1.x)</h3>
While configuring Apache, add the following to ./configure parameters:
    --add-module=/path/to/mod_uid.c:
<pre>
tar xzvf apache_1.3xxx
tar xzvf mod_uid-1.0.xx.tar.gz
cd apache_1.3xx
./configure --prefix=/usr/local/apache \
... --add-module=../mod_uid_1.0.xx/mod_uid.c other-params
make
make install
</pre>

      <a name="3a"></a>

<h3>Installation (Apache 2.0)</h3>
You should use mod_uid2.c with Apache 2.0.x<br>
Use the <b>apxs</b> program for installation:
<pre>
tar xzvf mod_uid-1.xx.tar.gz
cd mod_uid-1.xx
/usr/local/apach/bin/apxs -i -c -a mod_uid2.c
</pre>
This command will compile (-c), install (-i) and activate (-a)
    mod_uid2 module.

      <a name="4"></a>

<h3>Configuration Directives</h3>

All the configuration directives may be specified wherever desired:
Server/VirtualServer/Location/... To specify them in <tt>.htaccess</tt>, 
one should allow AllowOverride FileInfo (or All).<p>

    <dl>
      <dt><b>UIDActive</b> On/Off</dt>
      <dd>Cookie issue turned on/off.<br> If set to "off", the
        cookies received from the client are decoded all the same and may be
        written to the log.<br>
        <i>Default:</i> On
      </dd>
      
      <dt><p></dt>
      <dt><b>UIDCookieName</b> string</dt>
      <dd>Cookie name (<i>default</i> - uid).<br>
        The name of the cookie issued to the client. Should not
        match any other name(s) used at the site.
      </dd>
      <dt><p></dt>
      <dt><b>UIDService</b> number</dt>
      <dd>The "service number" is a strictly positive (nonzero)
        unique number identifying the given server in the cluster or
        the given document or document set.<br>
        This number is used for two purposes:
        <ol>
          <li>If several servers are used within one domain 
          (with the same cookie parameter <tt>domain=</tt>)
           or with one hostname, then the use of different
          <b>UIDService</b> numbers guarantees that the cookies issued
           by different servers will be unique.
          <li>The use of different <b>UIDService</b> numbers for
          different parts of the server makes it possible to reveal
          (by log analysis) which of the parts was first visited by
          the client.
        </ol>
        <i>Default:</i> server IP address.
      </dd>
      <dt><p></dt>
      <dt><b>UIDDomain</b> .domain.name</dt>
      <dd> Name of the domain for which the cookie is issued<br>
        In multiserver configurations, this directive makes it
        possible to have a common cookie namespace for all the servers
        (for example, mail.rambler.ru, www.rambler.ru, and info.rambler.ru 
        use the .rambler.ru domain)<br>
        If <tt>domain=</tt> has to be set to "off" for a certain document
        set but stay "on" for the server as a whole, one should use 
        <code>UIDDomain none</code> in the corresponding config 
        section (Location/Directory/...).<br>
        <i>Default:</i> no domain; that is, the user's browser will
        return the cookie only to the originating server.
      </dd>
      <dt><p></dt>
      <dt><b>UIDPath</b>  string</dt>
      <dd>The path for which the cookie is issued (parameter <tt>path=</tt> in
        Set-Cookie:)<br>
        <i>Default:</i> /
        </dd>
      <dt><p></dt>
      <dt><b>UIDExpires</b> number</dt>
      <dd>
      Sets the expiration date for the cookie.<br>
      <code>UIDExpires number</code> - <tt>number</tt> of seconds to be
       added to the current time.<br>
      <code>UIDExpires plus 3 year 4 month 2 day 1 hour 15
        minutes</code> - the same expressed in normal human language.
      <br>
      <i>Default:</i> current date plus 10 years.
      </dd>
      <dt><p></dt>
      <dt><b>UIDP3P</b> On/Off/Always</dt>
      <dd>Controls if the P3P header is issued together with the 
        cookie.<br>
        Variants:
        <ul>
            <li>  Off - P3P header is not issued;
             <li> On - issued only if the <tt>domain</tt> parameter
             is issued for the cookie;
          <li>  Always - always issued (i.e. even without <tt>domain</tt>).
        </ul>
        <i>Default:</i> Off.<br>
        This directive is required for satisfying MS IE6+ in the
        multiserver configuration and, for example, for including 
        the "counter" code from another server in the page. In case
        the cookie is issued without <tt>domain=</tt> or <tt>domain</tt>
        includes the current server name for the main document, MS IE6+
        with default settings will be satisfied all the same; however, 
        the cookies may be suppressed for compound documents collected from
        different servers.<br>
        <b>mod_uid</b> issues only the P3P header (by default,
        only with compact policy); support of /w3c/p3p.xml and the like
        is up to the owner of the server.<br>
        The P3P header is issued only if <b>mod_uid</b> issues the Set-Cookie
        header; that is, if you have to issue other cookies as well and
        also need P3P for them, the problem of P3P issuing should be solved
        separately and independently.
        </dd>
      <dt><p></dt>
      <dt><b>UIDP3PString</b> string</dt>
      <dd>Text of the P3P header sent to the client.<br>
        <i>Default:</i> CP="NOI PSA OUR BUS UNI"
        </dd>
</dl>

      <a name="format"></a>

<h3>Cookie Format</h3>

   The cookie format in the binary form is
   <tt>unsigned int cookie[4]</tt>, where<br>
<ul>
   cookie[0] is the "service number" (specified via <b>UIDService</b>);<br>
   cookie[1] is the issue time (unix time);<br>
   cookie[2] is the <b>pid</b> of the process that issued the cookie;<br>
   cookie[3] contains a unique sequencer within the limits of the process
             (upper 24 bits, starting value 0x030303) and<br> 
             the cookie version number (lower 8 bits, now equal to 2).<br> 
</ul>
These 128 bits are converted with respect for the network byte order,
encoded (base64) and sent to the client. (In ver. 1, everything was sent
in the host order, and support of server clusters with different
architectures was thus complicated.)<br>

<h4>Uniqueness</h4>

Evidently, only insurance can fully guarantee anything. ;) And if
more than 2^128 cookies are issued within a single domain, some of them
will be duplicate. However, the cookie format was developed in such
a way that the cookies must be unique if their number is reasonable.

<ol>
<li> If the "service number" is unique (each server has its own)
within the given domain, different servers will surely issue
different cookies.
<li> Inclusion of the issue time and <b>pid</b> in the cookie implies
that <b>pid</b>s of different processes are not duplicated during
one second. This is true for all UNIX systems I know: <b>pid</b>s
monotonically increase up to a certain maximum (2^16 or higher). That 
is, cookie[1]/cookie[2] may be duplicated within one server if more
than 2^16 <b>fork()</b> is done per second, which is hardly possible 
in the present state of matters.
<li> The sequencer (the upper 24 bits in cookie[3]) enables one
to verify the uniqueness of the cookie within one process during one
second. The capacity of the sequencer makes it possible to issue
up to 1.0E+07 cookies per second by one process.
</ol>

      <a name="5"></a>

<h3>What can be written to the log</h3>

<b>mod_uid</b> writes one of the following two values to "notes":
<ol>
<li>if a cookie was received from the client, it is placed in note
        <tt>uid_got</tt>;
<li>if a cookie was sent to the client, it is placed
in note <tt>uid_set</tt>.
</ol>
Cookies are logged as four 32-bit hexadecimal numbers in the host order
    (in ver. 2, a network-host conversion is performed; in ver. 1,
    everything is saved "as is" under the assumption that the server
    architecture did not change since the cookie had been issued).

In <b>LogFormat</b>, these notes may be used in the form of \"%{uid_got}n\" 
    and \"%{uid_set}n\", respectively.<br>
Using LogFormat of the type
<pre>
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{uid_got}n\" \"%{uid_set}n  combined_cookie
</pre>
we'll have approximately this kind of log entries:
<pre>
Cookie sent to the client:
62.104.212.93 - - [05/Jan/2002:00:02:06 +0300] "GET / HTTP/1.0" 200
13487 "-" "Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x
4.90)" "-" "ruid=000000013C36184E00009A2100002901" 

Cookie received from the client:
216.136.145.172 - - [05/Jan/2002:00:14:59 +0300] "GET /buttons/but-support-e.gif
 HTTP/1.0" 200 252 "http://apache.lexa.ru/english/meta-http-eng.html" 
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" 
"ruid=000000013C361B5000009A0100009501" "-" 
</pre>
Such a format is easily understood by widespread log analyzers, including
<b>Webtrends</b>, which nicely counts visitors according to such a log.

      <a name="6"></a>
<h3> Why not <b>mod_usertrack</b> from the Apache distribution? </h3>

 Because it has several drawbacks:
<ul>
 <li> it does not strictly guarantee that the same cookie will not
be issued to two users, although, of course, the probability of
such an event is minimized due to consideration of <tt>getpid()</tt>, 
<tt>remote_ip</tt>, and time up to milliseconds;
 <li> it does not support multiserver work, and the probability of
issuing identical cookies increases in this case;
 <li> one might wish to see the cookie sent to the user also in the log, 
and see it separately, whereas <b>mod_usertrack</b> mingles them;
 <li> one might wish to see the "service number" (see above) in order
to understand which of our services was visited by the user during his
first visit.
</ul>

      <a name="7"></a>
<h3>TODO</h3>
<ol>
<li>
 Support of various formats (Netscape/Cookie/Cookie2, as in
 <b>mod_usertrack</b>), but only if it becomes really necessary -
 and so far I haven't noticed any such necessity.
<li> There is a vague suspicion that the sequencer increment
should be surrounded with mutexes at multithread-apache and
multiprocessor computers.
</ol>


<!-- $Id: README.html,v 1.6 2002/04/20 16:56:19 lexa Exp $ --> 
</body>
</html>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.