Fast Master Dataset Expansion

                            1999-08-12


Changes: 1999-08-12
   - some typos fixed; added URL for 3rd bear;

Changes: 1999-08-10
   - tighter wording on definition;
   - explanation of second inflation;
   - explanation of code needed to do subsequent inflations;

This note proposes an enhancement to IMAGE which would
allow for very fast (linear time) capacity increases of master datasets.


Background:
   There are two ways to expand a master dataset today: the
   traditional "rehashing" method, and the MDX method.

   The rehashing method requires that each old entry be read, and a
   new hash value calculated to determine the entry's new location.
   Because of the nature of hashing, the locality of the old entries
   may not be preserved, resulting in the new entries being distri-
   buted in varying manners, potentially increasing the cost (time) 
   of the rehashing dramatically.  Rehashing has the potential of 
   reducing chain lengths, and also the potential for increasing chain 
   lengths.  Depending upon data layout, the time required for later 
   DBPUTs to find a free entry can become extremely large (This is 
   known as the "third bear of IMAGE", and is described  by the 
   originator of the term, Fred White, at 
   http://www.adager.com/TechnicalPapers.html). 
   This happens when a new entry hashes to the start of a large run 
   on contiguous in-use entries.

   The MDX method involves using secondary entries above the
   original capacity (now also called the "hashing capacity",
   and utilizes a free chain / highwater mark for that area,
   just like detail datasets.  Using MDX (as opposed to rehashing)
   guarantees that chain lengths will continue to grow.  However,
   the time required to find a free entry is short.

   The "hashing algorithm" is:

         hash_value := (computed 32-bit value) mod original_capacity + 1;

         primary_entry_location := hash_value;

   (The computed-32 bit value is the direct I/J value in some cases,
    and a more complicated value in other cases ... but we're not
   interested in the internals of this value here.)


Proposal:

   Introduce a new kind of "expansion", called "inflation".

   Every master has a scale factor, which is initially 1.

   The "hashing algorithm" would be changed to:

         hash_value := (computed 32-bit value) mod original_capacity + 1;
             (i.e., no change in the above)

         primary_entry_location := hash_value * scale_factor;
                                                 (round down)

   (Note: deflation, where the scale_factor is less than 1,
    would not be allowed.  See Discussion #2 below for reasons.)


Discussion #1: doubling the capacity (scale_factor of 2)

   Let's see how this works for the easy case, where we are
   inflating the dataset by a factor of 2 (e.g., capacity of 1000
   being inflated to a capacity of 2000).  In this case, we say
   that the scale_factor is 2.

   Let's assume we have five entries in the original dataset:

      entry  99 ... a primary, linked to entry 101
      entry 100 ... a primary, linked to entry 102
      entry 101 ... a secondary (primary is 99)
      entry 102 ... a secondary (primary is 100)
      entry 103 ... a primary with no synonyms.

   Note that there are no free entries between 99 and 103.

   When we expand the capacity via the inflation method (with
   a scale_factor of 2), those entries are moved to:

      entry 198 ... a primary, linked to entry 202 (old  99)
      entry 200 ... a primary, linked to entry 204 (old 100)
      entry 202 ... a secondary (primary is 198)   (old 101)
      entry 204 ... a secondary (primary is 200)   (old 102)
      entry 206 ... a primary with no synonyms.    (old 103)

   Note that there are now free entries at ever other slot,
   thus allowing room for growth (attacking the "third bear
   of IMAGE" problem).

   How complicated was this expansion?  Trivial!

   The expansion can be done with a single sequential read of the
   old master, with each entry being written to the newly computed
   location to the new master.  We also update the pointers in
   the synonym chain area of each entry, inflating them with the
   same algorithm.

   Note that each new entry gets written to a higher entry number
   than the prior new entry.  I.e., the entry ordering is maintained!

   We are not, in any way, "rehashing"!

   Unlike a "rehash" expansion, we never have to work hard to figure out
   where to put each entry.


Discussion #2: increasing capacity by 50%  (scale_factor of 1.5)

   What if the scale factor was 1.5?   Using the same five
   entries from above (the ones at 99..103), we would get:

      entry 148 ... a primary, linked to entry 151 (old  99)
      entry 150 ... a primary, linked to entry 153 (old 100)
      entry 151 ... a secondary (primary is 148)   (old 101)
      entry 153 ... a secondary (primary is 150)   (old 102)
      entry 154 ... a primary with no synonyms.    (old 103)

   Ok...some of those were easy (e.g., 100 -> 150).  What
   happened to 99, 101, and 103?

        99 * 1.5 = 148.5, so we round down to 148.
       101 * 1.5 = 151.5, so we round down to 151.
       103 * 1.5 = 154.5, so we round down to 154.

   Scale factors of less than 1 would be disallowed (because they
   would result in two different hash values being deflated into the
   same entry value).


Discussion #3: second inflation

   What happens if we originally inflated by 50% (scale_factor 1.5)
   (which resulted in a capacity of 1500),
   and *then* we come back and inflate by 100% (scale_factor of 2),
   resulting in a capacity of 3000?

   The second inflation would do:

      entry 297 ... a primary, linked to entry 303 (old 148) (old  99)
      entry 300 ... a primary, linked to entry 306 (old 150) (old 100)
      entry 303 ... a secondary (primary is 297)   (old 151) (old 101)
      entry 306 ... a secondary (primary is 300)   (old 153) (old 102)
      entry 309 ... a primary with no synonyms.    (old 154) (old 103)

   The dataset's scale factor is now 3!

   It's clear how we got 300 and 306 (150 * 2, and 153 * 2), but
   how did we get 297, 303, and 309?


   We couldn't simply take the current entry number times the new
   scale factor (e.g., 148 * 1.5), because the following *MUST*
   hold true:

       A primary entry must reside at the expected location.
       If the cumulative scale factor is 3, and the hash_value
       is 98, then the primary *MUST* be at 297 (99 * 3).

   When we did the first expansion, 99 went to 148 ... which
   was a *TRUNCATED* value.  (99 * 1.5 = 148.5)

   So, to calculate the new entry location we always have to
   do the following:

        orig_entry# := some_function (current_entry#)

        new_entry# := orig_entry#  * new scale_factor;
                                             (round down)

   How do we obtain the original entry number?  Relatively
   simple:

       orig_entry# := current_entry# / prior_scale_value
                      (perhaps plus 1)

   (The "perhaps plus 1" is because we don't know whether the
   current_entry# was the result of rounding down or not.)

   The actual code to compute it is:

           test := trunc (current_entry_num / prior_scale_value);

           if trunc (test * prior_scale_value) = current_entry_num then
              orig_entry_num := test

           else if trunc ((test  + 1) * prior_scale_value) = current_entry_num then
              orig_entry_num := test + 1

           else
              complain ... shouldn't happen.

   I've tested the above code for a large number of cases and it appears
   to work.


Other topics

   Other changes needed in IMAGE code:

      a) Storing (and reporting) original and current primary capacities.
         (E.g., DBUTIL, QUERY, DBINFO)

      b) Updating the "capability mask" to include inflation.
         (This would prevent older IMAGEs from opening databases with
         inflated datasets.)

   Third party tools should be made aware of the change...both
   database modifying tools and programs like SUPRTOOL and QueryCalc.

   The average chain length is not affected whatsoever.  (This could
   be good, or it could be bad.)


Extra Enhancement

   In some cases, it should be possible to inflate some secondary
   entries into locations one higher than normally would be inflated.

   E.g., with a scale_factor of 2, a secondary entry at 100
   could be relocated to 201 instead of 200.  Why do this?  Because
   it leaves 200 (the inflated primary 100) free for a new primary.
   It's easy to do, because we know when we're adjusting pointers
   for the various entries that any pointer we encounter is a pointer
   to a secondary, thus when converting a forward pointer to 100, we
   automatically know that 100 is a secondary.


Summary

   NOTHING IS REHASHED!  As a result, the expansion (inflation) is
   extremely quick.

   The benefits are basically twofold:

      a) reduction of third bear problem, because "ventilation holes"
         have been added, making DBPUT faster (until they fill up,
         of course)

      b) it's a method of expansion that requires a single sequential read of
         the old master, and a single sequential write of the new master,
         with no other (disk) work involved!


Stan & Gavin