ENCHILADA #4: Expanded Root File Stan Sieler sieler@allegro.com 2000-03-06 Background: Many years ago (5? more?), I proposed expanding the IMAGE root file to hold extra information. This information would include, but would not be limited to, things like: - list of files associated with the database; - DB tool vendor information (e.g., date of last repack) - measurement data Since then, other enhancement suggestions have been proposed, many of which would require remembering extra information about a database. In some cases, it's a small amount of information; in some cases it's a large amount. At some point, the SIEC lumped a bunch of enhancement requests together and properly noticed that they all had something in common: a need to remember new data about a given database....the ENCHILADA (coined by Steve Cooper). This proposal addresses the need for handling small and medium amounts of database-local storage. Philosophy: Propose an implementation that meets the needs of a number of vendors (and/or HP), which imposes minimal constraints upon the programmer, and minimum workload on HP, and yet provides the most power possible. The proposal should comfortably handle between 0 and 10 MB of data. Proposal: Expand the rootfile to hold tagged blocks of data, which are stored after the end of the current root file contents. Expose (or add & expose) some routines to find a block (based on a matching tag) and manipulate it. These tags and blocks would use the tics_expandable_header_type implemented for the Jumbo dataset project. (I.e., each block of data is of arbitrary size, with a prefix of 32 bytes of type tics_expandable_header_type). The tag tells us who created the data block ("the creator ID"), and what its usage is (the "header ID"). For example, one header ID is eh_id_filelist, which tells us that the data block is a list of files associated with the database. Another header ID is eh_id_vendor, which tells us that IMAGE does *not* know much about the data ... but that the associated vendor does! Note: the pair header_ID/creator_ID is unique. If a given vendor wants multiple blocks, various means exist within this proposal to accomodate that desire. I feel that ensuring uniqeness will help to avoid possible problems later. The tag/data blocks may appear in any order, and are specifically not guaranteed to be in any particular order. Additionally, the order might change from time to time for a given database. A previously unused word, perhaps userlabel #0, word 62, is the file system record number of the start of this new area. (Let's call it root'first'exh'rec for "Root first expandable header record number") Each user (internal to HP and external) is assigned a small unique integer (an "id"). When dealing with data of this type, we need to address: - adding a block of data; - finding a block of data; - deleting a block of data; - modifying a block of data; - replacing a block of data; (Obviously, modifying/replacing (and shrinking/enlarging) could be implemented as delete & add, if necessary.) A new bit in the root_features_used flag would be set to "ON" if any tag/data blocks exist. Usages We (SIEC?) would define several "pre-defined" usage types, for things that we anticipate being commonly used. For example, the format of the file list for header ID eh_id_filelist will be well defined, to allow various utilities/vendors to access and update the list of files associated with the database. Vendors, and other utility writers (including HP), would be free to have their own unique data (the layout/meaning of which is known only to them) as well. Changes to existing HP datatypes: type tics_expandable_header_ids = ( eh_id_stopper, {0} eh_id_unknown, {1} eh_id_jumbo, {2 : Jumbo Datasets} --> eh_id_filelist, {3 : list of files associated} --> eh_id_vendor, {4 : vendor specific} last_eh_id); {*always* the last} tics_eh_creator_ids = ( ehc_unknown, {0} ehc_chunkset, {1} {chunkset routines} ehc_dbchange, {2} ehc_dbutil, {3} ehc_adager, {4} ehc_bradmark, {5} ehc_hicomp, {6} ... ); New routines: ti_find_tag_block ( anyvar f_status : hpe_status_a2; {OUT} options : tics_chunk_ctl_options_type; {IN} root_fid : shortint; {IN} header_id : tics_expandable_header_ids; {IN} creator_id : tics_eh_creator_ids; {IN} var tag_ptr : globalanyptr; {OUT} var data_ptr : globalanyptr; {OUT} {Cloned from existing ccui_find_chunkset, with about 3 lines changed} Searches the tag/blocks for the first block that satisfies the search criteria: if header_id <> 0, then the header_id must match the one in the tag if creator_id <> 0,then the creator_id must match the one in the tag If found, tag_ptr will point to the beginning of the tag area, and data_pointer points to the beginning of the data area. ti_get_nth_tag_block ( anyvar f_status : hpe_status_a2; {OUT} options : tics_chunk_ctl_options_type; {IN} root_fid : shortint; {IN} nth : integer; {IN} var tag_ptr : globalanyptr; {OUT} var data_ptr : globalanyptr; {OUT} Retrieves the nth tag/block (first one is nth=1). If found, tag_ptr will point to the beginning of the tag area, and data_pointer points to the beginning of the data area. If there are less than nth entries, a warning is returned and tag_ptr and data_ptr are set to nil. ti_delete_tag_block ( anyvar f_status : hpe_status_a2; {OUT} options : tics_chunk_ctl_options_type; {IN} root_fid : shortint; {IN} var tag_ptr : globalanyptr; {IN} var data_ptr : globalanyptr); {IN} Note: if the caller must either have exclusive access to the root file or LOCK access (and FLOCK must succeed). ti_add_tag_block ( anyvar f_status : hpe_status_a2; {OUT} options : tics_chunk_ctl_options_type; {IN} root_fid : shortint; {IN} header_id : tics_expandable_header_ids; {IN} creator_id : tics_eh_creator_ids; {IN} var data_ptr : globalanyptr; {IN} bytes : integer) {IN} Adds the specified data block to the root file. "Bytes" is the size of the data portion. This value is rounded up by IMAGE to a multiple of 4 before being stored in the new tag. Note: if a tag/data block with a matching header_id/creator_id already exists, this add is rejected. Note: if the caller must either have exclusive access to the root file or LOCK access (and FLOCK must succeed). etc. -------------------------------------------------------------------------- Arguments Pro/Con For this section, let's quickly number the various proposals: 1- A database for all databases on the system, later amended to a database per group. 2- An external privileged mode file for each database using ENCHILADA. 3- Storing the metadata in the database, in either hidden or visible datasets and items. 4- Extending the root file and storing the data in this extension area. (this proposal) (Apologies to Steve...the numbers above were suggested by Denys, and do not reflect chronological ordering.) Note that proposals #2 and #3 overlap to some extent (because a new dataset is an external priv mode file). Implementation cost: Proposal #1 has a high cost Proposal #2 has a medium cost Proposal #3 has a medium to high cost. (Security implications, for example, contribute) Proposal #4 is partially done...and it has the lowest cost of all proposals ... by a wide margin. I speak as a TurboIMAGE implementor here! STORE transparency (and other backup/copy/rename/purge tools) Proposal #1 causes great problems for backup/copy/rename/purge tools. Proposal #2 causes medium problems for backup/copy/rename/purge tools. Proposal #3 causes minor to medium problems for backup/copy/rename/purge tools. Proposal #4 no problems Database Modification Tools: Proposal #1 (dealt with under STORE above) Proposal #2 (dealt with under STORE above) Proposal #3 can cause major problems, particularly if a new dataset is used. (E.g., the tool must be modified to prevent changing/deleting the dataset) Proposal #4: minor problems (tools that re-build the root file must copy the old tag/data blocks to the new root file) (a simple task) IMAGE run time routines: Proposal #1 Depending upon implementation, ranges from none to major impact. Proposal #2 Depending upon implementation, ranges from none to major impact. Proposal #3 If implemented as a "hidden" dataset, has a medium impact on some run time routines (e.g., DBOPEN, DBFIND, DBGET, DBINFO, DBCONTROL). The intrinsics must be modified to "not see" the dataset as appropriate. Is the hidden dataset #1? #N+1? If the latter, what happens when a new set is added by a DB tool? Proposal #4: no impact IMAGE limits: Proposal #1: no impact (other than a possible limit to the number of entries the global database might have) Proposal #2: no impact Proposal #3: medium impact. Some databases are already at the 199 set limit. Although the number of sets is increasing to 250, it would appear that this proposal always takes at least one potential dataset away from the user. Proposal #2: no impact Data access: Proposal #1 Major work. Locking strategy is necessary. Deadlock / stall strategy must be implemented. New routines must be written to access the data. What about data allocation, deallocation, re-use, compaction? Proposal #2 Medium work. New routines must be written to access the data. What about data allocation, deallocation, re-use, compaction? Proposal #3 Medium work. Standard IMAGE intrinsic access can be used, which at first glance makes one incorrectly think "no work". But: the use of IMAGE intrinsics means that all the data is returned in the same size/shape... which means that some users of the data will have to invent their own mapping techniques. For example, assuming the dataset has records with a single X100 field, the associated-file-list users would have to decide: how do we store three 40 character filenames? How do we store one 110 character filename? Proposal #4: Low to medium work. New routines must be written to access the data. However, some area based on existing code, and the others are quite easy to implement. The advange to the user is obvious: no data reblocking/repacking is necessary. Portability (to other HP 3000s): Proposal #1 causes *MAJOR* problems with cross-machine portability. We'd have to solve the "how do I add data to the system-wide database during RESTORE" problem, for example. Proposal #2 minor work Proposal #3 no work Proposal #1 no work Backwards compatibility (transporting new database with ENCHILADA data to an older machine): Proposal #1 no impact Proposal #2 no impact Proposal #3 possibly major impact. Depending upon how the hidden dataset is implemented in the root file structures, it may cause problems on the older machine. Proposal #4 no impact Mistake Tolerance: (e.g., can users screw up the ENCHILADA data, either on the original machine, or on an old (pre-ENCHILADA) machine?) Proposal #1 little possibility. (The data could be lost by an old copy/restore tool.) Proposal #2 some possibility. User could lose the file during a RESTORE (via older RESTORE product). Proposal #3 high possibility. On old machine, user might be able to easily modify data. Old backup/copy tool might loose a hidden dataset. User might be able to DBUPDATE/DELETE/whatever on current machine or on old machine. Proposal #4 little possibility. (The data could be lost by an old database modification tool that builds a new rootfile) //