Archive-name: databases/sybase-faq/part14
URL: http://www.isug.com/Sybase_FAQ Version: 1.7 Maintainer: David Owen Last-modified: 2003/03/02 Posting-Frequency: posted every 3rd month A how-to-find-the-FAQ article is posted on the intervening months. See reader questions & answers on this topic! - Help others by sharing your knowledge 6.2.7: Hierarchy traversal - BOMs ------------------------------------------------------------------------------- Alright, so you wanna know more about representing hierarchies in a relational database? Before I get in to the nitty gritty I should at least give all of the credit for this algorithm to: "_Hierarical_Structures:_The_Relational_Taboo!_, _(Can_ Transitive_Closure_Queries_be_Efficient?)_", by Michael J. Kamfonas as published in 1992 "Relational Journal" (I don't know which volume or issue). The basic algorithm goes like this, given a tree (hierarchy) that looks roughly like this (forgive the ASCII art--I hope you are using a fixed font to view this): a / \ / \ / \ b c / \ /|\ / \ / | \ / \ / | \ d e f | g Note, that the tree need not be balanced for this algorithm to work. The next step assigned two numbers to each node in the tree, called left and right numbers, such that the left and right numbers of each node contain the left and right numbers of the ancestors of that node (I'll get into the algorithm for assigning these left and right numbers later, but, hint: use a depth-first search): 1a16 / \ / \ / \ 2b7 8c15 / \ /|\ / \ / | \ / \ / | \ 3d4 5e6 9f10 11g12 13h14 Side Note: The careful observer will notice that these left and right numbers look an awful lot like a B-Tree index. So, you will notice that all of the children of node 'a' have left and right numbers between 1 and 16, and likewise all of the children of 'c' have left and right numbers between 8 and 15. In a slightly more relational format this table would look like: Table: hier node parent left_nbr right_nbr ----- ------ -------- --------- a NULL 1 16 b a 2 7 c a 8 15 d b 3 4 e b 5 6 f c 9 10 g c 11 12 h c 13 14 So, given a node name, say @node (in Sybase variable format), and you want to know all of the children of the node you can do: SELECT h2.node FROM hier h1, hier h2 WHERE h1.node = @node AND h2.left_nbr > h1.left_nbr AND h2.left_nbr < h1.right_nbr If you had a table that contained, say, the salary for each node in your hierarchy (assuming a node is actually a individual in a company) you could then figure out the total salary for all of the people working underneath of @node by doing: SELECT sum(s.salary) FROM hier h1, hier h2, salary s WHERE h1.node = @node AND h2.left_nbr > h1.left_nbr AND h2.right_nbr > h1.right_nbr AND s.node = h2.node Pretty cool, eh? And, conversely, if you wanted to know how much it cost to manage @node (i.e. the combined salary of all of the boss's of @node), you can do: SELECT sum(s.salary) FROM hier h1, hier h2, salary s WHERE h1.node = @node AND h2.left_nbr < h1.left_nbr AND h2.left_nbr > h1.right_nbr AND s.node = h2.node Now that you can see the algorithm in action everything looks peachy, however the sticky point is the method in which left and right numbers get assigned. And, unfortunately, there is no easy method to do this relationally (it can be done, it just ain't that easy). For an real- world application that I have worked on, we had an external program used to build and maintain the hierarchies, and it was this program's responsibility to assign the left and right numbers. But, in brief, here is the algorithm to assign left and right numbers to every node in a hierarchy. Note while reading this that this algorithm uses an array as a stack, however since arrays are not available in Sybase, they are (questionably) emulated using a temp table. DECLARE @skip int, @counter int, @idx int, @left_nbr int, @node varchar(10) /*-- Initialize variables --*/ SELECT @skip = 1000, /* Leave gaps in left & right numbers */ @counter = 0, /* Counter of next available left number */ @idx = 0 /* Index into array */ /* * The following table is used to emulate an array for Sybase, * for Oracle this wouldn't be a problem. :( */ CREATE TABLE #a ( idx int NOT NULL, node varchar(10) NOT NULL, left_nbr int NOT NULL ) /* * I know that I always preach about not using cursors, and there * are ways to get around it, but in this case I am more worried * about readability over performance. */ DECLARE root_cur CURSOR FOR SELECT h.node FROM hier h WHERE h.parent IS NULL FOR READ ONLY /* * Here we are populating our "stack" with all of the root * nodes of the hierarchy. We are using the cursor in order * to assign an increasing index into the "stack"...this could * be done using an identity column and a little trickery. */ OPEN root_cur FETCH root_cur INTO @node WHILE (@@sqlstatus = 0) BEGIN SELECT @idx = @idx + 1 INSERT INTO #a VALUES (@idx, @node, 0) FETCH root_cur INTO @node END CLOSE root_cur DEALLOCATE CURSOR root_cur /* * The following cursor will be employed to retrieve all of * the children of a given parent. */ DECLARE child_cur CURSOR FOR SELECT h.node FROM hier h WHERE h.parent = @node FOR READ ONLY /* * While our stack is not empty. */ WHILE (@idx > 0) BEGIN /* * Look at the element on the top of the stack. */ SELECT @node = node, @left_nbr = left_nbr FROM #a WHERE idx = @idx /* * If the element at the top of the stack has not been assigned * a left number yet, then we assign it one and copy its children * on the stack as "nodes to be looked at". */ IF (@left_nbr = 0) BEGIN /* * Set the left number of the current node to be @counter + @skip. * Note, we are doing a depth-first traversal, assigning left * numbers as we go. */ SELECT @counter = @counter + @skip UPDATE #a SET left_nbr = @counter WHERE idx = @idx /* * Append the children of the current node to the "stack". */ OPEN child_cur FETCH child_cur INTO @node WHILE (@@sqlstatus = 0) BEGIN SELECT @idx = @idx + 1 INSERT INTO #a VALUES (@idx, @node, 0) FETCH child_cur INTO @node END CLOSE child_cur END ELSE BEGIN /* * It turns out that the current node already has a left * number assigned to it, so we just need to assign the * right number and update the node in the actual * hierarchy. */ SELECT @counter = @counter + @skip UPDATE h SET left_nbr = @left_nbr, right_nbr = @counter WHERE h.node = @node /* * "Pop" the current node off our "stack". */ DELETE #a WHERE idx = @idx SELECT @idx = @idx - 1 END END /* WHILE (@idx > 0) */ DEALLOCATE CURSOR child_cur While reading through this, you should notice that assigning the left and right numbers to the entire hierarchy is very costly, especially as the size of the hierarchy grows. If you put the above code in an insert trigger on the hier table, the overhead for inserting each node would be phenomenal. However, it is possible to reduce the overall cost of an insertion into the hierarchy. 1. By leaving huge gaps in the left & right numbers (using the @skip variable), you can reduce the circumstances in which the numbers need to be reassigned for a given insert. Thus, as long as you can squeeze a new node between an existing pair of left and right numbers you don't need to do the re-assignment (which could affect all of the node in the hierarchy). 2. By keeping an extra flag around in the hier table to indicate which nodes are leaf nodes (this could be maintained with a trigger as well), you avoid placing leaf nodes in the array and thus reduce the number of updates. Deletes on this table should never cause the left and right numbers to be re-assigned (you could even have a trigger automagically re-parent orphaned hierarchy nodes). All-in-all, this algorithm is very effective as long as the structure of the hierarchy does not change very often, and even then, as you can see, there are ways of getting around a lot of its inefficiencies. Back to top ------------------------------------------------------------------------------- 6.2.8: Calling OS commands from a trigger or a stored procedure ------------------------------------------------------------------------------- 11.5 and above The Adaptive Server (11.5) will allow O/S calls from within stored procedures and triggers. These stored procedures are known as extended stored procedures. Pre-11.5 Periodically folks ask if it's possible to make a system command or call a UNIX process from a Trigger or a Stored Procedure. Guaranteed Message Processing The typical ways people have implemented this capability is: 1. Buy Open Server and bind in your own custom stuff (calls to system() or custom C code) and make Sybase RPC calls to it. 2. Have a dedicated client application running on the server box which regularly scans a table and executes the commands written into it (and tucks the results into another table which can have a trigger on it to gather results...). It is somewhat tricky but cheaper than option 1. Sybase ASE 10.0.2.5 and Above - syb_sendmsg() This release includes a new built-in function called syb_sendmsg(). Using this function you can send a message up to 255 bytes in size to another application from the ASE. The arguments that need to be passed to syb_sendmsg() are the IP address and port number on the destination host, and the message to be sent. The port number specified can be any UDP port, excluding ports 1-1024, not already in use by another process. An example is: 1> select syb_sendmsg("120.10.20.5", 3456, "Hello") 2> go This will send the message "Hello" to port 3456 at IP address '120.10.20.5'. Because this built-in uses the UDP protocol to send the message, the ASE does not guarantee the receipt of the message by the receiving application. Also, please note that there are no security checks with this new function. It is possible to send sensitive information with this command and Sybase strongly recommends caution when utilizing syb_sendmsg to send sensitive information across the network. By enabling this functionality, the user accepts any security problems which result from its use (or abuse). To enable this feature you should run the following commands as the System Security Officer. 1. Login to the ASE using 'isql'. 2. Enable the syb_sendmsg() feature using sp_configure. 1> sp_configure "allow sendmsg", 1 2> go 1> sp_configure "syb_sendmsg port number", <port number> 2> go 1> reconfigure with override -- Not necessary with 11.0 and above 2> go The server must be restarted to set the port number. Using syb_sendmsg() with Existing Scripts Since syb_sendmsg() installs configuration parameter "allow sybsendmsg", existing scripts that contain the syntax 1> sp_configure allow, 1 2> go to enable updates to system tables should be altered to be fully qualified as in the following: 1> sp_configure "allow updates", 1 2> go If existing scripts are not altered they will fail with the following message: 1> sp_configure allow, 1 2> go Configuration option is not unique. duplicate_options ---------------------------- allow updates allow sendmsg (return status = 1) (The above error is a little out of date for the latest releases of ASE, there are now 8 rows that contain "allow", but the result is the same.) Backing Out syb_sendmsg() The syb_sendmsg() function requires the addition on two config values. If it becomes necessary to roll back to a previous ASE version which does not include syb_sendmsg(), please follow the instructions below. 1. Edit the RUNSERVER file to point to the SWR ASE binary you wish to use. 2. isql -Usa -P<sa password> -Sserver_name -n -iunconfig.sendmsg -ooutput_file Sample C program #include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <unistd.h> #include <fcntl.h> main(argc, argv) int argc; char *argv[]; { struct sockaddr_in sadr; int portnum,sck,dummy,msglen; char msg[256]; if (argc <2) { printf("Usage: udpmon <udp portnum>\n"); exit(1); } if ((portnum=atoi(argv[1])) <1) { printf("Invalid udp portnum\n"); exit(1); } if ((sck="socket(AF_INET,SOCK_DGRAM,IPPROTO_UDP))" < 0) { printf("Couldn't create socket\n"); exit(1); } sadr.sin_family = AF_INET; sadr.sin_addr.s_addr = inet_addr("0.0.0.0"); sadr.sin_port = portnum; if (bind(sck,&sadr,sizeof(sadr)) < 0) { printf("Couldn't bind requested udp port\n"); exit(1); } for (;;) { if((msglen="recvfrom(sck, msg, sizeof(msg), 0, NULL, &dummy))" < 0) printf("Couldn't recvfrom() from udp port\n"); printf("%.*s\n", msglen, msg); } } Back to top ------------------------------------------------------------------------------- 6.2.9: Identities and Sequential Keys ------------------------------------------------------------------------------- This has several sections, culled from various sources. It is better described as "Everything you've ever wanted to know about identities." It will serve to answer the following frequently asked questions: What are the Features and Advantages of using Identities? What are the Problems with and Disadvantages of Identities? Common Questions about Identities * Is Identity the equivalent of Oracle's Auto-sequencing? * How do I configure a table to use the Identity field? * How do I configure the burn factor? * How do I find out if my tables have Identities defined? * What is my current identity burn factor vulnerability? How do I optimize the performance of a table that uses Identities? How do I recover from a huge gap in my identity column? How do I fix a table that has filled up its identity values? OK, I hate identities. How do I generate sequential keys without using the Identity feature? How do I optimize a hand-made sequential key system for best performance? - Question 8.1 of the comp.database.sybase FAQ has a quick blurb about identities and sequential numbers. Search down in the page for the section titled, "Generating Sequential Numbers." Question 8.1 is a general document describing Performance and Tuning topics to be considered and thus doesn't go into as much detail as this page. - There's a white paper by Malcolm Colton available from the sybase web site. Goto the Sybase web site http://www.sybase.com and type Surrogate in the search form. Select the Surrogate Primary Keys, Concurrency, and the Cache Hit Ratio document. ------------------------------------------------------------------------------- Advantages/Features of Using Identities There's an entire section devoted to Identity columns in the ASE Reference manual, Chapter 5 Sybase System 10 introduced many changes over the 4.9.x architecture. One of these changes was the Identity feature. The identity column is a special column type that gets automatically updated by the server upon a new row insert. Its purpose is to guarantee a unique row identifier not based on the other data in the row. It was integrated with the server and made memory based for fast value retrieval and no locking (as was/is the case with homegrown sequential key generation schemes). The Advantages and Features of Identities include: * A non-SQL based solution to the problem of having an default unique value assigned to a row. ASE prefetches identity values into cache and adds them automatically to rows as they're inserted into tables that have a type Identity column. There's no concurrency issues, no deadlocking in high-insert situations, and no possibility of duplicate values. * A high performance Unique identifier; ASE's optimizer is tuned to work well with Unique indexes based on the identity value. * The flexibility to insert into the identity field a specific value in the case of a mistaken row deletion. (You can never update however). You accomplish this by: 1> set identity_insert [datababase]..[table] on 2> go Note however that the System will not verify the uniqueness of the value you specifically insert (unless of course you have a unique index existing on the identity column). * The flexibility during bcp to either retain existing identity values or to reset them upon bcping back in. To retain the specific identity values during a bcp out/in process, bcp your data out normally (no special options). Then create your bcp in target table with ddl specifying the identity column in the correct location. Upon bcp'ing back in, add the "-E" option at the end of the bcp line, like this (from O/S prompt): % bcp [database]..[new_table] in [bcp datafile] -Usa -S[server] -f [fmt file] -E For procedures on resetting identity values during a bcp, see the section regarding Identity gaps. * Databasewide Identity options: 1) The ability to set Sybase to automatically create an Identity column on any table that isn't created with a primary key or a unique constraint specified. 2) Sybase can automatically include an Identity field in all indexes created, guaranteeing all will be unique. These two options guarantee increased index performance optimization and guarantees the use of updateable cursors and isolation level 0 reads. These features are set via sp_dboption, like this: 1> sp_dboption [dbname], "auto identity", true 2> go or 1> sp_dboption [dbname], "identity in nonunique index", true 2> go To tune the size of the auto identity (it defaults to precision 10): 1> sp_configure "size of auto identity", [desired_precision] 2> go (the identity in nonunique index db_option and the size of auto identity sp_configure value are new with System 11: the auto identity existed with the original Identity feature introduction in System 10) Like other dboptions, you can set these features on the model database before creating new databases and all your future databases will be configured. Be warned of the pitfalls of large identity gaps however; see the question regarding Burn Factor Vulnerability in the Common Questions about Identities section. * The existence of the @@identity global variable, which keeps track of the identity value assigned during the last insert executed by the server. This variable can be used programming SQL around tables that have identity values (in case you need to know what the last value inserted was). If the last value inserted in the server was to a non-identity table, this value will be "0." Back to start of 6.2.9 ------------------------------------------------------------------------------- Disadvantages/Drawbacks of Using Identities Despite its efficacy of use, the Identity has some drawbacks: * The mechanism that Sybase uses to allocate Identities involves a memory based prefetch scheme for performance. The downside of this is, during non-normal shutdowns of ASE (shutdown with nowait or flat out crashes) ASE will simply discard or "burn" all the unused identity values it has pre-allocated in memory. This sometimes leaves large "gaps" in your monotonically increasing identity columns and can be unsettling for some application developers and/or end users. NOTE: Sybase 11.02.1 (EBF 6717) and below had a bug (bugid 96089) which would cause "large gaps to occur in identity fields after polite shutdowns." The Sybase 11.02.2 rollup (EBF 6886) fixed this problem. If you're at or below 11.02.1 and you use identities, you should definitely upgrade. * (paraphrased from Sybooks P&T guide, Chapter 6): If you do a large number of inserts and you have built your clustered index on an Identity column, you will have major contention and deadlocking problems. This will instantly create a hot spot in your database at the point of the last inserted row, and it will cause bad contention if multiple insert requests are received at once. Instead, create your clustered index on a field that will somewhat randomize the inserts across the physical disk (such as last name, account number, social security number, etc) and then create a non-clustered index based on the identity field that will "cover" any eligible queries. The drawback here, as pointed out in the Identity Optimization section in more detail, is that clustering on another field doesn't truly resolve the concurrency issues. The hot spot simply moves from the last data page to the last non-clustered index page of the index created on the Identity column. * If you fill up your identity values, no more inserts can occur. This can be a big problem, especially if you have a large number of inserts and you have continually crashed your server. However this problem most often occurs when you try to alter a table and add an Identity column that's too small, or if you try to bcp into a table with an identity column thetas too small. If this occurs, follow the procedures for recovering from identity gaps. * I've heard (but not been able to reproduce) that identities jump significantly when dumping and loading databases. Not confirmed. NOTE: there are several other System 11 bugs related to Identities. EBF 7312 fixes BugId 97748, which caused duplicate identity values to be inserted at times. EBF 6886 fixed (in addition to the above described bug) an odd bug (#82460) which caused a server crash when bcping into a table w/ an identity added via alter table. As always, try to stay current on EBFs. Back to start of 6.2.9 ------------------------------------------------------------------------------- Common questions about Identities Is the Identity the equivalent of Oracle's auto-sequencing?: Answer: More or less yes. Oracle's auto-sequencing feature is somewhat transparent to the end user and automatically increments if created as a primary key upon a row insert. The Sybase Identity column is normally specified at table creation and thus is a functional column of the table. If however you set the "auto identity" feature for a database, the tables created will have a "hidden" identity column that doesn't even appear when you execute a select * from [table]. See the Advantages of Identities for more details. * How do I configure Identities?: You can either create your table initially with the identity column: 1> create table ident_test 2> (text_field varchar(10), 3> ident_field numeric(5,0) identity) 4> go Or alter an existing table and add an identity column: 1> alter table existing_table 2> add new_identity_field numeric(7,0) identity 3> go When you alter a table and add an identity column, the System locks the table while systematically incrementing and adding unique values to each row. IF YOU DON'T SPECIFY a precision, Sybase defaults the size to 18! Thats 1,000,000,000,000,000,000-1 possible values and some major major problems if you ever crash your ASE and burn a default number of values... (10^18 with the default burn factor will burn 5^14 or 500,000,000,000,000 values...yikes). * How do I Configure the burn factor?: The number of identity values that gets "burned" upon a crash or a shutdown can by found by logging into the server and typing: 1> sp_configure "identity burning set factor" 2> go the Default value set upon install is 5000. The number "5000" in this case is read as ".05% of all the potential identity values you can have in this particular case will be burned upon an unexpected shutdown." The actual number depends on the size of the identity field as you specified it when you created your table. To set the burn factor, type: 1> sp_configure "identity burning set factor", [new value] 2> go This is a static change; the server must be rebooted before it takes effect. * How do I tell which tables have identities?: You can tell if a table has identities one of two ways: 1. sp_help [tablename]: there is a field included in the sp_help output describing a table called "Identity." It is set to 1 for identity fields, 0 otherwise. 2. Within a database, execute this query: 1> select object_name(id) "table",name "column", prec "precision" 2> from syscolumns 3> where convert(bit, (status & 0x80)) = 1 4> go this will list all the tables and the field within the table that serves as an identity, and the size of the identity field. * What is my identity burn factor vulnerability right now?: In other words, what would happen to my tables if I crashed my server right now? Identities are created type numeric, scale 0, and precision X. A precision of 9 means the largest identity value the server will be able to process is 10^9-1, or 1,000,000,000-1, or 999,999,999. However, when it comes to Burning identities, the server will burn (based on the default value of 5000) .05% of 1,000,000,000 or 500,000 values in the case of a crash. (You may think an identity precision allowing for 1 Billion rows is optimistic, but I once saw a precision set at 14...then the database crashed and their identity values jumped 5 TRILLION. Needless to say they abandoned their original design. Even worse, SQL server defaults precision to 18 if you don't specify it upon table creation...that's a MINIMUM 10,000,000,000 jump in identity values upon a crash with the absolute minimum burn factor) Lets say you have inserted 5 rows into a table, and then you crash your server and then insert 3 more rows. If you select all the values of your identity field, it will look like this: 1> select identity_field from id_test 2> go identity_field -------------- 1 2 3 4 5 500006 500007 500008 (8 rows affected) Here's your Identity burning options (based on a precision of 10^9 as above): Burn value % of values # values burned during crash 5000 .05% 500,000 1000 .01% 100,000 100 .001% 10,000 10 .0001% 1,000 1 .00001% 100 So, the absolute lowest amount of numbers you'll burn, assuming you configure the burn factor down to 1 (sp_configure "identity burning set factor", 1) and a precision of 9, is 100 values. Back to start of 6.2.9 --------------------------------------------------------------------------- Optimizing your Identity setup for performance and maintenance If you've chosen to use Identities in your database, here are some configuration tips to avoid typical Identity pitfalls: + Tune the burn factor!: see the vulnerability section for a discussion on what happens to identity values upon ASE crashes. Large jumps in values can crash front ends that aren't equipped to handle and process numbers upwards of 10 Trillion. I've seen Powerbuilder applications crash and/or not function properly when trying to display these large identity values. + Run update statistics often on tables w/ identities: Any index with an identity value as the first column in the search condition will have its performance severely hampered if Update statistics is not run frequently. Running a nightly update statistics/sp_recompile job is a standard DBA task, and should be run often regardless of the existence of identities in your tables. + Tune the "Identity Grab Size": ASE defaults the number of Identity values it pre-fetches to one (1). This means that in high insert environments the Server must constantly update its internal identity placeholder structure before adding the row. By tuning this parameter up: 1> sp_configure "identity grab size", [number] 2> go You can prefetch larger numbers of values for each user as they log into the server an insert rows. The downside of this is, if the user doesn't use all of the prefetched block of identity values, the unused values are lost (seeing as, if another user logs in the next block gets assigned to him/her). This can quickly accelerate the depletion of identity values and can cause gaps in Identity values. (this feature is new with System 11) + Do NOT build business rules around Identity values. More generally speaking the recommendation made by DBAs is, if your end users are EVER going to see the identity field during the course of doing their job, then DON'T use it. If your only use of the Identity field is for its advertised purpose (that being solely to have a uniquely identifying row for a table to index on) then you should be fine. + Do NOT build your clustered index on your Identity field, especially if you're doing lots of inserts. This will create a hot spot of contention at the point of insertion, and in heavier OLTP environments can be debilitating. - There is an excellent discussion in document http://www.sybase.com/ detail?id=860 on the performance and tuning aspects of Identities. It supplements some of the information located here (Note: this will open in a new browser window). Back to start of 6.2.9 --------------------------------------------------------------------------- Recovery from Large Identity value gaps or Recovery from Identity insert errors/Full Identity tables This section will discuss how to re-order the identity values for a table following a crash/abnormal shutdown that has resulted in huge gaps in the values. The same procedure is used in cases where the identity field has "filled up" and does not allow inserts anymore. Some applications that use Identities are not truly candidates for this process (i.e., applications that depend on the identity field for business purposes as opposed to simple unique row identifiers). Applications like this that wish to rid their dependence on identities will have to re-evaluate their database design. + Method 1:bcp out and in: - First, (from O/S command line): % bcp database..table out [data_file] -Usa -S[server] -N This will create a binary bcp datafile and will force the user to create a .fmt file. The -N option tells the server to skip the identity field while bcp'ing out. - drop and recreate the table in question from ddl (make sure your table ddl specifies the identity field). - Now bcp back in: % bcp database.table in [data_file -Usa -S[server] -f[fmt file] -N The -N option during bcp in tells the server to ignore the data file's placeholder column for the defined identity column. Coincidentally, if you bcp out w/o the -N option, drop the table, recreate from ddl specifying the identity field, and bcp back in w/o the -N option, the same effect as above occurs. (note: if you bcp out a table w/ identity values and then want to preserve the identity values during the bcp back in, use the "-E" option.) + Method 2: select into a new table, adding the identity column as you go : Follow this process: 1> select [all columns except identity column] 2> [identity column name ] = identity(desired_precision) 3> into [new_table] 4> from [old table] 5> go + There are alternate methods that perform the above in multi steps, and might be more appropriate in some situations. o You can bcp out all the fields of a table except the identity column (create the bcp format file from the original table, edit out the identity column, and re-bcp). At this point you can create a new table with or without the identity column; if you create it with, as you bcp back in the Server will assign new identity values. If you create it without, you can bcp back in normally and then alter the table and add the identity later. o You can select all columns but the identity into a new table, then alter that table and add an identity later on. Back to start of 6.2.9 --------------------------------------------------------------------------- How do I generate Sequential Keys w/o the Identity feature? There are many reasons not to use the Identity feature of Sybase. This section will present several alternative methods, along with their advantages and drawbacks. The methods are presented in increasing order of complexity. The most often implemented is Method 3, which is a more robust version of Method 2 and which uses a surrogate-key storage table. Throughout this section the test table I'm adding lines to and generating sequential numbers for is table inserttest, created like this: 1> create table inserttest 2> (testtext varchar(25), counter int) 3> go + Method 1: Create your table with a column called counter of type int. Then, each time you insert a row, do something like this: 1> begin tran 2> declare @nextkey int 3> select @nextkey=max(counter)+1 from inserttest holdlock 4> insert inserttest (testtext,counter) values ("test_text,@nextkey") 5> go 1> commit tran 2> go This method is rather inefficient, as large tables will take minutes to return a max(column) value, plus the entire table must be locked for each insert (since the max() will perform a table scan). Further, the select statement does not guarantee an exclusive lock when it executes unless you have the "holdlock" option; so either duplicate values might be inserted to your target table or you have massive deadlocking. + Method 2: See Question 10.1.1 of the comp.database.sybase FAQ is the May 1994 (Volume 3, Number 2) Sybase Technical Note (these links will open in a new browser window). Search down in the tech note for the article titled, "How to Generate Sequential Keys for Table Key Columns." This has a simplistic solution that is expanded upon in Method 3. + Method 3: Create a holding table for keys in a common database: Here's our central holding table. 1> create table keystorage 2> (tablename varchar(25), 4> lastkey int) 5> go And initially populate it with the tablenames and last values inserted (enter in a 0 for tables that are brand new). 1> insert into keystorage (tablename,lastkey) 2> select "inserttest", max(counter) from inserttest 3> go Now, whenever you go to insert into your table, go through a process like this: 1> begin tran 2> update keystorage set lastkey=lastkey+1 where tablename="inserttest" 3> go 1> declare @lastkey int 2> select @lastkey = lastkey from keystorage where tablename="inserttest" 3> insert inserttest (testtext,counter) values ("nextline",@lastkey) 4> go 1> commit tran 2> go There is plenty of room for error checking with this process: for example (code adapted from Colm O'Reilly (colm@mail.lk.blackbird.ie) post to Sybase-L 6/20/97): 1> begin tran 2> update keystorage set lastkey=lastkey+1 where tablename="inserttest" 3> if @@rowcount=1 4> begin 5> declare @lastkey int 6> select @lastkey=lastkey from keystorage where tablename="inserttest" 7> end 8> commit tran 9> begin tran 10> if @lastkey is not null 11> begin 12> insert inserttest (testtext,counter) values ("third line",@lastkey) 13> end 14> commit tran 15> go This provides a pretty failsafe method of guaranteeing the success of the select statements involved in the process. You still have a couple of implementation decisions though: o One transaction or Two? The above example uses two transactions to complete the task; one to update the keystorage and one to insert the new data. Using two transactions reduces the amount of time the lock is held on keystorage and thus is better for high insertion applications. However, the two transaction method opens up the possibility that the first transaction will commit and the second will roll back, leaving a gap in the sequential numbers. (of course, this gap is small potatoes compared to the gaps that occur in Identity values). Using one transaction (deleting lines 8 and 9 in the SQL above) will guarantee absolutely no gaps in the values, but will lock the keystorage table longer, reducing concurrency in high insert applications. o Update first or select first? The examples given generally update the keystorage table first, THEN select the new value. Performing the select first (you will have to rework the creation scheme slightly; by selecting first you're actually getting the NEXT key to add, where as by updating first, the keystorage table actually holds the LAST key added) you allow the application to continue processing while it waits for the update lock on the table. However, performing the update first guarantees uniqueness (selects are not exclusive). Some DBAs experienced with this keystorage table method warn of large amounts of blocking in high insert activity situations, a potential drawback. + Method 4: Enhance the above method by creating an insert trigger on your inserttest table that performs the next-key obtainment logic. Or you could create an insert trigger on keystorage which updates the table and obtains your value for you. Integrating the trigger logic to your application might make this approach more complex. Also, because of the nature of the trigger you'll have to define the sequence number columns as allowing NULL values (a bad thing if you're depending on the sequential number as your primary key). Plus, triggers will slow the operation down because after obtaining the new value via trigger, you'll have to issue an extra update command to insert the rest of your table values. + Method 5: (Thanks to John Drevicky (jdrevicky@tca-techsys.com)) The following procedure is offered as another example of updating and returning the Next Sequential Key, with an option that allows automatic reuse of numbers...... ----------------------------------------------------------------- ---- -- DECLARE @sql_err int, @sql_count int -- begin tran -- select @out_seq = 0 -- UPDATE NEXT_SEQUENCE SET next_seq_id = ( next_seq_id * ( sign(1 + sign(max_seq_id - next_seq_id) ) -- evaluates: 0 [when -- next > max]; else 1 * sign(max_seq_id - next_seq_id) -- evaluates: 0 [when next = max]; -- 1 [next < max]; -- -1 [next > max] ) -- both evaluate to 1 when next < max ) + 1 -- increment by [or restart at] 1 WHERE seq_type = @in_seq_type -- select @sql_err = @@error, @sql_count = @@rowcount -- IF @sql_err = 0 and @sql_count = 1 BEGIN select @out_seq = next_seq_id from NEXT_SEQUENCE where seq_type = @in_seq_type -- commit tran return 0 END ELSE BEGIN RAISERROR 44999 'Error %1! returned from proc derive_next_sequence...no update occurred', @sql_err rollback tran END + Other Methods: there are several other implementation alternatives available that involve more complex logic but which might be good solutions. One example has a central table that stores pre-inserted sequential numbers that are deleted as they're inserted into the production rows. This method allows the sequence numbers to be recycled if their associated row is deleted from the production table. An interesting solution was posted to Sybase-L 6/20/97 by Matt Townsend ( mtowns@concentric.net) and is based on the millisecond field of the date/time stamp. His solution guarantees uniqueness without any surrogate tables or extra inserts/updates, and is a superior performing solution to other methods described here (including Identities), but cannot support exact sequential numbers. Some other solutions are covered in a white paper available at Sybase's Technical library discussing Sequential Keys (this will open in a new browser window). Back to start of 6.2.9 --------------------------------------------------------------------------- Optimizing your home grown Sequential key generating process for any version of Sybase + max_rows_per_page/fillfactor/table padding to simulate row level locking: This is the most important tuning mechanism when creating a hand -made sequence key generation scheme. Because of Sybase's page level locking mechanism, your concurrency performance in higher-insert activity situations could be destroyed unless the server only grabs one row at a time. However since Sybase doesn't currently have row-level locking, we simulate row-level locking by creating our tables in such a way as to guarantee one row per 2048 byte page. o For pre-System 11 servers; Calculate the size of your rows, then create dummy fields in the table that get populated with junk but which guarantee the size of the row will fill an entire page. For example (code borrowed from Gary Meyer's 5/8/94 ISUG presentation ( gmeyer@netcom.com)): 1> create table keystorage 2> (tablename varchar(25), 3> lastkey int, 4> filler1 char(255) not null, 5> filler2 char(255) not null, 6> filler3 char(255) not null, 7> filler4 char(255) not null, 8> filler5 char(255) not null, 9> filler6 char(255) not null, 9> filler7 char(255) not null) 10> with fillfactor = 100 11> go We use 7 char(255) fields to pad our small table. We also specify the fillfactor create table option to be 100. A fillfactor of 100 tells the server to completely fill every data page. Now, during your initial insertion of a line of data, do this: 1> insert into keystorage 2> (tablename,lastkey, 3> filler1,filler2,filler3,filler4,filler5,filler6,filler7) 4> values 5> ("yourtable",0, 6> replicate("x",250),replicate("x",250), 7> replicate("x",250),replicate("x",250), 8> replicate("x",250),replicate("x",250), 9> replicate("x",250)) 10> go This pads the row with 1750 bytes of junk, almost guaranteeing that, given a row's byte size limit of 1962 bytes (a row cannot span more than one page, thus the 2048 page size minus server overhead == 1962), we will be able to simulate row level locking. o In Sybase 11, a new create table option was introduced: max_rows_per_page. It automates the manual procedures above and guarantees at a system level what we need to achieve; one row per page. 1> create table keystorage 2> (tablename varchar(25), 3> lastkey int) 4> with max_rows_per_page = 1 5> go + Create unique clustered indexes on the tablename/entity name within your keystorage table. This can only improve its performance. Remember to set max_rows_per_page or the fillfactor on your clustered index, as clustered indexes physically reorder the data. + Break up the process into multiple transactions wherever possible; this will reduce the amount of time any table lock is held and will increase concurrency in high insertion environments. + Use Stored Procedures: Put the SQL commands that update the keystorage table and then insert the updated key value into a stored procedure. Stored procedures are generally faster than individual SQL statements in your code because procedures are pre-compiled and have optimization plans for index usage stored in Sybase's system tables. + Enhance the keystorage table to contain a fully qualified table name as opposed to just the tablename. This can be done by adding fields to the table definition or by just expanding the entity name varchar field definition. Then place the keystorage table in a central location/ common database that applications share. This will eliminate multiple keystorage tables but might add length to queries (since you have to do cross-database queries to obtain the next key). - There is an excellent discussion located in the whitepapers section of Sybase's home page discussing the performance and tuning aspects of any type of Sequential key use. It supplements the information here (note: this page will open in a new browser window). Back to start of 6.2.9 Back to top ------------------------------------------------------------------------------- 6.2.10: How can I execute dynamic SQL with ASE? ------------------------------------------------------------------------------- Adaptive Server Enterprise: System 12 ASE 12 supports dynamic SQL, allowing the following: declare @sqlstring varchar(255) select @sqlstring = "select count(*) from master..sysobjects" exec (@sqlstring) go Adaptive Server Enterprise: 11.5 and 11.9 There is a neat trick that was reported first by Bret Halford ( bret@sybase.com ). (If anyone knows better, point me to the proof and I will change this!) It utilises the CIS features of Sybase ASE. * Firstly define your local server to be a remote server using sp_addserver LOCALSRV,sql_server[,INTERFACENAME] go * Enable CIS sp_configure "enable cis",1 go * Finally, use sp_remotesql, sending the sql to the server defined in point 1. declare @sqlstring varchar(255) select @sqlstring = "select count(*) from master..sysobjects" sp_remotesql LOCALSRV,@sqlstring go Remember to ensure that all of the databases referred to in the SQL string are fully qualified since the call to sp_remotesql places you back in your default database. Sybase ASE (4.9.x, 10.x and 11.x before 11.5) Before System 11.5 there was no real way to execute dynamic SQL. Rob Verschoor has some very neat ideas that fills some of the gaps (http://www.euronet.nl/ ~syp_rob/dynsql.html). Dynamic Stored Procedure Execution With System 10, Sybase introduced the ability to execute a stored procedure dynamically. declare @sqlstring varchar(255) select @sqlstring = "sp_who" exec @sqlstring go For some reason Sybase chose never to document this feature. Obviously all of this is talking about executing dynamic SQL within the server itself ie stored procedures and triggers. Dynamic SQL within client apps is a different matter altogether. Back to top ------------------------------------------------------------------------------- 6.2.11: Is it possible to concatenate all the values from a column and return a single row? ------------------------------------------------------------------------------- Hey, this was quite cool I thought. It is now possible to concatenate a series of strings to return a single column, in a sort of analogous manner to sum summing all of the numbers in a column. Obviously, in versions before 12.5, the longest string that you can have is 255 characters, but with very long varchars, this may prove useful to someone. Use a case statement, a la, 1> declare @string_var varchar(255) 2> 3> select @string_var = "" 4> 5> select @string_var = @string_var + 6> (case 1 when 1 7> then char_col 8> end) 9> from tbl_a 10> 11> print "%1!", @string_var 12> go (1 row affected) ABCDEFGH (8 rows affected) 1> select * from tbl_a 2> go char_col -------- A B C D E F G H (8 rows affected) 1> Back to top ------------------------------------------------------------------------------- 6.2.12: Selecting rows N to M without Oracle's rownum? ------------------------------------------------------------------------------- Sybase does not have a direct equivalent to Oracle's rownum but its functionality can be emulated in a lot of cases. If you are simply trying to retrieve the first N rows of a table, then simple use: set rowcount replacing <N> with your desired number of rows. (set rowcount 0 restores normality.) If it is simply the last N rows, then use a descending order-by clause in the select. 1> set rowcount 2> go 1> select foo 2> from bar 3> order by barID desc 4> go If you are trying to retrieve rows 100 to 150, say, from a table in a given order. You could use this to retrieve rows for a set of web pages, but there are probably more efficient ways using cursors or well written queries or even Sybperl! The general idea is select the rows into a temporary table adding an identity column at the same time. Only select enough rows to do the job using the rowcount trick. Finally, return the rows from the temporary table where the identity column is between 100 and 150. Something like this: set rowcount 150 select pseudo_key = identity(3), col1, col2 into #tempA from masterTable where clause... order by 2,3 select col1,col2 from #tempA where pseudo_key between 100 and 150 Remember to reset rowcount back to 0 before issuing any more SQL or you will only get back 150 rows! A small optimisation would be to select only the key columns for the source table together with the identity key. Once you have the set of rows you require in the temporary table, join this back to the source using the key columns to get any data that you require. An alternative, which might be better if you needed to join back to this table a lot, would be to insert enough rows to cover the range as before, but then delete the set of unwanted rows. This would be a very efficient mechanism if the majority of your queries involved the first few rows of a table. A typical application for this might be a search engine displaying relevant items first. The chances are that the user is going to be bored after the first couple of pages and go back to playing 'Internet Doom'. set rowcount 150 select col1, col2 into #tempA from masterTable where clause... set rowcount 100 delete #tempA Sybase does not guarantee to return rows in any particular order, so the delete may not delete the correct set of rows. In the above example, you should add an order-by to the 'select' and build a clustered index on a suitable key in the temporary table. The following stored proc was posted to the Sybase-L mailing list and uses yet another mechanism. You should check that it works as expected in your environment since it relies on the fact a variable will be set using the last row that is returned from a result set. This is not published behaviour and is not guaranteed by Sybase. CREATE PROCEDURE dbo.sp_get_posts @perpage INT, @pagenumber INT WITH RECOMPILE AS -- if we're on the first page no need to go through the @postid push IF @pagenumber = 1 BEGIN SET ROWCOUNT @perpage SELECT ... RETURN END -- otherwise DECLARE @min_postid NUMERIC( 8, 0 ), @position INT SELECT @position = @perpage * ( @pagenumber - 1 ) + 1 SET ROWCOUNT @position -- What happens here is it will select through the rows -- and order the whole set. -- It will stop push postid into @min_postid until it hits -- ROWCOUNT and does this out of the ordered set (a work -- table). SELECT @min_postid = postid FROM post WHERE ... ORDER BY postid ASC SET ROWCOUNT @perpage -- we know where we want to go (say the 28th post in a set of 50). SELECT ... FROM post WHERE postid >= @min_postid ... ORDER BY postid ASC Yet another solution would be to use a loop and a counter. Probably the least elegant, but again, it would depend on what you were trying to do as to what would be most appropriate. As you can see, none of these are particularly pretty. If you know of a better method, please forward it to dowen@midsomer.org. Back to top ------------------------------------------------------------------------------- 6.2.13: How can I return number of rows that are returned from a grouped query without using a temporary table? ------------------------------------------------------------------------------- This question is certainly not rocket science, but it is often nice to know how many rows are returned as part of a group by. This might be for a report or a web query, where you would want to tell the user how many rows were returned on page one. It is easy using a temp table, but how to do it without a temp table is a little harder. I liked this solution and thought that it might not be obvious to everyone, it was certainly educational to me. Thanks go to Karl Jost for a very nice answer. So, give data like: name item ---- ---- Brown 1 Smith 2 Brown 5 Jones 7 you wish to return a result set of the form: name sum(item) rows ---- --------- ---- Brown 6 3 Jones 7 3 Smith 2 3 rather than name sum(item) rows ---- --------- ---- Brown 6 2 Jones 7 1 Smith 2 1 Use the following, beguilingly simple query: select name, sum(item), sum(sign(count(*))) from data group by name Back to top ------------------------------------------------------------------------------- Useful SQL Tricks SQL Fundamentals ASE FAQ Useful SQL Tricks 6.3.1 How to feed the result set of one stored procedure into another. 6.3.2 Is it possible to do dynamic SQL before ASE 12? Open Client SQL Advanced ASE FAQ ------------------------------------------------------------------------------- Note: A number of the following tips require CIS to be enabled (at this precise moment, all of them require CIS :-) The optimiser does take on a different slant, however small, when CIS is enabled, so it is up to you to ensure that things don't break when you do turn it on. Buyer beware. Test, test, test and when you have done that, check some more. ------------------------------------------------------------------------------- 6.3.1: How to feed the result set of one stored procedure into another. ------------------------------------------------------------------------------- I am sure that this is all documented, but it is worth adding here. It uses CIS, as do a number of useful tricks. CIS is disabled by default before 12.0 and not available before 11.5. It is courtesy of BobW from sybase.public.ase.general, full acceditation will be granted if I can find out who he is. Excellent tip! So, the scenario is that you have a stored procedure, AP_A, and you wish to use the result set that it returns in a query. Create a proxy table for SP_A. create table proxy_SP_A ( a int, b int, c int, _p1 int null, _p2 int null ) external procedure at "SELF.dbname.dbo.SP_A" Columns a, b, c correspond to the result set of SP_A. Columns _p1, _p2 correspond to the @p1, @p2 parameters of SP_A. "SELF" is an alias put in sysservers to refer back to the local server. If you only have one row returned the proxy table can be used with the following: declare @a int, @b int, @c int select @a = a, @b = b, @c = c from proxy_SP_B where _p1 = 3 and _p2 = 5 More rows can be handled with a cursor. Back to top ------------------------------------------------------------------------------- 6.3.2: Is it possible to do dynamic SQL before ASE 12? ------------------------------------------------------------------------------- Again, using CIS, it is possible to fake dynamic SQL. Obviously for this to work, CIS must be enabled. In addition, the local server must be added to sysservers as a remote server. There is a stored procedure, sp_remotesql, that takes as an arguments a remote server and a string, containing SQL. As before, adding SELF as the 'dummy' server name pointing to the local server as if it were a remote server, we can execute the following: sp_remotesql "SELF","select * from sysdatabases" Which will do just what you expect, running the query on the local machine. The stored proc will take 251 (according to its own documentation) arguments of char(255) or varchar(255) arguments, and concatenate them all together. So we can do the following: 1> declare @p1 varchar(255),@p2 varchar(255),@p3 varchar(255), @p4 varchar(255) 2> 3> select @p1 = "select", 4> @p2 = " name ", 5> @p3 = "from ", 6> @p4 = "sysdatabases" 7> 8> exec sp_remotesql "SELF", @p1, @p2, @p3, @p4 9> go (1 row affected) name ------------------------------ bug_track dbschema master model sybsystemprocs tempdb (6 rows affected, return status = 0) Obviously, when the parameters are concatenated, they must form a legal T-SQL statement. If we remove one of the spaces from the above statement, then we see: 1> declare @p1 varchar(255),@p2 varchar(255),@p3 varchar(255), @p4 varchar(255) 2> 3> select @p1 = "select", 4> @p2 = "name ", 5> @p3 = "from ", 6> @p4 = "sysdatabases" 7> 8> exec sp_remotesql "SELF", @p1, @p2, @p3, @p4 9> go Msg 156, Level 15, State 1 , Line 1 Incorrect syntax near the keyword 'from'. (1 row affected, return status = 156) Back to top ------------------------------------------------------------------------------- Open Client SQL Advanced ASE FAQ User Contributions:Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Part8 - Part9 - Part10 - Part11 - Part12 - Part13 - Part14 - Part15 - Part16 - Part17 - Part18 - Part19 [ Usenet FAQs | Web FAQs | Documents | RFC Index ] Send corrections/additions to the FAQ Maintainer: dowen@midsomer.org (David Owen)
Last Update March 27 2014 @ 02:11 PM
|
Comment about this article, ask questions, or add new information about this topic: