Login or Sign Up to become a member!
LessThanDot Site Logo

LessThanDot

A Technical Community for IT Professionals

Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.

LTD Social Sitings

Lessthandot twitter Lessthandot Linkedin Lessthandot facebook Lessthandot rss

Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.

Highly Rated Users

Forum
No Posts Rated

Top 50
Given
Received

Links

Wiki
Blog

Forum Statistics

Users
Members:
1834
Members Online:
4
Guests Online:
29

Total Post History
Posts:
81393
Topics:
18665

7-Day Post History
New Posts:
6
New Topics:
1
Active Topics:
2

Our newest member
lrhwy

Other

FAQ
All times are UTC [ DST ]

Google Ads

CTE version for split string function

Microsoft SQL Server
Please wait...

CTE version for split string function

Postby Niikola on Wed Sep 02, 2009 3:47 pm

I posted CTE version for split string function as a comment in
http://blogs.lessthandot.com/index.php/ ... 5-clr-vs-t

My test shows it's much faster than function that uses numbers table.
Here is the code again.
  1. CREATE FUNCTION fnSplitWithCTE
  2. (
  3.    @List VARCHAR(8000),
  4.    @Delimiter CHAR(1),
  5.    @ShowEmpty BIT
  6. )
  7. RETURNS @ret TABLE
  8. (
  9.    pos INT PRIMARY KEY,
  10.    VALUE VARCHAR(8000)
  11. )
  12. AS BEGIN
  13.    DECLARE @LEN INT
  14.    SET @LEN=LEN(@List)+1
  15.  
  16.    ;With a AS
  17.    (
  18.      SELECT 1 AS nStart,
  19.             isNull(NULLIF(CHARINDEX(@Delimiter,@List,1),0),@LEN) AS nEnd,
  20.             RTRIM(LTRIM(SUBSTRING(@List,1,isNull(NULLIF(CHARINDEX(@Delimiter,@List,1),0),@LEN)-1))) AS VALUE
  21.      UNION All
  22.      SELECT nEnd+1,
  23.             isNull(NULLIF(CHARINDEX(@Delimiter,@List,nEnd+1),0),@LEN),
  24.             RTRIM(LTRIM(SUBSTRING(@List,nEnd+1,isNull(NULLIF(CHARINDEX(@Delimiter,@List,nEnd+1),0),@LEN)-nEnd-1)))
  25.      FROM a
  26.      WHERE nEnd<@LEN
  27.    )
  28.    INSERT INTO @ret
  29.    SELECT Row_Number() OVER (ORDER BY nStart),
  30.           NULLIF(VALUE,'')
  31.     FROM a
  32.     WHERE (@ShowEmpty=1 or LEN(VALUE)>0)
  33.     RETURN
  34. END
  35.  
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain

Re: CTE version for split string function

Postby onpnt on Wed Sep 02, 2009 3:50 pm

Thanks! I'll run the test after I free up from the 24 hours of pass sessions. I'll psot them back here over the blog so it's easier for everyone to pick them up
Tarwn: Yeah yeah, all you do is SELECT * all day long and say "no" to people...Life of a DBA
User avatar
onpnt
LTD Admin
LTD Admin
LTD Silver - Rating: 623LTD Silver - Rating: 623LTD Silver - Rating: 623LTD Silver - Rating: 623LTD Silver - Rating: 623
LTD Silver - Rating: 623LTD Silver - Rating: 623LTD Silver - Rating: 623LTD Silver - Rating: 623
 
Posts: 1608
Joined: Tue Oct 09, 2007 5:23 pm
Location: Kenosha, WI
Unrated

Re: CTE version for split string function

Postby Naomi on Wed Sep 02, 2009 6:39 pm

Please make your two other parameters optional (comma and 0)
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby Naomi on Wed Sep 02, 2009 6:39 pm

Also can you just post the test case scenario?
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby SQLDenis on Wed Sep 02, 2009 6:46 pm

Naomi wrote:Please make your two other parameters optional (comma and 0)


How do you do that in a function?

even if you do this
  1. CREATE FUNCTION fnSplitWithCTE
  2. (
  3.    @List VARCHAR(8000),
  4.    @Delimiter CHAR(1) = null,
  5.    @ShowEmpty BIT =  null
  6. ...
  7. ...
  8. ..


or this
  1. ALTER FUNCTION fnSplitWithCTE
  2. (
  3.    @List VARCHAR(8000),
  4.    @Delimiter CHAR(1) = ',',
  5.    @ShowEmpty BIT =  1
  6. )
  7. ....
  8. ....


and you call it like this

  1. SELECT * FROM dbo.fnSplitWithCTE('abc,mcde')


you will be greeted with the following message

Server: Msg 313, Level 16, State 3, Line 1
An insufficient number of arguments were supplied for the procedure or function dbo.fnSplitWithCTE.
User avatar
SQLDenis
LTD Admin
LTD Admin
LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467
LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467
LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467
 
Posts: 21784
Joined: Wed Oct 10, 2007 6:43 pm
Location: Princeton, New Jersey, USA,World, Solar System, Milky Way, Universe and Beyond
Unrated

Re: CTE version for split string function

Postby Naomi on Wed Sep 02, 2009 6:57 pm

You're right :( I thought for a moment we can use it without optional parameters.
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby Niikola on Wed Sep 02, 2009 7:21 pm

Naomi wrote:You're right :( I thought for a moment we can use it without optional parameters.

You can call it as
fnStrng('abc,def',DEFAULT,DEFAULT)
but I never liked it :)

P.S. (added later)
It's easy to make another function with one parameter which will call this one.
Something like

Create Function fnSplitCSV(@list as varchar(8000))
Returns table as
Return (Select * From fnSplitCSV(@list,',',0) )

It should not degrade performance (a lot), although, I prefer to create it as separate one.

There is one downside of using CTE - default limit of 100 iterations.
If you expect more than 100 items in the list you have to increase number of allowed CTE iterations.
Last edited by Niikola on Wed Sep 02, 2009 7:47 pm, edited 1 time in total.
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain
Unrated

Re: CTE version for split string function

Postby Naomi on Wed Sep 02, 2009 7:34 pm

What is the difference between your function and this http://www.sommarskog.se/arrays-in-sql-2005.html#CTEs and can you post case scenarios?
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby Niikola on Wed Sep 02, 2009 7:56 pm

Naomi wrote:What is the difference between your function and this http://www.sommarskog.se/arrays-in-sql-2005.html#CTEs and can you post case scenarios?

I didn't read this before.
I like his idea of using binary collation and his stop clause
My solution has additional option - how to handle consecutive delimiters, which could be helpful when you want to split sentence by spaces and you don't want to get empty string for double spaces.
One small point - which could be easily added in his solution - is position of parameter

In fact, those two solutions are same, with few smaller differences. Probably his solution would win the race, but mine is maybe a bit more comfortable :)
It's up to you to choose coupe or sedan.
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain
Unrated

Re: CTE version for split string function

Postby Naomi on Wed Sep 02, 2009 7:57 pm

Right - the link above also addressed that.

In that link, though, I don't understand why did we use non-standard collation.
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby Niikola on Wed Sep 02, 2009 8:17 pm

Naomi wrote:In that link, though, I don't understand why did we use non-standard collation.


String operations against variables in binary collations are fastest as they don't have to take in account additional rules and constraints assigned to other collations.
Take as example ..._Ci_AI collations - Case Insensitive/Accent Insensitive.
Any string operation has to perform some additional operation in order to check if two characters are equal even if they have different code (eg. AÀÁÂÃÄÅaàáâãäå is composed of the same letters according to _CI_AI collation)

According to BOL "Binary collations are useful in achieving improved application performance due to their relative simplicity. "

I do not know why he took Slovenian_bin instead of some other, but it doesn't really matter.
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain

Re: CTE version for split string function

Postby Naomi on Wed Sep 02, 2009 8:48 pm

Thanks for the explanation. BTW, I tried to send you an e-mail using your gmail account.
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby Niikola on Thu Sep 03, 2009 4:45 pm

I've got some results comparing my version of split functions (one with CTE "fnSplitWithCTE" and another using Numbers table "fnSplitVC"). I would refer to them as CTE and NUM functions.
You can find below how I populated test data and query used for testing
General idea was to test it against decent number of rows (10,000) and against different number of delimiters.
I created table that has 5 varchar columns with sizes of 256, 512, 1024, 2048 and 4096
Every column was populated with 8 comma ',' , 16 dot '.', 32 colon ':' and 64 semicolon ';' delimiters at random positions.
My numbers table has smallint number column only. It's populated with whole range including negative numbers.

In the code bellow I was changing <functionName>, <fieldname> and <delimiter> .
I used next query to be sure that functions did the job and results were really used (that's why min and max functions)

  1. DECLARE @t1 TIME
  2. SELECT t.Id, s.nItems, s.minLen, s.maxLen
  3. FROM testSplit t WITH(nolock)
  4. Cross apply (SELECT COUNT(*) AS nItems,
  5.                     MIN(LEN(c.VALUE)) AS minLen,
  6.                     MAX(LEN(c.VALUE)) AS maxLen
  7.                FROM dbo.<functionName>(t.<fieldname>, <delimiter>, 1) c
  8.             ) s
  9.  


Here are results:

CTE version
-----------------------------------
number of delimiters
--------------------------
Length 6 16 32 64
------ --------------------------
256 6.1 8.8 14.0 24.3
512 8.1 10.9 16.3 27.0
1,024 12.0 15.0 20.7 31.9
2,048 19.8 23.2 29.4 41.5
4,096 35.3 39.6 46.8 60.6
-----------------------------------

NUM version
-----------------------------------
number of delimiters
--------------------------
Length 6 16 32 64
------ --------------------------
256 59.5 60.9 62.6 67.0
512 62.1 63.4 65.6 69.9
1,024 67.6 69.0 71.9 76.6
2,048 78.9 80.7 83.6 89.9
4,096 101.0 104.0 108.0 115.5
-----------------------------------


Graphic presentation below, shows that execution time depends more on string length than on number of delimiters.
NUM function is much less dependent on number of delimiters while CTE function has a slightly less better curve of string size dependency.

Overall, CTE showed much better performance (at least my versions of CTE and NUM functions)

SplitFunctionsComparison.png


Code for my version of function that uses numbers table (it could be faster I believe):
Code is hidden, SHOW


Test table schema and population:
Code is hidden, SHOW
You do not have the required permissions to view the files attached to this post.
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain

Re: CTE version for split string function

Postby Emtucifor on Thu Sep 03, 2009 5:34 pm

I have a feeling there's a better numbers table solution... I wish I had time to make/find one for you.
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA
Unrated

Re: CTE version for split string function

Postby Niikola on Thu Sep 03, 2009 6:14 pm

Emtucifor wrote:I have a feeling there's a better numbers table solution... I wish I had time to make/find one for you.

Anyway I want to rerun test as I forgot to remove Primary Key from table definition (I used it for some testing). It could bring some improvements in both solutions.
I will appreciate any suggestion or solution you or anyone else post. It's not a big deal to include additional versions in testing.

p.s. I know my number based solution is not the best one, most probably CTE could be improved too
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain
Unrated

Re: CTE version for split string function

Postby Naomi on Thu Sep 03, 2009 6:36 pm

Check http://blogs.lessthandot.com/index.php/ ... tions--1#3

I may have time on Sunday to test.
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby Niikola on Thu Sep 03, 2009 10:10 pm

Here are few more results.
I took one NUM and one CTE function from "Arrays and Lists in SQL Server 2005" by Erland Sommarskog
http://www.sommarskog.se/arrays-in-sql-2005.html
And NUM versions (original from the blog and a just fixed up a bit one from one of the last comments ) from "Split string in SQL Server 2005+ CLR vs. T-SQL" by onpnt
http://blogs.lessthandot.com/index.php/ ... 5-clr-vs-t
(*) By mistake i wrote "original i emtucifor's" (in Serbian i means and)

I did some smaller changes, eg. adding delimiter parameter and changing data types to varchar(8000)
I also removed primary key and @ShowEmpty parameter from my functions.
In addition I created both of my functions as inline and multi-statement table functions
fnSplitNUM_Somm  Sommarskog NUM  inline
fnSplitCTE_nik Niikola CTE multi-statement
fnSplitCTE_nil Niikola CTE inline
fnSplitCTE_Somm Sommarskog CTE inline
fnSplitNUM_nik Niikola NUM multi-statement
fnSplitNUM_nil Niikola NUM inline
fnSplitNUM_onpnt onpnt NUM multi-statement
fnSplitNUM_emt Emtucifor(modification of onpnt's function) NUM multi-statement


Here are the functions:
Code is hidden, SHOW


Results are from my SQL Server 2005 Express installed on my laptop
testSplit table had 1,000 rows (comparing to 10,000 from previous test on test server)
Results are again averages from 10 executions.


Average ranking is next:
fnName           rank
---------------- ----
fnSplitCTE_Somm 1.15
fnSplitCTE_nil 2.30
fnSplitNUM_nil 3.25
fnSplitCTE_nik 3.50
fnSplitNUM_onpnt 4.85
fnSplitNUM_nik 5.95
fnSplitNUM_Emt 7.00
fnSplitNUM_Somm 8.00
-------------------- ----


CTE version Sommarskog is fastest by far.
NUM function Sommarskog performs slowest (by far, again) and it does not show last item if string finishes with delimiter.
CTE functions again performs better than NUM ones. The only one that has a bit better overall result than worst performing CTE is my inline version of NUM against my multi-statement version of CTE - but differences are minor. In fact, CTE performs better with smaller number of delimiters while NUM is better with 64 delimiters - especially in case of shorter strings.

And full results (in seconds):
--------- ----- ---------------------------
number of delimiters
fnName Size ---------------------------
8 16 32 64
--------- ----- ------ ------ ------ ------
CTE_Somm 256 0.23 0.40 0.77 1.36
CTE_Somm 512 0.25 0.41 0.75 1.43
CTE_Somm 1,024 0.25 0.43 0.81 1.51
CTE_Somm 2,048 0.29 0.49 0.90 1.73
CTE_Somm 4,096 0.35 0.59 1.07 2.03
--------- ----- ------ ------ ------ ------
CTE_nil 256 0.41 0.60 0.96 1.66
CTE_nil 512 0.62 0.80 1.18 1.90
CTE_nil 1,024 1.00 1.21 1.64 2.42
CTE_nil 2,048 1.78 2.02 2.47 3.34
CTE_nil 4,096 3.30 3.71 4.26 5.21
--------- ----- ------ ------ ------ ------
CTE_nik 256 0.58 0.83 1.30 2.29
CTE_nik 512 0.78 1.03 1.54 2.50
CTE_nik 1,024 1.16 1.47 1.98 3.00
CTE_nik 2,048 1.94 2.29 2.83 4.00
CTE_nik 4,096 3.54 3.91 4.57 5.92
--------- ----- ------ ------ ------ ------
NUM_nil 256 0.61 0.61 0.67 0.77
NUM_nil 512 0.88 0.94 1.02 1.15
NUM_nil 1,024 1.53 1.57 1.68 1.83
NUM_nil 2,048 2.73 2.85 3.03 3.26
NUM_nil 4,096 5.25 5.54 5.68 6.10
--------- ----- ------ ------ ------ ------
NUM_onpnt 256 3.71 3.69 3.80 4.04
NUM_onpnt 512 3.76 3.80 3.95 4.21
NUM_onpnt 1,024 4.00 4.01 4.12 4.46
NUM_onpnt 2,048 4.72 4.45 4.59 4.90
NUM_onpnt 4,096 5.17 5.31 5.50 5.84
--------- ----- ------ ------ ------ ------
NUM_nik 256 3.72 3.79 4.12 4.24
NUM_nik 512 3.88 3.98 4.12 4.43
NUM_nik 1,024 4.19 4.32 4.47 4.82
NUM_nik 2,048 4.88 4.99 5.17 5.56
NUM_nik 4,096 6.17 6.32 6.57 7.05
--------- ----- ------ ------ ------ ------
NUM_Emt 256 8.36 8.380 8.54 8.80
NUM_Emt 512 8.43 8.500 8.66 8.90
NUM_Emt 1,024 8.67 8.766 9.03 9.22
NUM_Emt 2,048 9.19 9.390 9.47 9.74
NUM_Emt 4,096 10.13 10.250 10.43 10.82
--------- ----- ------ ------ ------ ------
NUM_Somm 256 13.29 13.17 13.47 13.38
NUM_Somm 512 15.76 15.64 16.09 15.66
NUM_Somm 1,024 21.26 21.43 21.42 21.71
NUM_Somm 2,048 115.60 115.62 115.92 115.83
NUM_Somm 4,096 239.77 239.31 239.87 240.95
--------- ----- ------ ------ ------ ------


P.S. Edited one result as NUM_onpnt for 4,096 chars and 8 delimiters showed 7.22 because one of the executions were extremely long (something happened probably that is not related to the function). I calculated average of other 9 executions and it gives 5.17
Last edited by Niikola on Fri Sep 04, 2009 8:05 pm, edited 2 times in total.
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain
Unrated

Re: CTE version for split string function

Postby Emtucifor on Fri Sep 04, 2009 4:48 pm

what is fnSplitNUM_emt?
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA
Unrated

Re: CTE version for split string function

Postby SQLDenis on Fri Sep 04, 2009 5:07 pm

Emtucifor wrote:what is fnSplitNUM_emt?

original i emtucifor's modification from the last comment)


Look here at your comment http://blogs.lessthandot.com/index.php/ ... 5-clr-vs-t
User avatar
SQLDenis
LTD Admin
LTD Admin
LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467
LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467
LTD Gold - Rating: 3467LTD Gold - Rating: 3467LTD Gold - Rating: 3467
 
Posts: 21784
Joined: Wed Oct 10, 2007 6:43 pm
Location: Princeton, New Jersey, USA,World, Solar System, Milky Way, Universe and Beyond
Unrated

Re: CTE version for split string function

Postby Niikola on Fri Sep 04, 2009 6:04 pm

Emtucifor wrote:what is fnSplitNUM_emt?

http://blogs.lessthandot.com/index.php/ ... vs-t#c1061
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain
Unrated

Re: CTE version for split string function

Postby Emtucifor on Fri Sep 04, 2009 7:46 pm

It's not my function... I just fixed up his a bit. :)
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA
Unrated

Re: CTE version for split string function

Postby Niikola on Fri Sep 04, 2009 8:00 pm

Emtucifor wrote:It's not my function... I just fixed up his a bit. :)

Ok I'll rephrase my post
Smile...tomorrow will be worse
User avatar
Niikola
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 139LTD Bronze - Rating: 139LTD Bronze - Rating: 139
 
Posts: 169
Joined: Thu Jun 25, 2009 12:02 pm
Location: Barcelona, Spain
Unrated

Re: CTE version for split string function

Postby Naomi on Thu Sep 17, 2009 3:21 pm

Niikola wrote:I posted CTE version for split string function as a comment in
http://blogs.lessthandot.com/index.php/ ... 5-clr-vs-t

My test shows it's much faster than function that uses numbers table.
Here is the code again.
  1. CREATE FUNCTION fnSplitWithCTE
  2. (
  3.    @List VARCHAR(8000),
  4.    @Delimiter CHAR(1),
  5.    @ShowEmpty BIT
  6. )
  7. RETURNS @ret TABLE
  8. (
  9.    pos INT PRIMARY KEY,
  10.    VALUE VARCHAR(8000)
  11. )
  12. AS BEGIN
  13.    DECLARE @LEN INT
  14.    SET @LEN=LEN(@List)+1
  15.  
  16.    ;With a AS
  17.    (
  18.      SELECT 1 AS nStart,
  19.             isNull(NULLIF(CHARINDEX(@Delimiter,@List,1),0),@LEN) AS nEnd,
  20.             RTRIM(LTRIM(SUBSTRING(@List,1,isNull(NULLIF(CHARINDEX(@Delimiter,@List,1),0),@LEN)-1))) AS VALUE
  21.      UNION All
  22.      SELECT nEnd+1,
  23.             isNull(NULLIF(CHARINDEX(@Delimiter,@List,nEnd+1),0),@LEN),
  24.             RTRIM(LTRIM(SUBSTRING(@List,nEnd+1,isNull(NULLIF(CHARINDEX(@Delimiter,@List,nEnd+1),0),@LEN)-nEnd-1)))
  25.      FROM a
  26.      WHERE nEnd<@LEN
  27.    )
  28.    INSERT INTO @ret
  29.    SELECT Row_Number() OVER (ORDER BY nStart),
  30.           NULLIF(VALUE,'')
  31.     FROM a
  32.     WHERE (@ShowEmpty=1 or LEN(VALUE)>0)
  33.     RETURN
  34. END
  35.  


Do we need separate functions if we want to use it for numeric IDs passing, e.g. declare table
TABLE
(
pos INT PRIMARY KEY,
vaLUE Int)
Premature optimization is the root of all evil in programming.
Donald Knuth, repeating C. A. R. Hoare
Naomi
Senior Sage
Senior Sage
LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234LTD Bronze - Rating: 234
 
Posts: 1748
Joined: Thu Mar 26, 2009 5:46 am
Location: WI
Unrated

Re: CTE version for split string function

Postby Emtucifor on Thu Sep 17, 2009 6:16 pm

No, you don't need it, just convert the value to an int if you want, or use automatic type conversion.

Your new post made me look at this again and I had a little time. I came up with my own version of the CTE. (And this CTE method is pretty clever if you ask me, and not a surprise that Nigel Rivett was one of the first originators of it. I really enjoyed the time that he was active on a certain forum I was active on...)

After writing my CTE, I looked at Naomi's link to Sommarskog's site and found that my version is almost identical to his. Oh well. I did make a few tweaks after looking at that, but the basic idea is unchanged.

Compared to the version Niikola posted, this CTE rates about 55% of the CPU/duration and 80% of the reads.

  1. CREATE FUNCTION SplitString (
  2.     @List VARCHAR(MAX),
  3.     @Delimiter VARCHAR(MAX)
  4. ) RETURNS TABLE
  5. AS
  6. RETURN
  7.     WITH Tokens AS (
  8.         SELECT
  9.             StartPos = CONVERT(BIGINT, 1),
  10.             EndPos = CONVERT(BIGINT, CHARINDEX(@Delimiter, @List COLLATE Latin1_General_BIN2 + @Delimiter))
  11.         UNION ALL
  12.         SELECT
  13.             EndPos + 1,
  14.             CONVERT(BIGINT, CHARINDEX(@Delimiter, @List COLLATE Latin1_General_BIN2 + @Delimiter, EndPos + 1))
  15.         FROM Tokens
  16.         WHERE EndPos > 0
  17.     )
  18.     SELECT
  19.         Item = Row_Number() OVER (ORDER BY StartPos),
  20.         Token = SUBSTRING(@List, StartPos, EndPos - StartPos)
  21.     FROM Tokens
  22.     WHERE EndPos <> 0

Unfortunately, to call it with more than 100 tokens one has to add OPTION (MAXRECURSION 0) to any query using it. Another version of the function that uses a table variable and has this hint inside the function works fine but performs 15-20% worse.

Also, this function will handle strings longer than 8000 characters, and can handle more than 100 tokens. Niikola, for my comparisons I modified your function to do the same. It required converting the initial SELECT 1 to bigint as well as adding OPTION (MAXRECURSION 0).

I think that adding any options to the CTE such as to suppress blank values just hurts performance for no reason when those features aren't used. I also think that making the parameter list longer just makes the function harder to use--just (InputString, Delimiter) for me and I'll never forget it or have to look it up. Adding WHERE Token <> '' to a query using the function is no problem for me.

In the context of contract work, I like Nigel Rivett's preference of simply using the CTE inline rather than in a function, because as a contractor he wants to leave as small a footprint as possible. My speculations on some reasons for this are: 1) No need to give away expertise unnecessarily through exposure of techniques that would make usable building blocks for others. 2) Less dependence on requirements outside of the area worked in makes for fewer problems--imagine some admin seeing the function and dropping it, altering it, or changing permissions on it without realizing he was breaking something else. 3) As a contractor, the idea is Get In, Get It Done, Get Out. A quick copy and paste of the CTE split code from one's personal library is faster/smaller/quicker than creating a function. 4) Future maintenance won't require going to look at the function to ensure it still does what it's supposed to do and no one's changed it, it can all be seen in one place.

It's a bit like a chef carrying his own knives with him wherever he goes. Or like packing your own parachute and then not letting it out of your sight until you jump.

(Edit: Made code actually use the passed-in delimiter.)
Last edited by Emtucifor on Wed Sep 30, 2009 9:35 pm, edited 3 times in total.
God cries a little bit every time someone builds a database.
User avatar
Emtucifor
Guru
Guru
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033LTD Gold - Rating: 1033
LTD Gold - Rating: 1033
 
Posts: 2835
Joined: Fri May 30, 2008 9:30 pm
Location: Bellingham, WA

Re: CTE version for split string function

Postby Natas on Fri Sep 18, 2009 2:07 pm

what does this line do and why do you need it?

  1. COLLATE Latin1_General_BIN2
User avatar
Natas
Senior Apprentice
Senior Apprentice
LTD Bronze - Rating: 14
 
Posts: 224
Joined: Thu Sep 04, 2008 1:52 pm
Unrated

Next