TDS and character encoding

  • Thread starter Thread starter raymond_b_jimenez
  • Start date Start date
R

raymond_b_jimenez

I've seen a dump of the TDS traffic going from my webserver to the SQL
Server database and it seems encoded in Unicode (it has two bytes per
char). Seems it would have a huge impact on performance if it
travelled in one byte. Why might this be?

rj
 
I've seen a dump of the TDS traffic going from my webserver to the SQL
Server database and it seems encoded in Unicode (it has two bytes per
char). Seems it would have a huge impact on performance if it
travelled in one byte. Why might this be?

I have never eavesdropped on TDS, but Unicode is indeed the character
set of SQL Server. You are perfectly able to name your tables in
Cyrillic or Hindi characters if you feel like. And of course character
strings may include all sorts of characters. So an batch of SQL statement
that is sent over the wire must be Unicode. That is beyond dispute.

However, you don't encode something in Unicode. Unicode is the character
set, and there are several encodings available, of which the most popular
are UTF-16 and UTF-8. In UTF-8 each character in the base plane takes up
2 bytes, and characters beyond that takes up 4 bytes. (The base plane
covers the vast majority of living langauges). In UTF-8, ASCII characters
takes up one byte, other characters in the Latin, Greek and Cyrillic
script takes two bytes, and Chinese and Japanese characters takes up three
bytes.

SQL Server uses UTF-16 exclusively. It is true that for network traffic
in the western world, it would be more effective if TDS used UTF-8, but
as you can see that it is necessarily the case in the Far East. And had
TDS used UTF-8, both ends of the wire would have had to convert to
UTF-16, so any reduced network traffic could be eaten up by extra CPU
time.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Snooping into the TDS would be the very last place I would look when trying
to improve performance. It would be like polishing a clean mirror to remove
one's zits.

--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant, Dad, Grandpa
Microsoft MVP
INETA Speaker
www.betav.com
www.betav.com/blog/billva
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
Visit www.hitchhikerguides.net to get more information on my latest book:
Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition)
and Hitchhiker's Guide to SQL Server 2005 Compact Edition (EBook)
-----------------------------------------------------------------------------------------------------------------------
 
Well William, that is clearly not the case where you have a REAL
database with REAL traffic. When I mean REAL, I mean a 25Mbps stream
between the IIS servers and SQL Server... Getting away from about
10Mbps of unneeded traffic does not seem like polishing to me...
I can guarantee you that this is having serious impact on performance,
and when you're digging really into it (things like TCP/IP slow-
starts...), you really get to know why it's huge impact for the
client, the DB server and performance.
rj
 
Given that SQL Server has the highest TPC-E benchmarks in the industry,
don't you think that the SQL Server team has made the TDS stream as
efficient as possible? IMHO, it's not the line protocol or the lowest layers
of the interface that should be the focus of performance tuning, but the
applications, database designs and query methodologies that should dominate
your attempts to improve throughput and scalibility. Reducing the traffic on
the TDS channel will go a long way to improving performance if you have to
move that much volume over the wire to make a difference.

SQL Server Holds Record for TPC-E Database Benchmark
by Brian Moran, (e-mail address removed)

SQL Server now holds every conceivable world record for the TPC-E database
benchmark. That news would be slightly more impressive if TPC-E scores
existed for any database besides SQL Server, but heck, winning a race with
just one runner doesn't mean that runner did a bad job. I first wrote about
TPC-E, the latest benchmark from the Transaction Processing Performance
Council, in my commentary "TPC's New Benchmark Strives for Realism," October
2006, InstantDoc ID 93955.

Microsoft became the first database vendor to have a published TPC-E result
when Unisys published a TPC-E score on July 12 using SQL Server 2005 on a
dual-core 16-processor ES7000. IBM followed suit with a dual-core
2-processor server two weeks later, and Dell posted a dual-core 4-processor
result on August 24. Both IBM's and Dell's results used SQL Server, so SQL
Server is currently the only database vendor listed, meaning SQL Server
currently holds all the top scores. Sane vendors don't post TPC-E scores
that make them look bad, but I suspect it's only a matter of time before IBM
and Oracle post TPC- E scores for their database products that leapfrog the
latest SQL Server scores, which will in turn be bested by Microsoft in the
never-ending game of benchmark leapfrog.
Read the full article at:
http://lists.sqlmag.com/t?ctl=642B5:CC6CF972C3DECB8DA4B50D3688BDE645



--
____________________________________
William (Bill) Vaughn
Author, Mentor, Consultant
Microsoft MVP
INETA Speaker
www.betav.com/blog/billva
www.betav.com
Please reply only to the newsgroup so that others can benefit.
This posting is provided "AS IS" with no warranties, and confers no rights.
__________________________________
Visit www.hitchhikerguides.net to get more information on my latest book:
Hitchhiker's Guide to Visual Studio and SQL Server (7th Edition)
and Hitchhiker's Guide to SQL Server 2005 Compact Edition (EBook)
-----------------------------------------------------------------------------------------------------------------------
 
Well William, that is clearly not the case where you have a REAL
database with REAL traffic. When I mean REAL, I mean a 25Mbps stream
between the IIS servers and SQL Server... Getting away from about
10Mbps of unneeded traffic does not seem like polishing to me...
I can guarantee you that this is having serious impact on performance,
and when you're digging really into it (things like TCP/IP slow-
starts...), you really get to know why it's huge impact for the
client, the DB server and performance.

Rather than blaming TDS, maybe you should look into trimming the
application. TDS is not going to change.

First step is to analyse what is making up those 25 Mbps. Is it SQL
commands? Or is data? SQL batches are, as I discussed in my previous post,
Unicode by necessity. Data is another matter.

As I said, I have not eavesdropped on TDS, but I would execpt varchar
data to be sent as bytes. That is, the value 'character' would be eleven
bytes on the wire. On the other hand, the value N'character' would be
20 bytes. And of course, metadata goes as Unicode.

Now, what can you do to reduce the amount the network traffic? If you
feel that you don't need Unicode, use varchar for you character data
and not nvarchar. (But keep in mind that the day when you need to support,
say, Japanese may be closer in time than you think.) But most of all,
trim your result sets from unneeded columns. Make sure that there are
not a lot of "SELECT *" in your queries, and that you don't retrieve
rows you don't need.

Furthermore, network traffic is not only about bytes, but also about
roundtrips. Don't get the details of the order, and then make one
call for each product on the order, but get all data at once.

And, yes, while you would have seen a gross cut if TDS was UTF-8 on
the wire and not UTF-16, a Chinese user would have seen an increase
instead.

--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
In addition to agreement with others statements about refactoring the
application to reduce network traffic, I proffer the following: lets assume
that you are an Amazon.com wannabee and you really DO have 25Mbps
steady-state bandwidth needs between IIS and SQL Server. That gives you
have almost THREE orders of magnitude before running out of currently
available network capacity (10GB ethernet). Even cheapo 1000MB ethernet
gives a huge amount of headroom to grow into.

BTW, I have been a SQL Server database consultant for 10+ years. I have
NEVER YET been to a client that had a fully optimized database application,
and most of them were HORRID from a performance standpoint. :-))
 
Rather than blaming TDS, maybe you should look into trimming the
application. TDS is not going to change.
BTDT. Especially with Profiler, which is a great tool.
First step is to analyse what is making up those 25 Mbps. Is it SQL
commands? Or is data? SQL batches are, as I discussed in my previous post,
Unicode by necessity. Data is another matter.
Queries and replies. Replies are in the 3-4 packets range. Lots of
stored procedures of course.
As I said, I have not eavesdropped on TDS, but I would execpt varchar
data to be sent as bytes. That is, the value 'character' would be eleven
bytes on the wire. On the other hand, the value N'character' would be
20 bytes. And of course, metadata goes as Unicode.
Don't accept anything for granted. Seeing is a completely different
experience.
Now, what can you do to reduce the amount the network traffic? If you
feel that you don't need Unicode, use varchar for you character data
and not nvarchar. (But keep in mind that the day when you need to support,
say, Japanese may be closer in time than you think.) But most of all,
trim your result sets from unneeded columns. Make sure that there are
not a lot of "SELECT *" in your queries, and that you don't retrieve
rows you don't need. BTDT

Furthermore, network traffic is not only about bytes, but also about
roundtrips. Don't get the details of the order, and then make one
call for each product on the order, but get all data at once.
You're absolutely correct! That's why I got another thread going on
about "SET NO_BROWSETABLE ON". This one seems particularly annoying,
as it seems it is getting inserted automatically by ADO.Net. Accounts
for about 20% of the data traffic, and one additional round trip do
the DB server for each query.
And, yes, while you would have seen a gross cut if TDS was UTF-8 on
the wire and not UTF-16, a Chinese user would have seen an increase
instead.
Wouldn't it be great to have an option?

rj
 
In addition to agreement with others statements about refactoring the
application to reduce network traffic, I proffer the following: lets assume
that you are an Amazon.com wannabee and you really DO have 25Mbps
steady-state bandwidth needs between IIS and SQL Server. That gives you
have almost THREE orders of magnitude before running out of currently
available network capacity (10GB ethernet). Even cheapo 1000MB ethernet
gives a huge amount of headroom to grow into.
While all that is true, you have several other factors pounding you
with these amounts of traffic. Name two: TCP/IP slow-start and
interrupts on the receiving servers. Last one can be solved with
better network cards, but mines are already offloading work on the
hardware.

rj
 
Given that SQL Server has the highest TPC-E benchmarks in the industry,
don't you think that the SQL Server team has made the TDS stream as
efficient as possible? IMHO, it's not the line protocol or the lowest layers
of the interface that should be the focus of performance tuning, but the
applications, database designs and query methodologies that should dominate
your attempts to improve throughput and scalibility. Reducing the traffic on
the TDS channel will go a long way to improving performance if you have to
move that much volume over the wire to make a difference.
Don't know a lot about TPC-E benchmarks. Are they measured over a
network?

rj
 
Given that SQL Server has the highest TPC-E benchmarks in the industry,
don't you think that the SQL Server team has made the TDS stream as
efficient as possible? IMHO, it's not the line protocol or the lowest layers
of the interface that should be the focus of performance tuning, but the
applications, database designs and query methodologies that should dominate
your attempts to improve throughput and scalibility. Reducing the traffic on
the TDS channel will go a long way to improving performance if you have to
move that much volume over the wire to make a difference.
Don't know a lot about TPC-E benchmarks. Are they measured over a
network?

rj
 
You're absolutely correct! That's why I got another thread going on
about "SET NO_BROWSETABLE ON". This one seems particularly annoying,
as it seems it is getting inserted automatically by ADO.Net. Accounts
for about 20% of the data traffic, and one additional round trip do
the DB server for each query.

And as I've noted in another thread, SqlClient does not do this when
you use it plainly. Run a plain ExecuteReader or DataAdapter.Fill and
you will not see it. There was a link posted that lead to an article
pointing out the problem with the CommandBuilder.

So that is one more thing to look into. Exactly what in your usage
of ADO .Net (and you still have not told us which data provider you
are using) triggers the usage of SET NO_BROWSETABLE ON.
Wouldn't it be great to have an option?

If you thinks, submit this suggest on
http://connect.microsoft.com/SqlServer/Feedback.
Personally, I don't think it is worth the pain, also it would also
require changes in the client APIs. And all it would affect is query
batches sent to SQL Server and metadata sent back. If the query batches
sent to SQL Server is killing your network, maybe you should look into
using stored procedures.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Don't know a lot about TPC-E benchmarks. Are they measured over a

Database benchmarks are typically done with a dedicated database server and
remote client(s). You can download the results disclosure reports from
http://www.tpc.org/tpce/tpce_perf_results.asp to get details of the actual
configurations used. Looking at the specs of the network gear, it doesn't
look to me like the benchmark sponsors were too concerned about network
performance.

I agree with the others in this thread that the application and database
design are by far the biggest contributing factors to overall performance.
A little common sense, like filtering data on the server rather than the
client, goes a long way towards improving scalability and performance.
 
Erland said:
If you thinks, submit this suggest on
http://connect.microsoft.com/SqlServer/Feedback.
Personally, I don't think it is worth the pain, also it would also
require changes in the client APIs. And all it would affect is query
batches sent to SQL Server and metadata sent back. If the query batches
sent to SQL Server is killing your network, maybe you should look into
using stored procedures.

Thinking of it, rather than having to select the character encoding, it's
better if the option was for compression of the network traffic in general.

But I find it difficult to believe that this would be a good option
for the traffic between a web server and an SQL Server that are on
the same LAN. It could possibly be an option if you are on a slow connection
over VPN. In general, I have a feeling that the network considerations
for SQL Server are for LAN connections, because that is surely the most
common scenario.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
Now, what can you do to reduce the amount the network traffic? If you
feel that you don't need Unicode, use varchar for you character data
and not nvarchar. (But keep in mind that the day when you need to support,
say, Japanese may be closer in time than you think.) But most of all,
trim your result sets from unneeded columns. Make sure that there are
not a lot of "SELECT *" in your queries, and that you don't retrieve
rows you don't need.

Furthermore, network traffic is not only about bytes, but also about
roundtrips. Don't get the details of the order, and then make one
call for each product on the order, but get all data at once.

Pardon me for interjecting here Erland, I am not disagreeing with you but
agreeing, but this subject seems hard to tackle.
I have thought that network packet traffic contributes a lots, you ideally
want "package" as many results as you can into a packet (I mean this is
Nagle's algorithm) but I see precious little written on this subject. I see
nothing in Microsoft Whitepapers or anything else.

For example if I make a straight SELECT off a table for a Forward, ReadOnly
Cursor iusing ADO, what is the ideal Cache Size for maximum throughput?
It seems to me that it ought be Network Packet Size / Size of record. I am
sure there are some overhead bytes.
It ought to be determinable rather than just empirical guess work.
But how?

Thanks

Stephen Howe
 
And as I've noted in another thread, SqlClient does not do this when
you use it plainly. Run a plain ExecuteReader or DataAdapter.Fill and
you will not see it. There was a link posted that lead to an article
pointing out the problem with the CommandBuilder.

Can you get classic ADO and SQLOLEDB in combination not to do that?
Thanks

Stephen Howe
 
In addition to agreement with others statements about refactoring the
application to reduce network traffic

Sure. Dont disagree. I could arrange that I get 5 records back from a SP
query instead of 1 record back per query.
And it may well be that 5 records fit in a network packet.
That is 1 round-trip rather than 5 round-trips.
But it could be that I squeeze 10 records or 20 records etc.
How do we determine this without endlessly going round some design-cycle
trying magic numbers which immediately alter if the fields in the query
alters?

Thanks

Stephen Howe
 
Stephen said:
Can you get classic ADO and SQLOLEDB in combination not to do that?

Yes, don't use it.

Seriously, it's is very difficult to get ADO only what it supposed to
and not a lot more. For instance, say that you set up a Command object
to call a stored procedure with a number of parameters. The first call
is clean, just a direct RPC call. But if you keep the object to call
the procedure again with different parameters, ADO now issues SET FMTONLY
ON to get some information about the result sets. And if you think that
NO_BROWSETABLE is bad, you should know that FMTONLY causes SQL Server to
sift through the statements in the procedure in a sort of half-NOEXEC
mode.

I did some quick testing with a some sort of general test app that I have
for ADO, and it appears that you get the browsetable stuff when you
request a lock type other than ReadOnly. If you perform your updates
through stored procedures, I see no reason why you should use anything
but ReadOnly and implement optimistic locking yourself.


--
Erland Sommarskog, SQL Server MVP, (e-mail address removed)

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
 
IMHO, with all other things being equal designing for fewer round trips to
the server usually equals better scalability and usually better performance
as well. Note that these two things are NOT synonymous and can often be at
odds with each other - especially on the high end of each spectrum.

Performance analysis and tuning is a VERY complex and complicated process
when you are really trying to optimize an entire application. There are
HUGE numbers of moving parts and parameters within each subsystem. Usually,
however, an expert at this realm of problem solving can use knowledge,
experience and his/her toolset to quickly isolate the most egregious
offenders and point the dev/hardware team to the sections that are either
"low-hanging fruit" or will give the biggest bang for the buck.
 
Can you get classic ADO and SQLOLEDB in combination not to do that?
Yes, don't use it.

Seriously, it's is very difficult to get ADO only what it supposed to
and not a lot more.

Well it is a stupid state of affairs that Microsoft invest 7 years in ADO -
and even now we dont fully know what it does - there is a heck of a lot
undocumented or poorly documented.

I am amazingly tired of playing the game, "ooohh, you have discovered some
flaws in our existing mature technology - why dont you shift to our brand
technology which does not have that flaw?".
Because it is only a matter of time before someone discovers that ADO.NET
has flaws and how long will it be before MS declares ADO.NET and .NET
legacy, they are bored with it, and start creating other new "cool
technology"? And then we are back to the same cycle. It is a crap state of
affairs.

There are limited ways of accessing databases - rowsets, SP's, input/output
parameters - why cant they sort it out so that if there are, say, 6
different methods, say, of retrieving data - we all know the overhead of
using each method - nothing is left undocumented - we all know under what
circumstances SET FMTONLY ON, NO_BROWSETABLE (which I have seen before - 4
years ago) are issued and why.

Does anyone really believe with ODBC, DAO, RDO, ADO, ADO.NET - each
generation has been an improvement?
They are all interfaces on the same set of database technology - which does
not change.

Stephen Howe
 
Back
Top