VarChar VS. nVarChar

  • Thread starter Thread starter Jonathan Wood
  • Start date Start date
J

Jonathan Wood

I'm just curious how many people use the Unicode string types for their
data.

At this stage, it doesn't seem needed and takes up about twice the storage.
Am I better off using nVarChar, or is VarChar just fine?

Thanks.
 
Unless I was working on a really memory constrained system I would use NVarChar.
And if the system was that memory constratined, i probably wouldn't be using
SQL Server. So far I have never used varchar other than when I needed the
extra size for something, just to e safe, and didn't want the hastle of dealing
with 'text' and 'ntext' types. DB stored html pieces is one case.

If you have too much data then you could just use varchar, but I doubt that
is actually the case for you. Just think, with these damn 64 bit machiens,
all our ints are going to double in size too :(

Go with nvarchar unless there is a real reason not to.

// Andrew
 
This may be a bit dated, but in my "Inside SQL Server 7.0" book (1999) it
strongly advises using the least amount of storage in a row. The less you
store, the more rows you can fit in a page, so the fewer disk accesses you
need - these are the things that take time.

Unless I was working on a really memory constrained system I would use
NVarChar.
And if the system was that memory constratined, i probably wouldn't be
using
SQL Server. So far I have never used varchar other than when I needed the
extra size for something, just to e safe, and didn't want the hastle of
dealing
with 'text' and 'ntext' types. DB stored html pieces is one case.

If you have too much data then you could just use varchar, but I doubt that
is actually the case for you. Just think, with these damn 64 bit machiens,
all our ints are going to double in size too :(

Go with nvarchar unless there is a real reason not to.

// Andrew
 
Heh, well, I guess if the things I'm trying to figure out now are things
that "experts" have conflicting ideas about, then I must finally be starting
to pick this stuff up. :-)
 
Yeah, nvarchar opinions are like...

It's safer to use nvarchar, and I haven't seen it hurt my performance yet.
Theoretically you might need to read a few less pages, but I bet you could
get more bang for your buck by indexing properly any day.

Granted, huge amounts of data, or rows pushing the 8k limit, would be really
good reasons to not do it. As always, depends on your situation. So far
I haven't had any perf problems and I haven't needed to pick varchar, shorts,
bit fields, or do any particular squeezing. If I was concerned about the
volumne of data over the wire, that might be what pushed me, assuming I was
_guaranteed_ to never need the unicodeness later.

Also, ya can always change it =)
 
According to the "Inside SQL Server 7.0" book (1999), the smaller the data,
the more rows you can fit within one 8k page, so it's not just a case of
trying to fit a row into an 8k page, but fitting as many as possible into an
8k page.


Yeah, nvarchar opinions are like...

It's safer to use nvarchar, and I haven't seen it hurt my performance yet.
Theoretically you might need to read a few less pages, but I bet you could
get more bang for your buck by indexing properly any day.

Granted, huge amounts of data, or rows pushing the 8k limit, would be really
good reasons to not do it. As always, depends on your situation. So far
I haven't had any perf problems and I haven't needed to pick varchar,
shorts,
bit fields, or do any particular squeezing. If I was concerned about the
volumne of data over the wire, that might be what pushed me, assuming I was
_guaranteed_ to never need the unicodeness later.

Also, ya can always change it =)
 
If you know your application will never be internationalized, you can go
with either you feel comfortable with. As long as you are not trying to
insert Unicode, you should be fine. If the data can ever be
globalized/localized, you can bite the bullet later, but it is easier to go
ahead and do it now.

I am not overly worried about storage, personally, as storage is cheap.
There are some instances where I might be concerned, but I would use varchar
or nvarchar based on my needs, not size.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
 
Cowboy,
If you know your application will never be internationalized, you can go
with either you feel comfortable with. As long as you are not trying to
insert Unicode, you should be fine. If the data can ever be
globalized/localized, you can bite the bullet later, but it is easier to
go ahead and do it now.

There are no plans to go global but I know better than to say it could never
happen.

I guess the question is: how hard is it to change strings to nvarchar down
the road if needed? Actually, it seems like it would be only a matter of
modifying the table definitions and the data is automatically converted for
me. Do you know if there's more to it than that?
I am not overly worried about storage, personally, as storage is cheap.
There are some instances where I might be concerned, but I would use
varchar or nvarchar based on my needs, not size.

Storage is a little more expensive when it is on a shared hosting account,
and others have pointed out that there may be performance issues when data
is larger, but, yeah, that's probably not a major concern.

Thanks!
 
It's a big question, especially if you don't actually need Unicode for your
storage; for example if you are using only english and you never intent to
store anything else but english. (I won't enter the discussion of storage
for other latin languages such as french. Theoritically, you don't
necessarily need Unicode to store only french or a mix of french and
english. However, there is a big step between the theory and the practice
because of the mix of code pages but this discussion is probably not your
cup of tea.)

The current assumption made by many persons is that the use of Unicode
(nchar, nvarchar and ntext) instead of ASCII will double the size of your
database and cut in a half its performance. However, a lot of stuff stored
inside a database has nothing to do with text and won't be affected by the
use of Unicode. Typically, if you take a database and convert it to
Unicode, you will probably get something like a 30% increase in storage and
a performance decrease of about 10%. Of course, the exact mileage will vary
from database to database but these are the values that I got some years ago
when I made myself these tests.

Now, is this difference really important or not? Well, I would say that in
most cases, if you client is happy with the actual speed but will not be
happy if there is a 10% decrease, then anyway probably that he won't be
happy in a few months when the usual increase on load will happen (for
example by hiring a single new employee). If you can make any use of a
difference of 30% in size and 10% in speed, then probably that you already
in some big trouble anyway. (The only big difference in performance that I
know of is with the use of the operator LIKE with an argument *beginning*
with %, something like LIKE '%....%'. However, if you don't use it or use
it rarely then it's not really a concern at all.).

Think about it, the computer at your front itself is already using Unicode
for practically anything that you are seeing in its GUI. When was the last
time that you have heard someone complaining that the use of ANSI for the
GUI would make his computer going faster? (Maybe your computer is running
to slow at this moment but if this is the case, then it's probably because
you are using Vista instead of WinXP, not because you are using Unicode for
its display.)

One final thought: it's easy to get misguided by performing tests that are
not relevant to your situation. For example, if you have a database of size
20 Megs and don't use with it the operator LIKE '%...%'; then there is no
point to create a 10Gigs database and use it to test the operator LIKE
'%...%' in order to evaluate your own situation.

Finally, it's an error to say that it will be easy to convert a database to
Unicode, especially if its deployement has already be made to more than one
single location (what will you do if you have to manage the design of a
database using ANSI at some places and UNICODE at other places?) or if you
have to perform a lot of validation/steps before its deployment (a change to
Unicode will require that all validation tests to be redone again, as well
as a lot of other things (documentation, backup, shut down, etc.); things
that might not be possible to do.)
 
Sylvain,
The current assumption made by many persons is that the use of Unicode
(nchar, nvarchar and ntext) instead of ASCII will double the size of your
database and cut in a half its performance. However, a lot of stuff
stored inside a database has nothing to do with text and won't be affected
by the use of Unicode. Typically, if you take a database and convert it
to Unicode, you will probably get something like a 30% increase in storage
and a performance decrease of about 10%. Of course, the exact mileage
will vary from database to database but these are the values that I got
some years ago when I made myself these tests.

I've been using Unicode, but if those estimates are accurate, I'd consider
changing to ASCII. The increase in storage could be a concern, but the
decrease in performance is the issue for me.

I'm very new to ASP.NET and dynamic Websites. Things run fine when I'm the
only user. But I'm quite concerned about what happens if the site becomes
very successful.
Think about it, the computer at your front itself is already using Unicode
for practically anything that you are seeing in its GUI. When was the
last time that you have heard someone complaining that the use of ANSI for
the GUI would make his computer going faster? (Maybe your computer is
running to slow at this moment but if this is the case, then it's probably
because you are using Vista instead of WinXP, not because you are using
Unicode for its display.)

I'm not concerned about performance issues on the user's computer. I only
wonder what type of performance Unicode in a database causes on the server.
One final thought: it's easy to get misguided by performing tests that are
not relevant to your situation. For example, if you have a database of
size 20 Megs and don't use with it the operator LIKE '%...%'; then there
is no point to create a 10Gigs database and use it to test the operator
LIKE '%...%' in order to evaluate your own situation.

I don't currently have plans for implementing LIKE, but that could very well
be requested by the client as they discover they need to find stuff.
Finally, it's an error to say that it will be easy to convert a database
to Unicode, especially if its deployement has already be made to more than
one single location (what will you do if you have to manage the design of
a database using ANSI at some places and UNICODE at other places?) or if
you have to perform a lot of validation/steps before its deployment (a
change to Unicode will require that all validation tests to be redone
again, as well as a lot of other things (documentation, backup, shut down,
etc.); things that might not be possible to do.)

The whole point of writing an application as a Website is that I only need
to deploy the application (and database) to one location. I'm not sure what
type of validation you are talking about, but if I change the fields from
varchar to nvarchar in Visual Studio, the data appears to be silently
converted for me. Certainly, that seemed pretty easy.

Thanks.
 
Unless if you are already in the case of working in a near saturation level,
the decrease in performance should mean nothing to you. If, for example, the
CPU is actually working at a mean level of 30%, then increasing it to 33%
should go totally unnoticed. To that, you must add the overhead of the
network communication and of the application running on the client side.

If you are in one of those rare situations where your application is running
the CPU near its 100% for extended periods of time, then I don't know what
kind of suggestion you are expecting from a public newsgroup.

In the worst case scenario, wait one month before buying the hardware; the
increased power that you will get by waiting a few weeks should be
sufficient to cover the difference in speed and storage.

As for you application, I know nothing about it and about its requirements,
present and future; so I cannot tell anything about it. Simply saying that
it's a web application tell me nothing about its real requirements or if it
will be easy or not later to change from ANSI to Unicode. It's up to you to
make your own decision.
 
Sylvain,
If you are in one of those rare situations where your application is
running the CPU near its 100% for extended periods of time, then I don't
know what kind of suggestion you are expecting from a public newsgroup.

General SQL efficiency suggestions, perhaps?

Jonathan Wood
SoftCircuits Programming
http://www.softcircuits.com
 
If you are interested in studying general SQL efficiency then there a lot of
books already available about these questions and probably that most of
these books will go at a much greater depth than most of the answers that
you will get here.

For the rest, it won't be a bad idea to concentrate your thoughts on other
aspects of SQL-Server (query plans, indexes optimization, locking, etc.)
instead of doing a comparaison between varchar and nvarchar. There is much
more to gain by studying these questions than by questioning the use and
misuse of Unicode.

ANSI is a fossil that has climbing the evolutionary ladder up to the present
but whatever look you give it, it will still remain a fossil. Unless you
are an archeologist, you are losing your time studying fossils.
 
I'm currently reading a stack of books. Unfortunately, SQL is only one of
them.

This issue came up so I thought I'd ask here for some comments. To me, that
is the very purpose of these newsgroups--not just so someone can tell you to
go read books about it.
 
Yes, you can always squeeze more. But do you NEED to? Are you using smallints,
or making sure that you stuff all your bit flags into a single integer and
then parse them out yourself? I doubt it.

Go ahead and use nvarchar unless there is a REASON not to. Imagined performance
problems are probably not it.

From one of the most famous guys out ther:
"We should forget about small efficiencies, say about 97% of the time: premature
optimization is the root of all evil."
 
Here's something to read about saving space and the benefits of doing so.
It's for SQL Server 2000, but should be applicable to later versions. As
well as mentioning trying to use varchar instead of nvarchar, it has other
useful tips:
http://www.sql-server-performance.com/articles/per/saving_space_p1.aspx


I'm currently reading a stack of books. Unfortunately, SQL is only one of
them.

This issue came up so I thought I'd ask here for some comments. To me, that
is the very purpose of these newsgroups--not just so someone can tell you to
go read books about it.
 
Thanks. With the release of SQL Server 2008, I suspect some of the items
discussed there have changed. Still, it has me thinking: It's probably
better to use varchar unless you actually need nvarchar.
 
Back
Top