French characters in query string

  • Thread starter Thread starter darin dimitrov
  • Start date Start date
D

darin dimitrov

Hello,

How can I convert an url encoded string containing some french
characters back to the original string?

I have the following html form:
<form>
name = <input type="text" name="name" value="Noël Pirès"/>
<input type="submit" value="OK"/>
</form>

When I submit this form the following data is sent at protocol level
using HTTP GET: name=No%EBl+Pir%E8s

I tried converting it with the HttpUtility.UrlDecode method in order
to obtain the original string but with no success. This method
returned
"name=Nol Pirs" and the special french characters are omitted. What
function should I use in order to decode such a string.

In fact I would like to implement the following: I have a text file
containing the urlencoded string and in a console application I want
to make the conversion, something like:

------------------------
FileStream fin = new FileStream(@"c:\test.txt", FileMode.Open,
FileAccess.read);
byte[] buf = new byte[fin.Length];
fin.Read(buf, 0, (int)buf.Length);
string s = Encoding.Default.GetString(buf);

s = HttpUtility.UrlDecode(s);
System.Console.Write(s);
 
darin dimitrov said:
How can I convert an url encoded string containing some french
characters back to the original string?

As far as I know, there's not much in the way of standardisation for
what URL encodings mean above the top of ASCII. The most robust way of
proceeding, IMO, is to use post data instead.
 
Jon Skeet said:
As far as I know, there's not much in the way of standardisation for
what URL encodings mean above the top of ASCII. The most robust way of
proceeding, IMO, is to use post data instead.


I will post the solution I found myself in case that someone else
might be interested:

The problem was that when I use the HttpUtility.UrlDecode methode
there is no way to specify the encoding, so instead I used the
HttpUtility.UrlDecodeToBytes method like:

<code>
string s = "name=No%EBl+Pir%E8s";
byte[] binaryData = HttpUtility.UrlDecodeToBytes(s,
Encoding.Default);
s = Encoding.Default.GetString(binaryData);
</code>

Now s = "name=Noël Pirès"
 
darin dimitrov said:
As far as I know, there's not much in the way of standardisation for
what URL encodings mean above the top of ASCII. The most robust way of
proceeding, IMO, is to use post data instead.

I will post the solution I found myself in case that someone else
might be interested:

The problem was that when I use the HttpUtility.UrlDecode methode
there is no way to specify the encoding, so instead I used the
HttpUtility.UrlDecodeToBytes method like:

<code>
string s = "name=No%EBl+Pir%E8s";
byte[] binaryData = HttpUtility.UrlDecodeToBytes(s,
Encoding.Default);
s = Encoding.Default.GetString(binaryData);
</code>

Now s = "name=Noël Pirès"

That's fine so long as you know that the client was using the same
encoding - but do you?
 
Jon Skeet said:
darin dimitrov said:
As far as I know, there's not much in the way of standardisation for
what URL encodings mean above the top of ASCII. The most robust way of
proceeding, IMO, is to use post data instead.

I will post the solution I found myself in case that someone else
might be interested:

The problem was that when I use the HttpUtility.UrlDecode methode
there is no way to specify the encoding, so instead I used the
HttpUtility.UrlDecodeToBytes method like:

<code>
string s = "name=No%EBl+Pir%E8s";
byte[] binaryData = HttpUtility.UrlDecodeToBytes(s,
Encoding.Default);
s = Encoding.Default.GetString(binaryData);
</code>

Now s = "name=No l Pir s"

That's fine so long as you know that the client was using the same
encoding - but do you?


In my particular case I know the encoding that my clients are using,
but I agree that this is not a general solution. I would appreciate
any better suggestions.
 
darin dimitrov said:
In my particular case I know the encoding that my clients are using,
but I agree that this is not a general solution. I would appreciate
any better suggestions.

As I said before, a much better way of passing data is in the body of
the request, as that has an encoding associated with it.
 
Back
Top