Uri problem/bug

  • Thread starter Thread starter Sunny
  • Start date Start date
S

Sunny

Hi all,
I have found a possible bug in the Uri class constructor.
When I make something like this:
test = new Uri(@"http://www.test.com/dir1/page1.html");

test2 = new Uri(test, @"../../page2.html");

in test2.AbsolutetPath I receive http://www.test.com/../page2.html.

I know that you can not go before the /test/ dir, and that why there is
strange result. And if you try to use test2 ageist a website, it crashes,
while if you open such a page (page1.htm) in a browser (IE) and click on
such a link, it navigates to http://test.com/page2.html. So I'm wondering
....

And yes, there is such a page (not mine) which I have to parse.

I am checking for such a links and replace "../" with "", but maybe this
have to be reposted somewhere as a possible bug.



Sunny
 
I know that this is not a valid construct ( as far as ".." is a pointer to
the upper dir in most OSes, and for sure for all web servers).

I have to parse an html page (lets say page1.html from the example), in
which, because of bad (?) design maybe, there is "<img
src="../../image.gif". Even then, in IE and Netscape, the image is
displayed, but if I use Uris' to prepare the whole URL from the base page, I
receive the result I post. And then I can not use directly
imgUri.AbsolutePath to download the image.
As far as I'm concerned of that problem, I just check the result URL and
correct it by removing "../", but I just wanted to know where is the problem
for this different behavior in Uri class and browser.
I have posted this just to start a discussion in that direction. I can
assume that there are not so many so bad coded web pages, but ...

Thanks for reading all this
Sunny

P.S. I try to download the image (if it does matter) with:
System.Net.WebClient source = new System.Net.WebClient();

Stream myData = null;

myData = source.OpenRead(sUrl);

byte[] buffer = new byte[4096];

int br = buffer.Length;

while (br == buffer.Length)

br = myData.Read(buffer, 0, buffer.Length);



Actually my code breaks in OpenRead method.

Sunny
 
Hi Sunny,

I have made a test on my machine and found that
<img src="../../image.gif"
if current path is in the root of the website,(usually the root is the
wwwroot directory), then the ../../image.gif will be parsed to image.gif.
that is to say, in the <img src="../../image.gif"> when the current path
is at the root of the website, then the ../ which used to get the upper
directory will be skipped.
So when the current directory is root of the website, e.g.
http://localhost/test.htm
then the <img src="../../image.gif"> section in the test.htm will equal
to <img src="../image.gif"> as well as <img src="image.gif">.
I suggest when you parse the htm file, you need to get he current path to
see if it is the root of the website, in such case , you may need to
neglect the ../ as I discussed above.

Did I misunderstand your meaing?
I look forward to hearing from you.


Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
--------------------
Subject: Re: Uri problem/bug
Date: Sun, 7 Sep 2003 22:59:41 -0500
Lines: 92
MIME-Version: 1.0
Content-Type: text/plain;
charset="koi8-r"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
Message-ID: <[email protected]>
Newsgroups: microsoft.public.dotnet.languages.csharp
NNTP-Posting-Host: c-66-41-159-114.mn.client2.attbi.com 66.41.159.114
Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP09.phx.gbl
Xref: cpmsftngxa06.phx.gbl microsoft.public.dotnet.languages.csharp:183078
X-Tomcat-NG: microsoft.public.dotnet.languages.csharp

I know that this is not a valid construct ( as far as ".." is a pointer to
the upper dir in most OSes, and for sure for all web servers).

I have to parse an html page (lets say page1.html from the example), in
which, because of bad (?) design maybe, there is "<img
src="../../image.gif". Even then, in IE and Netscape, the image is
displayed, but if I use Uris' to prepare the whole URL from the base page, I
receive the result I post. And then I can not use directly
imgUri.AbsolutePath to download the image.
As far as I'm concerned of that problem, I just check the result URL and
correct it by removing "../", but I just wanted to know where is the problem
for this different behavior in Uri class and browser.
I have posted this just to start a discussion in that direction. I can
assume that there are not so many so bad coded web pages, but ...

Thanks for reading all this
Sunny

P.S. I try to download the image (if it does matter) with:
System.Net.WebClient source = new System.Net.WebClient();

Stream myData = null;

myData = source.OpenRead(sUrl);

byte[] buffer = new byte[4096];

int br = buffer.Length;

while (br == buffer.Length)

br = myData.Read(buffer, 0, buffer.Length);



Actually my code breaks in OpenRead method.

Sunny


Greg Ewing said:
Sunny, when you create test2 you are creating it with the following URL:

http://www.test.com/dir1/../../page2.html

which is really

http://www.test.com/../page2.html (the first .. goes down 1 dir, eliminating
the dir1 ref)

So, the URI class is returning the correct AbsolutePath. Does that make
sense? If you can give some more details about what you are trying to do
I'm sure we could help you here.
 
v- said:
Hi Sunny,

I have made a test on my machine and found that
<img src="../../image.gif"
if current path is in the root of the website,(usually the root is the
wwwroot directory), then the ../../image.gif will be parsed to image.gif.
that is to say, in the <img src="../../image.gif"> when the current path
is at the root of the website, then the ../ which used to get the upper
directory will be skipped.
So when the current directory is root of the website, e.g.
http://localhost/test.htm
then the <img src="../../image.gif"> section in the test.htm will equal
to <img src="../image.gif"> as well as <img src="image.gif">.
I suggest when you parse the htm file, you need to get he current path to
see if it is the root of the website, in such case , you may need to
neglect the ../ as I discussed above.

Did I misunderstand your meaing?
I look forward to hearing from you.


Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
--------------------


Hi Peter,
Yes, you have understood me correctly. There is no problem at all, as I
check for that (../../).
The main point of my original post was that the way web browsers and Uri
constructor deals with that issue are different (and the way browsers do
it, is the right one). I thing that this is a possible bug in the Uri
class, as far as most of OSystems use ".." to point to the upper dir.
So I can no accept that new Uri(baseuri, sPath) can return an uri which
likes "http://test.com/../image.gif" and you have by yourself to check
for this.
I have started this thread only as a warning to other developers, and to
MS team, if they read this group.

And if someone thinks that Uri SHOULD act like this, because of any
reason, I'd like to hear it.

Thanks
Sunny
 
Hi Sunny,

http://test.com/../image.gif
the uri you posted will not work in an IE browser either, it seems that the
uri is parsed from the html file, isn't it?
I think the IE browser is for compatibility concern. Since there are many
links on the web that will not work, that was why browser must be strong
compatibility.
Since the http://test.com/../image.gif doesn't not work in the ie, but the
<img src="../image.gif> works, it is all because of the compatibility.
But to uri class, it can not guarantee that the http://test.com/image.gif
will work, then it didn't not check if it is necessary to skip "../" when
encounter the root.


Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
--------------------
From: Sunny <[email protected]>
Subject: Re: Uri problem/bug
Date: Mon, 8 Sep 2003 10:28:59 -0500
Message-ID: <[email protected]>
References: <#[email protected]>
<[email protected]>
 
Hi Sunny,

I think the behavior is by design.
In the RFC specification, it said.
e) All occurrences of "<segment>/../", where <segment> is a
complete path segment not equal to "..", are removed from the
buffer string. Removal of these path segments is performed
iteratively, removing the leftmost matching pattern on each
iteration, until no matching pattern remains.

f) If the buffer string ends with "<segment>/..", where <segment>
is a complete path segment not equal to "..", that
"<segment>/.." is removed.

For more information, please refer to the link below
http://www.ietf.org/rfc/rfc2396.txt

Regards,
Peter Huang
Microsoft Online Partner Support
Get Secure! www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.

--------------------
From: Sunny <[email protected]>
Subject: Re: Uri problem/bug
Date: Tue, 9 Sep 2003 09:44:16 -0500
Message-ID: <[email protected]>
References: <#[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
 
Back
Top