FileUtils.normalize/isLeadingPath have bitten me

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
Hi all

while looking into https://bz.apache.org/bugzilla/show_bug.cgi?id=62502
I realized that I had a false expectation of what normalize would do in
a certain edge case.

If you feed into it a path with more "../" segments than can be
travelled up, like

FileUtils.normalize("/tmp/dest/../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../Temp/evil.txt")

it will realize it would go outside of the file system root and returns
a new File instance with the original path. I'm not sure what I had
expected (an exception maybe) but this is now what I had assumed, my
fault.

Then isLeadingPath calls normalize for both its arguments and ends up
seeing "/tmp/dest/" is a prefix of
"/tmp/dest/../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../Temp/evil.txt"
and unzip allows the file to escape.

It seems to depend on the OS what it makes of the path, on Linux I
receive an exception but apparently Windows just swallows the excess ../
segments.

I'm not 100% sure how to fix this properly.

Is normalize doing the right thing? If so, we need to fix isLeadingPath
for example by simply always returning false if either normalized path
contains "../" segments (because that means the path cannot exist on
disk at all).

Or should the behavior of normalize change? This unit test snippet

        assertEquals("will not go outside FS root (but will not throw an exception either)",
                new File(localize("/1/../../b")),
                FILE_UTILS.normalize(localize("/1/../../b")));

makes me think we better leave it as is as it seems to be by design (and
I just have forgotten about it).

In either case, once we agree on the fix I propose to cut new releases
immediately.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Jaikiran Pai
I don't have enough background on the expectations of the normalize
method, so can't really say how it should behave. Plus, it gets used in
relatively larger number of places as compared to isLeadingPath, so not
too sure how it might impact other places if we do change its existing
semantics.

However, looking at the FileUtils#isLeadingPath(...) implementation, I
wonder why it even uses normalize. Given that the goal of that API (as
stated in the javadoc) is to figure out if one path leads the other, to
me that translates to being a check to see whether the "leading" param
to that method is a parent of the "path". I suppose that can be
implemented by using the java.io.File#getCanonicalFile() call on each of
the params and then doing a iterative getParent() call on the canonical
File returned for the "path" and seeing if it ends up leading to the
canonical File returned for "leading". The call to
java.io.File#getCanonicalFile() should take care of the ".", ".." and
even symbolic link reference resolutions (which I guess is a good
thing?). Do you think this has merits? Or is the expectation of the
isLeadingPath API solely based on the names of that passed files rather
than their actual resolved location on the filesystem? I haven't yet
tried it out myself to have a more clearer understanding of how it will
end up behaving in the context that we use this API.

-Jaikiran


On 28/06/18 7:17 PM, Stefan Bodewig wrote:

> Hi all
>
> while looking into https://bz.apache.org/bugzilla/show_bug.cgi?id=62502
> I realized that I had a false expectation of what normalize would do in
> a certain edge case.
>
> If you feed into it a path with more "../" segments than can be
> travelled up, like
>
> FileUtils.normalize("/tmp/dest/../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../Temp/evil.txt")
>
> it will realize it would go outside of the file system root and returns
> a new File instance with the original path. I'm not sure what I had
> expected (an exception maybe) but this is now what I had assumed, my
> fault.
>
> Then isLeadingPath calls normalize for both its arguments and ends up
> seeing "/tmp/dest/" is a prefix of
> "/tmp/dest/../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../../Temp/evil.txt"
> and unzip allows the file to escape.
>
> It seems to depend on the OS what it makes of the path, on Linux I
> receive an exception but apparently Windows just swallows the excess ../
> segments.
>
> I'm not 100% sure how to fix this properly.
>
> Is normalize doing the right thing? If so, we need to fix isLeadingPath
> for example by simply always returning false if either normalized path
> contains "../" segments (because that means the path cannot exist on
> disk at all).
>
> Or should the behavior of normalize change? This unit test snippet
>
>          assertEquals("will not go outside FS root (but will not throw an exception either)",
>                  new File(localize("/1/../../b")),
>                  FILE_UTILS.normalize(localize("/1/../../b")));
>
> makes me think we better leave it as is as it seems to be by design (and
> I just have forgotten about it).
>
> In either case, once we agree on the fix I propose to cut new releases
> immediately.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
On 2018-06-28, Jaikiran Pai wrote:

> However, looking at the FileUtils#isLeadingPath(...) implementation, I
> wonder why it even uses normalize. Given that the goal of that API (as
> stated in the javadoc) is to figure out if one path leads the other,
> to me that translates to being a check to see whether the "leading"
> param to that method is a parent of the "path". I suppose that can be
> implemented by using the java.io.File#getCanonicalFile() call on each
> of the params and then doing a iterative getParent() call on the
> canonical File returned for the "path" and seeing if it ends up
> leading to the canonical File returned for "leading". The call to
> java.io.File#getCanonicalFile() should take care of the ".", ".." and
> even symbolic link reference resolutions (which I guess is a good
> thing?).

I think I have written that method long ago so I should be able to
answer that. Unfortunately I'm not.

In may cases we did not want to resolve canonical paths. I think the
expectation is that for

/dir
  /dir2
  /dir3
     link -> /dir/dir2

isLeadingPath("/dir/dir3", "/dir/dir3/link") returns true which it would
not do if links have been resolved.

> Do you think this has merits? Or is the expectation of the
> isLeadingPath API solely based on the names of that passed files
> rather than their actual resolved location on the filesystem?

Yes, I think so.

> I haven't yet tried it out myself to have a more clearer understanding
> of how it will end up behaving in the context that we use this API.

It is used in SymbolicLinkUtils in a way that might break if symbolic
links are resolved.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Jaikiran Pai

On 28/06/18 8:37 PM, Stefan Bodewig wrote:

> On 2018-06-28, Jaikiran Pai wrote:
>
>> However, looking at the FileUtils#isLeadingPath(...) implementation, I
>> wonder why it even uses normalize. Given that the goal of that API (as
>> stated in the javadoc) is to figure out if one path leads the other,
>> to me that translates to being a check to see whether the "leading"
>> param to that method is a parent of the "path". I suppose that can be
>> implemented by using the java.io.File#getCanonicalFile() call on each
>> of the params and then doing a iterative getParent() call on the
>> canonical File returned for the "path" and seeing if it ends up
>> leading to the canonical File returned for "leading". The call to
>> java.io.File#getCanonicalFile() should take care of the ".", ".." and
>> even symbolic link reference resolutions (which I guess is a good
>> thing?).
> I think I have written that method long ago so I should be able to
> answer that. Unfortunately I'm not.
>
> In may cases we did not want to resolve canonical paths. I think the
> expectation is that for
>
> /dir
>    /dir2
>    /dir3
>       link -> /dir/dir2
>
> isLeadingPath("/dir/dir3", "/dir/dir3/link") returns true which it would
> not do if links have been resolved.
That's a good example and yes it would return false if we would change
it to use canonical paths.

Which then makes me wonder - in the context of this specific
untar/expand/unzip issue, should we probably be using a different custom
very specific logic (which relies on canonical files and getParent())
instead of a call to isLeadingPath()? I don't have an answer though and
I will have to sleep over this a bit to see if it has other implications
and if it does indeed solve the issue at hand.

-Jaikiran

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
On 2018-06-28, Jaikiran Pai wrote:

> On 28/06/18 8:37 PM, Stefan Bodewig wrote:

>> /dir
>>    /dir2
>>    /dir3
>>       link -> /dir/dir2

>> isLeadingPath("/dir/dir3", "/dir/dir3/link") returns true which it would
>> not do if links have been resolved.

> That's a good example and yes it would return false if we would change
> it to use canonical paths.

> Which then makes me wonder - in the context of this specific
> untar/expand/unzip issue, should we probably be using a different
> custom very specific logic (which relies on canonical files and
> getParent()) instead of a call to isLeadingPath()?

Probably. I used isLeadingPath because it has been already there - and )
simply didn't realize it wouldn't do what I expected it to.

https://github.com/apache/commons-compress/blob/a080293da69f3fe3d11d5214432e1469ee195870/src/main/java/org/apache/commons/compress/archivers/examples/Expander.java#L245
is how I implemented it in Commons Compress' example code.

> I don't have an answer though and I will have to sleep over this a bit
> to see if it has other implications and if it does indeed solve the
> issue at hand.

I appreciate this a lot, many thanks.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
On 2018-06-28, Stefan Bodewig wrote:

> On 2018-06-28, Jaikiran Pai wrote:

>> Which then makes me wonder - in the context of this specific
>> untar/expand/unzip issue, should we probably be using a different
>> custom very specific logic (which relies on canonical files and
>> getParent()) instead of a call to isLeadingPath()?

> Probably. I used isLeadingPath because it has been already there - and )
> simply didn't realize it wouldn't do what I expected it to.

I decided to "fix" isLeadingPath for this edge case and add a method
using canonical paths instead - and use that in unzip and friends.

Please have a look at the proposed solution, I won't close the Bugzilla
issue before we have agreed this is the proper fix.

Cheers

        Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Nicolas Lalevée
At my first read of the code I wondered if paths ending with slash are properly handled in every case. After more careful reading, it is seems ok. Maybe some unit tests about it would be nice, just to be sure.

Then, about tackling the bug, I am on the same page as you both. It looks good to me.

Nicolas

> Le 1 juil. 2018 à 11:27, Stefan Bodewig <[hidden email]> a écrit :
>
> On 2018-06-28, Stefan Bodewig wrote:
>
>> On 2018-06-28, Jaikiran Pai wrote:
>
>>> Which then makes me wonder - in the context of this specific
>>> untar/expand/unzip issue, should we probably be using a different
>>> custom very specific logic (which relies on canonical files and
>>> getParent()) instead of a call to isLeadingPath()?
>
>> Probably. I used isLeadingPath because it has been already there - and )
>> simply didn't realize it wouldn't do what I expected it to.
>
> I decided to "fix" isLeadingPath for this edge case and add a method
> using canonical paths instead - and use that in unzip and friends.
>
> Please have a look at the proposed solution, I won't close the Bugzilla
> issue before we have agreed this is the proper fix.
>
> Cheers
>
>        Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Jaikiran Pai
In reply to this post by Stefan Bodewig
Hi Stefan,

Sorry that I couldn't get to this earlier. My plan to spend more time on
this during the weekend didn't work out.

I just checked the commits related to this and it looks mostly correct.
However, I am still not 100% sure we have covered it all with this new
method that was introduced[1] and used in the unzip part. What I mean
is, the API expectations are right. However the implementation _might_
need a bit of rethink since it still relies on path string comparison
(checks like endsWith) instead of a getParent() check. My suspicion is
all theoretical at the moment and may not even be valid. I'm going to
build the latest changes of Ant give it a try against a bunch of my
theoretical use cases and see if my suspicions are really valid or not.

[1]
https://github.com/apache/ant/commit/6a41d62cb9ab4e640b72cb4de42a6c211dea645d

-Jaikiran


On 01/07/18 2:57 PM, Stefan Bodewig wrote:

> On 2018-06-28, Stefan Bodewig wrote:
>
>> On 2018-06-28, Jaikiran Pai wrote:
>>> Which then makes me wonder - in the context of this specific
>>> untar/expand/unzip issue, should we probably be using a different
>>> custom very specific logic (which relies on canonical files and
>>> getParent()) instead of a call to isLeadingPath()?
>> Probably. I used isLeadingPath because it has been already there - and )
>> simply didn't realize it wouldn't do what I expected it to.
> I decided to "fix" isLeadingPath for this edge case and add a method
> using canonical paths instead - and use that in unzip and friends.
>
> Please have a look at the proposed solution, I won't close the Bugzilla
> issue before we have agreed this is the proper fix.
>
> Cheers
>
>          Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
On 2018-07-02, Jaikiran Pai wrote:

> Sorry that I couldn't get to this earlier. My plan to spend more time
> on this during the weekend didn't work out.

It's been a weekend :-)

No worries.

> I just checked the commits related to this and it looks mostly
> correct. However, I am still not 100% sure we have covered it all with
> this new method that was introduced[1] and used in the unzip
> part. What I mean is, the API expectations are right. However the
> implementation _might_ need a bit of rethink since it still relies on
> path string comparison (checks like endsWith) instead of a getParent()
> check.

I mostly kept the string comparison for performance reasons, but maybe
that point is moot and performance should not be a concern
here. Rewriting it to use getParent is no problem at all.

> My suspicion is all theoretical at the moment and may not even be
> valid. I'm going to build the latest changes of Ant give it a try
> against a bunch of my theoretical use cases and see if my suspicions
> are really valid or not.

Thank you

      Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
In reply to this post by Nicolas Lalevée
Hi Nicolas

many thanks for lending an extra pair of eyes. I'll add a few more
tests.

Stefan

On 2018-07-01, Nicolas Lalevée wrote:

> At my first read of the code I wondered if paths ending with slash are properly handled in every case. After more careful reading, it is seems ok. Maybe some unit tests about it would be nice, just to be sure.

> Then, about tackling the bug, I am on the same page as you both. It looks good to me.

> Nicolas

>> Le 1 juil. 2018 à 11:27, Stefan Bodewig <[hidden email]> a écrit :

>> On 2018-06-28, Stefan Bodewig wrote:

>>> On 2018-06-28, Jaikiran Pai wrote:

>>>> Which then makes me wonder - in the context of this specific
>>>> untar/expand/unzip issue, should we probably be using a different
>>>> custom very specific logic (which relies on canonical files and
>>>> getParent()) instead of a call to isLeadingPath()?

>>> Probably. I used isLeadingPath because it has been already there - and )
>>> simply didn't realize it wouldn't do what I expected it to.

>> I decided to "fix" isLeadingPath for this edge case and add a method
>> using canonical paths instead - and use that in unzip and friends.

>> Please have a look at the proposed solution, I won't close the Bugzilla
>> issue before we have agreed this is the proper fix.

>> Cheers

>>        Stefan

>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]



> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
In reply to this post by Stefan Bodewig
On 2018-07-02, Stefan Bodewig wrote:

> On 2018-07-02, Jaikiran Pai wrote:

>> I just checked the commits related to this and it looks mostly
>> correct. However, I am still not 100% sure we have covered it all with
>> this new method that was introduced[1] and used in the unzip
>> part. What I mean is, the API expectations are right. However the
>> implementation _might_ need a bit of rethink since it still relies on
>> path string comparison (checks like endsWith) instead of a getParent()
>> check.

> I mostly kept the string comparison for performance reasons, but maybe
> that point is moot and performance should not be a concern
> here. Rewriting it to use getParent is no problem at all.

Done

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Jaikiran Pai
Hi Stefan,

I did some testing manually for this new method, with both symlinks and
non-symlinks with both the string check version and the getParent()
version. In both of those, I couldn't get it to break in any odd ways
(which is a good thing). It also means that my theory that the string
comparison may not always be a best idea is just theoretical. However, I
just feel a bit more comfortable seeing the getParent() version since
that then removes any kind of file separator or odd backslash/frontslash
permutations that we may not have thought of and instead leaves it to
the JRE implementation to deal with it. Again, this is me being a bit
paranoid than any real demoable issue with the string comparison code.
So thank you for using the getParent() call in this implementation.

At this point, I think these commits address the issue that we sought
out to fix. So unless someone else sees any issues, I think we can go
ahead and do the release that you had planned for.

-Jaikiran


On 03/07/18 1:16 PM, Stefan Bodewig wrote:

> On 2018-07-02, Stefan Bodewig wrote:
>
>> On 2018-07-02, Jaikiran Pai wrote:
>>> I just checked the commits related to this and it looks mostly
>>> correct. However, I am still not 100% sure we have covered it all with
>>> this new method that was introduced[1] and used in the unzip
>>> part. What I mean is, the API expectations are right. However the
>>> implementation _might_ need a bit of rethink since it still relies on
>>> path string comparison (checks like endsWith) instead of a getParent()
>>> check.
>> I mostly kept the string comparison for performance reasons, but maybe
>> that point is moot and performance should not be a concern
>> here. Rewriting it to use getParent is no problem at all.
> Done
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: FileUtils.normalize/isLeadingPath have bitten me

Stefan Bodewig
On 2018-07-03, Jaikiran Pai wrote:

> I did some testing manually for this new method, with both symlinks
> and non-symlinks with both the string check version and the
> getParent() version. In both of those, I couldn't get it to break in
> any odd ways (which is a good thing). It also means that my theory
> that the string comparison may not always be a best idea is just
> theoretical. However, I just feel a bit more comfortable seeing the
> getParent() version since that then removes any kind of file separator
> or odd backslash/frontslash permutations that we may not have thought
> of and instead leaves it to the JRE implementation to deal with
> it. Again, this is me being a bit paranoid than any real demoable
> issue with the string comparison code.

I welcome paranoia in particular if security is involved. :-)

> At this point, I think these commits address the issue that we sought
> out to fix. So unless someone else sees any issues, I think we can go
> ahead and do the release that you had planned for.

Thanks. I'll let it sit for a bit longer and will cut release candidates
later the coming days.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]