[GitHub] ant pull request #76: bz-43144 - Improve the performance of the tar task whe...

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] ant pull request #76: bz-43144 - Improve the performance of the tar task whe...

asfgit
GitHub user jaikiran opened a pull request:

    https://github.com/apache/ant/pull/76

    bz-43144 - Improve the performance of the tar task when it uses a zipfileset

    https://bz.apache.org/bugzilla/show_bug.cgi?id=43144 is an issue where users have reported that the tar task is extremely slow when used with a `zipfileset`.
   
    The comment in that bug, from @bodewig, rightly notes that the reason for this slowness has to do with the code where we iterate over the zip entries as a `Resource` and then open an `InputStream` for each entry during the `tar` task. The implementation of the `ZipResource#getInputStream` opens a new `ZipFile` every time and as Stefan notes in that bug, the reason it's done this way is because we don't know what would be the right time to close a `ZipFIle` if the implementation of the `ZipResource` would want to hold on to a open `ZipFile` instance and reuse it.
   
    However, there are occasions, like the one here, where the callers of the `ArchiveFileSet` are aware and know when and how long they expect the underlying archive to be open. The commit in this PR, introduces a new method (`openArchive()`) on the `ArchiveFileSet` to allow such callers to have more control over the opening and closing of the underlying resource(s) like the `ZipFile`. The whole goal of this new method is to allow iterating over the entries in the archive (just like the existing `iterator()` method) and yet the same time exposing a way for users to explicitly `close` the returned `ArchiveEntries`. When to use the `iterator()` method and when to use the `openArchive` method will be left to the users of ArchiveSet.
   
    The commit in this PR intentionally just exposes only one method `openArchive` as a `public` method and keeps the rest of the new methods either private or package private. Right now, only `ZipFileSet` overrides the new package private method to provide a scanner which holds on to open resource(s), if it's asked to do so.
   
    The changes in this commit maintain backward compatibility of the existing methods and does _not_ introduce any change in behaviour of their usages or semantics.
   
    The javadocs of the new methods explain more about what each one does and how the usage typically looks like.
   
    I have run a few of the existing Tar related tests locally and haven't found any regressions. I have also run a manual test to see how the new performance with this change looks like. I used a random zip file which is around 5MB in size and has 927 entries (as reported by the unzip command):
   
    ```
    unzip -l source.zip
    <contents of the file> ....
    ---------                     -------
     22222385                     927 files
    ```
    I then used the following build file:
   
    ```
    <?xml version="1.0" encoding="UTF-8"?>
    <project name="test" default="main">
        <target name="main">
          <mkdir dir="build"/>
          <tar destFile="build/test.tar" longfile="gnu">
          <zipfileset src="source.zip"/>
           </tar>
        </target>
    </project>
    ```
    Ran the following command:
    ```
    time ant
    ```
    With Ant 1.10.5 (the latest released), it takes a while to complete and reports:
   
    ```
    Total time: 23 seconds
   
    real 0m23.485s
    user 0m16.767s
    sys 0m9.368s
    ```
    So around 23 seconds for the tar task.
   
    With this patch and the freshly built Ant distribution, this same build file (I did the necessary cleanup of the  build generated tar file, before running it again) reports:
   
    ```
    Total time: 0 seconds
   
    real 0m1.068s
    user 0m1.994s
    sys 0m0.258s
    ```
   
    Around 1 second for the exact same task. So as expected there's a big improvement. I have done basic comparison of the generated tar files, with Ant 1.10.5 and this patched version to check it contains all the expected content and it looked fine. I will see if I can add some kind of automated testing around this if possible. Until then I would like inputs on whether this looks fine and any suggestions/feedback on this change. Right now, this is targetted for master branch. I'll take a look later if it's easy (I guess, should be) to have this in 1.9.x too.
   
    P.S: The linked bug also has one comment which states that the `copy` task when it uses a `zipfileset` is impacted by this performance issue too. I haven't checked or verified that. At a later date, I'll take a look if it needs this same/similar fix.
   
   
   
     

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jaikiran/ant bz-43144-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/ant/pull/76.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #76
   
----
commit 81b8c80c550ba560770a1f82de60c4d0b11ace91
Author: Jaikiran Pai <jaikiran@...>
Date:   2018-10-31T13:59:29Z

    bz-43144 - (performance fix) Explicitly control when an archive is opened and closed when a zipfileset is used as a resource collection in the tar task

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] ant issue #76: bz-43144 - Improve the performance of the tar task when it us...

asfgit
Github user asfgit commented on the issue:

    https://github.com/apache/ant/pull/76
 
   
    Refer to this link for build results (access rights to CI server needed):
    https://builds.apache.org/job/Ant%20Github-PR-Windows/88/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] ant issue #76: bz-43144 - Improve the performance of the tar task when it us...

asfgit
In reply to this post by asfgit
Github user asfgit commented on the issue:

    https://github.com/apache/ant/pull/76
 
   
    Refer to this link for build results (access rights to CI server needed):
    https://builds.apache.org/job/Ant%20Github-PR-Linux/82/



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] ant issue #76: bz-43144 - Improve the performance of the tar task when it us...

asfgit
In reply to this post by asfgit
Github user jaikiran commented on the issue:

    https://github.com/apache/ant/pull/76
 
    I'm going to put this on hold for a bit and won't merge it yet, for the reasons noted in the recent comments in the linked issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] ant pull request #76: bz-43144 - Improve the performance of the tar task whe...

asfgit
In reply to this post by asfgit
Github user jaikiran closed the pull request at:

    https://github.com/apache/ant/pull/76


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] ant issue #76: bz-43144 - Improve the performance of the tar task when it us...

asfgit
In reply to this post by asfgit
Github user bodewig commented on the issue:

    https://github.com/apache/ant/pull/76
 
    Quite a bit of code :-)
    Looks as if you had found a solution with minimal API impact that can be extended to cases other than just `tar` although I'm afraid we are using resource collections wrapping resource collections in some places.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] ant issue #76: bz-43144 - Improve the performance of the tar task when it us...

asfgit
In reply to this post by asfgit
Github user bodewig commented on the issue:

    https://github.com/apache/ant/pull/76
 
    Quite a bit of code :-)
    Looks as if you had found a solution with minimal API impact that can be extended to cases other than just `tar` although I'm afraid we are using resource collections wrapping resource collections in some places.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[GitHub] ant issue #76: bz-43144 - Improve the performance of the tar task when it us...

asfgit
In reply to this post by asfgit
Github user jaikiran commented on the issue:

    https://github.com/apache/ant/pull/76
 
    >> Looks as if you had found a solution with minimal API impact that can be extended to cases other than just tar although I'm afraid we are using resource collections wrapping resource collections in some places.
   
    You are right, plus unlike the `tar` task, the way `Resource`(s) are passed around, it isn't that straightforward to keep track of them. I might still find a way similar to this to get those covered too. I'll spend some more time on this in the upcoming days.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]