Database backed DirectoryScanner

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Database backed DirectoryScanner

Mike S.
Hello,

I'm needing to search and collect very large sets of files using a variety
of criteria. All my research for libraries to support this keeps coming back to
Ant and its powerful FileSet and DirectoryScanner. So I'm hoping to make use
of them, especially since Ant works on several platforms.

Given that the number of files will be very large, I'm concerned about how
DirectoryScanner blocks until it has collected all the results in memory.
I ran across one case where the DirectoryScanner search duration exceeded
a desired time frame https://bz.apache.org/bugzilla/show_bug.cgi?id=57253
so I know this isn't just a theoretical concern.

Now I can't be the first person to consider the idea of accessing
DirectoryScanner results before they are complete, or to actively store the
results persistently, like in a database. Which is why I wanted to run the idea
by this mailing list in case this is a terrible idea, or if there's a better
way, or if someone has already done it.

Implementation wise, it doesn't seem very difficult. Looking at the design of
DirectoryScanner, I see all results are ultimately stored in Vectors.
So I could extend Vector and override all behavior with database queries.
And since DirectoryScanner exposes the ability to set the Vectors
(via protected methods), I would extend DirectoryScanner to use this new
database backed Vector.

Doing this would offer the benefits of
* Avoiding memory limits
* Access to the results from other threads or processes while the search
is in progress
* Still supporting Vector.contains() lookup capability used during the search
* and the database allows for various real-time searching, aggregating,
reporting, etc.

Again, I can't be the first one to consider this, but I can't find any mention
of such an idea anywhere. So if anyone has any thoughts on the matter, I would
definitely appreciate any feedback.

Thank you for your time and consideration.

Regards,
Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]