Reproducible JARs

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Reproducible JARs

George Bateman
Dear List,

I'm currently trying to make Processing <https://processing.org> build
reproducibly (so building it twice yields the exact same output file).
Currently this involves, as far as I can tell, unzipping every JAR
file, touching the files with a constant modification time, and
re-zipping it. (Even if you were to touch the .class files in advance,
there's MANIFEST.MF, which is created by Ant and which you have to
unzip to access.)

Unless there's some efficient way of doing this that I've missed,
could you advise me on how I'd go about writing a patch for Ant that
makes reproducible JARs easier? I'd been thinking of adding a
"modificationtime" attribute to the jar and zip tasks, and giving that
time to all the files, but I'd be grateful if you could give me a
rough idea of how Ant works and which files I'd need to be looking at
and editing.

Thanks,
George Bateman.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Reproducible JARs

Stefan Bodewig
On 2017-07-22, George Bateman wrote:

> I'm currently trying to make Processing <https://processing.org> build
> reproducibly (so building it twice yields the exact same output file).
> Currently this involves, as far as I can tell, unzipping every JAR
> file, touching the files with a constant modification time, and
> re-zipping it. (Even if you were to touch the .class files in advance,
> there's MANIFEST.MF, which is created by Ant and which you have to
> unzip to access.)

The manifest may contain the current time stamp, to make things worse
for you :-)

I'm not sure what you want can be achieved at all. The best you can get
is the exact same jar when the build has been run on the same OS with
the same version of java, javac and zlib (and probably a few other
things I'm missing right now).

Different javacs will create different class files. ZIP (and thus JAR)
creation uses zlib under the covers and different versions may result in
different deflated output.

> Unless there's some efficient way of doing this that I've missed,
> could you advise me on how I'd go about writing a patch for Ant that
> makes reproducible JARs easier? I'd been thinking of adding a
> "modificationtime" attribute to the jar and zip tasks, and giving that
> time to all the files, but I'd be grateful if you could give me a
> rough idea of how Ant works and which files I'd need to be looking at
> and editing.

First of all you'd have to modify the class
org.apache.tools.ant.taskdefs.Zip and add a setModificationtime method
to it - this will create the attribute for both tasks, as the
implementation of <jar> (org.apache.tools.ant.taskdefs.Jar) is a
subclass of the Zip class. You'll need to think about what the value of
the attribute should be - milliseconds since epoch? A formatted string
containing the timestamp to set?

And then you need to look for all places where Zip or Jar invoke setTime
on a ZipEntry instance.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Reproducible JARs

Gintautas Grigelionis
I believe the easier way would be to diff the fileset of classes to be
packed to the fileset of classes in the jar before creating a new jar.
There's a tool called zipdiff that could do the trick (it may ignore
timestamps, etc). That would correspond to a javac deciding on whether to
recompile the classes (which is based on timestamps, I guess ;-). It sounds
to me like you're looking for an equivalent of ClearCase winkin. If that's
the case, have you read [1]?

Gintas

[1]
https://www.packtpub.com/books/content/ibm-rational-clearcase-challenges-java-development

2017-07-23 12:35 GMT+02:00 Stefan Bodewig <[hidden email]>:

> On 2017-07-22, George Bateman wrote:
>
> > I'm currently trying to make Processing <https://processing.org> build
> > reproducibly (so building it twice yields the exact same output file).
> > Currently this involves, as far as I can tell, unzipping every JAR
> > file, touching the files with a constant modification time, and
> > re-zipping it. (Even if you were to touch the .class files in advance,
> > there's MANIFEST.MF, which is created by Ant and which you have to
> > unzip to access.)
>
> The manifest may contain the current time stamp, to make things worse
> for you :-)
>
> I'm not sure what you want can be achieved at all. The best you can get
> is the exact same jar when the build has been run on the same OS with
> the same version of java, javac and zlib (and probably a few other
> things I'm missing right now).
>
> Different javacs will create different class files. ZIP (and thus JAR)
> creation uses zlib under the covers and different versions may result in
> different deflated output.
>
> > Unless there's some efficient way of doing this that I've missed,
> > could you advise me on how I'd go about writing a patch for Ant that
> > makes reproducible JARs easier? I'd been thinking of adding a
> > "modificationtime" attribute to the jar and zip tasks, and giving that
> > time to all the files, but I'd be grateful if you could give me a
> > rough idea of how Ant works and which files I'd need to be looking at
> > and editing.
>
> First of all you'd have to modify the class
> org.apache.tools.ant.taskdefs.Zip and add a setModificationtime method
> to it - this will create the attribute for both tasks, as the
> implementation of <jar> (org.apache.tools.ant.taskdefs.Jar) is a
> subclass of the Zip class. You'll need to think about what the value of
> the attribute should be - milliseconds since epoch? A formatted string
> containing the timestamp to set?
>
> And then you need to look for all places where Zip or Jar invoke setTime
> on a ZipEntry instance.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reproducible JARs

George Bateman

Hi Gintas and Stefan, and List,

Thanks very much for your emails. Sorry for the slight delay in replying!

On 25 July 2017 at 07:39, Gintautas Grigelionis <[hidden email]> wrote:
> I believe the easier way would be to diff the fileset of classes to be
> packed to the fileset of classes in the jar before creating a new jar.
> There's a tool called zipdiff that could do the trick (it may ignore
> timestamps, etc). That would correspond to a javac deciding on whether to
> recompile the classes (which is based on timestamps, I guess ;-).

That doesn't solve the timestamps problem though; two people building from scratch with no jar or class files to begin with will still get different modification times in it.

> It sounds to me like you're looking for an equivalent of ClearCase
> winkin. If that's the case, have you read [1]?

I have now, thank you! I'll be honest, adding that sort of capability to Ant sounds just a little beyond my capabilities. Also, it doesn't seem to be necessary; javac seems to be giving the same exact output every time for a given javac version.

Stefan wrote:
> The manifest may contain the current time stamp, to make things worse
> for you :-)

Fortunately, I don't see anywhere in Jar.java that mentions time or date, except passing it to Zip#zipFile() as the modification time, and I override it there anyway.

> I'm not sure what you want can be achieved at all. The best you can get
> is the exact same jar when the build has been run on the same OS with
> the same version of java, javac and zlib (and probably a few other
> things I'm missing right now).
>
> Different javacs will create different class files. ZIP (and thus JAR)
> creation uses zlib under the covers and different versions may result in
> different deflated output.

That's fine. It's sufficient to be able to say that if you set up your build environment to my exact but reasonable instructions (as you get in a Debian source package, for example), you will get the same thing I got. This would prove that I built the binaries from the source I said I did just by comparing the single eventual output file.

Re: date formatting, I've added a "parseLenientDateTime" method to the end of DateUtils.java. It's compatible with the date attribute to <touch> but also accept milliseconds since epoch and a small range of ISO 8601-type formats, so that dates output by other software can more easily be read by Ant. Would it be okay if I submitted a patch that got <touch> to use parseLenientDateTime as well? It's currently a bit awkward to make git, for example, output a date that can be read by <touch>.

I've attached the output of git diff so you can see it, but what's the correct way to submit the code to you?

Best wishes,

George.



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Reproducible JARs

Stefan Bodewig
On 2017-08-23, George Bateman wrote:

> Stefan wrote:
>> The manifest may contain the current time stamp, to make things worse
>> for you :-)

> Fortunately, I don't see anywhere in Jar.java that mentions time or date,
> except passing it to Zip#zipFile() as the modification time, and I override
> it there anyway.

I thought we might add a build timestamp inside the default manifest. I
just checked and we don't. All is well.

>> I'm not sure what you want can be achieved at all. The best you can get
>> is the exact same jar when the build has been run on the same OS with
>> the same version of java, javac and zlib (and probably a few other
>> things I'm missing right now).

>> Different javacs will create different class files. ZIP (and thus JAR)
>> creation uses zlib under the covers and different versions may result in
>> different deflated output.

> That's fine. It's sufficient to be able to say that if you set up your
> build environment to my exact but reasonable instructions (as you get in a
> Debian source package, for example), you will get the same thing I got.
> This would prove that I built the binaries from the source I said I did
> just by comparing the single eventual output file.

In this case you could <touch> all files that end up inside the archive
to some fixed timestamp and obtain the same jar for all of them.

It may not work for the timestamp of the MANIFEST.MF, come to think of
it. See line 556 in Jar.java (master branch), the manifest will always
use the current timestamp. Same for the META-INF directory.

> I've attached the output of git diff so you can see it, but what's the
> correct way to submit the code to you?

Attachments are stripped, so your gif never made it through.

Please open an enhancement request in bugzilla[1] and attach your patch
or alternatively open a github pull request.

Stefan

[1] https://bz.apache.org/bugzilla/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]