You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, or at least in version 0.9.13, the command line client adapter is
unable to retrieve files larger than a few megs without throwing an
OutOfMemoryError. The reason is the results of the 'cat' call are read into a
ByteArrayOutputStream, and either at that point or later in the application, the
error is thrown. This problem was discovered when trying to use Ivy
(http://jayasoft.org/ivy) to download large dependencies.
This patch adds a readStreamWithOverflow() method which reads the stream into a
ByteArrayOutputStream, however if more than 1 meg is read, a temporary file is
created, the byte array is dumped into it, and the rest of the stream is writted
into the temp file. The appropriate InputStream is then returned. This new
method is used by the getContent(SVNUrl...) method instead of the previous
implementation. This patch makes it possible, at least in my testing, to
download any size of file without any memory problems and without any
preceivable performance penalty.
The patch is as follows from the 0.9.13 tag:
Index: CmdLineClientAdapter.java
===================================================================
--- CmdLineClientAdapter.java (revision 1402)
+++ CmdLineClientAdapter.java (working copy)
@@ -55,12 +55,14 @@
package org.tigris.subversion.svnclientadapter.commandline;
import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
+import java.io.OutputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedList;
@@ -321,11 +323,7 @@
try {
InputStream content = _cmd.cat(toString(arg0), toString(arg1));
- //read byte-by-byte and put it in a vector.
- //then take the vector and fill a byteArray.
- byte[] byteArray;
- byteArray = streamToByteArray(content, false);
- return new ByteArrayInputStream(byteArray);
+ return readStreamWithOverflow(content);
} catch (IOException e) {
throw SVNClientException.wrapException(e);
} catch (CmdLineException e) {
@@ -1206,4 +1204,60 @@
// TODO Auto-generated method stub
return null;
}
+
+
+ /**
+ * Reads the contents of an InputStream into a byte array, overflowing to
+ * a temporary file if needed. If the bytes read exceeds 1MB, a tempory
+ * file is created and used to store the rest of the bytes.
+ *
+ * @param inStream The original InputStream
+ * @return A ByteArrayInputStream of read bytes, or if overflow occurred,
+ * a FileInputStream reading from the temporary file
+ *
+ */
+ private InputStream readStreamWithOverflow(InputStream inStream) throws
IOException {
+
+ try {
+ ByteArrayOutputStream outBuffer = new ByteArrayOutputStream();
+ OutputStream outStream = outBuffer;
+ File overflowFile = null;
+ byte[] buffer = new byte[4096];
+ int total = 0;
+ int overflowCutoff = 1 * 1024 * 1024;
+ int len;
+ while ((len = inStream.read(buffer)) > 0) {
+ total += len;
+ // When too many bytes have been read, shove buffer into
+ // temporary file and write everything else to it
+ if (total > overflowCutoff && overflowFile == null) {
+
+ byte[] b = outBuffer.toByteArray();
+ overflowFile = File.createTempFile("ivy", ".tmp");
+ //System.out.println("Overflow detected, using
"+overflowFile+" file");
+ outStream = new FileOutputStream(overflowFile);
+ outStream.write(b, 0, b.length);
+ outBuffer = null;
+ }
+ outStream.write(buffer, 0, len);
+ }
+
+ if (overflowFile != null) {
+ try {
+ outStream.close();
+ } catch (IOException ex) {
+ // ignore
+ }
+ return new FileInputStream(overflowFile);
+ } else {
+ return new ByteArrayInputStream(outBuffer.toByteArray());
+ }
+ }
+ finally {
+ if (inStream != null) {
+ inStream.close();
+ }
+ }
+ }
+
}
Metadata Imported from Tigris (Issue 321)
Creation Date: 2005-06-21 12:22:59
Reporter: mrdon
Subcomponent: svnClientAdapter
Version: current
Milestone: 1.0.0
Keywords:
Cc: conor, dlr, markphip
Comments
2005-07-02 21:07:00 by dlr
Adding myself to the CC list.
2005-07-02 21:52:40 by dlr
This issue applies not only to cat, but to diff and propget as well.
2005-07-02 22:32:57 by dlr
That data is read out of a java.lang.Process object byte-by-byte into a Vector
in CmdLineClientAdapter.streamToByteArray() is rather inefficient. Streaming
the data to disk when it exceeds a certain threshold is not a bad idea (though
the suggested 1 MB is too large a buffer), but I wonder if we could get away
with instead returning the value of Process.getInputStream()?
The JavaDoc of Process has a little to say about that:
http://java.sun.com/j2se/1.3/docs/api/java/lang/Process.html
"All its standard io (i.e. stdin, stdout, stderr) operations will be redirected
to the parent process through three streams (Process.getOutputStream(),
Process.getInputStream(), Process.getErrorStream()). The parent process uses
these streams to feed input to and get output from the subprocess. Because some
native platforms only provide limited buffer size for standard input and output
streams, failure to promptly write the input stream or read the output stream of
the subprocess may cause the subprocess to block, and even deadlock."
Clearly, that last sentence could theoretically pose a problem. However, I
wonder how often such a thing would come up in practice? The InputStream is
likely to be read immediately by callers, thus avoiding a deadlock or OS buffer
overrun. This might be further obviated by wrapping the returned InputStream in
a BufferedInputStream.
2005-07-02 22:57:54 by dlr
The suggested approach of queuing up data exceeding a threshold in a temporary
file only works okay so long as you actually have permission to write to the
local disk, and have possibly up to 2x the space available for the data read off
of the InputStream on disk. Furthermore, it's much slower since it's possibly
involving disk (hardware) I/O, instead of attempting to use only memory. The
provided patch is a good example of how to take this approach, but also includes
some commented-out debug statements, and uses the text "ivy" when creating the
temporary file. Use of Commons I/O (which would be a new dependency) and/or
Lang (an existing dependency) might obviate the need for some of this patch's code.
2005-07-03 00:46:36 by mrdon
Right, the key issue is if there is a need to immediately retrieve the contents of the Process's InputStream
or it can be returned wrapped in a buffer. I currently use this patch to download files up to 250 MB, but
have not tested returning a buffered input stream directly. As you pointed out, these seems to be an issue
in other places as well. I was talking with Conor MacNeill from Apache Ant at Javaone, and he said he was
having similar issues with the JNI-based svn client. In his case, the fix involved modifying API's which
thankfully isn't the case here. I certainly appreciate anything you can do to resolve this issue for future
revisions.
2005-07-14 10:55:28 by dlr
Thanks Don.
We've been continuing this discussion on the dev list. Here's a link to the
email thread:
http://subclipse.tigris.org/servlets/BrowseList?list=dev&by=thread&from=342908
Related JavaHL issue, including a scalability patch by Conor MacNeill:
http://subversion.tigris.org/issues/show_bug.cgi?id=2299
2005-08-03 17:42:53 by dlr
SVN 2299 has been committed to SVN's trunk, and is expected to show up in a
release no later than Subversion 1.3.0.
The text was updated successfully, but these errors were encountered:
Description
Metadata Imported from Tigris (Issue 321)
Comments
2005-07-02 21:07:00 by dlr
2005-07-02 21:52:40 by dlr
2005-07-02 22:32:57 by dlr
2005-07-02 22:57:54 by dlr
2005-07-03 00:46:36 by mrdon
2005-07-14 10:55:28 by dlr
2005-08-03 17:42:53 by dlr
The text was updated successfully, but these errors were encountered: