-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prashan_pul #1
base: master
Are you sure you want to change the base?
prashan_pul #1
Commits on Jan 13, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 97cd27e - Browse repository at this point
Copy the full SHA 97cd27eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 27311b1 - Browse repository at this point
Copy the full SHA 27311b1View commit details -
Merge pull request #2 from jegonzal/GraphXCCIssue
Improving documentation and identifying potential bug in CC calculation.
Configuration menu - View commit details
-
Copy full SHA for 8038da2 - Browse repository at this point
Copy the full SHA 8038da2View commit details -
Updated JavaStreamingContext to make scaladoc compile.
`sbt/sbt doc` used to fail. This fixed it.
Configuration menu - View commit details
-
Copy full SHA for 30328c3 - Browse repository at this point
Copy the full SHA 30328c3View commit details
Commits on Jan 14, 2014
-
Configuration menu - View commit details
-
Copy full SHA for e2d25d2 - Browse repository at this point
Copy the full SHA e2d25d2View commit details -
Merge pull request #410 from rxin/scaladoc1
Updated JavaStreamingContext to make scaladoc compile. `sbt/sbt doc` used to fail. This fixed it.
Configuration menu - View commit details
-
Copy full SHA for 01c0d72 - Browse repository at this point
Copy the full SHA 01c0d72View commit details -
Configuration menu - View commit details
-
Copy full SHA for dc041cd - Browse repository at this point
Copy the full SHA dc041cdView commit details -
Configuration menu - View commit details
-
Copy full SHA for c0bb38e - Browse repository at this point
Copy the full SHA c0bb38eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1bd5cef - Browse repository at this point
Copy the full SHA 1bd5cefView commit details -
Add EdgeDirection.Either and use it to fix CC bug
The bug was due to a misunderstanding of the activeSetOpt parameter to Graph.mapReduceTriplets. Passing EdgeDirection.Both causes mapReduceTriplets to run only on edges with *both* vertices in the active set. This commit adds EdgeDirection.Either, which causes mapReduceTriplets to run on edges with *either* vertex in the active set. This is what connected components needed.
Configuration menu - View commit details
-
Copy full SHA for ae4b75d - Browse repository at this point
Copy the full SHA ae4b75dView commit details -
Improvements in example code for the programming guide as well as add…
…ing serialization support for GraphImpl to address issues with failed closure capture.
Configuration menu - View commit details
-
Copy full SHA for cfe4a29 - Browse repository at this point
Copy the full SHA cfe4a29View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1233b3d - Browse repository at this point
Copy the full SHA 1233b3dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 02a8f54 - Browse repository at this point
Copy the full SHA 02a8f54View commit details -
Merge branch 'graphx' of github.com:ankurdave/incubator-spark into gr…
…aphx Conflicts: graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala
Configuration menu - View commit details
-
Copy full SHA for a4e12af - Browse repository at this point
Copy the full SHA a4e12afView commit details -
Configuration menu - View commit details
-
Copy full SHA for 87f335d - Browse repository at this point
Copy the full SHA 87f335dView commit details -
Configuration menu - View commit details
-
Copy full SHA for ae06d2c - Browse repository at this point
Copy the full SHA ae06d2cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1dce9ce - Browse repository at this point
Copy the full SHA 1dce9ceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 79a5ba3 - Browse repository at this point
Copy the full SHA 79a5ba3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 161ab93 - Browse repository at this point
Copy the full SHA 161ab93View commit details -
Configuration menu - View commit details
-
Copy full SHA for 622b7f7 - Browse repository at this point
Copy the full SHA 622b7f7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 552de5d - Browse repository at this point
Copy the full SHA 552de5dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4c22c55 - Browse repository at this point
Copy the full SHA 4c22c55View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8e5c732 - Browse repository at this point
Copy the full SHA 8e5c732View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9317286 - Browse repository at this point
Copy the full SHA 9317286View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0b18bfb - Browse repository at this point
Copy the full SHA 0b18bfbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0fbc0b0 - Browse repository at this point
Copy the full SHA 0fbc0b0View commit details -
Configuration menu - View commit details
-
Copy full SHA for d4cd5de - Browse repository at this point
Copy the full SHA d4cd5deView commit details -
Configuration menu - View commit details
-
Copy full SHA for ee8931d - Browse repository at this point
Copy the full SHA ee8931dView commit details -
Add default value for HadoopRDD's
cloneRecords
constructor arg, to ……maintain backwards compatibility.
Configuration menu - View commit details
-
Copy full SHA for 9e84e70 - Browse repository at this point
Copy the full SHA 9e84e70View commit details -
Merge pull request #411 from tdas/filestream-fix
Improved logic of finding new files in FileInputDStream Earlier, if HDFS has a hiccup and reports a existence of a new file (mod time T sec) at time T + 1 sec, then fileStream could have missed that file. With this change, it should be able to find files that are delayed by up to <batch size> seconds. That is, even if file is reported at T + <batch time> sec, file stream should be able to catch it. The new logic, at a high level, is as follows. It keeps track of the new files it found in the previous interval and mod time of the oldest of those files (lets call it X). Then in the current interval, it will ignore those files that were seen in the previous interval and those which have mod time older than X. So if a new file gets reported by HDFS that in the current interval, but has mod time in the previous interval, it will be considered. However, if the mod time earlier than the previous interval (that is, earlier than X), they will be ignored. This is the current limitation, and future version would improve this behavior further. Also reduced line lengths in DStream to <=100 chars.
Configuration menu - View commit details
-
Copy full SHA for a2fee38 - Browse repository at this point
Copy the full SHA a2fee38View commit details -
Configuration menu - View commit details
-
Copy full SHA for 33022d6 - Browse repository at this point
Copy the full SHA 33022d6View commit details -
Merge pull request #412 from harveyfeng/master
Add default value for HadoopRDD's `cloneRecords` constructor arg Small mend to https://github.com/apache/incubator-spark/pull/359/files#diff-1 for backwards compatibility
Configuration menu - View commit details
-
Copy full SHA for b07bc02 - Browse repository at this point
Copy the full SHA b07bc02View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc93c2a - Browse repository at this point
Copy the full SHA cc93c2aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8399341 - Browse repository at this point
Copy the full SHA 8399341View commit details -
Configuration menu - View commit details
-
Copy full SHA for d4d9ece - Browse repository at this point
Copy the full SHA d4d9eceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 84d6af8 - Browse repository at this point
Copy the full SHA 84d6af8View commit details -
Fix infinite loop in GraphGenerators.generateRandomEdges
The loop occurred when numEdges < numVertices. This commit fixes it by allowing generateRandomEdges to generate a multigraph.
Configuration menu - View commit details
-
Copy full SHA for c6023be - Browse repository at this point
Copy the full SHA c6023beView commit details -
Configuration menu - View commit details
-
Copy full SHA for 59e4384 - Browse repository at this point
Copy the full SHA 59e4384View commit details -
Configuration menu - View commit details
-
Copy full SHA for c28e5a0 - Browse repository at this point
Copy the full SHA c28e5a0View commit details -
Configuration menu - View commit details
-
Copy full SHA for e14a14b - Browse repository at this point
Copy the full SHA e14a14bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 67795db - Browse repository at this point
Copy the full SHA 67795dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f6f8c9 - Browse repository at this point
Copy the full SHA 6f6f8c9View commit details -
Configuration menu - View commit details
-
Copy full SHA for c6dbfd1 - Browse repository at this point
Copy the full SHA c6dbfd1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 76ebdae - Browse repository at this point
Copy the full SHA 76ebdaeView commit details -
Merge pull request #409 from tdas/unpersist
Automatically unpersisting RDDs that have been cleaned up from DStreams Earlier RDDs generated by DStreams were forgotten but not unpersisted. The system relied on the natural BlockManager LRU to drop the data. The cleaner.ttl was a hammer to clean up RDDs but it is something that needs to be set separately and need to be set very conservatively (at best, few minutes). This automatic unpersisting allows the system to handle this automatically, which reduces memory usage. As a side effect it will also improve GC performance as there are less number of objects stored in memory. In fact, for some workloads, it may allow RDDs to be cached as deserialized, which speeds up processing without too much GC overheads. This is disabled by default. To enable it set configuration spark.streaming.unpersist to true. In future release, this will be set to true by default. Also, reduced sleep time in TaskSchedulerImpl.stop() from 5 second to 1 second. From my conversation with Matei, there does not seem to be any good reason for the sleep for letting messages be sent out be so long.
Configuration menu - View commit details
-
Copy full SHA for 08b9fec - Browse repository at this point
Copy the full SHA 08b9fecView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2cd9358 - Browse repository at this point
Copy the full SHA 2cd9358View commit details -
Configuration menu - View commit details
-
Copy full SHA for af645be - Browse repository at this point
Copy the full SHA af645beView commit details -
Merge pull request #401 from andrewor14/master
External sorting - Add number of bytes spilled to Web UI Additionally, update test suite for external sorting to induce spilling.
Configuration menu - View commit details
-
Copy full SHA for 0ca0d4d - Browse repository at this point
Copy the full SHA 0ca0d4dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0d94d74 - Browse repository at this point
Copy the full SHA 0d94d74View commit details -
Since getLong() and getInt() have side effect, get back parentheses, …
…and remove an empty line
Configuration menu - View commit details
-
Copy full SHA for 12386b3 - Browse repository at this point
Copy the full SHA 12386b3View commit details -
Merge pull request #413 from rxin/scaladoc
Adjusted visibility of various components and documentation for 0.9.0 release.
Configuration menu - View commit details
-
Copy full SHA for 68641bc - Browse repository at this point
Copy the full SHA 68641bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4bafc4f - Browse repository at this point
Copy the full SHA 4bafc4fView commit details -
Merge pull request #408 from pwendell/external-serializers
Improvements to external sorting 1. Adds the option of compressing outputs. 2. Adds batching to the serialization to prevent OOM on the read side. 3. Slight renaming of config options. 4. Use Spark's buffer size for reads in addition to writes.
Configuration menu - View commit details
-
Copy full SHA for 945fe7a - Browse repository at this point
Copy the full SHA 945fe7aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 80e73ed - Browse repository at this point
Copy the full SHA 80e73edView commit details -
Merge pull request #367 from ankurdave/graphx
GraphX: Unifying Graphs and Tables GraphX extends Spark's distributed fault-tolerant collections API and interactive console with a new graph API which leverages recent advances in graph systems (e.g., [GraphLab](http://graphlab.org)) to enable users to easily and interactively build, transform, and reason about graph structured data at scale. See http://amplab.github.io/graphx/. Thanks to @jegonzal, @rxin, @ankurdave, @dcrankshaw, @jianpingjwang, @amatsukawa, @kellrott, and @adamnovak. Tasks left: - [x] Graph-level uncache - [x] Uncache previous iterations in Pregel - [x] ~~Uncache previous iterations in GraphLab~~ (postponed to post-release) - [x] - Describe GC issue with GraphLab - [ ] Write `docs/graphx-programming-guide.md` - [x] - Mention future Bagel support in docs - [ ] - Section on caching/uncaching in docs: As with Spark, cache something that is used more than once. In an iterative algorithm, try to cache and force (i.e., materialize) something every iteration, then uncache the cached things that depended on the newly materialized RDD but that won't be referenced again. - [x] Undo modifications to core collections and instead copy them to org.apache.spark.graphx - [x] Make Graph serializable to work around capture in Spark shell - [x] Rename graph -> graphx in package name and subproject - [x] Remove standalone PageRank - [x] ~~Fix amplab/graphx#52 by checking `iter.hasNext`~~
Configuration menu - View commit details
-
Copy full SHA for 4a805af - Browse repository at this point
Copy the full SHA 4a805afView commit details -
Configuration menu - View commit details
-
Copy full SHA for c2852cf - Browse repository at this point
Copy the full SHA c2852cfView commit details -
Merge pull request #380 from mateiz/py-bayes
Add Naive Bayes to Python MLlib, and some API fixes - Added a Python wrapper for Naive Bayes - Updated the Scala Naive Bayes to match the style of our other algorithms better and in particular make it easier to call from Java (added builder pattern, removed default value in train method) - Updated Python MLlib functions to not require a SparkContext; we can get that from the RDD the user gives - Added a toString method in LabeledPoint - Made the Python MLlib tests run as part of run-tests as well (before they could only be run individually through each file)
Configuration menu - View commit details
-
Copy full SHA for fdaabdc - Browse repository at this point
Copy the full SHA fdaabdcView commit details -
Removed StreamingContext.registerInputStream and registerOutputStream…
… - they were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation.
Configuration menu - View commit details
-
Copy full SHA for 4e497db - Browse repository at this point
Copy the full SHA 4e497dbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0984647 - Browse repository at this point
Copy the full SHA 0984647View commit details -
Merge pull request #415 from pwendell/shuffle-compress
Enable compression by default for spills
Configuration menu - View commit details
-
Copy full SHA for 055be5c - Browse repository at this point
Copy the full SHA 055be5cView commit details -
Configuration menu - View commit details
-
Copy full SHA for a3da468 - Browse repository at this point
Copy the full SHA a3da468View commit details -
Configuration menu - View commit details
-
Copy full SHA for 845e568 - Browse repository at this point
Copy the full SHA 845e568View commit details -
Merge remote-tracking branch 'apache/master' into filestream-fix
Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
Configuration menu - View commit details
-
Copy full SHA for f8e239e - Browse repository at this point
Copy the full SHA f8e239eView commit details -
Configuration menu - View commit details
-
Copy full SHA for f8bd828 - Browse repository at this point
Copy the full SHA f8bd828View commit details -
Merge pull request #416 from tdas/filestream-fix
Removed unnecessary DStream operations and updated docs Removed StreamingContext.registerInputStream and registerOutputStream - they were useless. InputDStream has been made to register itself, and just registering a DStream as output stream cause RDD objects to be created but the RDDs will not be computed at all.. Also made DStream.register() private[streaming] for the same reasons. Updated docs, specially added package documentation for streaming package. Also, changed NetworkWordCount's input storage level to use MEMORY_ONLY, replication on the local machine causes warning messages (as replication fails) which is scary for a new user trying out his/her first example.
Configuration menu - View commit details
-
Copy full SHA for 980250b - Browse repository at this point
Copy the full SHA 980250bView commit details -
Modifications as suggested in PR feedback-
- more variants of mapPartitions added to JavaRDDLike - move setGenerator to JavaRDDLike - clean up
Saurabh Rawat committedJan 14, 2014 Configuration menu - View commit details
-
Copy full SHA for 1442cd5 - Browse repository at this point
Copy the full SHA 1442cd5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2303479 - Browse repository at this point
Copy the full SHA 2303479View commit details -
Merge pull request #420 from pwendell/header-files
Add missing header files
Configuration menu - View commit details
-
Copy full SHA for fa75e5e - Browse repository at this point
Copy the full SHA fa75e5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 57fcfc7 - Browse repository at this point
Copy the full SHA 57fcfc7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 486f37c - Browse repository at this point
Copy the full SHA 486f37cView commit details -
Merge pull request #423 from jegonzal/GraphXProgrammingGuide
Improving the graphx-programming-guide This PR will track a few minor improvements to the content and formatting of the graphx-programming-guide.
Configuration menu - View commit details
-
Copy full SHA for 3fcc68b - Browse repository at this point
Copy the full SHA 3fcc68bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0bba773 - Browse repository at this point
Copy the full SHA 0bba773View commit details -
Broadcast variable visibility change & doc update.
Note that previously Broadcast class was accidentally marked as private[spark]. It needs to be public for broadcast variables to work. Also exposing the broadcast varaible id.
Configuration menu - View commit details
-
Copy full SHA for 71b3007 - Browse repository at this point
Copy the full SHA 71b3007View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a12b9e - Browse repository at this point
Copy the full SHA 6a12b9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for f8c12e9 - Browse repository at this point
Copy the full SHA f8c12e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 55db774 - Browse repository at this point
Copy the full SHA 55db774View commit details -
Maintain Serializable API compatibility by reverting back to java.io.…
…Serializable for Broadcast and Accumulator.
Configuration menu - View commit details
-
Copy full SHA for 1b5623f - Browse repository at this point
Copy the full SHA 1b5623fView commit details -
Configuration menu - View commit details
-
Copy full SHA for f12e506 - Browse repository at this point
Copy the full SHA f12e506View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f965a4 - Browse repository at this point
Copy the full SHA 6f965a4View commit details -
Configuration menu - View commit details
-
Copy full SHA for 938e4a0 - Browse repository at this point
Copy the full SHA 938e4a0View commit details -
Configuration menu - View commit details
-
Copy full SHA for b683608 - Browse repository at this point
Copy the full SHA b683608View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b3a3e2 - Browse repository at this point
Copy the full SHA 5b3a3e2View commit details -
Merge pull request #425 from rxin/scaladoc
API doc update & make Broadcast public In #413 Broadcast was mistakenly made private[spark]. I changed it to public again. Also exposing id in public given the R frontend requires that. Copied some of the documentation from the programming guide to API Doc for Broadcast and Accumulator. This should be cherry picked into branch-0.9 as well for 0.9.0 release.
Configuration menu - View commit details
-
Copy full SHA for 2ce23a5 - Browse repository at this point
Copy the full SHA 2ce23a5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ea2cd5 - Browse repository at this point
Copy the full SHA 8ea2cd5View commit details -
Configuration menu - View commit details
-
Copy full SHA for b1b22b7 - Browse repository at this point
Copy the full SHA b1b22b7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8ea056d - Browse repository at this point
Copy the full SHA 8ea056dView commit details -
Merge pull request #427 from pwendell/deprecate-aggregator
Deprecate rather than remove old combineValuesByKey function
Configuration menu - View commit details
-
Copy full SHA for d601a76 - Browse repository at this point
Copy the full SHA d601a76View commit details -
Merge pull request #429 from ankurdave/graphx-examples-pom.xml
Add GraphX dependency to examples/pom.xml
Configuration menu - View commit details
-
Copy full SHA for 193a075 - Browse repository at this point
Copy the full SHA 193a075View commit details -
Merge pull request #428 from pwendell/writeable-objects
Don't clone records for text files
Configuration menu - View commit details
-
Copy full SHA for 74b46ac - Browse repository at this point
Copy the full SHA 74b46acView commit details
Commits on Jan 15, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 1210ec2 - Browse repository at this point
Copy the full SHA 1210ec2View commit details -
Merge pull request #431 from ankurdave/graphx-caching-doc
Describe caching and uncaching in GraphX programming guide
Configuration menu - View commit details
-
Copy full SHA for ad294db - Browse repository at this point
Copy the full SHA ad294dbView commit details -
Merge pull request #424 from jegonzal/GraphXProgrammingGuide
Additional edits for clarity in the graphx programming guide. Added an overview of the Graph and GraphOps functions and fixed numerous typos.
Configuration menu - View commit details
-
Copy full SHA for 3a386e2 - Browse repository at this point
Copy the full SHA 3a386e2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 148757e - Browse repository at this point
Copy the full SHA 148757eView commit details -
Configuration menu - View commit details
-
Copy full SHA for f4d9019 - Browse repository at this point
Copy the full SHA f4d9019View commit details -
Configuration menu - View commit details
-
Copy full SHA for 147a943 - Browse repository at this point
Copy the full SHA 147a943View commit details -
Configuration menu - View commit details
-
Copy full SHA for dfb1524 - Browse repository at this point
Copy the full SHA dfb1524View commit details -
Changed SparkConf to not be serializable. And also fixed unit-test lo…
…g paths in log4j.properties of external modules.
Configuration menu - View commit details
-
Copy full SHA for 1f4718c - Browse repository at this point
Copy the full SHA 1f4718cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e15bd7 - Browse repository at this point
Copy the full SHA 0e15bd7View commit details -
Merge pull request #434 from rxin/graphxmaven
Fixed SVDPlusPlusSuite in Maven build. This should go into 0.9.0 also.
Configuration menu - View commit details
-
Copy full SHA for 087487e - Browse repository at this point
Copy the full SHA 087487eView commit details -
Merge pull request #435 from tdas/filestream-fix
Fixed the flaky tests by making SparkConf not serializable SparkConf was being serialized with CoGroupedRDD and Aggregator, which somehow caused OptionalJavaException while being deserialized as part of a ShuffleMapTask. SparkConf should not even be serializable (according to conversation with Matei). This change fixes that. @mateiz @pwendell
Configuration menu - View commit details
-
Copy full SHA for 139c24e - Browse repository at this point
Copy the full SHA 139c24eView commit details -
Expose method and class - so that we can use it from user code (parti…
…cularly since checkpoint directory is autogenerated now
Configuration menu - View commit details
-
Copy full SHA for 0aea33d - Browse repository at this point
Copy the full SHA 0aea33dView commit details -
Merge pull request #436 from ankurdave/VertexId-case
Rename VertexID -> VertexId in GraphX
Configuration menu - View commit details
-
Copy full SHA for 3d9e66d - Browse repository at this point
Copy the full SHA 3d9e66dView commit details -
remove "-XX:+UseCompressedStrings" option
remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
Configuration menu - View commit details
-
Copy full SHA for 263933d - Browse repository at this point
Copy the full SHA 263933dView commit details -
Merge pull request #366 from colorant/yarn-dev
More yarn code refactor Try to retrive common code in yarn alpha/stable for client and workerRunnable to reduce duplicated codes. By put them into a trait in common dir and extends with them. Same works could be done for the remaining files in alpha/stable , while the remainning files have much more overlapping codes with different API call here and there within functions, and will need much more close review , aslo it might divide functions into too small trifle ones, thus might not deserve to be done in this way. So just make it run for these two files firstly.
Configuration menu - View commit details
-
Copy full SHA for cef2af9 - Browse repository at this point
Copy the full SHA cef2af9View commit details -
Merge pull request #433 from markhamstra/debFix
Updated Debian packaging
Configuration menu - View commit details
-
Copy full SHA for 494d3c0 - Browse repository at this point
Copy the full SHA 494d3c0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9259d70 - Browse repository at this point
Copy the full SHA 9259d70View commit details -
Configuration menu - View commit details
-
Copy full SHA for 00a3f7e - Browse repository at this point
Copy the full SHA 00a3f7eView commit details -
Merge pull request #441 from pwendell/graphx-build
GraphX shouldn't list Spark as provided. I noticed this when building an application against GraphX to audit the released artifacts.
Configuration menu - View commit details
-
Copy full SHA for 5fecd25 - Browse repository at this point
Copy the full SHA 5fecd25View commit details -
Made some classes private[stremaing] and deprecated a method in JavaS…
…treamingContext.
Configuration menu - View commit details
-
Copy full SHA for 9e63753 - Browse repository at this point
Copy the full SHA 9e63753View commit details -
Merge pull request #443 from tdas/filestream-fix
Made some classes private[stremaing] and deprecated a method in JavaStreamingContext. Classes `RawTextHelper`, `RawTextSender` and `RateLimitedOutputStream` are not useful in the streaming API. There are not used by the core functionality and was there as a support classes for an obscure example. One of the classes is RawTextSender has a main function which can be executed using bin/spark-class even if it is made private[streaming]. In future, I will probably completely remove these classes. For the time being, I am just converting them to private[streaming]. Accessing underlying JavaSparkContext in JavaStreamingContext was through `JavaStreamingContext.sc` . This is deprecated and preferred method is `JavaStreamingContext.sparkContext` to keep it consistent with the `StreamingContext.sparkContext`.
Configuration menu - View commit details
-
Copy full SHA for 2a05403 - Browse repository at this point
Copy the full SHA 2a05403View commit details -
Merge pull request #442 from pwendell/standalone
Workers should use working directory as spark home if it's not specified If users don't set SPARK_HOME in their environment file when launching an application, the standalone cluster should default to the spark home of the worker.
Configuration menu - View commit details
-
Copy full SHA for 59f475c - Browse repository at this point
Copy the full SHA 59f475cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ffdaef - Browse repository at this point
Copy the full SHA 2ffdaefView commit details -
Merge pull request #444 from mateiz/py-version
Clarify that Python 2.7 is only needed for MLlib
Configuration menu - View commit details
-
Copy full SHA for 4f0c361 - Browse repository at this point
Copy the full SHA 4f0c361View commit details
Commits on Jan 16, 2014
-
Fail rather than hanging if a task crashes the JVM.
Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefiniteily rather than failing the job after maxFailures. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.
Configuration menu - View commit details
-
Copy full SHA for a268d63 - Browse repository at this point
Copy the full SHA a268d63View commit details -
Merge pull request #439 from CrazyJvm/master
SPARK-1024 Remove "-XX:+UseCompressedStrings" option from tuning guide remove "-XX:+UseCompressedStrings" option from tuning guide since jdk7 no longer supports this.
Configuration menu - View commit details
-
Copy full SHA for 0675ca5 - Browse repository at this point
Copy the full SHA 0675ca5View commit details -
fix "set MASTER automatically fails" bug.
spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
Configuration menu - View commit details
-
Copy full SHA for 7a0c5b5 - Browse repository at this point
Copy the full SHA 7a0c5b5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8400536 - Browse repository at this point
Copy the full SHA 8400536View commit details -
Merge pull request #414 from soulmachine/code-style
Code clean up for mllib * Removed unnecessary parentheses * Removed unused imports * Simplified `filter...size()` to `count ...` * Removed obsoleted parameters' comments
Configuration menu - View commit details
-
Copy full SHA for 84595ea - Browse repository at this point
Copy the full SHA 84595eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 718a13c - Browse repository at this point
Copy the full SHA 718a13cView commit details -
Merge pull request #445 from kayousterhout/exec_lost
Fail rather than hanging if a task crashes the JVM. Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefinitely rather than failing the job after maxFailures. Eventually, this makes the job hang, because the Standalone Scheduler removes the application after 10 works have failed, and then the app is left in a state where it's disconnected from the master and waiting to reconnect. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.
Configuration menu - View commit details
-
Copy full SHA for c06a307 - Browse repository at this point
Copy the full SHA c06a307View commit details -
Fixed Window spark shell launch script error.
JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
Configuration menu - View commit details
-
Copy full SHA for 4e510b0 - Browse repository at this point
Copy the full SHA 4e510b0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1a0da89 - Browse repository at this point
Copy the full SHA 1a0da89View commit details -
Configuration menu - View commit details
-
Copy full SHA for edd82c5 - Browse repository at this point
Copy the full SHA edd82c5View commit details -
Updated java API docs for streaming, along with very minor changes in…
… the code examples.
Configuration menu - View commit details
-
Copy full SHA for 11e6534 - Browse repository at this point
Copy the full SHA 11e6534View commit details
Commits on Jan 17, 2014
-
Configuration menu - View commit details
-
Copy full SHA for fcb4fc6 - Browse repository at this point
Copy the full SHA fcb4fc6View commit details -
Merge pull request #438 from ScrapCodes/clone-records-java-api
Clone records java api
Configuration menu - View commit details
-
Copy full SHA for d4fd89e - Browse repository at this point
Copy the full SHA d4fd89eView commit details -
Merge pull request #451 from Qiuzhuang/master
Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
Configuration menu - View commit details
-
Copy full SHA for d749d47 - Browse repository at this point
Copy the full SHA d749d47View commit details -
Configuration menu - View commit details
-
Copy full SHA for b690e11 - Browse repository at this point
Copy the full SHA b690e11View commit details -
Configuration menu - View commit details
-
Copy full SHA for d28bf41 - Browse repository at this point
Copy the full SHA d28bf41View commit details -
Configuration menu - View commit details
-
Copy full SHA for cb13b15 - Browse repository at this point
Copy the full SHA cb13b15View commit details -
Configuration menu - View commit details
-
Copy full SHA for eb2d8c4 - Browse repository at this point
Copy the full SHA eb2d8c4View commit details -
Configuration menu - View commit details
-
Copy full SHA for dbec69b - Browse repository at this point
Copy the full SHA dbec69bView commit details -
Configuration menu - View commit details
-
Copy full SHA for c9b4845 - Browse repository at this point
Copy the full SHA c9b4845View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5c639d7 - Browse repository at this point
Copy the full SHA 5c639d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4e96757 - Browse repository at this point
Copy the full SHA 4e96757View commit details -
Configuration menu - View commit details
-
Copy full SHA for caf97a2 - Browse repository at this point
Copy the full SHA caf97a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for fa32998 - Browse repository at this point
Copy the full SHA fa32998View commit details -
Configuration menu - View commit details
-
Copy full SHA for 85b95d0 - Browse repository at this point
Copy the full SHA 85b95d0View commit details
Commits on Jan 18, 2014
-
Configuration menu - View commit details
-
Copy full SHA for e91ad3f - Browse repository at this point
Copy the full SHA e91ad3fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5316bca - Browse repository at this point
Copy the full SHA 5316bcaView commit details -
Merge pull request #461 from pwendell/master
Use renamed shuffle spill config in CoGroupedRDD.scala This one got missed when it was renamed.
Configuration menu - View commit details
-
Copy full SHA for aa981e4 - Browse repository at this point
Copy the full SHA aa981e4View commit details -
Allow files added through SparkContext.addFile() to be overwritten
This is useful for the cases when a file needs to be refreshed and downloaded by the executors periodically. Signed-off-by: Yinan Li <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fd833e7 - Browse repository at this point
Copy the full SHA fd833e7View commit details
Commits on Jan 19, 2014
-
Merge pull request #462 from mateiz/conf-file-fix
Remove Typesafe Config usage and conf files to fix nested property names With Typesafe Config we had the subtle problem of no longer allowing nested property names, which are used for a few of our properties: http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html This PR is for branch 0.9 but should be added into master too. (cherry picked from commit 34e911c) Signed-off-by: Patrick Wendell <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bf56995 - Browse repository at this point
Copy the full SHA bf56995View commit details -
Merge pull request #426 from mateiz/py-ml-tests
Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+) We disabled these earlier because Jenkins didn't have these versions.
Configuration menu - View commit details
-
Copy full SHA for 4c16f79 - Browse repository at this point
Copy the full SHA 4c16f79View commit details -
Merge pull request #437 from mridulm/master
Minor api usability changes - Expose checkpoint directory - since it is autogenerated now - null check for jars - Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.
Configuration menu - View commit details
-
Copy full SHA for 73dfd42 - Browse repository at this point
Copy the full SHA 73dfd42View commit details -
Merge pull request #459 from srowen/UpdaterL2Regularization
Correct L2 regularized weight update with canonical form Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E
Configuration menu - View commit details
-
Copy full SHA for fe8a354 - Browse repository at this point
Copy the full SHA fe8a354View commit details -
Addressed comments from Reynold
Signed-off-by: Yinan Li <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 584323c - Browse repository at this point
Copy the full SHA 584323cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 720836a - Browse repository at this point
Copy the full SHA 720836aView commit details -
Configuration menu - View commit details
-
Copy full SHA for ceb79a3 - Browse repository at this point
Copy the full SHA ceb79a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for dd56b21 - Browse repository at this point
Copy the full SHA dd56b21View commit details -
Merge pull request #458 from tdas/docs-update
Updated java API docs for streaming, along with very minor changes in the code examples. Docs updated for: Scala: StreamingContext, DStream, PairDStreamFunctions Java: JavaStreamingContext, JavaDStream, JavaPairDStream Example updated: JavaQueueStream: Not use deprecated method ActorWordCount: Use the public interface the right way.
Configuration menu - View commit details
-
Copy full SHA for 256a355 - Browse repository at this point
Copy the full SHA 256a355View commit details -
Merge pull request #470 from tgravescs/fix_spark_examples_yarn
Only log error on missing jar to allow spark examples to jar. Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs. To make that nicer so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.
Configuration menu - View commit details
-
Copy full SHA for 792d908 - Browse repository at this point
Copy the full SHA 792d908View commit details
Commits on Jan 20, 2014
-
Configuration menu - View commit details
-
Copy full SHA for f9a95d6 - Browse repository at this point
Copy the full SHA f9a95d6View commit details -
change TestClient & Worker to Some("xxx") kill manager if it is started remove unnecessary .get when fetch "SPARK_HOME" values
Configuration menu - View commit details
-
Copy full SHA for 29f4b6a - Browse repository at this point
Copy the full SHA 29f4b6aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3e85b87 - Browse repository at this point
Copy the full SHA 3e85b87View commit details
Commits on Jan 21, 2014
-
Configuration menu - View commit details
-
Copy full SHA for cdb003e - Browse repository at this point
Copy the full SHA cdb003eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 54867e9 - Browse repository at this point
Copy the full SHA 54867e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1b29914 - Browse repository at this point
Copy the full SHA 1b29914View commit details -
Configuration menu - View commit details
-
Copy full SHA for c324ac1 - Browse repository at this point
Copy the full SHA c324ac1View commit details -
Configuration menu - View commit details
-
Copy full SHA for f84400e - Browse repository at this point
Copy the full SHA f84400eView commit details -
Configuration menu - View commit details
-
Copy full SHA for de526ad - Browse repository at this point
Copy the full SHA de526adView commit details -
Configuration menu - View commit details
-
Copy full SHA for d46df96 - Browse repository at this point
Copy the full SHA d46df96View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2e95174 - Browse repository at this point
Copy the full SHA 2e95174View commit details -
Restricting /lib to top level directory in .gitignore
This patch was proposed by Sean Mackrory.
Configuration menu - View commit details
-
Copy full SHA for e437069 - Browse repository at this point
Copy the full SHA e437069View commit details -
Configuration menu - View commit details
-
Copy full SHA for e0b741d - Browse repository at this point
Copy the full SHA e0b741dView commit details -
Merge pull request #483 from pwendell/gitignore
Restricting /lib to top level directory in .gitignore This patch was proposed by Sean Mackrory.
Configuration menu - View commit details
-
Copy full SHA for 7373ffb - Browse repository at this point
Copy the full SHA 7373ffbView commit details -
Merge pull request #482 from tdas/streaming-example-fix
Added StreamingContext.awaitTermination to streaming examples StreamingContext.start() currently starts a non-daemon thread which prevents termination of a Spark Streaming program even if main function has exited. Since the expected behavior of a streaming program is to run until explicitly killed, this was sort of fine when spark streaming applications are launched from the command line. However, when launched in Yarn-standalone mode, this did not work as the driver effectively got terminated when the main function exits. So SparkStreaming examples did not work on Yarn. This addition to the examples ensures that the examples work on Yarn and also ensures that everyone learns that StreamingContext.awaitTermination() being necessary for SparkStreaming programs to wait. The true bug-fix of making sure all threads by Spark Streaming are daemon threads is left for post-0.9.
Configuration menu - View commit details
-
Copy full SHA for 0367981 - Browse repository at this point
Copy the full SHA 0367981View commit details -
Merge pull request #449 from CrazyJvm/master
SPARK-1028 : fix "set MASTER automatically fails" bug. spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
Configuration menu - View commit details
-
Copy full SHA for 6b4eed7 - Browse repository at this point
Copy the full SHA 6b4eed7View commit details -
Configuration menu - View commit details
-
Copy full SHA for a917a87 - Browse repository at this point
Copy the full SHA a917a87View commit details -
Configuration menu - View commit details
-
Copy full SHA for 65869f8 - Browse repository at this point
Copy the full SHA 65869f8View commit details -
Merge pull request #484 from tdas/run-example-fix
Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM. bin/run-example scripts was not passing Java properties set through the SPARK_JAVA_OPTS to the example. This is important for examples like Twitter** as the Twitter authentication information must be set through java properties. Hence added the same JAVA_OPTS code in run-example as it is in bin/spark-class script. Also added SPARK_MEM, in case someone wants to run the example with different amounts of memory. This can be removed if it is not tune with the intended semantics of the run-example scripts. @matei Please check this soon I want this to go in 0.9-rc4
Configuration menu - View commit details
-
Copy full SHA for c67d3d8 - Browse repository at this point
Copy the full SHA c67d3d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for a9bcc98 - Browse repository at this point
Copy the full SHA a9bcc98View commit details -
Merge pull request #480 from pwendell/0.9-fixes
Handful of 0.9 fixes This patch addresses a few fixes for Spark 0.9.0 based on the last release candidate. @mridulm gets credit for reporting most of the issues here. Many of the fixes here are based on his work in #477 and follow up discussion with him.
Configuration menu - View commit details
-
Copy full SHA for 77b986f - Browse repository at this point
Copy the full SHA 77b986fView commit details -
Incorporate Tom's comments - update doc and code to reflect that core…
… requests may not always be honored
Configuration menu - View commit details
-
Copy full SHA for adf4261 - Browse repository at this point
Copy the full SHA adf4261View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a067b4 - Browse repository at this point
Copy the full SHA 3a067b4View commit details -
Merge pull request #469 from ajtulloch/use-local-spark-context-in-tes…
…ts-for-mllib [MLlib] Use a LocalSparkContext trait in test suites Replaces the 9 instances of ```scala class XXXSuite extends FunSuite with BeforeAndAfterAll { @transient private var sc: SparkContext = _ override def beforeAll() { sc = new SparkContext("local", "test") } override def afterAll() { sc.stop() System.clearProperty("spark.driver.port") } ``` with ```scala class XXXSuite extends FunSuite with LocalSparkContext { ```
Configuration menu - View commit details
-
Copy full SHA for f854498 - Browse repository at this point
Copy the full SHA f854498View commit details -
Clarify spark.default.parallelism
It's the task count across the cluster, not per worker, per machine, per core, or anything else.
Configuration menu - View commit details
-
Copy full SHA for 069bb94 - Browse repository at this point
Copy the full SHA 069bb94View commit details -
Merge pull request #489 from ash211/patch-6
Clarify spark.default.parallelism It's the task count across the cluster, not per worker, per machine, per core, or anything else.
Configuration menu - View commit details
-
Copy full SHA for 749f842 - Browse repository at this point
Copy the full SHA 749f842View commit details
Commits on Jan 22, 2014
-
Replace the code to check for Option != None with Option.isDefined ca…
…ll in Scala code. This hopefully will make the code cleaner.
Configuration menu - View commit details
-
Copy full SHA for 90ea9d5 - Browse repository at this point
Copy the full SHA 90ea9d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 36f9a64 - Browse repository at this point
Copy the full SHA 36f9a64View commit details -
Fixed bug where task set managers are added to queue twice
This bug leads to a small performance hit because task set managers will get offered each rejected resource offer twice, but doesn't lead to any incorrect functionality.
Configuration menu - View commit details
-
Copy full SHA for 19da82c - Browse repository at this point
Copy the full SHA 19da82cView commit details -
Merge pull request #315 from rezazadeh/sparsesvd
Sparse SVD # Singular Value Decomposition Given an *m x n* matrix *A*, compute matrices *U, S, V* such that *A = U * S * V^T* There is no restriction on m, but we require n^2 doubles to fit in memory. Further, n should be less than m. The decomposition is computed by first computing *A^TA = V S^2 V^T*, computing svd locally on that (since n x n is small), from which we recover S and V. Then we compute U via easy matrix multiplication as *U = A * V * S^-1* Only singular vectors associated with the largest k singular values If there are k such values, then the dimensions of the return will be: * *S* is *k x k* and diagonal, holding the singular values on diagonal. * *U* is *m x k* and satisfies U^T*U = eye(k). * *V* is *n x k* and satisfies V^TV = eye(k). All input and output is expected in sparse matrix format, 0-indexed as tuples of the form ((i,j),value) all in RDDs. # Testing Tests included. They test: - Decomposition promise (A = USV^T) - For small matrices, output is compared to that of jblas - Rank 1 matrix test included - Full Rank matrix test included - Middle-rank matrix forced via k included # Example Usage import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.SVD import org.apache.spark.mllib.linalg.SparseMatrix import org.apache.spark.mllib.linalg.MatrixyEntry // Load and parse the data file val data = sc.textFile("mllib/data/als/test.data").map { line => val parts = line.split(',') MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble) } val m = 4 val n = 4 // recover top 1 singular vector val decomposed = SVD.sparseSVD(SparseMatrix(data, m, n), 1) println("singular values = " + decomposed.S.data.toArray.mkString) # Documentation Added to docs/mllib-guide.md
Configuration menu - View commit details
-
Copy full SHA for d009b17 - Browse repository at this point
Copy the full SHA d009b17View commit details -
Merge pull request #493 from kayousterhout/double_add
Fixed bug where task set managers are added to queue twice @mateiz can you verify that this is a bug and wasn't intentional? (https://github.com/apache/incubator-spark/commit/90a04dab8d9a2a9a372cea7cdf46cc0fd0f2f76c#diff-7fa4f84a961750c374f2120ca70e96edR551) This bug leads to a small performance hit because task set managers will get offered each rejected resource offer twice, but doesn't lead to any incorrect functionality. Thanks to @hdc1112 for pointing this out.
Configuration menu - View commit details
-
Copy full SHA for 5bcfd79 - Browse repository at this point
Copy the full SHA 5bcfd79View commit details -
Merge pull request #478 from sryza/sandy-spark-1033
SPARK-1033. Ask for cores in Yarn container requests Tested on a pseudo-distributed cluster against the Fair Scheduler and observed a worker taking more than a single core.
Configuration menu - View commit details
-
Copy full SHA for 576c4a4 - Browse repository at this point
Copy the full SHA 576c4a4View commit details -
Depend on Commons Math explicitly instead of accidentally getting it …
…from Hadoop (which stops working in 2.2.x) and also use the newer commons-math3
Configuration menu - View commit details
-
Copy full SHA for fd0c5b8 - Browse repository at this point
Copy the full SHA fd0c5b8View commit details -
Merge pull request #492 from skicavs/master
fixed job name and usage information for the JavaSparkPi example
Configuration menu - View commit details
-
Copy full SHA for a1238bb - Browse repository at this point
Copy the full SHA a1238bbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4476398 - Browse repository at this point
Copy the full SHA 4476398View commit details -
Merge pull request #495 from srowen/GraphXCommonsMathDependency
Fix graphx Commons Math dependency `graphx` depends on Commons Math (2.x) in `SVDPlusPlus.scala`. However the module doesn't declare this dependency. It happens to work because it is included by Hadoop artifacts. But, I can tell you this isn't true as of a month or so ago. Building versus recent Hadoop would fail. (That's how we noticed.) The simple fix is to declare the dependency, as it should be. But it's also worth noting that `commons-math` is the old-ish 2.x line, while `commons-math3` is where newer 3.x releases are. Drop-in replacement, but different artifact and package name. Changing this only usage to `commons-math3` works, tests pass, and isn't surprising that it does, so is probably also worth changing. (A comment in some test code also references `commons-math3`, FWIW.) It does raise another question though: `mllib` looks like it uses the `jblas` `DoubleMatrix` for general purpose vector/matrix stuff. Should `graphx` really use Commons Math for this? Beyond the tiny scope here but worth asking.
Configuration menu - View commit details
-
Copy full SHA for 3184fac - Browse repository at this point
Copy the full SHA 3184facView commit details
Commits on Jan 23, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 2b3c461 - Browse repository at this point
Copy the full SHA 2b3c461View commit details -
Fix bug in worker clean-up in UI
Introduced in d5a96fe. This should be picked into 0.8 and 0.9 as well.
Configuration menu - View commit details
-
Copy full SHA for 6285513 - Browse repository at this point
Copy the full SHA 6285513View commit details -
Merge pull request #447 from CodingCat/SPARK-1027
fix for SPARK-1027 fix for SPARK-1027 (https://spark-project.atlassian.net/browse/SPARK-1027) FIXES 1. change sparkhome from String to Option(String) in ApplicationDesc 2. remove sparkhome parameter in LaunchExecutor message 3. adjust involved files
Configuration menu - View commit details
-
Copy full SHA for 034dce2 - Browse repository at this point
Copy the full SHA 034dce2View commit details -
Configuration menu - View commit details
-
Copy full SHA for a1cd185 - Browse repository at this point
Copy the full SHA a1cd185View commit details -
Configuration menu - View commit details
-
Copy full SHA for cc0fd33 - Browse repository at this point
Copy the full SHA cc0fd33View commit details -
Configuration menu - View commit details
-
Copy full SHA for a5a513e - Browse repository at this point
Copy the full SHA a5a513eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 19a01c1 - Browse repository at this point
Copy the full SHA 19a01c1View commit details -
fixed ClassTag in mapPartitions
eklavya committedJan 23, 2014 Configuration menu - View commit details
-
Copy full SHA for 60e7457 - Browse repository at this point
Copy the full SHA 60e7457View commit details -
Merge pull request #499 from jianpingjwang/dev1
Replace commons-math with jblas in SVDPlusPlus
Configuration menu - View commit details
-
Copy full SHA for a2b47da - Browse repository at this point
Copy the full SHA a2b47daView commit details -
Merge pull request #406 from eklavya/master
Extending Java API coverage Hi, I have added three new methods to JavaRDD. Please review and merge.
Configuration menu - View commit details
-
Copy full SHA for fad6aac - Browse repository at this point
Copy the full SHA fad6aacView commit details -
Configuration menu - View commit details
-
Copy full SHA for 0035dbb - Browse repository at this point
Copy the full SHA 0035dbbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6156990 - Browse repository at this point
Copy the full SHA 6156990View commit details
Commits on Jan 24, 2014
-
Remove Hadoop object cloning and warn users making Hadoop RDD's.
The code introduced in #359 used Hadoop's WritableUtils.clone() to duplicate objects when reading from Hadoop files. Some users have reported exceptions when cloning data in verious file formats, including Avro and another custom format. This patch removes that functionality to ensure stability for the 0.9 release. Instead, it puts a clear warning in the documentation that copying may be necessary for Hadoop data sets.
Configuration menu - View commit details
-
Copy full SHA for 7101017 - Browse repository at this point
Copy the full SHA 7101017View commit details -
Fix bug on read-side of external sort when using Snappy.
This case wasn't handled correctly and this patch fixes it.
Configuration menu - View commit details
-
Copy full SHA for 0213b40 - Browse repository at this point
Copy the full SHA 0213b40View commit details -
Configuration menu - View commit details
-
Copy full SHA for c58d4ea - Browse repository at this point
Copy the full SHA c58d4eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for f830684 - Browse repository at this point
Copy the full SHA f830684View commit details -
Configuration menu - View commit details
-
Copy full SHA for 268ecbd - Browse repository at this point
Copy the full SHA 268ecbdView commit details -
Merge pull request #501 from JoshRosen/cartesian-rdd-fixes
Fix two bugs in PySpark cartesian(): SPARK-978 and SPARK-1034 This pull request fixes two bugs in PySpark's `cartesian()` method: - [SPARK-978](https://spark-project.atlassian.net/browse/SPARK-978): PySpark's cartesian method throws ClassCastException exception - [SPARK-1034](https://spark-project.atlassian.net/browse/SPARK-1034): Py4JException on PySpark Cartesian Result The JIRAs have more details describing the fixes.
Configuration menu - View commit details
-
Copy full SHA for cad3002 - Browse repository at this point
Copy the full SHA cad3002View commit details -
Merge pull request #502 from pwendell/clone-1
Remove Hadoop object cloning and warn users making Hadoop RDD's. The code introduced in #359 used Hadoop's WritableUtils.clone() to duplicate objects when reading from Hadoop files. Some users have reported exceptions when cloning data in various file formats, including Avro and another custom format. This patch removes that functionality to ensure stability for the 0.9 release. Instead, it puts a clear warning in the documentation that copying may be necessary for Hadoop data sets.
Configuration menu - View commit details
-
Copy full SHA for c319617 - Browse repository at this point
Copy the full SHA c319617View commit details -
Configuration menu - View commit details
-
Copy full SHA for ff44732 - Browse repository at this point
Copy the full SHA ff44732View commit details -
Merge pull request #503 from pwendell/master
Fix bug on read-side of external sort when using Snappy. This case wasn't handled correctly and this patch fixes it.
Configuration menu - View commit details
-
Copy full SHA for 3d6e754 - Browse repository at this point
Copy the full SHA 3d6e754View commit details -
Deprecate mapPartitionsWithSplit in PySpark.
Also, replace the last reference to it in the docs. This fixes SPARK-1026.
Configuration menu - View commit details
-
Copy full SHA for 4cebb79 - Browse repository at this point
Copy the full SHA 4cebb79View commit details -
Merge pull request #505 from JoshRosen/SPARK-1026
Deprecate mapPartitionsWithSplit in PySpark (SPARK-1026) This commit deprecates `mapPartitionsWithSplit` in PySpark (see [SPARK-1026](https://spark-project.atlassian.net/browse/SPARK-1026) and removes the remaining references to it from the docs.
Configuration menu - View commit details
-
Copy full SHA for 05be704 - Browse repository at this point
Copy the full SHA 05be704View commit details
Commits on Jan 26, 2014
-
Increase JUnit test verbosity under SBT.
Upgrade junit-interface plugin from 0.9 to 0.10. I noticed that the JavaAPISuite tests didn't appear to display any output locally or under Jenkins, making it difficult to know whether they were running. This change increases the verbosity to more closely match the ScalaTest tests.
Configuration menu - View commit details
-
Copy full SHA for 531d9d7 - Browse repository at this point
Copy the full SHA 531d9d7View commit details -
Fix ClassCastException in JavaPairRDD.collectAsMap() (SPARK-1040)
This fixes an issue where collectAsMap() could fail when called on a JavaPairRDD that was derived by transforming a non-JavaPairRDD. The root problem was that we were creating the JavaPairRDD's ClassTag by casting a ClassTag[AnyRef] to a ClassTag[Tuple2[K2, V2]]. To fix this, I cast a ClassTag[Tuple2[_, _]] instead, since this actually produces a ClassTag of the appropriate type because ClassTags don't capture type parameters: scala> implicitly[ClassTag[Tuple2[_, _]]] == implicitly[ClassTag[Tuple2[Int, Int]]] res8: Boolean = true scala> implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[Tuple2[Int, Int]]] == implicitly[ClassTag[Tuple2[Int, Int]]] res9: Boolean = false
Configuration menu - View commit details
-
Copy full SHA for 740e865 - Browse repository at this point
Copy the full SHA 740e865View commit details -
Merge pull request #511 from JoshRosen/SPARK-1040
Fix ClassCastException in JavaPairRDD.collectAsMap() (SPARK-1040) This fixes [SPARK-1040](https://spark-project.atlassian.net/browse/SPARK-1040), an issue where JavaPairRDD.collectAsMap() could sometimes fail with ClassCastException. I applied the same fix to the Spark Streaming Java APIs. The commit message describes the fix in more detail. I also increased the verbosity of JUnit test output under SBT to make it easier to verify that the Java tests are actually running.
Configuration menu - View commit details
-
Copy full SHA for c66a2ef - Browse repository at this point
Copy the full SHA c66a2efView commit details -
Merge pull request #504 from JoshRosen/SPARK-1025
Fix PySpark hang when input files are deleted (SPARK-1025) This pull request addresses [SPARK-1025](https://spark-project.atlassian.net/browse/SPARK-1025), an issue where PySpark could hang if its input files were deleted.
Configuration menu - View commit details
-
Copy full SHA for c40619d - Browse repository at this point
Copy the full SHA c40619dView commit details
Commits on Jan 27, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 6a5af7b - Browse repository at this point
Copy the full SHA 6a5af7bView commit details -
Merge pull request #460 from srowen/RandomInitialALSVectors
Choose initial user/item vectors uniformly on the unit sphere ...rather than within the unit square to possibly avoid bias in the initial state and improve convergence. The current implementation picks the N vector elements uniformly at random from [0,1). This means they all point into one quadrant of the vector space. As N gets just a little large, the vector tend strongly to point into the "corner", towards (1,1,1...,1). The vectors are not unit vectors either. I suggest choosing the elements as Gaussian ~ N(0,1) and normalizing. This gets you uniform random choices on the unit sphere which is more what's of interest here. It has worked a little better for me in the past. This is pretty minor but wanted to warm up suggesting a few tweaks to ALS. Please excuse my Scala, pretty new to it. Author: Sean Owen <[email protected]> == Merge branch commits == commit 492b13a Author: Sean Owen <[email protected]> Date: Mon Jan 27 08:05:25 2014 +0000 Style: spaces around binary operators commit ce2b5b5 Author: Sean Owen <[email protected]> Date: Sun Jan 19 22:50:03 2014 +0000 Generate factors with all positive components, per discussion in https://github.com/apache/incubator-spark/pull/460 commit b6f7a8a Author: Sean Owen <[email protected]> Date: Sat Jan 18 15:54:42 2014 +0000 Choose initial user/item vectors uniformly on the unit sphere rather than within the unit square to possibly avoid bias in the initial state and improve convergence
Configuration menu - View commit details
-
Copy full SHA for f67ce3e - Browse repository at this point
Copy the full SHA f67ce3eView commit details -
Merge pull request #490 from hsaputra/modify_checkoption_with_isdefined
Replace the check for None Option with isDefined and isEmpty in Scala code Propose to replace the Scala check for Option "!= None" with Option.isDefined and "=== None" with Option.isEmpty. I think this, using method call if possible then operator function plus argument, will make the Scala code easier to read and understand. Pass compile and tests.
Configuration menu - View commit details
-
Copy full SHA for f16c21e - Browse repository at this point
Copy the full SHA f16c21eView commit details
Commits on Jan 28, 2014
-
Merge pull request #516 from sarutak/master
modified SparkPluginBuild.scala to use https protocol for accessing gith... We cannot build Spark behind a proxy although we execute sbt with -Dhttp(s).proxyHost -Dhttp(s).proxyPort -Dhttp(s).proxyUser -Dhttp(s).proxyPassword options. It's because of using git protocol to clone junit_xml_listener.git. I could build after modifying SparkPluginBuild.scala. I reported this issue to JIRA. https://spark-project.atlassian.net/browse/SPARK-1046
Configuration menu - View commit details
-
Copy full SHA for 3d5c03e - Browse repository at this point
Copy the full SHA 3d5c03eView commit details -
Merge pull request #466 from liyinan926/file-overwrite-new
Allow files added through SparkContext.addFile() to be overwritten This is useful for the cases when a file needs to be refreshed and downloaded by the executors periodically. For example, a possible use case is: the driver periodically renews a Hadoop delegation token and writes it to a token file. The token file needs to be downloaded by the executors whenever it gets renewed. However, the current implementation throws an exception when the target file exists and its contents do not match those of the new source. This PR adds an option to allow files to be overwritten to support use cases similar to the above.
Configuration menu - View commit details
-
Copy full SHA for 84670f2 - Browse repository at this point
Copy the full SHA 84670f2View commit details
Commits on Jan 29, 2014
-
Configuration menu - View commit details
-
Copy full SHA for 1381fc7 - Browse repository at this point
Copy the full SHA 1381fc7View commit details -
Configuration menu - View commit details
-
Copy full SHA for f8c742c - Browse repository at this point
Copy the full SHA f8c742cView commit details -
Merge pull request #497 from tdas/docs-update
Updated Spark Streaming Programming Guide Here is the updated version of the Spark Streaming Programming Guide. This is still a work in progress, but the major changes are in place. So feedback is most welcome. In general, I have tried to make the guide to easier to understand even if the reader does not know much about Spark. The updated website is hosted here - http://www.eecs.berkeley.edu/~tdas/spark_docs/streaming-programming-guide.html The major changes are: - Overview illustrates the usecases of Spark Streaming - various input sources and various output sources - An example right after overview to quickly give an idea of what Spark Streaming program looks like - Made Java API and examples a first class citizen like Scala by using tabs to show both Scala and Java examples (similar to AMPCamp tutorial's code tabs) - Highlighted the DStream operations updateStateByKey and transform because of their powerful nature - Updated driver node failure recovery text to highlight automatic recovery in Spark standalone mode - Added information about linking and using the external input sources like Kafka and Flume - In general, reorganized the sections to better show the Basic section and the more advanced sections like Tuning and Recovery. Todos: - Links to the docs of external Kafka, Flume, etc - Illustrate window operation with figure as well as example. Author: Tathagata Das <[email protected]> == Merge branch commits == commit 18ff105 Author: Tathagata Das <[email protected]> Date: Tue Jan 28 21:49:30 2014 -0800 Fixed a lot of broken links. commit 34a5a60 Author: Tathagata Das <[email protected]> Date: Tue Jan 28 18:02:28 2014 -0800 Updated github url to use SPARK_GITHUB_URL variable. commit f338a60 Author: Tathagata Das <[email protected]> Date: Mon Jan 27 22:42:42 2014 -0800 More updates based on Patrick and Harvey's comments. commit 89a81ff Author: Tathagata Das <[email protected]> Date: Mon Jan 27 13:08:34 2014 -0800 Updated docs based on Patricks PR comments. commit d5b6196 Author: Tathagata Das <[email protected]> Date: Sun Jan 26 20:15:58 2014 -0800 Added spark.streaming.unpersist config and info on StreamingListener interface. commit e3dcb46 Author: Tathagata Das <[email protected]> Date: Sun Jan 26 18:41:12 2014 -0800 Fixed docs on StreamingContext.getOrCreate. commit 6c29524 Author: Tathagata Das <[email protected]> Date: Thu Jan 23 18:49:39 2014 -0800 Added example and figure for window operations, and links to Kafka and Flume API docs. commit f06b964 Author: Tathagata Das <[email protected]> Date: Wed Jan 22 22:49:12 2014 -0800 Fixed missing endhighlight tag in the MLlib guide. commit 036a7d4 Merge: eab351d a1cd185 Author: Tathagata Das <[email protected]> Date: Wed Jan 22 22:17:42 2014 -0800 Merge remote-tracking branch 'apache/master' into docs-update commit eab351d Author: Tathagata Das <[email protected]> Date: Wed Jan 22 22:17:15 2014 -0800 Update Spark Streaming Programming Guide.
Configuration menu - View commit details
-
Copy full SHA for 7930209 - Browse repository at this point
Copy the full SHA 7930209View commit details -
Merge pull request #494 from tyro89/worker_registration_issue
Issue with failed worker registrations I've been going through the spark source after having some odd issues with workers dying and not coming back. After some digging (I'm very new to scala and spark) I believe I've found a worker registration issue. It looks to me like a failed registration follows the same code path as a successful registration which end up with workers believing they are connected (since they received a `RegisteredWorker` event) even tho they are not registered on the Master. This is a quick fix that I hope addresses this issue (assuming I didn't completely miss-read the code and I'm about to look like a silly person :P) I'm opening this pr now to start a chat with you guys while I do some more testing on my side :) Author: Erik Selin <[email protected]> == Merge branch commits == commit 973012f Author: Erik Selin <[email protected]> Date: Tue Jan 28 23:36:12 2014 -0500 break logwarning into two lines to respect line character limit. commit e3754dc Author: Erik Selin <[email protected]> Date: Tue Jan 28 21:16:21 2014 -0500 add log warning when worker registration fails due to attempt to re-register on same address. commit 14baca2 Author: Erik Selin <[email protected]> Date: Wed Jan 22 21:23:26 2014 -0500 address code style comment commit 71c0d7e Author: Erik Selin <[email protected]> Date: Wed Jan 22 16:01:42 2014 -0500 Make a failed registration not persist, not send a `RegisteredWordker` event and not run `schedule` but rather send a `RegisterWorkerFailed` message to the worker attempting to register.
Configuration menu - View commit details
-
Copy full SHA for 0ff38c2 - Browse repository at this point
Copy the full SHA 0ff38c2View commit details
Commits on Jan 30, 2014
-
Merge pull request #524 from rxin/doc
Added spark.shuffle.file.buffer.kb to configuration doc. Author: Reynold Xin <[email protected]> == Merge branch commits == commit 0eea1d7 Author: Reynold Xin <[email protected]> Date: Wed Jan 29 14:40:48 2014 -0800 Added spark.shuffle.file.buffer.kb to configuration doc.
Configuration menu - View commit details
-
Copy full SHA for ac712e4 - Browse repository at this point
Copy the full SHA ac712e4View commit details
Commits on Feb 1, 2014
-
Merge pull request #527 from ankurdave/graphx-assembly-pom
Add GraphX to assembly/pom.xml Author: Ankur Dave <[email protected]> == Merge branch commits == commit bb0b33e Author: Ankur Dave <[email protected]> Date: Fri Jan 31 15:24:52 2014 -0800 Add GraphX to assembly/pom.xml
Configuration menu - View commit details
-
Copy full SHA for a8cf3ec - Browse repository at this point
Copy the full SHA a8cf3ecView commit details
Commits on Feb 3, 2014
-
Merge pull request #529 from hsaputra/cleanup_right_arrowop_scala
Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency Looks like there are some ⇒ Unicode character (maybe from scalariform) in Scala code. This PR is to change it to => to get some consistency on the Scala code. If we want to use ⇒ as default we could use sbt plugin scalariform to make sure all Scala code has ⇒ instead of => And remove unused imports found in TwitterInputDStream.scala while I was there =) Author: Henry Saputra <[email protected]> == Merge branch commits == commit 29c1771 Author: Henry Saputra <[email protected]> Date: Sat Feb 1 22:05:16 2014 -0800 Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency.
Configuration menu - View commit details
-
Copy full SHA for 0386f42 - Browse repository at this point
Copy the full SHA 0386f42View commit details -
Merge pull request #530 from aarondav/cleanup. Closes #530.
Remove explicit conversion to PairRDDFunctions in cogroup() As SparkContext._ is already imported, using the implicit conversion appears to make the code much cleaner. Perhaps there was some sinister reason for doing the conversion explicitly, however. Author: Aaron Davidson <[email protected]> == Merge branch commits == commit aa4a63f Author: Aaron Davidson <[email protected]> Date: Sun Feb 2 23:48:04 2014 -0800 Remove explicit conversion to PairRDDFunctions in cogroup() As SparkContext._ is already imported, using the implicit conversion appears to make the code much cleaner. Perhaps there was some sinister reason for doing the converion explicitly, however.
Configuration menu - View commit details
-
Copy full SHA for 1625d8c - Browse repository at this point
Copy the full SHA 1625d8cView commit details -
Merge pull request #528 from mengxr/sample. Closes #528.
Refactor RDD sampling and add randomSplit to RDD (update) Replace SampledRDD by PartitionwiseSampledRDD, which accepts a RandomSampler instance as input. The current sample with/without replacement can be easily integrated via BernoulliSampler and PoissonSampler. The benefits are: 1) RDD.randomSplit is implemented in the same way, related to https://github.com/apache/incubator-spark/pull/513 2) Stratified sampling and importance sampling can be implemented in the same manner as well. Unit tests are included for samplers and RDD.randomSplit. This should performance better than my previous request where the BernoulliSampler creates many Iterator instances: https://github.com/apache/incubator-spark/pull/513 Author: Xiangrui Meng <[email protected]> == Merge branch commits == commit e8ce957 Author: Xiangrui Meng <[email protected]> Date: Mon Feb 3 12:21:08 2014 -0800 more docs to PartitionwiseSampledRDD commit fbb4586 Author: Xiangrui Meng <[email protected]> Date: Mon Feb 3 00:44:23 2014 -0800 move XORShiftRandom to util.random and use it in BernoulliSampler commit 987456b Author: Xiangrui Meng <[email protected]> Date: Sat Feb 1 11:06:59 2014 -0800 relax assertions in SortingSuite because the RangePartitioner has large variance in this case commit 3690aae Author: Xiangrui Meng <[email protected]> Date: Sat Feb 1 09:56:28 2014 -0800 test split ratio of RDD.randomSplit commit 8a410bc Author: Xiangrui Meng <[email protected]> Date: Sat Feb 1 09:25:22 2014 -0800 add a test to ensure seed distribution and minor style update commit ce7e866 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 18:06:22 2014 -0800 minor style change commit 750912b Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 18:04:54 2014 -0800 fix some long lines commit c446a25 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 17:59:59 2014 -0800 add complement to BernoulliSampler and minor style changes commit dbe2bc2 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 17:45:08 2014 -0800 switch to partition-wise sampling for better performance commit a1fca52 Merge: ac712e4 cf6128f Author: Xiangrui Meng <[email protected]> Date: Fri Jan 31 16:33:09 2014 -0800 Merge branch 'sample' of github.com:mengxr/incubator-spark into sample commit cf6128f Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 14:40:07 2014 -0800 set SampledRDD deprecated in 1.0 commit f430f84 Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 14:38:59 2014 -0800 update code style commit a8b5e20 Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 12:56:27 2014 -0800 move package random to util.random commit ab0fa2c Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 12:50:35 2014 -0800 add Apache headers and update code style commit 985609f Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 11:49:25 2014 -0800 add new lines commit b21bddf Author: Xiangrui Meng <[email protected]> Date: Sun Jan 26 11:46:35 2014 -0800 move samplers to random.IndependentRandomSampler and add tests commit c02dacb Author: Xiangrui Meng <[email protected]> Date: Sat Jan 25 15:20:24 2014 -0800 add RandomSampler commit 8ff7ba3 Author: Xiangrui Meng <[email protected]> Date: Fri Jan 24 13:23:22 2014 -0800 init impl of IndependentlySampledRDD
Configuration menu - View commit details
-
Copy full SHA for 23af00f - Browse repository at this point
Copy the full SHA 23af00fView commit details
Commits on Feb 4, 2014
-
Merge pull request #535 from sslavic/patch-2. Closes #535.
Fixed typo in scaladoc Author: Stevo Slavić <[email protected]> == Merge branch commits == commit 0a77f78 Author: Stevo Slavić <[email protected]> Date: Tue Feb 4 15:30:27 2014 +0100 Fixed typo in scaladoc
Configuration menu - View commit details
-
Copy full SHA for 0c05cd3 - Browse repository at this point
Copy the full SHA 0c05cd3View commit details -
Merge pull request #534 from sslavic/patch-1. Closes #534.
Fixed wrong path to compute-classpath.cmd compute-classpath.cmd is in bin, not in sbin directory Author: Stevo Slavić <[email protected]> == Merge branch commits == commit 23deca3 Author: Stevo Slavić <[email protected]> Date: Tue Feb 4 15:01:47 2014 +0100 Fixed wrong path to compute-classpath.cmd compute-classpath.cmd is in bin, not in sbin directory
Configuration menu - View commit details
-
Copy full SHA for 9209287 - Browse repository at this point
Copy the full SHA 9209287View commit details
Commits on Feb 5, 2014
-
Merge pull request #540 from sslavic/patch-3. Closes #540.
Fix line end character stripping for Windows LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too). This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff. Author: Stevo Slavić <[email protected]> == Merge branch commits == commit 1e43ba0 Author: Stevo Slavić <[email protected]> Date: Wed Feb 5 14:48:29 2014 +0100 Fix line end character stripping for Windows LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too). This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff.
Configuration menu - View commit details
-
Copy full SHA for f7fd80d - Browse repository at this point
Copy the full SHA f7fd80dView commit details -
Merge pull request #544 from kayousterhout/fix_test_warnings. Closes …
…#544. Fixed warnings in test compilation. This commit fixes two problems: a redundant import, and a deprecated function. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit da9d2e1 Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 11:41:51 2014 -0800 Fixed warnings in test compilation. This commit fixes two problems: a redundant import, and a deprecated function.
Configuration menu - View commit details
-
Copy full SHA for cc14ba9 - Browse repository at this point
Copy the full SHA cc14ba9View commit details
Commits on Feb 6, 2014
-
Merge pull request #549 from CodingCat/deadcode_master. Closes #549.
remove actorToWorker in master.scala, which is actually not used actorToWorker is actually not used in the code....just remove it Author: CodingCat <[email protected]> == Merge branch commits == commit 52656c2 Author: CodingCat <[email protected]> Date: Thu Feb 6 00:28:26 2014 -0500 remove actorToWorker in master.scala, which is actually not used
Configuration menu - View commit details
-
Copy full SHA for 18c4ee7 - Browse repository at this point
Copy the full SHA 18c4ee7View commit details -
Merge pull request #526 from tgravescs/yarn_client_stop_am_fix. Close…
…s #526. spark on yarn - yarn-client mode doesn't always exit immediately https://spark-project.atlassian.net/browse/SPARK-1049 If you run in the yarn-client mode but you don't get all the workers you requested right away and then you exit your application, the application master stays around until it gets the number of workers you initially requested. This is a waste of resources. The AM should exit immediately upon the client going away. This fix simply checks to see if the driver closed while its waiting for the initial # of workers. Author: Thomas Graves <[email protected]> == Merge branch commits == commit 03f40a6 Author: Thomas Graves <[email protected]> Date: Fri Jan 31 11:23:10 2014 -0600 spark on yarn - yarn-client mode doesn't always exit immediately
Configuration menu - View commit details
-
Copy full SHA for 3802096 - Browse repository at this point
Copy the full SHA 3802096View commit details -
Merge pull request #545 from kayousterhout/fix_progress. Closes #545.
Fix off-by-one error with task progress info log. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit 29798fc Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 13:40:01 2014 -0800 Fix off-by-one error with task progress info log.
Configuration menu - View commit details
-
Copy full SHA for 79c9552 - Browse repository at this point
Copy the full SHA 79c9552View commit details -
Merge pull request #498 from ScrapCodes/python-api. Closes #498.
Python api additions Author: Prashant Sharma <[email protected]> == Merge branch commits == commit 8b51591 Author: Prashant Sharma <[email protected]> Date: Fri Jan 24 11:50:29 2014 +0530 Josh's and Patricks review comments. commit d37f967 Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 17:27:17 2014 +0530 fixed doc tests commit 27cb54b Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 16:48:43 2014 +0530 Added keys and values methods for PairFunctions in python commit 4ce76b3 Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 13:51:26 2014 +0530 Added foreachPartition commit 05f0534 Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 13:02:59 2014 +0530 Added coalesce fucntion to python API commit 6568d2c Author: Prashant Sharma <[email protected]> Date: Thu Jan 23 12:52:44 2014 +0530 added repartition function to python API.
Configuration menu - View commit details
-
Copy full SHA for 084839b - Browse repository at this point
Copy the full SHA 084839bView commit details -
Merge pull request #554 from sryza/sandy-spark-1056. Closes #554.
SPARK-1056. Fix header comment in Executor to not imply that it's only u... ...sed for Mesos and Standalone. Author: Sandy Ryza <[email protected]> == Merge branch commits == commit 1f2443d Author: Sandy Ryza <[email protected]> Date: Thu Feb 6 15:03:50 2014 -0800 SPARK-1056. Fix header comment in Executor to not imply that it's only used for Mesos and Standalone
Configuration menu - View commit details
-
Copy full SHA for 446403b - Browse repository at this point
Copy the full SHA 446403bView commit details
Commits on Feb 7, 2014
-
Merge pull request #321 from kayousterhout/ui_kill_fix. Closes #321.
Inform DAG scheduler about all started/finished tasks. Previously, the DAG scheduler was not always informed when tasks started and finished. The simplest example here is for speculated tasks: the DAGScheduler was only told about the first attempt of a task, meaning that SparkListeners were also not told about multiple task attempts, so users can't see what's going on with speculation in the UI. The DAGScheduler also wasn't always told about finished tasks, so in the UI, some tasks will never be shown as finished (this occurs, for example, if a task set gets killed). The other problem is that the fairness accounting was wrong -- the number of running tasks in a pool was decreased when a task set was considered done, even if all of its tasks hadn't yet finished. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit c8d547d Author: Kay Ousterhout <[email protected]> Date: Wed Jan 15 16:47:33 2014 -0800 Addressed Reynold's review comments. Always use a TaskEndReason (remove the option), and explicitly signal when we don't know the reason. Also, always tell DAGScheduler (and associated listeners) about started tasks, even when they're speculated. commit 3fee1e2 Author: Kay Ousterhout <[email protected]> Date: Wed Jan 8 22:58:13 2014 -0800 Fixed broken test and improved logging commit ff12fca Author: Kay Ousterhout <[email protected]> Date: Sun Dec 29 21:08:20 2013 -0800 Inform DAG scheduler about all finished tasks. Previously, the DAG scheduler was not always informed when tasks finished. For example, when a task set was aborted, the DAG scheduler was never told when the tasks in that task set finished. The DAG scheduler was also never told about the completion of speculated tasks. This led to confusion with SparkListeners because information about the completion of those tasks was never passed on to the listeners (so in the UI, for example, some tasks will never be shown as finished). The other problem is that the fairness accounting was wrong -- the number of running tasks in a pool was decreased when a task set was considered done, even if all of its tasks hadn't yet finished.
Configuration menu - View commit details
-
Copy full SHA for 18ad59e - Browse repository at this point
Copy the full SHA 18ad59eView commit details -
Merge pull request #450 from kayousterhout/fetch_failures. Closes #450.
Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs. Author: Kay Ousterhout <[email protected]> == Merge branch commits == commit e603784 Author: Kay Ousterhout <[email protected]> Date: Wed Feb 5 11:34:41 2014 -0800 Re-add check for empty set of failed stages commit d258f0e Author: Kay Ousterhout <[email protected]> Date: Wed Jan 15 23:35:41 2014 -0800 Only run ResubmitFailedStages event after a fetch fails Previously, the ResubmitFailedStages event was called every 200 milliseconds, leading to a lot of unnecessary event processing and clogged DAGScheduler logs.
Configuration menu - View commit details
-
Copy full SHA for 0b448df - Browse repository at this point
Copy the full SHA 0b448dfView commit details -
Merge pull request #533 from andrewor14/master. Closes #533.
External spilling - generalize batching logic The existing implementation consists of a hack for Kryo specifically and only works for LZF compression. Introducing an intermediate batch-level stream takes care of pre-fetching and other arbitrary behavior of higher level streams in a more general way. Author: Andrew Or <[email protected]> == Merge branch commits == commit 3ddeb7e Author: Andrew Or <[email protected]> Date: Wed Feb 5 12:09:32 2014 -0800 Also privatize fields commit 090544a Author: Andrew Or <[email protected]> Date: Wed Feb 5 10:58:23 2014 -0800 Privatize methods commit 13920c9 Author: Andrew Or <[email protected]> Date: Tue Feb 4 16:34:15 2014 -0800 Update docs commit bd5a1d7 Author: Andrew Or <[email protected]> Date: Tue Feb 4 13:44:24 2014 -0800 Typo: phyiscal -> physical commit 287ef44 Author: Andrew Or <[email protected]> Date: Tue Feb 4 13:38:32 2014 -0800 Avoid reading the entire batch into memory; also simplify streaming logic Additionally, address formatting comments. commit 3df7005 Merge: a531d2e 164489d Author: Andrew Or <[email protected]> Date: Mon Feb 3 18:27:49 2014 -0800 Merge branch 'master' of github.com:andrewor14/incubator-spark commit a531d2e Author: Andrew Or <[email protected]> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF. commit 164489d Author: Andrew Or <[email protected]> Date: Mon Feb 3 18:18:04 2014 -0800 Relax assumptions on compressors and serializers when batching This commit introduces an intermediate layer of an input stream on the batch level. This guards against interference from higher level streams (i.e. compression and deserialization streams), especially pre-fetching, without specifically targeting particular libraries (Kryo) and forcing shuffle spill compression to use LZF.
Configuration menu - View commit details
-
Copy full SHA for 1896c6e - Browse repository at this point
Copy the full SHA 1896c6eView commit details -
Merge pull request #506 from ash211/intersection. Closes #506.
SPARK-1062 Add rdd.intersection(otherRdd) method Author: Andrew Ash <[email protected]> == Merge branch commits == commit 5d9982b Author: Andrew Ash <[email protected]> Date: Thu Feb 6 18:11:45 2014 -0800 Minor fixes - style: (v,null) => (v, null) - mention the shuffle in Javadoc commit b86d02f Author: Andrew Ash <[email protected]> Date: Sun Feb 2 13:17:40 2014 -0800 Overload .intersection() for numPartitions and custom Partitioner commit bcaa349 Author: Andrew Ash <[email protected]> Date: Sun Feb 2 13:05:40 2014 -0800 Better naming of parameters in intersection's filter commit b10a6af Author: Andrew Ash <[email protected]> Date: Sat Jan 25 23:06:26 2014 -0800 Follow spark code format conventions of tab => 2 spaces commit 965256e Author: Andrew Ash <[email protected]> Date: Fri Jan 24 00:28:01 2014 -0800 Add rdd.intersection(otherRdd) method
Configuration menu - View commit details
-
Copy full SHA for 3a9d82c - Browse repository at this point
Copy the full SHA 3a9d82cView commit details
Commits on Feb 8, 2014
-
Merge pull request #552 from martinjaggi/master. Closes #552.
tex formulas in the documentation using mathjax. and spliting the MLlib documentation by techniques see jira https://spark-project.atlassian.net/browse/MLLIB-19 and https://github.com/shivaram/spark/compare/mathjax Author: Martin Jaggi <[email protected]> == Merge branch commits == commit 0364bfa Author: Martin Jaggi <[email protected]> Date: Fri Feb 7 03:19:38 2014 +0100 minor polishing, as suggested by @pwendell commit dcd2142 Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 18:04:26 2014 +0100 enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit bbafafd Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 17:31:29 2014 +0100 split MLlib documentation by techniques and linked from the main mllib-guide.md site commit d1c5212 Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:59:43 2014 +0100 enable mathjax formula in the .md documentation files code by @shivaram commit d73948d Author: Martin Jaggi <[email protected]> Date: Thu Feb 6 16:57:23 2014 +0100 minor update on how to compile the documentation
Configuration menu - View commit details
-
Copy full SHA for fabf174 - Browse repository at this point
Copy the full SHA fabf174View commit details -
Merge pull request #454 from jey/atomic-sbt-download. Closes #454.
Make sbt download an atomic operation Modifies the `sbt/sbt` script to gracefully recover when a previous invocation died in the middle of downloading the SBT jar. Author: Jey Kottalam <[email protected]> == Merge branch commits == commit 6c600eb Author: Jey Kottalam <[email protected]> Date: Fri Jan 17 10:43:54 2014 -0800 Make sbt download an atomic operation
Configuration menu - View commit details
-
Copy full SHA for 7805080 - Browse repository at this point
Copy the full SHA 7805080View commit details -
Merge pull request #561 from Qiuzhuang/master. Closes #561.
Kill drivers in postStop() for Worker. JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068 Author: Qiuzhuang Lian <[email protected]> == Merge branch commits == commit 9c19ce6 Author: Qiuzhuang Lian <[email protected]> Date: Sat Feb 8 16:07:39 2014 +0800 Kill drivers in postStop() for Worker. JIRA SPARK-1068:https://spark-project.atlassian.net/browse/SPARK-1068
Configuration menu - View commit details
-
Copy full SHA for f0ce736 - Browse repository at this point
Copy the full SHA f0ce736View commit details
Commits on Feb 9, 2014
-
Merge pull request #542 from markhamstra/versionBump. Closes #542.
Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <[email protected]> == Merge branch commits == commit 1b00a8a Author: Mark Hamstra <[email protected]> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT
Configuration menu - View commit details
-
Copy full SHA for c2341c9 - Browse repository at this point
Copy the full SHA c2341c9View commit details -
Merge pull request #565 from pwendell/dev-scripts. Closes #565.
SPARK-1066: Add developer scripts to repository. These are some developer scripts I've been maintaining in a separate public repo. This patch adds them to the Spark repository so they can evolve here and are clearly accessible to all committers. I may do some small additional clean-up in this PR, but wanted to put them here in case others want to review. There are a few types of scripts here: 1. A tool to merge pull requests. 2. A script for packaging releases. 3. A script for auditing release candidates. Author: Patrick Wendell <[email protected]> == Merge branch commits == commit 5d5d331 Author: Patrick Wendell <[email protected]> Date: Sat Feb 8 22:11:47 2014 -0800 SPARK-1066: Add developer scripts to repository.
Configuration menu - View commit details
-
Copy full SHA for f892da8 - Browse repository at this point
Copy the full SHA f892da8View commit details -
Merge pull request #560 from pwendell/logging. Closes #560.
[WIP] SPARK-1067: Default log4j initialization causes errors for those not using log4j To fix this - we add a check when initializing log4j. Author: Patrick Wendell <[email protected]> == Merge branch commits == commit ffdce51 Author: Patrick Wendell <[email protected]> Date: Fri Feb 7 15:22:29 2014 -0800 Logging fix
Configuration menu - View commit details
-
Copy full SHA for b6d40b7 - Browse repository at this point
Copy the full SHA b6d40b7View commit details -
Merge pull request #562 from jyotiska/master. Closes #562.
Added example Python code for sort I added an example Python code for sort. Right now, PySpark has limited examples for new people willing to use the project. This example code sorts integers stored in a file. I was able to sort 5 million, 10 million and 25 million integers with this code. Author: jyotiska <[email protected]> == Merge branch commits == commit 8ad8faf Author: jyotiska <[email protected]> Date: Sun Feb 9 11:00:41 2014 +0530 Added comments in code on collect() method commit 6f98f1e Author: jyotiska <[email protected]> Date: Sat Feb 8 13:12:37 2014 +0530 Updated python example code sort.py commit 945e39a Author: jyotiska <[email protected]> Date: Sat Feb 8 12:59:09 2014 +0530 Added example python code for sort
Configuration menu - View commit details
-
Copy full SHA for 2ef37c9 - Browse repository at this point
Copy the full SHA 2ef37c9View commit details -
Merge pull request #556 from CodingCat/JettyUtil. Closes #556.
[SPARK-1060] startJettyServer should explicitly use IP information https://spark-project.atlassian.net/browse/SPARK-1060 In the current implementation, the webserver in Master/Worker is started with val (srv, bPort) = JettyUtils.startJettyServer("0.0.0.0", port, handlers) inside startJettyServer: val server = new Server(currentPort) //here, the Server will take "0.0.0.0" as the hostname, i.e. will always bind to the IP address of the first NIC this can cause wrong IP binding, e.g. if the host has two NICs, N1 and N2, the user specify the SPARK_LOCAL_IP as the N2's IP address, however, when starting the web server, for the reason stated above, it will always bind to the N1's address Author: CodingCat <[email protected]> == Merge branch commits == commit 6c6d9a8 Author: CodingCat <[email protected]> Date: Thu Feb 6 14:53:34 2014 -0500 startJettyServer should explicitly use IP information
Configuration menu - View commit details
-
Copy full SHA for b6dba10 - Browse repository at this point
Copy the full SHA b6dba10View commit details