Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] extend doc #423

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ website/static/api
_gen/
project/metals.sbt
.vscode/settings.json
.bsp/sbt.json
.bsp
.scala-build
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@ title: 'Getting started'
weight: 1
---

Add any of these lines to your build.sbt:
## Saddle modules
Add the appropriate saddle modules to your build.sbt:
```scala
// The core library
libraryDependencies += "io.github.pityka" % "saddle-core" % "@VERSION@"
Expand All @@ -21,25 +22,39 @@ libraryDependencies += "io.github.pityka" % "saddle-time" % "@VERSION@"
libraryDependencies += "io.github.pityka" % "saddle-stats" % "@VERSION@"
```

### Dependencies
## Dependencies
The actively maintained artifacts have minimal dependencies:

- `saddle-core` depends on [cats-kernel](https://github.com/typelevel/cats)
- `saddle-linalg` depends on [netlib-java](https://github.com/fommil/netlib-java)
- `saddle-binary` depends on [ujson](http://www.lihaoyi.com/upickle/)
- `saddle-circe` depends on [circe](https://github.com/circe/circe)

### Example: SVD on the Iris dataset
```scala mdoc
## Imports
You most likely need the following two imports:
```scala
import org.saddle._
import org.saddle.order._
```

Note that `org.saddle.order._` imports `cats.kernel.Order[_]` typeclass instances into the scope.
If you import cats instances an other way then you should not import `org.saddle.order._`.

The `Order[Double]` and `Order[Float]` instances in `org.saddle.order` define a total ordering and
order `NaN` above all other values, consistent with `java.lang.Double.compare`.

## Example: SVD on the Iris dataset
```scala mdoc:silent
import scala.io.Source
import org.saddle._
import org.saddle.linalg._
val irisURL = "https://gist.githubusercontent.com/pityka/d05bb892541d71c2a06a0efb6933b323/raw/639388c2cbc2120a14dcf466e85730eb8be498bb/iris.csv"
val iris = csv.CsvParser.parseSourceWithHeader[Double](
source = Source.fromURL(irisURL),
cols = List(0,1,2,3),
recordSeparator = "\n").toOption.get

import org.saddle.linalg._
```
```scala mdoc
val centered = iris.mapVec(_.demeaned)
val SVDResult(u, s, vt) = centered.toMat.svd(2)
val pca = u.mDiagFromRight(s).toFrame
Expand Down
68 changes: 68 additions & 0 deletions docs/content/docs/02_guide/01_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: 'Overview'
weight: 1
---

# Overview
This is a high level overview of saddle. It contains links to more detailed sections.

## Construction

```scala mdoc
import org.saddle._
val vec = Vec(1, 2, 3)
Series("a" -> 1, "b" -> 2, "c" -> 3)
val series = Series(Vec(1,2,3), Index("a", "b", "c"))
val series1 = Series(Vec(4,5,6), Index("a", "b", "c"))
val frame = Frame("col_a" -> series, "col_b" -> series1)
```

## Missing data

A key aspect of saddle is how it deals with missing data, also known as NA. Values in vectors, series and frames can be missing, keys can't.

```scala mdoc
Vec[Int](1, na, 3)
Series(Vec[Int](1, 2, na), Index("a", "b", "c"))
Frame("col_a" -> Series(Vec[Int](1, 2, na), Index("a", "b", "c")))
```

This is necessary for supporting alignment by index and other
[joins]({{< relref "#joins" >}}).

## Selection
Elements can be selected by position:
```scala mdoc
vec.at(2)
series.at(2)
frame.at(0, 1)
frame.colAt(1)
frame.rowAt(0)
```
Or sliced:
```scala mdoc
vec.slice(0, 2)
series.slice(0, 2)
frame.rowSlice(0, 2)
```

Elements of `Series` and `Frame`s can further be selected by key:
```scala mdoc
series.get("a")
series(* -> "b")
frame.first("b").get("col_a")
frame("b" -> "c", "col_b" -> *)
```

## Joins
TODO

## Element-wise operations
TODO

## Linear algebra





Original file line number Diff line number Diff line change
@@ -1,31 +1,25 @@
---
title: 'Usage'
title: 'Data Structures'
weight: 2
---

### Imports
You most likely need the following two imports:
```scala
import org.saddle._
import org.saddle.order._
```

Note that `org.saddle.order._` imports `cats.kernel.Order[_]` typeclass instances into the scope.
If you import cats instances an other way then you should not import `org.saddle.order._`.
# Introduction to Saddle data structures

The `Order[Double]` and `Order[Float]` instances in `org.saddle.order` define a total ordering and
order `NaN` above all other values, consistent with `java.lang.Double.compare`.
Saddle functionnalities are provided by three main data structures:
[`Vec[T]`]({{< relref "#vector" >}}),
[`Series[K, V]`]({{< relref "#series" >}})
and [`Frame[RI, CI, V]`]({{< relref "#frame" >}}).
## Vector

### 1D vector: Vec[T]
Vector is an immutable, memory-efficient, indexed by offset, sequence. Its main implementation is a wrapper around an `Array`. Alternative implementation can exist, for ranges for example. It is the underlying structure of `Series` and `Frame`.

Factories:
Construction:
```scala mdoc
import org.saddle._
Vec.empty[Double]
Vec(1, 2, 3)
Vec(1 to 3 : _*)
Vec(Array(1,2,3))
Vec.empty[Double]

Vec(1 to 3 : _*)
vec.ones(2)
vec.zeros(3)
vec.rand(20)
Expand Down Expand Up @@ -57,14 +51,15 @@ Slicing:
import org.saddle._
import org.saddle.ops.BinOps._
Vec(1,2,3).at(2) // Boxes and keeps NA
Vec(1,2,3).raw(2)
Vec(1,2,3).raw(2)
Vec(1,2,3).apply(2) // same as raw
Vec(1,2,3).take(0,2)
Vec(1,2,3).take(1 -> *)
Vec(1,2,3).take(* -> 1)
```

### 1D vector with index: Series[K,V]
## Series

A Series combines a Vec with an Index that provides an ordered key-value mapping. We’ll talk more about the details of Index later.

The key type of a must have a natural ordering (ie, an Ordering of that type within the implicit scope). However, the Series maintains the order in which its data was supplied unless ordered othewise.
Expand Down Expand Up @@ -249,7 +244,7 @@ We mentioned joins. Let’s look at a few join operations; the result is a Frame
a.join(b, how=index.OuterJoin)
```

### Matrix: Mat[T]
## Matrix

A `Mat[T]` represents a matrix of values. Internally it is stored as a single contiguous array in row-major order.

Expand Down Expand Up @@ -355,7 +350,7 @@ Some other interesting methods on Mat:
mat.rand(2,2).roundTo(2)
```

### Homogeneous table with row and column index (data frame) : Frame[RX,CX,T]
## Frame

A Frame combines a Mat with a row index and a column index which provides a way to index into the Mat.

Expand Down
5 changes: 5 additions & 0 deletions docs/content/docs/02_guide/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
title: 'Guide'
weight: 2
bookCollapseSection: true
---