From 09c708730e589fa5618253739334a8ad41ffbd62 Mon Sep 17 00:00:00 2001 From: Tim Nielens Date: Sun, 15 May 2022 15:17:20 +0200 Subject: [PATCH] extend doc --- .gitignore | 3 +- ...tting_started.md => 01_getting_started.md} | 27 ++++++-- docs/content/docs/02_guide/01_overview.md | 68 +++++++++++++++++++ .../{usage.md => 02_guide/02_data_structs.md} | 37 +++++----- docs/content/docs/02_guide/_index.md | 5 ++ 5 files changed, 112 insertions(+), 28 deletions(-) rename docs/content/docs/{getting_started.md => 01_getting_started.md} (73%) create mode 100644 docs/content/docs/02_guide/01_overview.md rename docs/content/docs/{usage.md => 02_guide/02_data_structs.md} (96%) create mode 100644 docs/content/docs/02_guide/_index.md diff --git a/.gitignore b/.gitignore index 97289b8d..20123ec1 100644 --- a/.gitignore +++ b/.gitignore @@ -15,4 +15,5 @@ website/static/api _gen/ project/metals.sbt .vscode/settings.json -.bsp/sbt.json +.bsp +.scala-build diff --git a/docs/content/docs/getting_started.md b/docs/content/docs/01_getting_started.md similarity index 73% rename from docs/content/docs/getting_started.md rename to docs/content/docs/01_getting_started.md index df5a78b9..4bdd916c 100644 --- a/docs/content/docs/getting_started.md +++ b/docs/content/docs/01_getting_started.md @@ -3,7 +3,8 @@ title: 'Getting started' weight: 1 --- -Add any of these lines to your build.sbt: +## Saddle modules +Add the appropriate saddle modules to your build.sbt: ```scala // The core library libraryDependencies += "io.github.pityka" % "saddle-core" % "@VERSION@" @@ -21,7 +22,7 @@ libraryDependencies += "io.github.pityka" % "saddle-time" % "@VERSION@" libraryDependencies += "io.github.pityka" % "saddle-stats" % "@VERSION@" ``` -### Dependencies +## Dependencies The actively maintained artifacts have minimal dependencies: - `saddle-core` depends on [cats-kernel](https://github.com/typelevel/cats) @@ -29,17 +30,31 @@ The actively maintained artifacts have minimal dependencies: - `saddle-binary` depends on [ujson](http://www.lihaoyi.com/upickle/) - `saddle-circe` depends on [circe](https://github.com/circe/circe) -### Example: SVD on the Iris dataset -```scala mdoc +## Imports +You most likely need the following two imports: +```scala +import org.saddle._ +import org.saddle.order._ +``` + +Note that `org.saddle.order._` imports `cats.kernel.Order[_]` typeclass instances into the scope. +If you import cats instances an other way then you should not import `org.saddle.order._`. + +The `Order[Double]` and `Order[Float]` instances in `org.saddle.order` define a total ordering and +order `NaN` above all other values, consistent with `java.lang.Double.compare`. + +## Example: SVD on the Iris dataset +```scala mdoc:silent import scala.io.Source import org.saddle._ +import org.saddle.linalg._ val irisURL = "https://gist.githubusercontent.com/pityka/d05bb892541d71c2a06a0efb6933b323/raw/639388c2cbc2120a14dcf466e85730eb8be498bb/iris.csv" val iris = csv.CsvParser.parseSourceWithHeader[Double]( source = Source.fromURL(irisURL), cols = List(0,1,2,3), recordSeparator = "\n").toOption.get - -import org.saddle.linalg._ +``` +```scala mdoc val centered = iris.mapVec(_.demeaned) val SVDResult(u, s, vt) = centered.toMat.svd(2) val pca = u.mDiagFromRight(s).toFrame diff --git a/docs/content/docs/02_guide/01_overview.md b/docs/content/docs/02_guide/01_overview.md new file mode 100644 index 00000000..4b4f4977 --- /dev/null +++ b/docs/content/docs/02_guide/01_overview.md @@ -0,0 +1,68 @@ +--- +title: 'Overview' +weight: 1 +--- + +# Overview +This is a high level overview of saddle. It contains links to more detailed sections. + +## Construction + +```scala mdoc +import org.saddle._ +val vec = Vec(1, 2, 3) +Series("a" -> 1, "b" -> 2, "c" -> 3) +val series = Series(Vec(1,2,3), Index("a", "b", "c")) +val series1 = Series(Vec(4,5,6), Index("a", "b", "c")) +val frame = Frame("col_a" -> series, "col_b" -> series1) +``` + +## Missing data + +A key aspect of saddle is how it deals with missing data, also known as NA. Values in vectors, series and frames can be missing, keys can't. + +```scala mdoc +Vec[Int](1, na, 3) +Series(Vec[Int](1, 2, na), Index("a", "b", "c")) +Frame("col_a" -> Series(Vec[Int](1, 2, na), Index("a", "b", "c"))) +``` + +This is necessary for supporting alignment by index and other +[joins]({{< relref "#joins" >}}). + +## Selection +Elements can be selected by position: +```scala mdoc +vec.at(2) +series.at(2) +frame.at(0, 1) +frame.colAt(1) +frame.rowAt(0) +``` +Or sliced: +```scala mdoc +vec.slice(0, 2) +series.slice(0, 2) +frame.rowSlice(0, 2) +``` + +Elements of `Series` and `Frame`s can further be selected by key: +```scala mdoc +series.get("a") +series(* -> "b") +frame.first("b").get("col_a") +frame("b" -> "c", "col_b" -> *) +``` + +## Joins +TODO + +## Element-wise operations +TODO + +## Linear algebra + + + + + diff --git a/docs/content/docs/usage.md b/docs/content/docs/02_guide/02_data_structs.md similarity index 96% rename from docs/content/docs/usage.md rename to docs/content/docs/02_guide/02_data_structs.md index 0941db09..de397ea4 100644 --- a/docs/content/docs/usage.md +++ b/docs/content/docs/02_guide/02_data_structs.md @@ -1,31 +1,25 @@ --- -title: 'Usage' +title: 'Data Structures' weight: 2 --- -### Imports -You most likely need the following two imports: -```scala -import org.saddle._ -import org.saddle.order._ -``` - -Note that `org.saddle.order._` imports `cats.kernel.Order[_]` typeclass instances into the scope. -If you import cats instances an other way then you should not import `org.saddle.order._`. +# Introduction to Saddle data structures -The `Order[Double]` and `Order[Float]` instances in `org.saddle.order` define a total ordering and -order `NaN` above all other values, consistent with `java.lang.Double.compare`. +Saddle functionnalities are provided by three main data structures: +[`Vec[T]`]({{< relref "#vector" >}}), +[`Series[K, V]`]({{< relref "#series" >}}) +and [`Frame[RI, CI, V]`]({{< relref "#frame" >}}). +## Vector -### 1D vector: Vec[T] +Vector is an immutable, memory-efficient, indexed by offset, sequence. Its main implementation is a wrapper around an `Array`. Alternative implementation can exist, for ranges for example. It is the underlying structure of `Series` and `Frame`. -Factories: +Construction: ```scala mdoc import org.saddle._ +Vec.empty[Double] Vec(1, 2, 3) -Vec(1 to 3 : _*) Vec(Array(1,2,3)) -Vec.empty[Double] - +Vec(1 to 3 : _*) vec.ones(2) vec.zeros(3) vec.rand(20) @@ -57,14 +51,15 @@ Slicing: import org.saddle._ import org.saddle.ops.BinOps._ Vec(1,2,3).at(2) // Boxes and keeps NA -Vec(1,2,3).raw(2) +Vec(1,2,3).raw(2) Vec(1,2,3).apply(2) // same as raw Vec(1,2,3).take(0,2) Vec(1,2,3).take(1 -> *) Vec(1,2,3).take(* -> 1) ``` -### 1D vector with index: Series[K,V] +## Series + A Series combines a Vec with an Index that provides an ordered key-value mapping. We’ll talk more about the details of Index later. The key type of a must have a natural ordering (ie, an Ordering of that type within the implicit scope). However, the Series maintains the order in which its data was supplied unless ordered othewise. @@ -249,7 +244,7 @@ We mentioned joins. Let’s look at a few join operations; the result is a Frame a.join(b, how=index.OuterJoin) ``` -### Matrix: Mat[T] +## Matrix A `Mat[T]` represents a matrix of values. Internally it is stored as a single contiguous array in row-major order. @@ -355,7 +350,7 @@ Some other interesting methods on Mat: mat.rand(2,2).roundTo(2) ``` -### Homogeneous table with row and column index (data frame) : Frame[RX,CX,T] +## Frame A Frame combines a Mat with a row index and a column index which provides a way to index into the Mat. diff --git a/docs/content/docs/02_guide/_index.md b/docs/content/docs/02_guide/_index.md new file mode 100644 index 00000000..86b19c03 --- /dev/null +++ b/docs/content/docs/02_guide/_index.md @@ -0,0 +1,5 @@ +--- +title: 'Guide' +weight: 2 +bookCollapseSection: true +---