Skip to content

Commit

Permalink
Merge pull request #42 from datapages/metadata_minimization
Browse files Browse the repository at this point in the history
Metadata minimization
  • Loading branch information
ben-domingue authored Jan 30, 2025
2 parents e201376 + 1d50777 commit 1c723dd
Show file tree
Hide file tree
Showing 9 changed files with 128 additions and 468 deletions.
31 changes: 3 additions & 28 deletions _load-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,12 @@ library(stringr)

```{r}
# set user and dataset
# dataset <- redivis::user("datapages")$dataset("item_response_warehouse")
# get metadata table
# metadata_table <- dataset$table("metadata")
# metadata <- metadata_table$to_tibble()
project <- redivis::user("mikabr")$project("irw")
library(redivis)
project <- redivis::user("bdomingu")$dataset("irw_meta:bdxt:v1_1")
metadata_table <- project$table("metadata:h5gs")
# get metadata table
metadata_table <- project$table("metadata_output")
metadata <- metadata_table$to_tibble()
metadata <- metadata |>
mutate(partition = if_else(n_categories == 2, "dichotomous", "polytomous"))
Expand All @@ -24,27 +20,6 @@ metadata <- metadata |>
# cont_vars_list <- set_names(cont_vars, cont_vars |> str_replace_all("_", " ") |> str_to_sentence()) |> as.list()
# ojs_define(cont_vars = cont_vars_list)
# get item summary table
item_table <- project$table("item_summary_output")
item_summary <- item_table$to_tibble()
# get subject summary table
subject_table <- project$table("subject_summary_output")
subject_summary <- subject_table$to_tibble()
# combine item and subject summaries, put into data structure for selector
summaries <- full_join(
item_summary |> nest(items = -dataset_name),
subject_summary |> nest(subjects = -dataset_name),
by = "dataset_name"
) |>
mutate(data = map2(items, subjects, \(i, s) list(items = i, subjects = s)),
summaries = map2(dataset_name, data, \(n, d) set_names(list(d), n))) |>
arrange(dataset_name) |>
pull(summaries) |>
flatten()
# pass data to ojs
ojs_define(metadata = metadata)
ojs_define(summaries = summaries)
```
9 changes: 0 additions & 9 deletions _viz-datasets.qmd
Original file line number Diff line number Diff line change
@@ -1,15 +1,6 @@
```{ojs}
// selector for dataset
summaries_map = new Map(Object.entries(summaries))
viewof dataset = Inputs.select(summaries_map, {label: 'Dataset'})
// transpose all data for plotting
items = transpose(dataset.items)
subjects = transpose(dataset.subjects)
metadata_trans = transpose(metadata)
```
```{ojs}
Plot = import("https://esm.sh/@observablehq/[email protected]")
// plot items
Expand Down
5 changes: 2 additions & 3 deletions _viz-metadata.qmd
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
```{ojs}
metadata_trans = transpose(metadata)
// selectors for x and y variables
vars = new Map([["Number responses", "n_responses"],
["Number participants", "n_participants"],
["Number items", "n_items"],
["Responses per participant", "responses_per_participant"],
["Responses per item", "responses_per_item"],
["Sparsity (#responses/(#ids*#items))", "sparsity"]])
["Density (#responses/(#ids*#items))", "density"]])
// vars = new Map(Object.entries(cont_vars))
// console.log(Array.from(vars.values())[1])
Expand All @@ -18,9 +19,7 @@ color_opts = new Map([["None", null],
["Dichotomous vs. Polytomous", "partition"]])
viewof color_var = Inputs.select(color_opts, {label: "Color"})
```
```{ojs}
plt_color = color_var || default_color
// histogram
Expand Down
4 changes: 2 additions & 2 deletions analysis.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ compute_metadata <- function(df) {
n_items = n_distinct(df$item),
responses_per_participant = n_responses / n_participants,
responses_per_item = n_responses / n_items,
sparsity = (sqrt(n_responses) / n_participants) * (sqrt(n_responses) / n_items)
density = (sqrt(n_responses) / n_participants) * (sqrt(n_responses) / n_items)
)
}
Expand Down Expand Up @@ -68,7 +68,7 @@ def compute_metadata(df):
'n_items': [df['item'].nunique()],
'responses_per_participant': [len(df) / df['id'].nunique()],
'responses_per_item': [len(df) / df['item'].nunique()],
'sparsity': [(sqrt(len(df)) / df['id'].nunique()) * (sqrt(len(df)) / df['item'].nunique())]
'density': [(sqrt(len(df)) / df['id'].nunique()) * (sqrt(len(df)) / df['item'].nunique())]
})
dataset = redivis.user('datapages').dataset('item_response_warehouse')
Expand Down
41 changes: 26 additions & 15 deletions data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,27 +10,44 @@ Below we show metadata for the entire IRW, an example dataset, and illustrations

<iframe width="800" height="500" allowfullscreen src="https://redivis.com/embed/tables/bdomingu.irw_meta:bdxt:current.metadata:h5gs#cells" style="border:0;"></iframe>


## Individual dataset


```{ojs}
summaries_map = new Map(Object.entries(summaries))
viewof dataset = Inputs.select(summaries_map, {label: 'Dataset'})
function createDatasetMap(df) {
let datasetMap = new Map();
// Convert column-based object into an array of row objects
let dataArray = Array.from({ length: df[Object.keys(df)[0]].length }, (_, i) => {
return Object.fromEntries(Object.entries(df).map(([key, values]) => [key, values[i]]));
});
// Populate the Map using filename as the key
for (let row of dataArray) {
datasetMap.set(row.dataset_name, row); // Ensure 'filename' exists in the metadata
}
return datasetMap;
}
// Convert metadata table into datasetMap
dataset_map = createDatasetMap(metadata);
// Dropdown selector for dataset
viewof dataset = Inputs.select(dataset_map, { label: 'Dataset' });
// Function to find the key that corresponds to the selected dataset
// Function to find dataset name from the Map
function findDatasetNameFromMap(map, selectedDataset) {
for (let [key, value] of map.entries()) {
if (value === selectedDataset) {
return key;
return key;
}
}
return null;
}
// Look up the name of the dataset and set up the URL
dataset_name = findDatasetNameFromMap(summaries_map, dataset);
dataset_url = "https://redivis.com/embed/tables/datapages.item_response_warehouse:as2e:current." + dataset_name
// Look up the name of the dataset and construct the URL
dataset_name = findDatasetNameFromMap(dataset_map, dataset);
dataset_url = `https://redivis.com/embed/tables/datapages.item_response_warehouse:as2e:current.${dataset_name}`;
html`<iframe id="myIframe" width="800" height="500" allowfullscreen style="border:0;" src = "${dataset_url}"></iframe>`
```
Expand All @@ -52,9 +69,6 @@ You can also access IRW data programmatically using the Redivis API for [R](http
dataset <- redivis::user("datapages")$dataset("item_response_warehouse")
df <- dataset$table("4thgrade_math_sirt")$to_tibble()
# metadata
project <- redivis::user("mikabr")$project("irw")
metadata <- project$table("metadata_output")$to_tibble()
```

## Python
Expand All @@ -69,8 +83,5 @@ import redivis
dataset = redivis.user('datapages').dataset('item_response_warehouse')
df = dataset.table('4thgrade_math_sirt').to_pandas_dataframe()
# metadata
project = redivis.user('mikabr').project('irw')
metadata = project.table('metadata_output').to_pandas_dataframe()
```
:::
Binary file removed data/.~IRW Data Dictionary.xlsx
Binary file not shown.
Loading

0 comments on commit 1c723dd

Please sign in to comment.