Skip to content

Latest commit

 

History

History
115 lines (110 loc) · 25 KB

CUSTOM_STEPS_LIST.md

File metadata and controls

115 lines (110 loc) · 25 KB

List of Custom Steps in this Project

Abbreviations used in name of Custom Step

The list of available Custom Steps further down on this page uses the following abbreviations to group related steps.

Abbreviation Explanation
CAS Steps in this category provide utilities for working with data in CAS
CV Computer Vision
DQ Data Quality
LLM Large Language Model
NLP Natural Language Processing
OCR Optical Character Recognition
SDG Synthetic Data Generation

Available Custom Steps

Name Brief Description Owner/Contact Viya Version Supported Last Update
_template Template to use for contributions SAS 2020.1.5
or later
04OCT2024
Airflow - Generate DAG Generates an Apache Airflow DAG using SAS Studio Flow where flow steps represent Airflow tasks using the SAS Airflow Provider Nicolas Robert 2023.12
or later
21FEB2024
Anonymize and Mask Data Anonymize and Mask Data using QKB definitions Mary Kathryn Queen 2023.06
or later
19FEB2024
Append Table Appends data to a target table with support for maintaining unique incremental id Torben Juul Johansson 2020.1.5
or later
26OCT2022
CAS - Convert Char to Varchar Create a copy of a table and convert chars to varchars Carlo Petti 2023.01
or later
05SEP2023
CAS - Generate unique ID Generates a new column containing a unique identifier (ID) per observation for a given input CAS table Sundaresh Sankaran 2022.11
or later
07FEB2023
CAS - Load Tables from Folders in Filesystem Load all files in a directory to CAS tables Sundaresh Sankaran / Wilbram Hazejager 2022.11
or later
21DEC2022
CAS - Submit Python and R code Submit Python/R code to CAS server using CAS Gateway action David Weik 2023.11
or later
11JAN2024
CAS - Validate unique ID Validates if column contains unique values for a given input CAS table Sundaresh Sankaran 2023.06
or later
11JUL2023
Catalog - List Agents Extract list of configured SAS Information Catalog agents into a table David Weik 2023.12
or later
15JAN2024
Catalog - Run Agent Triggers the run of a SAS Information Catalog agent) David Weik 2023.12
or later
15JAN2024
Create Listing Of Directory CLOD Create table containing names of all files in directory Stephan Weigandt 2020.1.5
or later
25SEP2024
CV - Create Object Detection Table Create CAS table (training dataset) with images and labels for use with CAS actions for Computer Vision (CV) and for use with SAS DLPy Python library Neela Vengateshwaran 2024.01
or later
21JUL2024
CV - Load Images Load image files into a CAS table for use with CAS image analytics actions Robert W Blanchard 2024.05
or later
04OCT2024
CV - Merge Data with Images Merge or append data that contains images Robert W Blanchard 2024.05
or later
19SEP2024
CV - Train Models Develop Computer Vision (CV) models to accomplish one of four prediction tasks: 1. Image classification, 2. Image regression, 3. Object detection, 4. Multi-task Robert W Blanchard 2024.05
or later
19SEP2024
Data Synthesis with Python faker Generate synthetic data using Python faker module (also includes custom steps to install/load Python modules) Angus Looney / Duncan Bain 2021.1.1
or later
22DEC2022
Detect Data Drift Calculate metrics for tracking changes between two groups of records (representing two time intervals) inside a table David Weik 2024.07
or later
15AUG2024
Download Job Execution Log Store Job Execution Log in a user-specified location on SAS Compute Server file system Remco Gooijer 2024.01
or later
03NOV2024
DQ - Cluster Analysis Compare pairs of rows in a cluster and identify potential false positives Clemens Knobloch 2023.10
or later
23JAN2024
DQ - Change Case Upper-, Lower-, or Propercase data values using QKB locale specific rules Clemens Knobloch 2023.01
or later
15MAR2023
DQ - Clustering Cluster records based on column values, eg. match codes Lorenzo Toja / Arnold Toporowski / Nikolaus Hartung 2021.1.1
or later
05AUG2024
DQ - Create QKB Reference Tables Create QKB Reference Tables Mary Kathryn Queen 2023.06
or later
10OCT2023
DQ - Identify Obtain the Identity Type for of data values using the dqIdentify function Arnold Toporowski 2021.1.1
or later
29NOV2022
DQ - Match Code Create Match Codes based on locale, using SAS QKB and dqMatch function Lorenzo Toja / Arnold Toporowski 2021.1.1
or later
19MAR2024
DQ - Parsing Parse a string into a set of tokens using QKB locale specific rules Clemens Knobloch 2023.01
or later
15MAR2023
DQ - Standardize Create standardized values based on locale, using SAS QKB and dqStandardize function (includes support for generating masked values) Lorenzo Toja / Arnold Toporowski 2024.01
or later
21MAR2024
DQ - Surviving Record Extract the best record (aka. Golden Record) from clusters of records, with support for standard deduplication routines and user-defined rules Lorenzo Toja 2023.06
or later
24AUG2023
DuckDB Uses DuckDB to read and write data in various DBMS and file types Clemens Knobloch 2023.01
or later
12JUL2023
Dynamic Aggregations From Timeseries DAFT Perform dynamic aggregations on timeseries data Stephan Weigandt 2022.1.2
or later
08MAY2023
Export - ADLS File Writer Write SAS tables to Parquet files on Azure Data Lake Storage (ADLS) Alfredo Lorie 2023.02
or later
20APR2023
Export - GCSFS File Writer Write SAS and CAS datasets to Google Cloud Storage (GCS) in Parquet and Delta Lake format using Python Ignacio Rodríguez 2023.11
or later
21DEC2023
Export - Parquet Export SAS tables to Parquet files using SAS Libname engine Neil Griffin 2023.10
or later
15DEC2023
Extract Job Definition Metadata Extract all content (SAS Code, HTML forms, XML Prompts and Parameters) from a SAS Job Definition and store it in files David Weik 2023.11
or later
23NOV2023
Extract Text Features Supports extracting many different features from text fields David Weik / Ulrich Reincke / Rens Feenstra 2022.10
or later
27NOV2022
Extract VS Report Metadata Extract info about CAS tables and columns used, calculated column definitions and their usage, across objects in VA report Remco Gooijer 2023.05
or later
16APR2024
FTP Directory Listing Create a table containing names of files in an FTP directory Remco Gooijer 2023.01
or later
14AUG2024
FTP Download Files Download file from an FTP Server, where list of files to download is provided in an input table Remco Gooijer 2023.01
or later
14AUG2024
GEO - Shape Files Manage Shape Files used in GIS systems for use in SAS Visual Analytics Stefano Tucciarone 2023.10
or later
18APR2024
GeoDistance with Rounding Calculate the distance between 2 supplied lat/long locations in either kilometers or miles Mary Kathryn Queen 2020.1.5
or later
28SEP2022
Get Exchange Rates Get Exchange Rates from Service Provider David Weik 2023.08
or later
04SEP2023
Git - Clone Git Repo Clone Git Repo as part of running a flow Sundaresh Sankaran 2022.11
or later
25JAN2023
Git - Delete Local Repo Delete LOCAL Repo as part of running a flow Sundaresh Sankaran 2022.11
or later
25JAN2023
Git - List Local Repo Changes List changed files inside local Git repository folder into a dataset as part of running a flow for easy reporting Sundaresh Sankaran 2022.11
or later
07FEB2023
Git - Stage, Commit, Pull and Push Changes Perform stage, commit, pull and push changes as part of running a flow David Weik 2023.01
or later
19FEB2023
Great Expectations - Execute Rule Run business rules based on Great Expectations Python package Stephen Kotiang 2023.03
or later
11OCT2023
Great Expectations - Generate Expectation Suite Generate rules on input data using Great Expectations Mackenzie Looney 2023.04
or later
19OCT2023
Great Expectations - Run Expectations Suite Compare data against an Expectation Suite Mackenzie Looney 2023.04
or later
19OCT2023
HTTP Request Send HTTP requests using proc HTTP and use URL parameter values from input table Clemens Knobloch 2024.06
or later
17OCT2024
Import - ADLS File Reader Read Parquet files from Azure Data Lake Storage (ADLS) with support for Delta Lake file format Alfredo Lorie 2023.02
or later
25APR2023
Import - CSV with long column names Import CSV file with long column names (>32 chars) in header row Ignacio Rodríguez 2023.11
or later
15JAN2024
Import - Data Ingestion Auto Pilot DIAP (Light) for External Files Ingest external file(s) from directory with push of a button Stephan Weigandt 2020.1.5
or later
04OCT2024
Import - Extract Table from PDF Extract tabular data from within a PDF document and load the same to a SAS dataset Sundaresh Sankaran / Dragos Coles 2023.03
or later
01MAY2023
Import - GCSFS File Reader Read Parquet and Delta Lake files from Google Cloud Storage (GCS) and write to SAS and CAS datasets using Python Ignacio Rodríguez 2023.11
or later
21DEC2023
Import - Google Sheets Import public Google Sheets as a SAS data set David Weik 2022.12
or later
12JAN2023
Import - HTML Table Import HTML table(s) from web page as SAS data set(s) using Python Pandas David Weik 2023.07
or later
28JUL2023
LLM - Azure OpenAI RAG Uses a Retrieval Augmented Generation (RAG) approach to provide right context to an Azure OpenAI Large Language Model (LLM) for purposes of answering a question Samiul Haque / Sundaresh Sankaran 2024.01
or later
10JUL2024
LLM - Prompt Catalog Submit queries to a Large Language Model, test prompts (prompt engineering) and save prompt history Xin Ru Lee 2023.12
or later
24JUL2024
Log File Scraper Extract ERRORS and WARNINGS from one or more SAS log files and makes them available in a table Remco Gooijer 2021.1.5
or later
08NOV2024
Lookup Add column by performing lookup on other table (using data step hash object) Torben Juul Johansson 2021.2.1
or later
21SEP2022
Loop Deployed Object Parallel execution of a deployed flow or a SAS program for a given set of parameter values Remco Gooijer 2023.01
or later
03OCT2024
Loop Flow Iteratively execute another Studio Flow, much like Loop/Loop End in SAS Data Integration Studio Torben Juul Johansson 2023.09
or later
08MAR2024
NLP - Categories Testing Framework Test and assess SAS Visual Text Analytics categorization model Sundaresh Sankaran 2023.12
or later
10JAN2024
NLP - Extract Identities Pull entities out of documents or freeform text Arnold Toporowski 2022.12
or later
13DEC2023
NLP - Extract Rule Configuration Extracts the rule configuration within rules-based Visual Text Analytics Concepts or Categories model definitions for use in downstream applications. Sundaresh Sankaran 2023.04
or later
01AUG2023
NLP - Identify Language Identifies the language used for text data in an input table and create a column containing the ISO 639-1 language code. Sundaresh Sankaran 2022.12
or later
15FEB2023
NLP - Predefined Sentiment Analysis analyse a text corpus for the sentiment expressed in it Sundaresh Sankaran 2023.03
or later
04APR2023
NLP - Profile Text Profile text within a document corpus and understand its linguistic structure Sundaresh Sankaran 2022.07
or later
19OCT2022
NLP - Score Text Classifier Score a text corpus with a text classifier model trained using the deep learning (BERT-based) textClassifier.trainTextClassifier CAS action Sundaresh Sankaran 2023.02
or later
19MAR2023
NLP - Sentence Splitter Splits a text column into multiple observations with constituent sentences using CAS actions Sundaresh Sankaran 2023.08
or later
03NOV2023
NLP - Train Text Classifier Train a text classifier model based on deep learning (BERT-based transformer) architecture using textClassifier.trainTextClassifier CAS action (supports GPUs) Sundaresh Sankaran 2023.02
or later
18MAR2023
OCR - AWS Textract Use the AWS Textract service to perform different types of OCR on files that can be stored in S3 buckets or on the SAS Compute file system Jannic Horst 2022.09
or later
08JAN2024
OCR - Azure AI Document Intelligence Uses Microsoft Azure's Document Intelligence service to perform different types of OCR for files stored in the SAS Server file system or stored on a URL. Sundaresh Sankaran / Jannic Horst 2024.02
or later
22MAR2024
OCR - Document Analysis - Execute Batch OCR Process Uses SAS Document Analysis to perform a batch run on files stored in the SAS Server file system. William Nadolski 2024.08
or later
09OCT2024
OCR - Document Analysis - Produce Usage Report Output Uses SAS Document Analysis to to generate a usage report on previous batch processes. William Nadolski 2024.08
or later
29OCT2024
Python - Load Objects to SAS Load Python objects to SAS Compute or CAS tables Sundaresh Sankaran 2023.08
or later
01SEP2022
Python - Virtual Environments A collection of 5 SAS Studio custom steps which help you create, activate, and switch between virtual Python environments for use within SAS Viya Sundaresh Sankaran 2020.1.5
or later
12JUL2022
R Runner Submit R scripts with support for input and output table Samiul Haque / Sundaresh Sankaran 2023.08
or later
18AUG2023
Rank Columns - Starter template Simple Example (based on template) SAS 2020.1.5
or later
26AUG2022
SAS Content - Copy File from File System Copy file from Compute file system into SAS Content folder programmatically Sundaresh Sankaran 2022.11
or later
09JAN2024
SAS Content - Create Folder Creates a new folder in SAS Content programmatically Sundaresh Sankaran 2022.11
or later
18DEC2023
SAS Content - Obtain Folder URI Obtain URI of selected SAS Content folder and save it in a global macro variable Sundaresh Sankaran 2022.11
or later
18DEC2023
SCD Loader Slowly Changing Dimensions loader with support for type 1 and type 2 changes Torben Juul Johansson 2020.1.5
or later
28SEP2022
SDG - Generate Synthetic Data through GANs Generate synthetic data using a trained GAN model Sundaresh Sankaran 2022.09
or later
04NOV2024
SDG - Generate Synthetic Data through SMOTE Generate synthetic data based on an input table, using the Synthetic Minority Oversampling TEchnique (SMOTE). Sundaresh Sankaran 2024.10
or later
02NOV2024
SDG - - Train a Synthetic Data Generator through GANs A collection of 3 steps which help you train, score and assess Synthetic Data models Sundaresh Sankaran 2024.10
or later
04NOV2024
Send SMTP Email Send Email message Mary Kathryn Queen 2022.1.4
or later
03APR2023
Send Teams Message Send Microsoft Teams Messages to a Teams channel David Weik / Tamara Fischer 2022.10
or later
14JUN2023
Surrogate Key Generator Generates a surrogate key based on a business key Torben Juul Johansson 2020.1.5
or later
29SEP2022
Translate Text Translates text stored in a column using DeepL API David Weik 2023.04
or later
10MAY2023
Update column labels Update column labels from a (metadata) table, delimited file, or interactively Ignacio Rodríguez 2023.11
or later
21DEC2023
Vector Databases - Hydrate Chroma DB Collection Populate a Chroma vector database collection with documents and embeddings contained in a CAS table Sundaresh Sankaran 2023.12
or later
24JAN2024
Vector Databases - Query Chroma Chroma DB Collection Query a Chroma vector database collection with documents and store results in CAS table Sundaresh Sankaran 2023.12
or later
30JAN2024
Vector Search - Fast KNN Identify nearest neighbors to observations in an input query table Sundaresh Sankaran 2023.11
or later
09FEB2024
Zip data Move or copy files residing on the Compute file system to zip files using Python Neil Griffin 2024.05
or later
25JUN2024