The "CAS - Load Tables from Folders in Filesystem" custom step loads all files (of a specified pattern) located within a filesystem folder, directly to in-memory Cloud Analytics Services (CAS) tables without pulling data through SAS Compute. Every file within the folder is loaded to a separate table with the same name (without the suffix).
This custom step is useful for applications where the user wishes to load multiple files to CAS in one go.
For example, suppose you use source code and data located within a Git repository. After cloning the repo using Git Integration, you can then run this step to load all required data to CAS tables.
Here's a general idea:
Tested in Viya 4, Stable 2022.11
Note that this Custom Step is intended to output Cloud Analytics Services (CAS) tables. Ensure you have a connection to CAS established before running this step. References to output table names and locations below should be interpreted as referring to CAS Tables and caslibs.
- Folder containing Input Data: Select the folder containing your input data on the Filesystem. Note that if your folder path is already assigned to a caslib, the same caslib will be used as the input caslib from which to load data. Files which have already been loaded to CAS may get reloaded. Proceed with the step only if this is what you desire, otherwise simply execute a loadTable action referring to the Caslib.
- File extension: As an option, you may choose to filter on a specific file extension. A future enhancement will allow you to select multiple extensions. For now, select either one extension, or ALL.
- Note that when there are files of the same name but with different extensions within a directory, the CAS loadTable action follows an order in which it will try to load these files (sashdats are processed before sas7bdats before csvs, for example) and will overwrite CAS tables of the same name. A future update will address this issue. For now, you may like to check your folder contents beforehand and address cases of duplicate file names prior to running this step.
- Pattern: As another option, you may like to provide a wildcard pattern which only loads files whose names conform to the said pattern. Feel free to leave this blank if desired. You do not need to provide the % within the pattern.
- Output Caslib: Provide an output CASLIB name (PUBLIC is the default). It is necessary to provide a global caslib if you wish to promote your table as well.
- Promote Output Tables: The default behaviour is to promote the CAS table upon load. Uncheck this box if you wish to use this table only within a SAS Studio session.
Ensure you have write access to the caslib you wish to save output tables to. Also, when writing output tables to commonly used / shared caslibs (such as PUBLIC), be mindful that the output will change for all users of that caslib and table.
The number of output datasets may vary, depending on the number of files you have in your selected folder. You may choose to select these output tables for visualization using SAS Visual Analytics.
To check the status of the load, you may either refer to the Libraries tab in SAS Studio (left menu bar) or you may like to check using Manage Data / Prepare Data in the main menu.
Refer the "About" tab on the step for further details.
Here's SAS documentation for the two main actions used within this Custom Step.
- A SAS Viya 4 environment (monthly release 2022.07 or later) with SAS Studio Flows.
- Refer to the steps listed here.
Version 1.1 (21DEC2022)