Collecting System Information from a Cluster

Problem Definition

Goal: Use actor-IaC to collect system information (CPU, memory, disk, GPU, OS, network) from multiple compute nodes in a cluster.

In cluster management, understanding the configuration information of each node is a fundamental and important task. Using actor-IaC, you can execute the same information collection process in parallel on multiple nodes.

Network Configuration Assumed

This tutorial assumes a configuration where the operator terminal connects to compute nodes in the cluster via SSH. The following diagram shows the network configuration between the operator terminal and the cluster. The operator terminal is the machine running actor-IaC, and if located on a network external to the cluster, it accesses each compute node via a gateway.

This tutorial uses the following network configuration as an example.

Item	Value
Gateway IP address	192.168.5.1
Compute node IP addresses	192.168.5.13, .14, .15, .21, .22, .23
SSH username	youruser

Tutorial Structure

This tutorial proceeds in the following order.

Satisfying Prerequisites

For actor-IaC to connect to remote nodes, SSH authentication must be configured. Choose either public key authentication or password authentication.

SSH Connection with Public Key Authentication: Configure authentication using SSH key pairs. Recommended for security and convenience.
SSH Password Authentication: Configure password-based authentication when key pairs cannot be used.
Sudo Execution: Configure sudo for commands requiring elevated privileges.
SSH Non-Interactive Mode: Handle non-interactive SSH sessions.
Login Configuration Reference: Detailed reference for inventory file login settings.

Main Topic

Once prerequisites are met, create and execute a workflow.

System Information Collection Workflow: Complete example of creating and executing a workflow to collect system information from multiple nodes.

How to do it

Prerequisites

actor-IaC must be installed (refer to installation tutorial)
SSH requirements must be satisfied

1. Create Working Directory

mkdir -p ~/works/testcluster-iac/sysinfo
cd ~/works/testcluster-iac

2. Create Inventory File

Define the target nodes for system information collection in inventory.ini.

cat > inventory.ini << 'EOF'
[compute]
node13 actoriac_host=192.168.5.13
node14 actoriac_host=192.168.5.14
node15 actoriac_host=192.168.5.15
node21 actoriac_host=192.168.5.21
node22 actoriac_host=192.168.5.22
node23 actoriac_host=192.168.5.23

[compute:vars]
actoriac_user=youruser
EOF

Item	Description
`[compute]`	Group name
`node13` to `node23`	Identifier for each node
`actoriac_host=...`	IP address of each node
`[compute:vars]`	Variables applied to the entire group
`actoriac_user=youruser`	SSH connection username

3. Create Sub-Workflow

Create the sub-workflow that will be executed on each node in sysinfo/collect-sysinfo.yaml.

cat > sysinfo/collect-sysinfo.yaml << 'EOF'
name: collect-sysinfo

description: |
  Sub-workflow to collect system information from each compute node.
  Retrieves hostname, OS, CPU, memory, disk, GPU, and network information.

steps:
  - states: ["0", "1"]
    note: Retrieve hostname and OS information
    actions:
      - actor: this
        method: executeCommand
        arguments:
          - |
            echo "===== HOSTNAME ====="
            hostname -f
            echo ""
            echo "===== OS INFO ====="
            cat /etc/os-release | grep -E "^(NAME|VERSION|ID)="
            uname -a

  - states: ["1", "2"]
    note: Retrieve CPU architecture, core count, and model name
    actions:
      - actor: this
        method: executeCommand
        arguments:
          - |
            echo "===== CPU INFO ====="
            lscpu | grep -E "^(Architecture|CPU\(s\)|Model name|Thread|Core|Socket)"

  - states: ["2", "3"]
    note: Retrieve memory capacity (total, used, free)
    actions:
      - actor: this
        method: executeCommand
        arguments:
          - |
            echo "===== MEMORY INFO ====="
            free -h

  - states: ["3", "4"]
    note: Retrieve disk device list and mount status
    actions:
      - actor: this
        method: executeCommand
        arguments:
          - |
            echo "===== DISK INFO ====="
            lsblk -d -o NAME,SIZE,TYPE,MODEL 2>/dev/null || lsblk -d -o NAME,SIZE,TYPE
            echo ""
            df -h | grep -E "^(/dev|Filesystem)"

  - states: ["4", "5"]
    note: Retrieve GPU presence and model name (uses nvidia-smi for NVIDIA GPUs)
    actions:
      - actor: this
        method: executeCommand
        arguments:
          - |
            echo "===== GPU INFO ====="
            if command -v nvidia-smi &> /dev/null; then
                nvidia-smi --query-gpu=name,memory.total,driver_version --format=csv,noheader 2>/dev/null || echo "nvidia-smi failed"
            else
                lspci 2>/dev/null | grep -i -E "(vga|3d|display)" || echo "No GPU detected via lspci"
            fi

  - states: ["5", "end"]
    note: Retrieve network interfaces and IP addresses
    actions:
      - actor: this
        method: executeCommand
        arguments:
          - |
            echo "===== NETWORK INFO ====="
            ip -4 addr show | grep -E "(^[0-9]+:|inet )" | head -20
EOF

4. Create Main Workflow

Create the main workflow that calls the sub-workflow in sysinfo/main-collect-sysinfo.yaml.

cat > sysinfo/main-collect-sysinfo.yaml << 'EOF'
name: main-collect-sysinfo

description: |
  Main workflow to collect system information from all compute nodes in parallel.
  Executes collect-sysinfo.yaml on each node.

steps:
  - states: ["0", "end"]
    note: Execute collect-sysinfo.yaml in parallel on all nodes
    actions:
      - actor: nodeGroup
        method: apply
        arguments:
          actor: "node-*"
          method: runWorkflow
          arguments: ["collect-sysinfo.yaml"]
EOF

5. Execute Workflow

./actor_iac.java run -w sysinfo/main-collect-sysinfo.yaml -i inventory.ini -g compute

Option	Description
`-w sysinfo/main-collect-sysinfo.yaml`	Workflow file to execute
`-i inventory.ini`	Inventory file
`-g compute`	Target group

actor-IaC executes sub-workflows in parallel on 6 nodes. Output from each node is prefixed with [node-node13] etc. All logs are automatically saved to actor-iac-logs.mv.db.

If SSH authentication errors occur, refer to the troubleshooting section in SSH requirements.

6. Directory Structure

Directory structure after completion:

~/works/testcluster-iac/
├── actor_iac.java
├── actor-iac-logs.mv.db    ← Automatically created after execution
├── inventory.ini
└── sysinfo/
    ├── collect-sysinfo.yaml
    └── main-collect-sysinfo.yaml

7. Verify Workflow Contents (Optional)

You can verify workflow contents with the describe command.

Workflow list:

./actor_iac.java list -w sysinfo

Workflow details:

./actor_iac.java describe -w sysinfo/collect-sysinfo.yaml --steps

If you write description: and note: in the workflow, you can verify them with the describe command.

Under the hood

Actor Tree Generation

When executing a workflow, actor-IaC reads the inventory file and generates an actor tree. actor-IaC generates a node actor corresponding to each node definition in the inventory file and places it as a child actor of the nodeGroup actor. In the above example, 6 node actors are generated for 6 nodes (node13 to node23).

ROOT actor
└── nodeGroup actor ("nodeGroup")
    ├── node-node13 actor
    ├── node-node14 actor
    ├── node-node15 actor
    ├── node-node21 actor
    ├── node-node22 actor
    └── node-node23 actor

The purpose of actor-IaC is to execute the same configuration tasks in parallel on multiple servers. To achieve parallel execution, actor-IaC places multiple node actors as child actors under the nodeGroup actor.

Actor	Role
nodeGroup actor	Executes the main workflow. Controls which sub-workflows to execute on which nodes
node actor	Executes sub-workflows. Defines the specific commands to execute on each node

Parallel Execution of Sub-Workflows

When the main workflow main-collect-sysinfo.yaml calls the nodeGroup.apply() method, the nodeGroup actor executes the sub-workflow collect-sysinfo.yaml in parallel on all child actors matching the specified pattern (node-*). Each node actor executes commands on the remote node through SSH connection and returns the results.

For log output aggregation and persistence, refer to the tutorial on utilizing results.

Problem Definition​

Network Configuration Assumed​

Tutorial Structure​

Satisfying Prerequisites​

Main Topic​

How to do it​

Prerequisites​

1. Create Working Directory​

2. Create Inventory File​

3. Create Sub-Workflow​

4. Create Main Workflow​

5. Execute Workflow​

6. Directory Structure​

7. Verify Workflow Contents (Optional)​

Under the hood​

Actor Tree Generation​

Parallel Execution of Sub-Workflows​