Documentation

Machines

Machines are the compute resources that execute jobs in your Compute Share organization.

Overview

A machine in Compute Share represents a physical or virtual computer that runs the Compute Share agent and can execute jobs submitted by organization members.

Machine States

Machines can be in one of three states:

  • Available 🟢 - Ready to accept new jobs
  • Busy 🟡 - Currently executing one or more jobs
  • Offline âš« - Agent is not responding (no recent heartbeat)

Registering Machines

Initial Registration

Register a new machine using the CLI:

compute-share register --name "powerful-workstation" --token YOUR_ORG_TOKEN

After registration, the machine will appear in the Machines dashboard.

Machine Naming

Choose descriptive names that indicate:

  • Machine location (e.g., "office-gpu-1")
  • Hardware specs (e.g., "64gb-16core")
  • Purpose (e.g., "ml-training-box")

Admin Approval

By default, new machines require admin approval before they can accept jobs. Admins can approve machines from the dashboard:

  1. Go to Machines
  2. Find the pending machine
  3. Click Approve or Reject

Managing Machines

Viewing Machine Details

The machine dashboard shows:

  • Current status (Available/Busy/Offline)
  • Last heartbeat timestamp
  • Running jobs count
  • Total jobs executed
  • Hardware information

Monitoring Health

Check machine health indicators:

  • Heartbeat: Should update every 30 seconds
  • Status: Should be "Available" when idle
  • Jobs: Monitor for failed or stuck executions

Removing Machines

To decommission a machine:

  1. Stop the agent on the machine:

    compute-share stop
    
  2. Remove from dashboard:

    • Navigate to Machines
    • Select the machine
    • Click Remove

Resource Limits

Configuring Limits

Control how much of your machine's resources can be used:

resources:
  cpu_limit: 80          # Max 80% CPU usage
  memory_limit: 16384    # Max 16GB RAM
  disk_limit: 102400     # Max 100GB disk
  max_concurrent_jobs: 4 # Max 4 jobs at once

GPU Support

For machines with GPUs:

resources:
  gpus:
    - device: 0
      memory: 8192  # MB
    - device: 1
      memory: 8192

Troubleshooting

Machine Shows Offline

If a machine appears offline:

  1. Check the agent is running:

    ps aux | grep compute-share
    
  2. Verify network connectivity

  3. Check agent logs for errors

  4. Restart the agent

Jobs Not Being Assigned

Possible causes:

  • Machine status is not "Available"
  • Resource limits are too restrictive
  • Machine doesn't meet job requirements
  • Organization queue is empty

High Resource Usage

Monitor and adjust:

  • Lower cpu_limit or memory_limit
  • Reduce max_concurrent_jobs
  • Check for runaway jobs
  • Review job resource requests

Best Practices

Maintenance

  • Keep the agent updated
  • Monitor disk space regularly
  • Review logs periodically
  • Test job execution after updates

Security

  • Keep registration tokens secure
  • Run agent with minimal privileges
  • Use firewall rules appropriately
  • Enable secure boot where possible

Performance

  • Allocate resources based on job patterns
  • Balance max_concurrent_jobs with available resources
  • Monitor for bottlenecks
  • Consider dedicated machines for heavy workloads

Next Steps