Machines
Machines are the compute resources that execute jobs in your Compute Share organization.
Overview
A machine in Compute Share represents a physical or virtual computer that runs the Compute Share agent and can execute jobs submitted by organization members.
Machine States
Machines can be in one of three states:
- Available 🟢 - Ready to accept new jobs
- Busy 🟡 - Currently executing one or more jobs
- Offline âš« - Agent is not responding (no recent heartbeat)
Registering Machines
Initial Registration
Register a new machine using the CLI:
compute-share register --name "powerful-workstation" --token YOUR_ORG_TOKEN
After registration, the machine will appear in the Machines dashboard.
Machine Naming
Choose descriptive names that indicate:
- Machine location (e.g., "office-gpu-1")
- Hardware specs (e.g., "64gb-16core")
- Purpose (e.g., "ml-training-box")
Admin Approval
By default, new machines require admin approval before they can accept jobs. Admins can approve machines from the dashboard:
- Go to Machines
- Find the pending machine
- Click Approve or Reject
Managing Machines
Viewing Machine Details
The machine dashboard shows:
- Current status (Available/Busy/Offline)
- Last heartbeat timestamp
- Running jobs count
- Total jobs executed
- Hardware information
Monitoring Health
Check machine health indicators:
- Heartbeat: Should update every 30 seconds
- Status: Should be "Available" when idle
- Jobs: Monitor for failed or stuck executions
Removing Machines
To decommission a machine:
-
Stop the agent on the machine:
compute-share stop -
Remove from dashboard:
- Navigate to Machines
- Select the machine
- Click Remove
Resource Limits
Configuring Limits
Control how much of your machine's resources can be used:
resources:
cpu_limit: 80 # Max 80% CPU usage
memory_limit: 16384 # Max 16GB RAM
disk_limit: 102400 # Max 100GB disk
max_concurrent_jobs: 4 # Max 4 jobs at once
GPU Support
For machines with GPUs:
resources:
gpus:
- device: 0
memory: 8192 # MB
- device: 1
memory: 8192
Troubleshooting
Machine Shows Offline
If a machine appears offline:
-
Check the agent is running:
ps aux | grep compute-share -
Verify network connectivity
-
Check agent logs for errors
-
Restart the agent
Jobs Not Being Assigned
Possible causes:
- Machine status is not "Available"
- Resource limits are too restrictive
- Machine doesn't meet job requirements
- Organization queue is empty
High Resource Usage
Monitor and adjust:
- Lower
cpu_limitormemory_limit - Reduce
max_concurrent_jobs - Check for runaway jobs
- Review job resource requests
Best Practices
Maintenance
- Keep the agent updated
- Monitor disk space regularly
- Review logs periodically
- Test job execution after updates
Security
- Keep registration tokens secure
- Run agent with minimal privileges
- Use firewall rules appropriately
- Enable secure boot where possible
Performance
- Allocate resources based on job patterns
- Balance
max_concurrent_jobswith available resources - Monitor for bottlenecks
- Consider dedicated machines for heavy workloads
Next Steps
- Learn about submitting jobs
- Manage your organization
- Review Installation for advanced configuration