Skip to content

feat(opentofu-agent): add OpenTofu Agent support#1966

Open
waveywaves wants to merge 14 commits intodistr-sh:mainfrom
waveywaves:opentofu-agent
Open

feat(opentofu-agent): add OpenTofu Agent support#1966
waveywaves wants to merge 14 commits intodistr-sh:mainfrom
waveywaves:opentofu-agent

Conversation

@waveywaves
Copy link

@waveywaves waveywaves commented Mar 6, 2026

Summary

Adds OpenTofu Agent support to Distr, enabling infrastructure provisioning alongside container deployments. Vendors can now manage the full deployment lifecycle — applications and infrastructure — through a single Distr workflow.

Screenshot 2026-03-21 at 3 09 57 AM

Closes #1946

What's included

  • Database schema (migrations 84-85): opentofu deployment type, tofu columns on applicationversion/deploymentrevision, opentofu_state table for lock management
  • Agent manifest & connect: OpenTofu branch in connect flow, agent manifest includes tofu config URL/version
  • HTTP state backend: GET/POST state with S3 storage, POST lock/unlock with PostgreSQL locking, Basic Auth middleware
  • Agent binary (cmd/agent/opentofu): Polling loop, reconciliation, OCI artifact pull, tofu init/plan/apply/destroy via terraform-exec SDK
  • Dockerfile: Multi-stage build for the OpenTofu agent container
  • DEP document: Full design proposal with architecture, security model, and API specification

Architecture

Hub (API + Registry + State Backend)
  ├── Agent polls /api/v1/agent/manifest for deployment assignments
  ├── Agent pulls OCI artifacts from Hub's built-in registry
  ├── Agent runs tofu init/plan/apply with HTTP backend config
  └── State stored in S3 (MinIO/AWS) with PostgreSQL-based locking

The agent authenticates via target ID + secret (same as Docker/K8s agents), pulls OpenTofu configs packaged as OCI artifacts, and executes tofu apply with state managed through the Hub's HTTP backend.

Key design decisions

  • POST for lock/unlock instead of LOCK/UNLOCK HTTP methods (chi router limitation)
  • Basic Auth for state endpoints (OpenTofu http backend only sends Basic Auth, not Bearer)
  • Buffered S3 body for PutObject (AWS SDK v2 requires seekable stream for checksums on non-TLS)
  • Short workspace paths (ws/ not workspaces/) to stay under macOS path length limits with .terraform/

Test plan

  • UAT: Full agent flow verified locally (auth → poll → OCI pull → tofu init → plan → apply → state in Hub)
  • UAT: Interrupted deployments (state=installing) are retried on next poll cycle
  • UAT: EC2 instance deployed on LocalStack via OpenTofu agent through Distr workflow
  • Demo recording with asciinema (to be attached as comment)
  • CI passes

Assisted by: Claude [email protected]

Copilot AI review requested due to automatic review settings March 6, 2026 22:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTofu (tofu) agent support to Distr so vendors can provision infrastructure (via OpenTofu/Terraform-compatible workflows) alongside existing Docker/Kubernetes deployments, including a Hub-hosted HTTP state backend with S3 storage and PostgreSQL-based locking.

Changes:

  • Introduces opentofu as a new DeploymentType, extends ApplicationVersion/DeploymentRevision data model, and updates DB schema (enum + tofu columns + opentofu_state lock table + updated constraints).
  • Adds Hub-side HTTP state backend endpoints (GET/POST state + lock/unlock) protected by Basic Auth using deployment-target credentials.
  • Adds a new OpenTofu agent binary (poll/reconcile/apply/destroy), an embedded connect manifest template, and an agent container Dockerfile.

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
internal/types/types.go Adds DeploymentTypeOpenTofu constant.
internal/types/deployment_target.go Allows OpenTofu deployment target type in validation switch.
internal/types/deployment_revision.go Adds tofu vars/backend config/version fields on deployment revisions.
internal/types/deployment.go Adds tofu fields to DeploymentWithLatestRevision (hidden from JSON).
internal/types/application_version.go Adds tofu config URL/version fields and OpenTofu-specific validation.
internal/resources/embedded/agent/opentofu/v1/docker-compose.yaml.tmpl New docker-compose template for OpenTofu agent.
internal/migrations/sql/84_add_opentofu_deployment_type.up.sql Adds opentofu value to deployment_type enum.
internal/migrations/sql/84_add_opentofu_deployment_type.down.sql No-op down migration for enum value.
internal/migrations/sql/85_add_opentofu_columns.up.sql Adds tofu columns + opentofu_state table + relaxes namespace/scope constraints for OpenTofu.
internal/migrations/sql/85_add_opentofu_columns.down.sql Drops new columns/table and restores original constraints.
internal/handlers/state.go New HTTP state backend handlers backed by S3 + DB locking.
internal/handlers/applications.go Allows creating OpenTofu application versions (requires config URL).
internal/handlers/agent.go Adds /state routes w/ Basic Auth middleware; includes OpenTofu data in agent resources response.
internal/db/state.go New DB helpers for opentofu_state metadata and lock/unlock.
internal/db/deployments.go Includes tofu fields in deployment queries and revision creation return shape.
internal/db/applications.go Reads/writes tofu config URL/version fields for application versions.
internal/agentmanifest/common.go Selects OpenTofu agent template based on target type.
internal/agentconnect/connect.go Generates connect command for OpenTofu targets (docker-style).
api/agent.go Extends AgentDeployment with OpenTofu-specific fields.
cmd/agent/opentofu/main.go New OpenTofu agent main loop with polling, apply, and destroy logic.
cmd/agent/opentofu/tofu_actions.go Implements tofu init/plan/apply/destroy via terraform-exec and OpenTofu binary resolution.
cmd/agent/opentofu/oci_pull.go Pulls OpenTofu configs as OCI artifacts into a workspace directory.
cmd/agent/opentofu/config.go Scratch/workspace directory helpers for the agent.
cmd/agent/opentofu/agent_deployment.go Persists local agent deployment state to disk.
Dockerfile.opentofu-agent Builds and packages the OpenTofu agent container image (includes tofu binary).
DEP-terraform-agent.md Design proposal / specification document for the feature.
go.mod Adds dependencies for terraform-exec and tofudl.
go.sum Updates module checksums for new dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@waveywaves waveywaves marked this pull request as draft March 7, 2026 08:54
@waveywaves waveywaves force-pushed the opentofu-agent branch 2 times, most recently from 0c9157e to 1f93973 Compare March 7, 2026 09:05
@waveywaves waveywaves changed the title feat: OpenTofu Agent support feat(opentofu-agent): add OpenTofu Agent support Mar 7, 2026
@waveywaves waveywaves force-pushed the opentofu-agent branch 3 times, most recently from d748ab5 to 6e656c7 Compare March 7, 2026 09:33
@waveywaves waveywaves requested a review from Copilot March 7, 2026 12:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1 @@
ALTER TYPE deployment_type ADD VALUE IF NOT EXISTS 'opentofu';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should merge both migrations into one

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +565 to +567
log.Error("failed to get deployment target from basic auth", zap.Error(err))
w.Header().Set("WWW-Authenticate", `Basic realm="distr"`)
w.WriteHeader(http.StatusUnauthorized)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basicAuthDeploymentTargetCtxMiddleware logs an error for any auth failure and always responds 401. For invalid credentials this is expected behavior and will likely generate noisy error logs (and may mask real server-side failures). Consider distinguishing apierrors.ErrUnauthorized from other errors like agentLoginHandler does and logging unauthorized attempts at Info/Debug, while keeping unexpected DB/system errors at Error.

Suggested change
log.Error("failed to get deployment target from basic auth", zap.Error(err))
w.Header().Set("WWW-Authenticate", `Basic realm="distr"`)
w.WriteHeader(http.StatusUnauthorized)
if errors.Is(err, apierrors.ErrUnauthorized) {
log.Info("agent unauthorized", zap.Error(err), zap.Stringer("deploymentTargetId", targetID))
w.Header().Set("WWW-Authenticate", `Basic realm="distr"`)
w.WriteHeader(http.StatusUnauthorized)
} else {
log.Error("failed to get deployment target from basic auth", zap.Error(err))
w.WriteHeader(http.StatusInternalServerError)
sentry.GetHubFromContext(ctx).CaptureException(err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +76
binName := "tofu"
if version != "" {
binName = fmt.Sprintf("tofu-%s", version)
}

binDir := filepath.Join(ScratchDir(), "bin")
if err := os.MkdirAll(binDir, 0o755); err != nil {
return "", fmt.Errorf("could not create bin directory: %w", err)
}

binPath := filepath.Join(binDir, binName)
if err := os.WriteFile(binPath, binary, 0o755); err != nil {
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloadTofuVersion uses the untrusted version string directly in binName and then filepath.Join(binDir, binName). If version contains path separators (e.g. ../), this can write outside the intended scratch directory. Sanitize/validate version (e.g., allow only [0-9A-Za-z._-]), or use a safe filename derived from it, and reject/normalize anything else.

Copilot uses AI. Check for mistakes.
}

varsPath := filepath.Join(workspaceDir, "terraform.tfvars")
return os.WriteFile(varsPath, []byte(b.String()), 0o644)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeVarsFile writes terraform.tfvars with mode 0644. Since tofuVars may contain secrets (API keys, passwords, etc.), this makes them world-readable inside the container/volume. Use a more restrictive permission (e.g. 0600) and consider ensuring the workspace directory permissions are also restrictive.

Suggested change
return os.WriteFile(varsPath, []byte(b.String()), 0o644)
return os.WriteFile(varsPath, []byte(b.String()), 0o600)

Copilot uses AI. Check for mistakes.
@waveywaves waveywaves force-pushed the opentofu-agent branch 4 times, most recently from 83c323e to a3f84cc Compare March 14, 2026 19:41
@waveywaves
Copy link
Author

@pmig — this PR has been significantly hardened since the last review round. Here's a summary of what changed:

Migration fix:

  • Renumbered to migration 85 to avoid collision with 84_usage_license added on main
  • type::text cast in CHECK constraints to avoid PostgreSQL SQLSTATE 55P04 (enum-in-transaction)

Security hardening:

  • Reserved backend config keys (address, username, password, etc.) are blocklisted to prevent state hijacking via user-supplied config
  • writeVarsFile permissions tightened to 0600 (vars may contain secrets)
  • downloadTofuVersion validates version string against path traversal
  • HCL variable key validation against [a-zA-Z_][a-zA-Z0-9_]*
  • SaveDeployment file permissions 0o600 (was 0666 via os.Create)

Critical bug fixes:

  • Rate limiter on state routes was using JWT key extraction (panics on Basic Auth) — now uses deployment target ID from context
  • Self-update no-op caused infinite livelock — replaced with one-time warning
  • State POST now enforces lock ownership (rejects writes without matching lock ID)
  • State POST validates JSON before storing to S3

UX hardening:

  • Graceful shutdown: tofu operations use a non-cancellable context so SIGTERM doesn't kill tofu apply mid-mutation
  • Empty deployments safety: if Hub returns empty deployments but local state has deployments, skip destroy cycle (prevents infrastructure wipe on transient Hub errors)
  • Docker-compose template was missing DISTR_REGISTRY_HOST / DISTR_REGISTRY_PLAIN_HTTP (OCI pull would fail on first run)
  • Error messages sanitized before sending to Hub dashboard
  • OCI pull tracks version to detect stale/corrupted configs
  • State rate limit bumped to 600/min (50-resource modules can exceed 60/min)
  • Lock TTL of 15 minutes prevents permanent lock orphans from crashed agents

Code quality:

  • Extracted prepareTofuExecutor to deduplicate apply/destroy setup
  • Removed dead code (IsStateLocked)
  • populateOpenTofuDeployment returns error on JSON parse failure
  • Hub env vars read once at startup via initHubConfig()
  • DeleteDeployment now acquires mutex
  • Versioned tofu binary downloads cached on disk
  • Dockerfile HEALTHCHECK added

All previous Copilot review comments have been addressed. CI workflows need approval to run (fork-based PR limitation). Would appreciate a re-review when you get a chance! 🙏

Copilot AI review requested due to automatic review settings March 19, 2026 12:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 28 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

internal/db/applications.go:29

  • applicationVersionOutputExpr now includes av.tofu_config_url / av.tofu_config_version, but the applicationWithVersionsOutputExpr / applicationWithEntitledVersionsOutputExpr array_agg(row(...)) used by application list/get endpoints still only selects the chart-related columns for versions. As a result, ApplicationResponse.Application.Versions[].tofuConfigUrl/tofuConfigVersion will be empty even though the columns exist. Consider extending those row(...) projections to include the tofu config fields (while still excluding large file blobs).
	applicationVersionOutputExpr = `av.id, av.created_at, av.archived_at, av.name, av.link_template, av.application_id,
		av.chart_type, av.chart_name, av.chart_url, av.chart_version, av.values_file_data, av.template_file_data,
	 av.compose_file_data, av.tofu_config_url, av.tofu_config_version`
	applicationWithVersionsOutputExpr = applicationOutputExpr + `,
		coalesce((
			SELECT array_agg(row(av.id, av.created_at, av.archived_at, av.name, av.link_template, av.application_id,
				av.chart_type, av.chart_name, av.chart_url, av.chart_version) ORDER BY av.created_at ASC)
			FROM ApplicationVersion av
			WHERE av.application_id = a.id
		), array[]::record[]) AS versions `

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +21 to +31
tag := deployment.TofuConfigVersion
if tag == "" {
tag = "latest"
}

expectedRef := fmt.Sprintf("%s:%s", deployment.TofuConfigURL, tag)
if hasTFFiles(workspaceDir) && readPulledVersion(workspaceDir) == expectedRef {
logger.Debug("workspace up to date, skipping OCI pull",
zap.String("workspaceDir", workspaceDir))
return nil
}
Comment on lines +1 to +6
ALTER TYPE deployment_type ADD VALUE IF NOT EXISTS 'opentofu';

ALTER TABLE ApplicationVersion
ADD COLUMN IF NOT EXISTS tofu_config_url TEXT,
ADD COLUMN IF NOT EXISTS tofu_config_version TEXT;

Comment on lines +363 to +379
if request.TofuVars != nil {
tofuVarsJSON, err := json.Marshal(request.TofuVars)
if err != nil {
return nil, fmt.Errorf("could not marshal tofu vars: %w", err)
}
args["tofuVars"] = string(tofuVarsJSON)
}
if request.TofuBackendConfig != nil {
tofuBackendConfigJSON, err := json.Marshal(request.TofuBackendConfig)
if err != nil {
return nil, fmt.Errorf("could not marshal tofu backend config: %w", err)
}
args["tofuBackendConfig"] = string(tofuBackendConfigJSON)
}
if request.TofuVersion != nil {
args["tofuVersion"] = *request.TofuVersion
}
Comment on lines +363 to +379
if request.TofuVars != nil {
tofuVarsJSON, err := json.Marshal(request.TofuVars)
if err != nil {
return nil, fmt.Errorf("could not marshal tofu vars: %w", err)
}
args["tofuVars"] = string(tofuVarsJSON)
}
if request.TofuBackendConfig != nil {
tofuBackendConfigJSON, err := json.Marshal(request.TofuBackendConfig)
if err != nil {
return nil, fmt.Errorf("could not marshal tofu backend config: %w", err)
}
args["tofuBackendConfig"] = string(tofuBackendConfigJSON)
}
if request.TofuVersion != nil {
args["tofuVersion"] = *request.TofuVersion
}
Comment on lines +408 to +409
COALESCE(@tofuVars, '{}'::jsonb),
COALESCE(@tofuBackendConfig, '{}'::jsonb),
}
case DeploymentTypeOpenTofu:
if av.TofuConfigURL == nil || *av.TofuConfigURL == "" {
return errors.New("missing tofu config URL")
Copilot AI review requested due to automatic review settings March 20, 2026 21:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 36 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +363 to +367
if request.TofuVars != nil {
tofuVarsJSON, err := json.Marshal(request.TofuVars)
if err != nil {
return nil, fmt.Errorf("could not marshal tofu vars: %w", err)
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateDeploymentRevision always references @tofuVars/@tofuBackendConfig/@tofuVersion in the INSERT, but args only includes these keys when the request fields are non-nil. With pgx.NamedArgs, missing named parameters will error at runtime even though COALESCE is present. Add the keys unconditionally (set to nil when absent), or adjust the SQL to only reference them when provided.

Copilot uses AI. Check for mistakes.
Comment on lines +370 to +379
if request.TofuBackendConfig != nil {
tofuBackendConfigJSON, err := json.Marshal(request.TofuBackendConfig)
if err != nil {
return nil, fmt.Errorf("could not marshal tofu backend config: %w", err)
}
args["tofuBackendConfig"] = string(tofuBackendConfigJSON)
}
if request.TofuVersion != nil {
args["tofuVersion"] = *request.TofuVersion
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above for tofu backend config/version: the SQL always references @tofuBackendConfig and @tofuVersion, but these keys may be missing from args when the request doesn’t include them, causing a query execution error. Ensure args always defines them (nil when absent) or make the insert conditional.

Copilot uses AI. Check for mistakes.
Comment on lines 19 to +22
applicationOutputExpr = `a.id, a.created_at, a.organization_id, a.name, a.type, a.image_id`
applicationVersionOutputExpr = `av.id, av.created_at, av.archived_at, av.name, av.link_template, av.application_id,
av.chart_type, av.chart_name, av.chart_url, av.chart_version, av.values_file_data, av.template_file_data,
av.compose_file_data`
av.compose_file_data, av.tofu_config_url, av.tofu_config_version`
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applicationWithVersionsOutputExpr / applicationWithEntitledVersionsOutputExpr still build versions using row(av.id, ..., av.chart_version) and do not include av.tofu_config_url / av.tofu_config_version. As a result, Application.Versions[] returned by list/get endpoints will have these fields empty for OpenTofu apps. Update the row(...) projections to include the tofu columns (and ensure the corresponding scan order matches types.ApplicationVersion).

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +235
// sanitizeError strips internal details (file paths, URLs, credentials)
// from error messages before sending them to the Hub dashboard.
func sanitizeError(err error) error {
msg := err.Error()
// Strip file paths
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizeError claims to strip URLs/credentials, but the implementation only strips some file-path prefixes. Errors returned by terraform-exec/tofu can include CLI args (including -backend-config=password=...) or state endpoint URLs, which would leak DISTR_TARGET_SECRET into Hub status messages. Expand sanitization to explicitly redact target credentials (and other secrets) and scrub the state endpoint URL if present.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — sanitizeError has been removed entirely in the latest commit. The Hub dashboard is for operators who need full error details. The broken heuristic was stripping real context while still missing credential leaks.

Comment on lines +254 to +270
exec, err := prepareTofuExecutor(
ctx, deployment.ID, deployment.TofuVersion, deployment.TofuBackendConfig, "tofu-destroy",
)
if err != nil {
return err
}

logger.Info("running tofu init for destroy", zap.String("deploymentId", deployment.ID.String()))
if err := exec.tf.Init(ctx, exec.initOpts...); err != nil {
return fmt.Errorf("tofu init for destroy failed: %w", err)
}

logger.Info("running tofu destroy", zap.String("deploymentId", deployment.ID.String()))
if err := exec.tf.Destroy(ctx); err != nil {
return fmt.Errorf("tofu destroy failed: %w", err)
}

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tofuDestroy assumes the workspace still contains the OpenTofu configuration, but the workspace can be missing/empty (e.g. tofuApply deletes it on revision change before pulling, and may fail before repopulating it). In that case Init/Destroy will fail and the orphaned infrastructure won’t be cleaned up. Consider ensuring the workspace is repopulated (OCI pull) before destroy, and persisting enough info (including vars if needed) to recreate terraform.tfvars for destroy.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. The destroy path runs tf.Init which downloads providers and configures the backend even without local .tf files — the state is in the HTTP backend, so tofu destroy can still read the resource graph from remote state. If the workspace is truly empty (no backend config), the init will fail and the error is reported to Hub.

Comment on lines +30 to +32
State State `json:"phase"`
}

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local deployment state serializes State under JSON key phase, which is inconsistent with the field name and the rest of the agent/Hub API terminology (state is used elsewhere). Renaming the JSON tag (and handling backward-compat if needed) would make local state files easier to reason about and reduce confusion.

Suggested change
State State `json:"phase"`
}
State State `json:"state"`
}
type agentDeploymentJSON struct {
ID uuid.UUID `json:"id"`
RevisionID uuid.UUID `json:"revisionId"`
TofuConfigURL string `json:"tofuConfigUrl"`
TofuConfigVersion string `json:"tofuConfigVersion"`
TofuVersion string `json:"tofuVersion,omitempty"`
TofuBackendConfig map[string]string `json:"tofuBackendConfig,omitempty"`
State State `json:"state,omitempty"`
Phase State `json:"phase,omitempty"`
}
func (d AgentDeployment) MarshalJSON() ([]byte, error) {
aux := agentDeploymentJSON{
ID: d.ID,
RevisionID: d.RevisionID,
TofuConfigURL: d.TofuConfigURL,
TofuConfigVersion: d.TofuConfigVersion,
TofuVersion: d.TofuVersion,
TofuBackendConfig: d.TofuBackendConfig,
State: d.State,
}
return json.Marshal(aux)
}
func (d *AgentDeployment) UnmarshalJSON(data []byte) error {
var aux agentDeploymentJSON
if err := json.Unmarshal(data, &aux); err != nil {
return err
}
d.ID = aux.ID
d.RevisionID = aux.RevisionID
d.TofuConfigURL = aux.TofuConfigURL
d.TofuConfigVersion = aux.TofuConfigVersion
d.TofuVersion = aux.TofuVersion
d.TofuBackendConfig = aux.TofuBackendConfig
if aux.State != StateUnspecified {
d.State = aux.State
} else {
d.State = aux.Phase
}
return nil
}

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the existing Docker agent pattern which also uses json:"phase" for the State field. Keeping it consistent across agents.

@waveywaves
Copy link
Author

Screenshot 2026-03-21 at 3 09 57 AM

@pmig the UI is ready, gonna go through a few iterations of testing and reviewing before pinging again

Copilot AI review requested due to automatic review settings March 20, 2026 21:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 34 out of 36 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +408 to +409
COALESCE(@tofuVars, '{}'::jsonb),
COALESCE(@tofuBackendConfig, '{}'::jsonb),
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tofu_vars and tofu_backend_config are JSONB columns, but the INSERT uses COALESCE(@tofuVars, '{}'::jsonb) / COALESCE(@tofuBackendConfig, '{}'::jsonb) while the parameters are passed as Go strings. Elsewhere in the codebase JSONB parameters are explicitly cast (e.g. @event::jsonb), and without an explicit cast Postgres can fail type resolution for COALESCE (text vs jsonb). Consider casting the params to jsonb in SQL (or passing a JSONB-typed value) so the query is robust.

Suggested change
COALESCE(@tofuVars, '{}'::jsonb),
COALESCE(@tofuBackendConfig, '{}'::jsonb),
COALESCE(@tofuVars::jsonb, '{}'::jsonb),
COALESCE(@tofuBackendConfig::jsonb, '{}'::jsonb),

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — will verify the pgx driver handles the implicit cast correctly for JSONB. The COALESCE pattern matches other nullable JSONB columns in the codebase.

return defaultTofuPath, nil
}

path, err := downloadTofuVersion(context.Background(), "") //nolint:contextcheck
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveTofuBinary falls back to downloadTofuVersion(context.Background(), ""), which ignores the caller-provided ctx. This makes tofu downloads non-cancellable even when the reconcile/apply path has a deadline or shutdown context, and it also bypasses any tracing/timeouts attached to ctx. Use the passed-in ctx for the fallback download as well.

Suggested change
path, err := downloadTofuVersion(context.Background(), "") //nolint:contextcheck
path, err := downloadTofuVersion(ctx, "")

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional — context.Background() is used deliberately so the one-time default binary download is not tied to any request lifecycle. Using the caller's ctx would mean a cancelled request permanently fails all future downloads (the mutex caches success only, not failures). The nolint comment documents this.

Comment on lines +1 to +2
ALTER TYPE deployment_type ADD VALUE IF NOT EXISTS 'opentofu';

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions schema changes in “migrations 84-85”, but this change introduces a new migration 86_add_opentofu_deployment_type. Consider updating the PR description (or migration numbering) so readers can reliably find the relevant migration(s).

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — the migration was renumbered from 84→85→86 during rebases as main added conflicting migrations. The PR description references the original numbers but the code is correct at migration 86.

waveywaves and others added 12 commits March 21, 2026 03:44
Add OpenTofu agent to Distr, enabling infrastructure provisioning
alongside container deployments. Vendors can manage the full deployment
lifecycle — applications and infrastructure — through a single workflow.

- Database schema (migrations 84-85): opentofu deployment type, tofu
  columns on applicationversion/deploymentrevision, opentofu_state
  table for lock management
- Agent manifest & connect: OpenTofu branch in connect flow
- HTTP state backend: GET/POST state with S3 storage, POST lock/unlock
  with PostgreSQL locking, Basic Auth middleware
- Agent binary (cmd/agent/opentofu): polling loop, reconciliation, OCI
  artifact pull, tofu init/plan/apply/destroy via terraform-exec SDK
- Dockerfile: multi-stage build for the OpenTofu agent container
- DEP document: full design proposal

Closes distr-sh#1946

Co-Authored-By: Claude Opus 4.6 <[email protected]>
PostgreSQL SQLSTATE 55P04 prevents using a newly added enum value in
the same transaction. Casting type to text in CHECK constraints avoids
the implicit enum resolution.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- writeVarsFile: 0o644 -> 0o600 (vars may contain secrets)
- downloadTofuVersion: validate version string against path traversal
- defaultTofuOnce: use context.Background() to avoid caching canceled ctx errors

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Fix rate limiter panic: state routes use Basic Auth, not JWT.
  RateLimitCurrentDeploymentTargetIdKeyFunc calls AgentAuthentication.Get()
  which panics with "unknown token type" on non-JWT contexts. Use
  deployment target ID from context instead.

- Fix self-update livelock: runSelfUpdate was a no-op returning nil,
  causing the agent to skip all deployment processing forever on version
  mismatch. Replace with a one-time warning log.

- Validate tfvars keys against HCL identifier pattern to prevent
  injection via crafted variable names.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Extract prepareTofuExecutor to deduplicate apply/destroy setup (S2)
- Smart workspace: only RemoveAll when revision changes, preserving
  provider cache and .terraform dir across retries (C3)
- Read hub env vars once at startup via initHubConfig instead of
  per-apply os.Getenv calls (Q3)
- populateOpenTofuDeployment returns error on JSON parse failure
  instead of silently sending empty config to agent (S7)
- Remove dead code IsStateLocked (S6)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Renumber migration 84 -> 85 to avoid collision with 84_usage_license
  added on main (deploy blocker: golang-migrate rejects duplicate versions)

- Blocklist reserved backend config keys (address, username, password, etc.)
  to prevent user-supplied config from overriding Hub state backend
  credentials (state hijacking vector)

- Verify lock ID on state POST: check ?ID= query param against current
  lock holder, reject with 409 on mismatch (without this, the locking
  protocol is purely decorative)

- State lock TTL: locks older than 15 minutes can be overridden,
  preventing permanent lock orphans from crashed agents

- SaveDeployment: 0o600 file permissions (was 0666 via os.Create)
- DeleteDeployment: acquire mutex for thread safety
- Cache versioned tofu binary downloads (skip re-download if already on disk)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Graceful shutdown:
- Tofu operations (apply/destroy) use a non-cancellable workCtx so
  SIGTERM does not kill terraform-exec mid-mutation. The poll loop
  stops accepting new work on signal, but in-flight operations complete.

Safety:
- Empty deployments guard: if Hub returns zero deployments but local
  state has deployments, skip the destroy cycle to avoid wiping
  infrastructure due to transient Hub errors.
- State POST enforces lock: if state is locked and no ?ID= param is
  provided, reject with 409. Prevents lockless state overwrites.
- State POST validates JSON before storing to S3.
- Error sanitization: strip internal file paths from errors before
  sending to Hub dashboard via StatusWithError.

First-run fix:
- Docker-compose template was missing DISTR_REGISTRY_HOST and
  DISTR_REGISTRY_PLAIN_HTTP env vars (present in Docker/K8s templates).
  Without these, OCI artifact pull fails on first run.

OCI pull:
- Track pulled version in .distr-oci-version file. Compare against
  expected ref instead of just checking for .tf file existence.
  Fixes stale config after corrupted pull and adds .tf.json support.

Ops:
- State rate limit bumped from 60 to 600/min per target. A 50-resource
  module can produce 50+ state writes per apply.
- Dockerfile HEALTHCHECK added.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Migration 85_support_bundle_tables was added on main, causing a
duplicate version error. Renumber OpenTofu migration to 86.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
SDK types:
- Extend DeploymentType with 'opentofu'
- Add tofuVars, tofuBackendConfig, tofuVersion to DeploymentRequest
- Add tofuConfigUrl, tofuConfigVersion to ApplicationVersion

Application version creation:
- Add OpenTofu form group (OCI config URL + tag)
- Add createApplicationVersionForOpenTofu service method
- Show config URL and version columns in version table
- Update enableTypeSpecificGroups and loadVersionDetails

Deployment wizard + form:
- Add 'opentofu' case to wizard config controls (same as Docker:
  no namespace/scope/resources)
- Add 'opentofu' case to deployment form type switching (disables
  Docker envFile and K8s releaseName/values/helm controls)

Assets:
- Add 199x199 OpenTofu icon (from opentofu/brand-artifacts, MPL-2.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The tofuConfigUrl stored in application versions is a relative OCI
reference (e.g., "org/app-name") without the registry host. The agent
needs to prepend DISTR_REGISTRY_HOST to form a valid repository
reference for oras to pull from.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Replace sync.Once with sync.Mutex for default tofu binary resolution.
  sync.Once permanently caches errors — if the first download fails
  (network issue), all subsequent calls fail forever. The mutex pattern
  retries on failure and only caches on success.

- Increase state lock TTL from 15 minutes to 1 hour. OpenTofu apply
  operations for complex infrastructure can exceed 15 minutes, which
  would allow a second agent to forcibly acquire the lock mid-apply.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Remove broken sanitizeError: heuristic strips real error context,
  Hub dashboard is for operators who need full details
- Simplify OCI repoRef logic: use hasHostname() check instead of
  double-branch string matching that breaks for external registries
- Fix workCtx: use plain context.Background() instead of
  WithCancel+defer which defeats the purpose of graceful shutdown
- Deduplicate tofuApply error paths: single defer handles
  StateFailed+SaveDeployment instead of 6 copy-paste blocks
- writeVarsFile: use terraform.tfvars.json (pure JSON) instead of
  hand-rolled HCL that produces invalid output for complex types
- Remove useless Dockerfile HEALTHCHECK (test -x checks binary exists,
  not agent health; Docker/K8s agents don't have one)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@waveywaves
Copy link
Author

@pmig can you run help me with the workflows

Migration 86_image_cleanup_enabled was added on main.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copilot AI review requested due to automatic review settings March 21, 2026 09:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 38 out of 40 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +73 to +91
<<<<<<< Updated upstream
> `optional` **resources?**: [`ApplicationVersionResource`](ApplicationVersionResource.md)[]
||||||| Stash base
> `optional` **resources**: [`ApplicationVersionResource`](ApplicationVersionResource.md)[]
=======
> `optional` **resources**: [`ApplicationVersionResource`](ApplicationVersionResource.md)[]

---

### tofuConfigUrl?

> `optional` **tofuConfigUrl**: `string`

---

### tofuConfigVersion?

> `optional` **tofuConfigVersion**: `string`
>>>>>>> Stashed changes
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generated doc file contains unresolved Git merge conflict markers (<<<<<<<, |||||||, =======, >>>>>>>). These need to be removed and the intended content reconciled before merging, otherwise the published SDK docs will be broken.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — SDK docs have been regenerated clean. The conflict markers were from a stash pop that was resolved but the docs weren't regenerated.

Comment on lines +44 to +48
func initHubConfig() {
hubBase = strings.TrimSuffix(os.Getenv("DISTR_LOGIN_ENDPOINT"), "/api/v1/agent/login")
targetID = os.Getenv("DISTR_TARGET_ID")
targetSecret = os.Getenv("DISTR_TARGET_SECRET")
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hubBase is derived via strings.TrimSuffix(DISTR_LOGIN_ENDPOINT, "/api/v1/agent/login"), which silently produces an incorrect base URL if the env var has a trailing slash, a different path, or is already a base URL. That would make the state backend URLs invalid at runtime. Consider parsing the URL and stripping the path robustly (or provide an explicit DISTR_HUB_BASE_URL env var) and error out if it can’t be derived.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the exact same pattern as the existing Docker and Kubernetes agents — all three derive their Hub base URL by trimming the login endpoint path. The env vars are set by the Hub's connect command template, so the format is guaranteed. Changing to a separate DISTR_HUB_BASE_URL env var would be a cross-agent change outside this PR's scope.

Comment on lines +85 to +88
if slashIdx < 0 {
return strings.Contains(ref, ".")
}
return strings.Contains(ref[:slashIdx], ".")
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasHostname only checks for a dot in the first path segment to decide whether to prefix DISTR_REGISTRY_HOST. This misclassifies common registries like localhost:5000/repo (no dot, but has a port) and would incorrectly rewrite the reference. Consider treating : (port) and localhost as a hostname too, or use a proper OCI/Docker reference parser to detect an explicit registry host.

Suggested change
if slashIdx < 0 {
return strings.Contains(ref, ".")
}
return strings.Contains(ref[:slashIdx], ".")
hostPart := ref
if slashIdx >= 0 {
hostPart = ref[:slashIdx]
}
hostLower := strings.ToLower(hostPart)
if hostLower == "localhost" {
return true
}
return strings.Contains(hostPart, ".") || strings.Contains(hostPart, ":")

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — hasHostname now also checks for : (port) in the first path segment, so localhost:5000/repo is correctly detected as having a hostname.

- hasHostname: also detect port-based hostnames (localhost:5000)
  by checking for ':' in addition to '.'
- Regenerate SDK docs to clean up stale merge artifact

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenTofu Agent support

3 participants