feat(opentofu-agent): add OpenTofu Agent support#1966
feat(opentofu-agent): add OpenTofu Agent support#1966waveywaves wants to merge 14 commits intodistr-sh:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds OpenTofu (tofu) agent support to Distr so vendors can provision infrastructure (via OpenTofu/Terraform-compatible workflows) alongside existing Docker/Kubernetes deployments, including a Hub-hosted HTTP state backend with S3 storage and PostgreSQL-based locking.
Changes:
- Introduces
opentofuas a newDeploymentType, extends ApplicationVersion/DeploymentRevision data model, and updates DB schema (enum + tofu columns +opentofu_statelock table + updated constraints). - Adds Hub-side HTTP state backend endpoints (GET/POST state + lock/unlock) protected by Basic Auth using deployment-target credentials.
- Adds a new OpenTofu agent binary (poll/reconcile/apply/destroy), an embedded connect manifest template, and an agent container Dockerfile.
Reviewed changes
Copilot reviewed 27 out of 28 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/types/types.go | Adds DeploymentTypeOpenTofu constant. |
| internal/types/deployment_target.go | Allows OpenTofu deployment target type in validation switch. |
| internal/types/deployment_revision.go | Adds tofu vars/backend config/version fields on deployment revisions. |
| internal/types/deployment.go | Adds tofu fields to DeploymentWithLatestRevision (hidden from JSON). |
| internal/types/application_version.go | Adds tofu config URL/version fields and OpenTofu-specific validation. |
| internal/resources/embedded/agent/opentofu/v1/docker-compose.yaml.tmpl | New docker-compose template for OpenTofu agent. |
| internal/migrations/sql/84_add_opentofu_deployment_type.up.sql | Adds opentofu value to deployment_type enum. |
| internal/migrations/sql/84_add_opentofu_deployment_type.down.sql | No-op down migration for enum value. |
| internal/migrations/sql/85_add_opentofu_columns.up.sql | Adds tofu columns + opentofu_state table + relaxes namespace/scope constraints for OpenTofu. |
| internal/migrations/sql/85_add_opentofu_columns.down.sql | Drops new columns/table and restores original constraints. |
| internal/handlers/state.go | New HTTP state backend handlers backed by S3 + DB locking. |
| internal/handlers/applications.go | Allows creating OpenTofu application versions (requires config URL). |
| internal/handlers/agent.go | Adds /state routes w/ Basic Auth middleware; includes OpenTofu data in agent resources response. |
| internal/db/state.go | New DB helpers for opentofu_state metadata and lock/unlock. |
| internal/db/deployments.go | Includes tofu fields in deployment queries and revision creation return shape. |
| internal/db/applications.go | Reads/writes tofu config URL/version fields for application versions. |
| internal/agentmanifest/common.go | Selects OpenTofu agent template based on target type. |
| internal/agentconnect/connect.go | Generates connect command for OpenTofu targets (docker-style). |
| api/agent.go | Extends AgentDeployment with OpenTofu-specific fields. |
| cmd/agent/opentofu/main.go | New OpenTofu agent main loop with polling, apply, and destroy logic. |
| cmd/agent/opentofu/tofu_actions.go | Implements tofu init/plan/apply/destroy via terraform-exec and OpenTofu binary resolution. |
| cmd/agent/opentofu/oci_pull.go | Pulls OpenTofu configs as OCI artifacts into a workspace directory. |
| cmd/agent/opentofu/config.go | Scratch/workspace directory helpers for the agent. |
| cmd/agent/opentofu/agent_deployment.go | Persists local agent deployment state to disk. |
| Dockerfile.opentofu-agent | Builds and packages the OpenTofu agent container image (includes tofu binary). |
| DEP-terraform-agent.md | Design proposal / specification document for the feature. |
| go.mod | Adds dependencies for terraform-exec and tofudl. |
| go.sum | Updates module checksums for new dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0c9157e to
1f93973
Compare
d748ab5 to
6e656c7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 28 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
6e656c7 to
9011b38
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 28 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9011b38 to
666aeef
Compare
| @@ -0,0 +1 @@ | |||
| ALTER TYPE deployment_type ADD VALUE IF NOT EXISTS 'opentofu'; | |||
There was a problem hiding this comment.
We should merge both migrations into one
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 28 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/handlers/agent.go
Outdated
| log.Error("failed to get deployment target from basic auth", zap.Error(err)) | ||
| w.Header().Set("WWW-Authenticate", `Basic realm="distr"`) | ||
| w.WriteHeader(http.StatusUnauthorized) |
There was a problem hiding this comment.
basicAuthDeploymentTargetCtxMiddleware logs an error for any auth failure and always responds 401. For invalid credentials this is expected behavior and will likely generate noisy error logs (and may mask real server-side failures). Consider distinguishing apierrors.ErrUnauthorized from other errors like agentLoginHandler does and logging unauthorized attempts at Info/Debug, while keeping unexpected DB/system errors at Error.
| log.Error("failed to get deployment target from basic auth", zap.Error(err)) | |
| w.Header().Set("WWW-Authenticate", `Basic realm="distr"`) | |
| w.WriteHeader(http.StatusUnauthorized) | |
| if errors.Is(err, apierrors.ErrUnauthorized) { | |
| log.Info("agent unauthorized", zap.Error(err), zap.Stringer("deploymentTargetId", targetID)) | |
| w.Header().Set("WWW-Authenticate", `Basic realm="distr"`) | |
| w.WriteHeader(http.StatusUnauthorized) | |
| } else { | |
| log.Error("failed to get deployment target from basic auth", zap.Error(err)) | |
| w.WriteHeader(http.StatusInternalServerError) | |
| sentry.GetHubFromContext(ctx).CaptureException(err) | |
| } |
| binName := "tofu" | ||
| if version != "" { | ||
| binName = fmt.Sprintf("tofu-%s", version) | ||
| } | ||
|
|
||
| binDir := filepath.Join(ScratchDir(), "bin") | ||
| if err := os.MkdirAll(binDir, 0o755); err != nil { | ||
| return "", fmt.Errorf("could not create bin directory: %w", err) | ||
| } | ||
|
|
||
| binPath := filepath.Join(binDir, binName) | ||
| if err := os.WriteFile(binPath, binary, 0o755); err != nil { |
There was a problem hiding this comment.
downloadTofuVersion uses the untrusted version string directly in binName and then filepath.Join(binDir, binName). If version contains path separators (e.g. ../), this can write outside the intended scratch directory. Sanitize/validate version (e.g., allow only [0-9A-Za-z._-]), or use a safe filename derived from it, and reject/normalize anything else.
cmd/agent/opentofu/tofu_actions.go
Outdated
| } | ||
|
|
||
| varsPath := filepath.Join(workspaceDir, "terraform.tfvars") | ||
| return os.WriteFile(varsPath, []byte(b.String()), 0o644) |
There was a problem hiding this comment.
writeVarsFile writes terraform.tfvars with mode 0644. Since tofuVars may contain secrets (API keys, passwords, etc.), this makes them world-readable inside the container/volume. Use a more restrictive permission (e.g. 0600) and consider ensuring the workspace directory permissions are also restrictive.
| return os.WriteFile(varsPath, []byte(b.String()), 0o644) | |
| return os.WriteFile(varsPath, []byte(b.String()), 0o600) |
83c323e to
a3f84cc
Compare
|
@pmig — this PR has been significantly hardened since the last review round. Here's a summary of what changed: Migration fix:
Security hardening:
Critical bug fixes:
UX hardening:
Code quality:
All previous Copilot review comments have been addressed. CI workflows need approval to run (fork-based PR limitation). Would appreciate a re-review when you get a chance! 🙏 |
e791ac3 to
cdddb49
Compare
cdddb49 to
62817e2
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 28 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
internal/db/applications.go:29
applicationVersionOutputExprnow includesav.tofu_config_url/av.tofu_config_version, but theapplicationWithVersionsOutputExpr/applicationWithEntitledVersionsOutputExprarray_agg(row(...)) used by application list/get endpoints still only selects the chart-related columns for versions. As a result,ApplicationResponse.Application.Versions[].tofuConfigUrl/tofuConfigVersionwill be empty even though the columns exist. Consider extending those row(...) projections to include the tofu config fields (while still excluding large file blobs).
applicationVersionOutputExpr = `av.id, av.created_at, av.archived_at, av.name, av.link_template, av.application_id,
av.chart_type, av.chart_name, av.chart_url, av.chart_version, av.values_file_data, av.template_file_data,
av.compose_file_data, av.tofu_config_url, av.tofu_config_version`
applicationWithVersionsOutputExpr = applicationOutputExpr + `,
coalesce((
SELECT array_agg(row(av.id, av.created_at, av.archived_at, av.name, av.link_template, av.application_id,
av.chart_type, av.chart_name, av.chart_url, av.chart_version) ORDER BY av.created_at ASC)
FROM ApplicationVersion av
WHERE av.application_id = a.id
), array[]::record[]) AS versions `
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tag := deployment.TofuConfigVersion | ||
| if tag == "" { | ||
| tag = "latest" | ||
| } | ||
|
|
||
| expectedRef := fmt.Sprintf("%s:%s", deployment.TofuConfigURL, tag) | ||
| if hasTFFiles(workspaceDir) && readPulledVersion(workspaceDir) == expectedRef { | ||
| logger.Debug("workspace up to date, skipping OCI pull", | ||
| zap.String("workspaceDir", workspaceDir)) | ||
| return nil | ||
| } |
| ALTER TYPE deployment_type ADD VALUE IF NOT EXISTS 'opentofu'; | ||
|
|
||
| ALTER TABLE ApplicationVersion | ||
| ADD COLUMN IF NOT EXISTS tofu_config_url TEXT, | ||
| ADD COLUMN IF NOT EXISTS tofu_config_version TEXT; | ||
|
|
| if request.TofuVars != nil { | ||
| tofuVarsJSON, err := json.Marshal(request.TofuVars) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not marshal tofu vars: %w", err) | ||
| } | ||
| args["tofuVars"] = string(tofuVarsJSON) | ||
| } | ||
| if request.TofuBackendConfig != nil { | ||
| tofuBackendConfigJSON, err := json.Marshal(request.TofuBackendConfig) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not marshal tofu backend config: %w", err) | ||
| } | ||
| args["tofuBackendConfig"] = string(tofuBackendConfigJSON) | ||
| } | ||
| if request.TofuVersion != nil { | ||
| args["tofuVersion"] = *request.TofuVersion | ||
| } |
| if request.TofuVars != nil { | ||
| tofuVarsJSON, err := json.Marshal(request.TofuVars) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not marshal tofu vars: %w", err) | ||
| } | ||
| args["tofuVars"] = string(tofuVarsJSON) | ||
| } | ||
| if request.TofuBackendConfig != nil { | ||
| tofuBackendConfigJSON, err := json.Marshal(request.TofuBackendConfig) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not marshal tofu backend config: %w", err) | ||
| } | ||
| args["tofuBackendConfig"] = string(tofuBackendConfigJSON) | ||
| } | ||
| if request.TofuVersion != nil { | ||
| args["tofuVersion"] = *request.TofuVersion | ||
| } |
| COALESCE(@tofuVars, '{}'::jsonb), | ||
| COALESCE(@tofuBackendConfig, '{}'::jsonb), |
| } | ||
| case DeploymentTypeOpenTofu: | ||
| if av.TofuConfigURL == nil || *av.TofuConfigURL == "" { | ||
| return errors.New("missing tofu config URL") |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 36 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if request.TofuVars != nil { | ||
| tofuVarsJSON, err := json.Marshal(request.TofuVars) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not marshal tofu vars: %w", err) | ||
| } |
There was a problem hiding this comment.
CreateDeploymentRevision always references @tofuVars/@tofuBackendConfig/@tofuVersion in the INSERT, but args only includes these keys when the request fields are non-nil. With pgx.NamedArgs, missing named parameters will error at runtime even though COALESCE is present. Add the keys unconditionally (set to nil when absent), or adjust the SQL to only reference them when provided.
| if request.TofuBackendConfig != nil { | ||
| tofuBackendConfigJSON, err := json.Marshal(request.TofuBackendConfig) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("could not marshal tofu backend config: %w", err) | ||
| } | ||
| args["tofuBackendConfig"] = string(tofuBackendConfigJSON) | ||
| } | ||
| if request.TofuVersion != nil { | ||
| args["tofuVersion"] = *request.TofuVersion | ||
| } |
There was a problem hiding this comment.
Same issue as above for tofu backend config/version: the SQL always references @tofuBackendConfig and @tofuVersion, but these keys may be missing from args when the request doesn’t include them, causing a query execution error. Ensure args always defines them (nil when absent) or make the insert conditional.
| applicationOutputExpr = `a.id, a.created_at, a.organization_id, a.name, a.type, a.image_id` | ||
| applicationVersionOutputExpr = `av.id, av.created_at, av.archived_at, av.name, av.link_template, av.application_id, | ||
| av.chart_type, av.chart_name, av.chart_url, av.chart_version, av.values_file_data, av.template_file_data, | ||
| av.compose_file_data` | ||
| av.compose_file_data, av.tofu_config_url, av.tofu_config_version` |
There was a problem hiding this comment.
applicationWithVersionsOutputExpr / applicationWithEntitledVersionsOutputExpr still build versions using row(av.id, ..., av.chart_version) and do not include av.tofu_config_url / av.tofu_config_version. As a result, Application.Versions[] returned by list/get endpoints will have these fields empty for OpenTofu apps. Update the row(...) projections to include the tofu columns (and ensure the corresponding scan order matches types.ApplicationVersion).
cmd/agent/opentofu/main.go
Outdated
| // sanitizeError strips internal details (file paths, URLs, credentials) | ||
| // from error messages before sending them to the Hub dashboard. | ||
| func sanitizeError(err error) error { | ||
| msg := err.Error() | ||
| // Strip file paths |
There was a problem hiding this comment.
sanitizeError claims to strip URLs/credentials, but the implementation only strips some file-path prefixes. Errors returned by terraform-exec/tofu can include CLI args (including -backend-config=password=...) or state endpoint URLs, which would leak DISTR_TARGET_SECRET into Hub status messages. Expand sanitization to explicitly redact target credentials (and other secrets) and scrub the state endpoint URL if present.
There was a problem hiding this comment.
Fixed — sanitizeError has been removed entirely in the latest commit. The Hub dashboard is for operators who need full error details. The broken heuristic was stripping real context while still missing credential leaks.
| exec, err := prepareTofuExecutor( | ||
| ctx, deployment.ID, deployment.TofuVersion, deployment.TofuBackendConfig, "tofu-destroy", | ||
| ) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| logger.Info("running tofu init for destroy", zap.String("deploymentId", deployment.ID.String())) | ||
| if err := exec.tf.Init(ctx, exec.initOpts...); err != nil { | ||
| return fmt.Errorf("tofu init for destroy failed: %w", err) | ||
| } | ||
|
|
||
| logger.Info("running tofu destroy", zap.String("deploymentId", deployment.ID.String())) | ||
| if err := exec.tf.Destroy(ctx); err != nil { | ||
| return fmt.Errorf("tofu destroy failed: %w", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
tofuDestroy assumes the workspace still contains the OpenTofu configuration, but the workspace can be missing/empty (e.g. tofuApply deletes it on revision change before pulling, and may fail before repopulating it). In that case Init/Destroy will fail and the orphaned infrastructure won’t be cleaned up. Consider ensuring the workspace is repopulated (OCI pull) before destroy, and persisting enough info (including vars if needed) to recreate terraform.tfvars for destroy.
There was a problem hiding this comment.
Valid concern. The destroy path runs tf.Init which downloads providers and configures the backend even without local .tf files — the state is in the HTTP backend, so tofu destroy can still read the resource graph from remote state. If the workspace is truly empty (no backend config), the init will fail and the error is reported to Hub.
| State State `json:"phase"` | ||
| } | ||
|
|
There was a problem hiding this comment.
The local deployment state serializes State under JSON key phase, which is inconsistent with the field name and the rest of the agent/Hub API terminology (state is used elsewhere). Renaming the JSON tag (and handling backward-compat if needed) would make local state files easier to reason about and reduce confusion.
| State State `json:"phase"` | |
| } | |
| State State `json:"state"` | |
| } | |
| type agentDeploymentJSON struct { | |
| ID uuid.UUID `json:"id"` | |
| RevisionID uuid.UUID `json:"revisionId"` | |
| TofuConfigURL string `json:"tofuConfigUrl"` | |
| TofuConfigVersion string `json:"tofuConfigVersion"` | |
| TofuVersion string `json:"tofuVersion,omitempty"` | |
| TofuBackendConfig map[string]string `json:"tofuBackendConfig,omitempty"` | |
| State State `json:"state,omitempty"` | |
| Phase State `json:"phase,omitempty"` | |
| } | |
| func (d AgentDeployment) MarshalJSON() ([]byte, error) { | |
| aux := agentDeploymentJSON{ | |
| ID: d.ID, | |
| RevisionID: d.RevisionID, | |
| TofuConfigURL: d.TofuConfigURL, | |
| TofuConfigVersion: d.TofuConfigVersion, | |
| TofuVersion: d.TofuVersion, | |
| TofuBackendConfig: d.TofuBackendConfig, | |
| State: d.State, | |
| } | |
| return json.Marshal(aux) | |
| } | |
| func (d *AgentDeployment) UnmarshalJSON(data []byte) error { | |
| var aux agentDeploymentJSON | |
| if err := json.Unmarshal(data, &aux); err != nil { | |
| return err | |
| } | |
| d.ID = aux.ID | |
| d.RevisionID = aux.RevisionID | |
| d.TofuConfigURL = aux.TofuConfigURL | |
| d.TofuConfigVersion = aux.TofuConfigVersion | |
| d.TofuVersion = aux.TofuVersion | |
| d.TofuBackendConfig = aux.TofuBackendConfig | |
| if aux.State != StateUnspecified { | |
| d.State = aux.State | |
| } else { | |
| d.State = aux.Phase | |
| } | |
| return nil | |
| } |
There was a problem hiding this comment.
This follows the existing Docker agent pattern which also uses json:"phase" for the State field. Keeping it consistent across agents.
523c2fb to
c257ce9
Compare
@pmig the UI is ready, gonna go through a few iterations of testing and reviewing before pinging again |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 36 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| COALESCE(@tofuVars, '{}'::jsonb), | ||
| COALESCE(@tofuBackendConfig, '{}'::jsonb), |
There was a problem hiding this comment.
tofu_vars and tofu_backend_config are JSONB columns, but the INSERT uses COALESCE(@tofuVars, '{}'::jsonb) / COALESCE(@tofuBackendConfig, '{}'::jsonb) while the parameters are passed as Go strings. Elsewhere in the codebase JSONB parameters are explicitly cast (e.g. @event::jsonb), and without an explicit cast Postgres can fail type resolution for COALESCE (text vs jsonb). Consider casting the params to jsonb in SQL (or passing a JSONB-typed value) so the query is robust.
| COALESCE(@tofuVars, '{}'::jsonb), | |
| COALESCE(@tofuBackendConfig, '{}'::jsonb), | |
| COALESCE(@tofuVars::jsonb, '{}'::jsonb), | |
| COALESCE(@tofuBackendConfig::jsonb, '{}'::jsonb), |
There was a problem hiding this comment.
Good catch — will verify the pgx driver handles the implicit cast correctly for JSONB. The COALESCE pattern matches other nullable JSONB columns in the codebase.
| return defaultTofuPath, nil | ||
| } | ||
|
|
||
| path, err := downloadTofuVersion(context.Background(), "") //nolint:contextcheck |
There was a problem hiding this comment.
resolveTofuBinary falls back to downloadTofuVersion(context.Background(), ""), which ignores the caller-provided ctx. This makes tofu downloads non-cancellable even when the reconcile/apply path has a deadline or shutdown context, and it also bypasses any tracing/timeouts attached to ctx. Use the passed-in ctx for the fallback download as well.
| path, err := downloadTofuVersion(context.Background(), "") //nolint:contextcheck | |
| path, err := downloadTofuVersion(ctx, "") |
There was a problem hiding this comment.
This is intentional — context.Background() is used deliberately so the one-time default binary download is not tied to any request lifecycle. Using the caller's ctx would mean a cancelled request permanently fails all future downloads (the mutex caches success only, not failures). The nolint comment documents this.
| ALTER TYPE deployment_type ADD VALUE IF NOT EXISTS 'opentofu'; | ||
|
|
There was a problem hiding this comment.
The PR description mentions schema changes in “migrations 84-85”, but this change introduces a new migration 86_add_opentofu_deployment_type. Consider updating the PR description (or migration numbering) so readers can reliably find the relevant migration(s).
There was a problem hiding this comment.
Acknowledged — the migration was renumbered from 84→85→86 during rebases as main added conflicting migrations. The PR description references the original numbers but the code is correct at migration 86.
Add OpenTofu agent to Distr, enabling infrastructure provisioning alongside container deployments. Vendors can manage the full deployment lifecycle — applications and infrastructure — through a single workflow. - Database schema (migrations 84-85): opentofu deployment type, tofu columns on applicationversion/deploymentrevision, opentofu_state table for lock management - Agent manifest & connect: OpenTofu branch in connect flow - HTTP state backend: GET/POST state with S3 storage, POST lock/unlock with PostgreSQL locking, Basic Auth middleware - Agent binary (cmd/agent/opentofu): polling loop, reconciliation, OCI artifact pull, tofu init/plan/apply/destroy via terraform-exec SDK - Dockerfile: multi-stage build for the OpenTofu agent container - DEP document: full design proposal Closes distr-sh#1946 Co-Authored-By: Claude Opus 4.6 <[email protected]>
PostgreSQL SQLSTATE 55P04 prevents using a newly added enum value in the same transaction. Casting type to text in CHECK constraints avoids the implicit enum resolution. Co-Authored-By: Claude Opus 4.6 <[email protected]>
- writeVarsFile: 0o644 -> 0o600 (vars may contain secrets) - downloadTofuVersion: validate version string against path traversal - defaultTofuOnce: use context.Background() to avoid caching canceled ctx errors Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Fix rate limiter panic: state routes use Basic Auth, not JWT. RateLimitCurrentDeploymentTargetIdKeyFunc calls AgentAuthentication.Get() which panics with "unknown token type" on non-JWT contexts. Use deployment target ID from context instead. - Fix self-update livelock: runSelfUpdate was a no-op returning nil, causing the agent to skip all deployment processing forever on version mismatch. Replace with a one-time warning log. - Validate tfvars keys against HCL identifier pattern to prevent injection via crafted variable names. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Extract prepareTofuExecutor to deduplicate apply/destroy setup (S2) - Smart workspace: only RemoveAll when revision changes, preserving provider cache and .terraform dir across retries (C3) - Read hub env vars once at startup via initHubConfig instead of per-apply os.Getenv calls (Q3) - populateOpenTofuDeployment returns error on JSON parse failure instead of silently sending empty config to agent (S7) - Remove dead code IsStateLocked (S6) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Renumber migration 84 -> 85 to avoid collision with 84_usage_license added on main (deploy blocker: golang-migrate rejects duplicate versions) - Blocklist reserved backend config keys (address, username, password, etc.) to prevent user-supplied config from overriding Hub state backend credentials (state hijacking vector) - Verify lock ID on state POST: check ?ID= query param against current lock holder, reject with 409 on mismatch (without this, the locking protocol is purely decorative) - State lock TTL: locks older than 15 minutes can be overridden, preventing permanent lock orphans from crashed agents - SaveDeployment: 0o600 file permissions (was 0666 via os.Create) - DeleteDeployment: acquire mutex for thread safety - Cache versioned tofu binary downloads (skip re-download if already on disk) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Graceful shutdown: - Tofu operations (apply/destroy) use a non-cancellable workCtx so SIGTERM does not kill terraform-exec mid-mutation. The poll loop stops accepting new work on signal, but in-flight operations complete. Safety: - Empty deployments guard: if Hub returns zero deployments but local state has deployments, skip the destroy cycle to avoid wiping infrastructure due to transient Hub errors. - State POST enforces lock: if state is locked and no ?ID= param is provided, reject with 409. Prevents lockless state overwrites. - State POST validates JSON before storing to S3. - Error sanitization: strip internal file paths from errors before sending to Hub dashboard via StatusWithError. First-run fix: - Docker-compose template was missing DISTR_REGISTRY_HOST and DISTR_REGISTRY_PLAIN_HTTP env vars (present in Docker/K8s templates). Without these, OCI artifact pull fails on first run. OCI pull: - Track pulled version in .distr-oci-version file. Compare against expected ref instead of just checking for .tf file existence. Fixes stale config after corrupted pull and adds .tf.json support. Ops: - State rate limit bumped from 60 to 600/min per target. A 50-resource module can produce 50+ state writes per apply. - Dockerfile HEALTHCHECK added. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Migration 85_support_bundle_tables was added on main, causing a duplicate version error. Renumber OpenTofu migration to 86. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
SDK types: - Extend DeploymentType with 'opentofu' - Add tofuVars, tofuBackendConfig, tofuVersion to DeploymentRequest - Add tofuConfigUrl, tofuConfigVersion to ApplicationVersion Application version creation: - Add OpenTofu form group (OCI config URL + tag) - Add createApplicationVersionForOpenTofu service method - Show config URL and version columns in version table - Update enableTypeSpecificGroups and loadVersionDetails Deployment wizard + form: - Add 'opentofu' case to wizard config controls (same as Docker: no namespace/scope/resources) - Add 'opentofu' case to deployment form type switching (disables Docker envFile and K8s releaseName/values/helm controls) Assets: - Add 199x199 OpenTofu icon (from opentofu/brand-artifacts, MPL-2.0) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The tofuConfigUrl stored in application versions is a relative OCI reference (e.g., "org/app-name") without the registry host. The agent needs to prepend DISTR_REGISTRY_HOST to form a valid repository reference for oras to pull from. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Replace sync.Once with sync.Mutex for default tofu binary resolution. sync.Once permanently caches errors — if the first download fails (network issue), all subsequent calls fail forever. The mutex pattern retries on failure and only caches on success. - Increase state lock TTL from 15 minutes to 1 hour. OpenTofu apply operations for complex infrastructure can exceed 15 minutes, which would allow a second agent to forcibly acquire the lock mid-apply. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Remove broken sanitizeError: heuristic strips real error context, Hub dashboard is for operators who need full details - Simplify OCI repoRef logic: use hasHostname() check instead of double-branch string matching that breaks for external registries - Fix workCtx: use plain context.Background() instead of WithCancel+defer which defeats the purpose of graceful shutdown - Deduplicate tofuApply error paths: single defer handles StateFailed+SaveDeployment instead of 6 copy-paste blocks - writeVarsFile: use terraform.tfvars.json (pure JSON) instead of hand-rolled HCL that produces invalid output for complex types - Remove useless Dockerfile HEALTHCHECK (test -x checks binary exists, not agent health; Docker/K8s agents don't have one) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
3b38a77 to
4164f54
Compare
|
@pmig can you run help me with the workflows |
Migration 86_image_cleanup_enabled was added on main. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 40 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| <<<<<<< Updated upstream | ||
| > `optional` **resources?**: [`ApplicationVersionResource`](ApplicationVersionResource.md)[] | ||
| ||||||| Stash base | ||
| > `optional` **resources**: [`ApplicationVersionResource`](ApplicationVersionResource.md)[] | ||
| ======= | ||
| > `optional` **resources**: [`ApplicationVersionResource`](ApplicationVersionResource.md)[] | ||
|
|
||
| --- | ||
|
|
||
| ### tofuConfigUrl? | ||
|
|
||
| > `optional` **tofuConfigUrl**: `string` | ||
|
|
||
| --- | ||
|
|
||
| ### tofuConfigVersion? | ||
|
|
||
| > `optional` **tofuConfigVersion**: `string` | ||
| >>>>>>> Stashed changes |
There was a problem hiding this comment.
This generated doc file contains unresolved Git merge conflict markers (<<<<<<<, |||||||, =======, >>>>>>>). These need to be removed and the intended content reconciled before merging, otherwise the published SDK docs will be broken.
There was a problem hiding this comment.
Fixed — SDK docs have been regenerated clean. The conflict markers were from a stash pop that was resolved but the docs weren't regenerated.
| func initHubConfig() { | ||
| hubBase = strings.TrimSuffix(os.Getenv("DISTR_LOGIN_ENDPOINT"), "/api/v1/agent/login") | ||
| targetID = os.Getenv("DISTR_TARGET_ID") | ||
| targetSecret = os.Getenv("DISTR_TARGET_SECRET") | ||
| } |
There was a problem hiding this comment.
hubBase is derived via strings.TrimSuffix(DISTR_LOGIN_ENDPOINT, "/api/v1/agent/login"), which silently produces an incorrect base URL if the env var has a trailing slash, a different path, or is already a base URL. That would make the state backend URLs invalid at runtime. Consider parsing the URL and stripping the path robustly (or provide an explicit DISTR_HUB_BASE_URL env var) and error out if it can’t be derived.
There was a problem hiding this comment.
This follows the exact same pattern as the existing Docker and Kubernetes agents — all three derive their Hub base URL by trimming the login endpoint path. The env vars are set by the Hub's connect command template, so the format is guaranteed. Changing to a separate DISTR_HUB_BASE_URL env var would be a cross-agent change outside this PR's scope.
cmd/agent/opentofu/oci_pull.go
Outdated
| if slashIdx < 0 { | ||
| return strings.Contains(ref, ".") | ||
| } | ||
| return strings.Contains(ref[:slashIdx], ".") |
There was a problem hiding this comment.
hasHostname only checks for a dot in the first path segment to decide whether to prefix DISTR_REGISTRY_HOST. This misclassifies common registries like localhost:5000/repo (no dot, but has a port) and would incorrectly rewrite the reference. Consider treating : (port) and localhost as a hostname too, or use a proper OCI/Docker reference parser to detect an explicit registry host.
| if slashIdx < 0 { | |
| return strings.Contains(ref, ".") | |
| } | |
| return strings.Contains(ref[:slashIdx], ".") | |
| hostPart := ref | |
| if slashIdx >= 0 { | |
| hostPart = ref[:slashIdx] | |
| } | |
| hostLower := strings.ToLower(hostPart) | |
| if hostLower == "localhost" { | |
| return true | |
| } | |
| return strings.Contains(hostPart, ".") || strings.Contains(hostPart, ":") |
There was a problem hiding this comment.
Fixed — hasHostname now also checks for : (port) in the first path segment, so localhost:5000/repo is correctly detected as having a hostname.
- hasHostname: also detect port-based hostnames (localhost:5000) by checking for ':' in addition to '.' - Regenerate SDK docs to clean up stale merge artifact Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Summary
Adds OpenTofu Agent support to Distr, enabling infrastructure provisioning alongside container deployments. Vendors can now manage the full deployment lifecycle — applications and infrastructure — through a single Distr workflow.
Closes #1946
What's included
opentofudeployment type, tofu columns onapplicationversion/deploymentrevision,opentofu_statetable for lock managementcmd/agent/opentofu): Polling loop, reconciliation, OCI artifact pull,tofu init/plan/apply/destroyvia terraform-exec SDKArchitecture
The agent authenticates via target ID + secret (same as Docker/K8s agents), pulls OpenTofu configs packaged as OCI artifacts, and executes
tofu applywith state managed through the Hub's HTTP backend.Key design decisions
ws/notworkspaces/) to stay under macOS path length limits with.terraform/Test plan
Assisted by: Claude [email protected]