Technology

Using External S3 for Testing Stopped Making Sense, So I Built SeaweedFS Storage Myself

AAnonymous
13 min read

Introduction

When you keep testing uploads, presigned URLs, public buckets, CDN caching, and thumbnails locally, wiring an external S3 service into every test-only loop starts to feel more awkward than expected. It was not just about cost. What kept bothering me was having to route every experiment through external accounts and resources.

So I changed direction this time. I decided to bring the storage I use repeatedly in tests on-prem, while keeping the interface the application sees as close to S3 그대로 as possible. I felt this was a much better shape: use an on-prem S3-compatible store in local and dev, then swap the same interface to AWS S3 or Cloudflare R2 in production when needed.

As a result, I built a single-node S3-compatible storage stack with SeaweedFS + Nginx CDN + TLS + TTL. This project was also a case where AI automated almost the entire flow, from research drafts and infrastructure config drafts to manifest writing and deployment procedure write-ups. In this post, all sensitive information such as real keys, Secrets, internal paths, and internal IPs is masked.

Why build it myself at all?

AWS S3 itself was not the problem. It just kept feeling less natural for test workloads.

  • I needed to repeatedly verify file upload flows in local and development environments.
  • I needed to split public and private buckets and check CDN behavior, presigned URLs, and thumbnail layers together.
  • I had many workflows that created and deleted temporary files on short cycles.
  • Running all of those experiments against external storage every time felt too heavy for what was supposed to be simple testing.

What I wanted was not a massive storage platform, but 테스트와 검증에 충분한 S3 호환성 and 간단한 온프레미스 운영성. More precisely, I needed the storage contract visible to developers to stay fixed as an S3 호환 인터페이스, so only the backend implementation could be swapped by environment.

Why SeaweedFS?

I looked at the most familiar option first, but as of late 2025 it was hard to keep MinIO as the default choice. Because of the license change and the shift in community direction, I felt it had become an awkward fit for something I wanted to carry lightly over the long term.

On the other hand, Ceph RGW was a far more complete option, but it was too heavy for this goal. I needed single-node test storage and simple media handling, not a large-scale storage cluster.

So the final choice was SeaweedFS.

  • Its Apache 2.0 license keeps commercial constraints light.
  • It provides an S3 Gateway, so I did not need to heavily change the existing SDK flow.
  • For the main use cases such as uploads, downloads, bucket-level separation, and presigned URLs, it keeps the S3 interface almost intact.
  • It is easy to deploy directly to Kubernetes with Helm.
  • It is a good fit for small media files such as images, audio, and short videos.

Rather than finding a perfect S3 replacement, this was the most concise choice for my scope: S3와 거의 같은 계약으로 움직이는 저장소.

How I actually put it together

I kept the setup straightforward. I built an S3-compatible layer around SeaweedFS, put an Nginx cache layer in front of it, and added TLS, a thumbnail layer for public buckets, and a TTL policy for temporary files on top.

At a high level, the runtime pieces looked like this.

  • S3 endpoint: the primary storage endpoint the application talks to directly
  • CDN endpoint: cached delivery for public resources
  • thumbnail endpoint: an image transformation layer only for public buckets
  • bucket split: static, public are public, while images, videos, bgm, files are private
  • TTL: the _tmp/ prefix expires automatically after two weeks

That let me validate the main test flows in one place: uploads, public access, private presigned URLs, CDN caching, thumbnail generation, and temporary file cleanup, all on top of the same storage.

On the Spring Boot side, I also did not need to change the existing S3 integration very much. In local and test profiles, I only switched the endpoint to the on-prem S3-compatible address and aligned path-style access plus the public/private bucket rules. In other words, even after building new storage, the application code could keep its overall shape.

That part was especially good. In local and dev, I can use an on-prem store such as SeaweedFS, and in production I can switch to AWS S3 or R2 while keeping almost the same application-facing interface. Changing the storage product no longer means blowing up the application's storage abstraction.

Put differently, the core value of this build was not merely standing up one storage server. It was fixing the code-facing interface to S3 compatibility even though the actual backend implementation can vary by environment. That kind of interface is good for testing, and it also makes later production storage changes much more flexible.

How far AI went this time

The interesting part of this work was not SeaweedFS itself, but how far AI could push the process.

In practice, AI fully automated the tasks below, and the human side only handled sensitive inputs and the final review.

  • researching S3-compatible storage candidates and drafting the comparison
  • narrowing the options after comparing MinIO, SeaweedFS, Ceph, and RustFS
  • drafting Helm values, Nginx cache manifests, and TLS configuration
  • organizing bucket structure, TTL policies, test procedures, and an operations checklist
  • writing deployment guides and troubleshooting documents

What this reinforced for me is that infrastructure work can also be automated surprisingly deeply once the requirements and security boundaries are defined first. Especially in a repeated flow like 초안 작성 -> 설정 파일 생성 -> 배포 절차 문서화 -> 실제 반영, the practical efficiency gain was obvious.

Closing

This was not some grand story about adopting a storage platform. It was closer to a record of pulling the S3 I kept using for tests a little more into my own hands.

Even so, it was meaningful enough. I can now validate real operational flows such as public and private resources, CDN delivery, presigned URLs, thumbnails, and TTL more often without external dependencies. More importantly, I now have a structure where local and dev can stay on-prem while production can choose S3 or R2 behind the same S3-compatible interface.

And one more thing became clearer. AI can automate not only code writing, but also infrastructure drafts, configuration, and deployment documents at a fairly high level. There is still work left for humans, but at least now the boundary of what can be delegated is much clearer.


Appendix

Below is the full text of documents/workspace/infrastructure/S3_COMPATIBLE_STORAGE_RESEARCH.md.

Research on S3-Compatible Object Storage Solutions

Date: 2025-12-30 Purpose: Selecting a self-hosted infrastructure solution to replace AWS S3

Background

MinIO status (December 2025)

MinIO is no longer recommended. The main reasons are:

  1. License issue: changed from Apache 2.0 to AGPL-3.0

    • If offered as a network service, source code disclosure is required
    • For commercial use, annual licensing starts at $96,000
  2. Moved into maintenance mode (December 2025)

    • No more new features, improvements, or PR acceptance
    • Security patches are applied only after case-by-case evaluation
    • Community edition binary distribution has ended (source code only)
  3. Management UI removed

    • Admin console functionality was removed from the community edition
    • Full features are available only in the paid version

Reference: InfoQ - MinIO in Maintenance Mode


Requirements

ItemRequirement
S3 compatibilityRequired - must work with the AWS SDK as-is
LicensePrefer a license without commercial restrictions (Apache 2.0, MIT, etc.)
Stored assetsImages, BGM, short-form video clips
CDN expansionMust support a caching CDN through an Nginx reverse proxy
Deployment environmentKubernetes + Helm charts
CostReduce AWS costs with self-hosted infrastructure

Solution comparison

ItemDetails
LicenseApache 2.0 (free for commercial use)
LanguageGo
ArchitectureMaster + Volume + Filer structure
CharacteristicsO(1) disk seek, based on the Facebook Haystack architecture
S3 compatibilityProvides an S3 Gateway (fully supports core S3 operations)
HelmOfficial Helm charts + Kubernetes Operator
MaturityProduction proven (deployments over 1.5PB)
EnterpriseFree up to 25TB, then $1/TB/month

Pros

  • Can handle tens of billions of files, which is ideal for media storage
  • Fast access to small files with O(1) seek
  • Better storage efficiency through Erasure Coding
  • Cloud tiering for automatic cold-data offloading
  • Supports FUSE mounts, WebDAV, and Hadoop integration

Cons

  • Some advanced S3 features are not supported (for example lifecycle policies)
  • Metadata backup is mandatory (if Filer metadata is lost, files become orphaned)

Helm installation

Bash
helm repo add seaweedfs https://seaweedfs.github.io/seaweedfs/helm
helm upgrade --install seaweedfs seaweedfs/seaweedfs -n storage --create-namespace

Reference: SeaweedFS GitHub


2. RustFS

ItemDetails
LicenseApache 2.0
LanguageRust
Performance2.3x faster than MinIO for 4KB objects
S3 compatibilityFull S3 API support
HelmOfficial Helm charts
MaturityBeta as of December 2025 (0.0.77)

Pros

  • Supports MinIO migration and coexistence
  • Strong performance for small objects
  • Apache 2.0 license

Cons

  • Still in beta, so production use needs caution
  • Limited documentation and community support

Helm installation

Bash
helm repo add rustfs https://charts.rustfs.com
helm install rustfs rustfs/rustfs -n rustfs --create-namespace \
  --set ingress.className="nginx"

Reference: RustFS GitHub


3. Ceph RGW (via Rook)

ItemDetails
LicenseLGPL 2.1
LanguageC++ (data path), Go (Rook operator)
ArchitectureUnified RADOS-based storage (Block + File + Object)
S3 compatibilityBest in class (passes 576 s3-tests)
HelmRook Operator available
MaturityEnterprise-grade (proven for years)

Pros

  • Top-tier S3 compatibility with the broadest API coverage
  • Unified block, file, and object storage
  • Multi-tenancy and namespace isolation
  • Advanced Erasure Coding configuration
  • Exabyte-scale expansion

Cons

  • Complex to install and operate
  • High resource requirements (at least 3 nodes and a fast network)
  • Steep learning curve

Helm installation (Rook)

Bash
helm repo add rook-release https://charts.rook.io/release
helm install rook-ceph rook-release/rook-ceph -n rook-ceph --create-namespace
# CephCluster CRD 적용 필요

Reference: Rook Documentation


4. Garage

ItemDetails
LicenseAGPL-3.0 (same issue as MinIO)
LanguageRust
CharacteristicsLightweight and specialized for geographically distributed deployment
S3 compatibilitySupports core S3 operations (advanced features are limited)
HelmCommunity chart
MaturitySuitable for small-scale self-hosting

Pros

  • Lightweight and resource-efficient
  • Built-in multi-zone and multi-site replication
  • Rust-based memory safety

Cons

  • AGPL-3.0 license with commercial-use constraints
  • No Erasure Coding support (3x replication only)
  • Limited advanced S3 features

Reference: Garage


S3 compatibility comparison

Solutions3-tests passedEvaluation
Ceph RGW576Best
Zenko CloudServer382Strong
MinIO321Good
SeaweedFS56Basic

SeaweedFS fully supports core S3 operations (PUT, GET, DELETE, LIST, etc.), but advanced features such as Object Lock and Lifecycle are limited


Final recommendation

Why recommend it:

  1. Apache 2.0 license - fully open for commercial use
  2. Optimized for media storage - excellent for images, audio, and video
  3. Production proven - backed by years of large-scale deployment cases
  4. Kubernetes-friendly - official Helm charts and Operator
  5. Reasonable pricing - free up to 25TB, then $1/TB per month

Runner-up options

SituationRecommendation
Need near-perfect S3 compatibilityCeph RGW (if you can absorb the complexity)
Prefer newer tech and can tolerate experimentationRustFS (if you can absorb the beta risk)
Small scale and AGPL is not a concernGarage

Nginx CDN setup guide

Example CDN setup with SeaweedFS + Nginx:

Nginx cache configuration

Nginx
# /etc/nginx/nginx.conf

http {
    # 캐시 저장소 설정
    proxy_cache_path /var/cache/nginx/s3
        levels=1:2
        keys_zone=s3_cache:100m
        max_size=50g
        inactive=7d
        use_temp_path=off;

    upstream seaweedfs_s3 {
        server seaweedfs-s3.storage.svc.cluster.local:8333;
        keepalive 64;
    }

    server {
        listen 80;
        server_name cdn.example.com;

        # 이미지 캐싱 (7일)
        location ~* \.(jpg|jpeg|png|gif|webp|ico|svg)$ {
            proxy_pass http://seaweedfs_s3;
            proxy_cache s3_cache;
            proxy_cache_valid 200 7d;
            proxy_cache_valid 404 1m;
            proxy_cache_use_stale error timeout updating;
            proxy_cache_lock on;

            add_header X-Cache-Status $upstream_cache_status;
            add_header Cache-Control "public, max-age=604800";

            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }

        # 오디오/비디오 캐싱 (30일)
        location ~* \.(mp3|wav|ogg|mp4|webm|m4a)$ {
            proxy_pass http://seaweedfs_s3;
            proxy_cache s3_cache;
            proxy_cache_valid 200 30d;
            proxy_cache_valid 404 1m;
            proxy_cache_use_stale error timeout updating;
            proxy_cache_lock on;

            add_header X-Cache-Status $upstream_cache_status;
            add_header Cache-Control "public, max-age=2592000";

            # Range 요청 지원 (비디오 스트리밍)
            proxy_set_header Range $http_range;
            proxy_set_header If-Range $http_if_range;

            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }

        # 기타 파일
        location / {
            proxy_pass http://seaweedfs_s3;
            proxy_cache s3_cache;
            proxy_cache_valid 200 1d;

            add_header X-Cache-Status $upstream_cache_status;

            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Cache status headers

  • HIT: served from cache
  • MISS: fetched from origin
  • STALE: an expired cache entry was served because the origin failed
  • UPDATING: background refresh in progress

Reference: NGINX Caching Guide


Kubernetes deployment architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Kubernetes Cluster                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                    Ingress (Nginx)                        │   │
│  │                   cdn.example.com                         │   │
│  └──────────────────────┬───────────────────────────────────┘   │
│                         │                                        │
│  ┌──────────────────────▼───────────────────────────────────┐   │
│  │              Nginx Cache Layer (CDN)                      │   │
│  │          /var/cache/nginx (PVC: 50Gi+)                   │   │
│  └──────────────────────┬───────────────────────────────────┘   │
│                         │                                        │
│  ┌──────────────────────▼───────────────────────────────────┐   │
│  │                   SeaweedFS                               │   │
│  │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐      │   │
│  │  │ Master  │  │ Volume  │  │ Volume  │  │ Volume  │      │   │
│  │  │  (x3)   │  │   #1    │  │   #2    │  │   #3    │      │   │
│  │  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘      │   │
│  │       │            │            │            │            │   │
│  │  ┌────▼────────────▼────────────▼────────────▼────┐      │   │
│  │  │              Filer (S3 Gateway)                 │      │   │
│  │  │           seaweedfs-s3:8333                     │      │   │
│  │  └─────────────────────────────────────────────────┘      │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                  Persistent Volumes                       │   │
│  │     Volume #1      Volume #2      Volume #3               │   │
│  │     (100Gi+)       (100Gi+)       (100Gi+)                │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Next steps

  1. Build a test environment with the SeaweedFS Helm chart
  2. Run S3 SDK integration tests (confirm compatibility with existing code)
  3. Set up the Nginx cache layer
  4. Run performance benchmarks (images, audio, video)
  5. Create a production deployment plan

References