Using External S3 for Testing Stopped Making Sense, So I Built SeaweedFS Storage Myself
Introduction
When you keep testing uploads, presigned URLs, public buckets, CDN caching, and thumbnails locally, wiring an external S3 service into every test-only loop starts to feel more awkward than expected. It was not just about cost. What kept bothering me was having to route every experiment through external accounts and resources.
So I changed direction this time. I decided to bring the storage I use repeatedly in tests on-prem, while keeping the interface the application sees as close to S3 그대로 as possible. I felt this was a much better shape: use an on-prem S3-compatible store in local and dev, then swap the same interface to AWS S3 or Cloudflare R2 in production when needed.
As a result, I built a single-node S3-compatible storage stack with SeaweedFS + Nginx CDN + TLS + TTL. This project was also a case where AI automated almost the entire flow, from research drafts and infrastructure config drafts to manifest writing and deployment procedure write-ups. In this post, all sensitive information such as real keys, Secrets, internal paths, and internal IPs is masked.
Why build it myself at all?
AWS S3 itself was not the problem. It just kept feeling less natural for test workloads.
- I needed to repeatedly verify file upload flows in local and development environments.
- I needed to split public and private buckets and check CDN behavior, presigned URLs, and thumbnail layers together.
- I had many workflows that created and deleted temporary files on short cycles.
- Running all of those experiments against external storage every time felt too heavy for what was supposed to be simple testing.
What I wanted was not a massive storage platform, but 테스트와 검증에 충분한 S3 호환성 and 간단한 온프레미스 운영성. More precisely, I needed the storage contract visible to developers to stay fixed as an S3 호환 인터페이스, so only the backend implementation could be swapped by environment.
Why SeaweedFS?
I looked at the most familiar option first, but as of late 2025 it was hard to keep MinIO as the default choice. Because of the license change and the shift in community direction, I felt it had become an awkward fit for something I wanted to carry lightly over the long term.
On the other hand, Ceph RGW was a far more complete option, but it was too heavy for this goal. I needed single-node test storage and simple media handling, not a large-scale storage cluster.
So the final choice was SeaweedFS.
- Its Apache 2.0 license keeps commercial constraints light.
- It provides an S3 Gateway, so I did not need to heavily change the existing SDK flow.
- For the main use cases such as uploads, downloads, bucket-level separation, and presigned URLs, it keeps the S3 interface almost intact.
- It is easy to deploy directly to Kubernetes with Helm.
- It is a good fit for small media files such as images, audio, and short videos.
Rather than finding a perfect S3 replacement, this was the most concise choice for my scope: S3와 거의 같은 계약으로 움직이는 저장소.
How I actually put it together
I kept the setup straightforward. I built an S3-compatible layer around SeaweedFS, put an Nginx cache layer in front of it, and added TLS, a thumbnail layer for public buckets, and a TTL policy for temporary files on top.
At a high level, the runtime pieces looked like this.
- S3 endpoint: the primary storage endpoint the application talks to directly
- CDN endpoint: cached delivery for public resources
- thumbnail endpoint: an image transformation layer only for public buckets
- bucket split:
static,publicare public, whileimages,videos,bgm,filesare private - TTL: the
_tmp/prefix expires automatically after two weeks
That let me validate the main test flows in one place: uploads, public access, private presigned URLs, CDN caching, thumbnail generation, and temporary file cleanup, all on top of the same storage.
On the Spring Boot side, I also did not need to change the existing S3 integration very much. In local and test profiles, I only switched the endpoint to the on-prem S3-compatible address and aligned path-style access plus the public/private bucket rules. In other words, even after building new storage, the application code could keep its overall shape.
That part was especially good. In local and dev, I can use an on-prem store such as SeaweedFS, and in production I can switch to AWS S3 or R2 while keeping almost the same application-facing interface. Changing the storage product no longer means blowing up the application's storage abstraction.
Put differently, the core value of this build was not merely standing up one storage server. It was fixing the code-facing interface to S3 compatibility even though the actual backend implementation can vary by environment. That kind of interface is good for testing, and it also makes later production storage changes much more flexible.
How far AI went this time
The interesting part of this work was not SeaweedFS itself, but how far AI could push the process.
In practice, AI fully automated the tasks below, and the human side only handled sensitive inputs and the final review.
- researching S3-compatible storage candidates and drafting the comparison
- narrowing the options after comparing
MinIO,SeaweedFS,Ceph, andRustFS - drafting Helm values, Nginx cache manifests, and TLS configuration
- organizing bucket structure, TTL policies, test procedures, and an operations checklist
- writing deployment guides and troubleshooting documents
What this reinforced for me is that infrastructure work can also be automated surprisingly deeply once the requirements and security boundaries are defined first. Especially in a repeated flow like 초안 작성 -> 설정 파일 생성 -> 배포 절차 문서화 -> 실제 반영, the practical efficiency gain was obvious.
Closing
This was not some grand story about adopting a storage platform. It was closer to a record of pulling the S3 I kept using for tests a little more into my own hands.
Even so, it was meaningful enough. I can now validate real operational flows such as public and private resources, CDN delivery, presigned URLs, thumbnails, and TTL more often without external dependencies. More importantly, I now have a structure where local and dev can stay on-prem while production can choose S3 or R2 behind the same S3-compatible interface.
And one more thing became clearer. AI can automate not only code writing, but also infrastructure drafts, configuration, and deployment documents at a fairly high level. There is still work left for humans, but at least now the boundary of what can be delegated is much clearer.
Appendix
Below is the full text of documents/workspace/infrastructure/S3_COMPATIBLE_STORAGE_RESEARCH.md.
Research on S3-Compatible Object Storage Solutions
Date: 2025-12-30 Purpose: Selecting a self-hosted infrastructure solution to replace AWS S3
Background
MinIO status (December 2025)
MinIO is no longer recommended. The main reasons are:
-
License issue: changed from Apache 2.0 to AGPL-3.0
- If offered as a network service, source code disclosure is required
- For commercial use, annual licensing starts at $96,000
-
Moved into maintenance mode (December 2025)
- No more new features, improvements, or PR acceptance
- Security patches are applied only after case-by-case evaluation
- Community edition binary distribution has ended (source code only)
-
Management UI removed
- Admin console functionality was removed from the community edition
- Full features are available only in the paid version
Reference: InfoQ - MinIO in Maintenance Mode
Requirements
| Item | Requirement |
|---|---|
| S3 compatibility | Required - must work with the AWS SDK as-is |
| License | Prefer a license without commercial restrictions (Apache 2.0, MIT, etc.) |
| Stored assets | Images, BGM, short-form video clips |
| CDN expansion | Must support a caching CDN through an Nginx reverse proxy |
| Deployment environment | Kubernetes + Helm charts |
| Cost | Reduce AWS costs with self-hosted infrastructure |
Solution comparison
1. SeaweedFS (Recommended)
| Item | Details |
|---|---|
| License | Apache 2.0 (free for commercial use) |
| Language | Go |
| Architecture | Master + Volume + Filer structure |
| Characteristics | O(1) disk seek, based on the Facebook Haystack architecture |
| S3 compatibility | Provides an S3 Gateway (fully supports core S3 operations) |
| Helm | Official Helm charts + Kubernetes Operator |
| Maturity | Production proven (deployments over 1.5PB) |
| Enterprise | Free up to 25TB, then $1/TB/month |
Pros
- Can handle tens of billions of files, which is ideal for media storage
- Fast access to small files with O(1) seek
- Better storage efficiency through Erasure Coding
- Cloud tiering for automatic cold-data offloading
- Supports FUSE mounts, WebDAV, and Hadoop integration
Cons
- Some advanced S3 features are not supported (for example lifecycle policies)
- Metadata backup is mandatory (if Filer metadata is lost, files become orphaned)
Helm installation
helm repo add seaweedfs https://seaweedfs.github.io/seaweedfs/helm
helm upgrade --install seaweedfs seaweedfs/seaweedfs -n storage --create-namespace
Reference: SeaweedFS GitHub
2. RustFS
| Item | Details |
|---|---|
| License | Apache 2.0 |
| Language | Rust |
| Performance | 2.3x faster than MinIO for 4KB objects |
| S3 compatibility | Full S3 API support |
| Helm | Official Helm charts |
| Maturity | Beta as of December 2025 (0.0.77) |
Pros
- Supports MinIO migration and coexistence
- Strong performance for small objects
- Apache 2.0 license
Cons
- Still in beta, so production use needs caution
- Limited documentation and community support
Helm installation
helm repo add rustfs https://charts.rustfs.com
helm install rustfs rustfs/rustfs -n rustfs --create-namespace \
--set ingress.className="nginx"
Reference: RustFS GitHub
3. Ceph RGW (via Rook)
| Item | Details |
|---|---|
| License | LGPL 2.1 |
| Language | C++ (data path), Go (Rook operator) |
| Architecture | Unified RADOS-based storage (Block + File + Object) |
| S3 compatibility | Best in class (passes 576 s3-tests) |
| Helm | Rook Operator available |
| Maturity | Enterprise-grade (proven for years) |
Pros
- Top-tier S3 compatibility with the broadest API coverage
- Unified block, file, and object storage
- Multi-tenancy and namespace isolation
- Advanced Erasure Coding configuration
- Exabyte-scale expansion
Cons
- Complex to install and operate
- High resource requirements (at least 3 nodes and a fast network)
- Steep learning curve
Helm installation (Rook)
helm repo add rook-release https://charts.rook.io/release
helm install rook-ceph rook-release/rook-ceph -n rook-ceph --create-namespace
# CephCluster CRD 적용 필요
Reference: Rook Documentation
4. Garage
| Item | Details |
|---|---|
| License | AGPL-3.0 (same issue as MinIO) |
| Language | Rust |
| Characteristics | Lightweight and specialized for geographically distributed deployment |
| S3 compatibility | Supports core S3 operations (advanced features are limited) |
| Helm | Community chart |
| Maturity | Suitable for small-scale self-hosting |
Pros
- Lightweight and resource-efficient
- Built-in multi-zone and multi-site replication
- Rust-based memory safety
Cons
- AGPL-3.0 license with commercial-use constraints
- No Erasure Coding support (3x replication only)
- Limited advanced S3 features
Reference: Garage
S3 compatibility comparison
| Solution | s3-tests passed | Evaluation |
|---|---|---|
| Ceph RGW | 576 | Best |
| Zenko CloudServer | 382 | Strong |
| MinIO | 321 | Good |
| SeaweedFS | 56 | Basic |
SeaweedFS fully supports core S3 operations (
PUT,GET,DELETE,LIST, etc.), but advanced features such asObject LockandLifecycleare limited
Final recommendation
SeaweedFS (strongly recommended)
Why recommend it:
- Apache 2.0 license - fully open for commercial use
- Optimized for media storage - excellent for images, audio, and video
- Production proven - backed by years of large-scale deployment cases
- Kubernetes-friendly - official Helm charts and Operator
- Reasonable pricing - free up to 25TB, then $1/TB per month
Runner-up options
| Situation | Recommendation |
|---|---|
| Need near-perfect S3 compatibility | Ceph RGW (if you can absorb the complexity) |
| Prefer newer tech and can tolerate experimentation | RustFS (if you can absorb the beta risk) |
| Small scale and AGPL is not a concern | Garage |
Nginx CDN setup guide
Example CDN setup with SeaweedFS + Nginx:
Nginx cache configuration
# /etc/nginx/nginx.conf
http {
# 캐시 저장소 설정
proxy_cache_path /var/cache/nginx/s3
levels=1:2
keys_zone=s3_cache:100m
max_size=50g
inactive=7d
use_temp_path=off;
upstream seaweedfs_s3 {
server seaweedfs-s3.storage.svc.cluster.local:8333;
keepalive 64;
}
server {
listen 80;
server_name cdn.example.com;
# 이미지 캐싱 (7일)
location ~* \.(jpg|jpeg|png|gif|webp|ico|svg)$ {
proxy_pass http://seaweedfs_s3;
proxy_cache s3_cache;
proxy_cache_valid 200 7d;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on;
add_header X-Cache-Status $upstream_cache_status;
add_header Cache-Control "public, max-age=604800";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# 오디오/비디오 캐싱 (30일)
location ~* \.(mp3|wav|ogg|mp4|webm|m4a)$ {
proxy_pass http://seaweedfs_s3;
proxy_cache s3_cache;
proxy_cache_valid 200 30d;
proxy_cache_valid 404 1m;
proxy_cache_use_stale error timeout updating;
proxy_cache_lock on;
add_header X-Cache-Status $upstream_cache_status;
add_header Cache-Control "public, max-age=2592000";
# Range 요청 지원 (비디오 스트리밍)
proxy_set_header Range $http_range;
proxy_set_header If-Range $http_if_range;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# 기타 파일
location / {
proxy_pass http://seaweedfs_s3;
proxy_cache s3_cache;
proxy_cache_valid 200 1d;
add_header X-Cache-Status $upstream_cache_status;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
}
Cache status headers
HIT: served from cacheMISS: fetched from originSTALE: an expired cache entry was served because the origin failedUPDATING: background refresh in progress
Reference: NGINX Caching Guide
Kubernetes deployment architecture
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Ingress (Nginx) │ │
│ │ cdn.example.com │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼───────────────────────────────────┐ │
│ │ Nginx Cache Layer (CDN) │ │
│ │ /var/cache/nginx (PVC: 50Gi+) │ │
│ └──────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼───────────────────────────────────┐ │
│ │ SeaweedFS │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Master │ │ Volume │ │ Volume │ │ Volume │ │ │
│ │ │ (x3) │ │ #1 │ │ #2 │ │ #3 │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │
│ │ ┌────▼────────────▼────────────▼────────────▼────┐ │ │
│ │ │ Filer (S3 Gateway) │ │ │
│ │ │ seaweedfs-s3:8333 │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Persistent Volumes │ │
│ │ Volume #1 Volume #2 Volume #3 │ │
│ │ (100Gi+) (100Gi+) (100Gi+) │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Next steps
- Build a test environment with the SeaweedFS Helm chart
- Run S3 SDK integration tests (confirm compatibility with existing code)
- Set up the Nginx cache layer
- Run performance benchmarks (images, audio, video)
- Create a production deployment plan