Getting Started with RefPack
Welcome to RefPack! This comprehensive guide will walk you through understanding, creating, and using RefPack datasets—a standardized, secure format for distributing structured data.
Introduction
RefPack is a standardized ecosystem for distributing, validating, and consuming reference datasets. It solves common problems in data distribution such as version management, data integrity verification, and secure sharing across teams and organizations.
Whether you're sharing country codes between microservices, distributing ML training datasets, or publishing government open data, RefPack provides the tools and standards to make data distribution seamless and trustworthy.
What is a RefPack?
A RefPack is a standardized .zip
archive (typically named dataset-name-1.0.0.refpack.zip
) containing a well-defined set of files that ensure data integrity, authenticity, and usability.
File Structure
/
├── data.meta.json ← Package manifest (required)
├── data.meta.json.jws ← Cryptographic signature (required)
├── data.json ← Your dataset as JSON array (required)
├── data.schema.json ← JSON Schema for validation (optional)
├── data.changelog.json ← Version history (optional)
├── data.readme.md ← Documentation (optional)
└── assets/ ← Supplementary files (optional)
├── diagram.png
└── raw-data.csv
- Manifest: Package metadata including ID, version, author, and description
- Signature: Cryptographic proof of authenticity using JSON Web Signatures (JWS)
- Data: Your actual dataset as a JSON array of objects
Key Benefits
Security & Trust
Cryptographic signatures ensure data integrity and authenticity. Know exactly who published your data and verify it hasn't been tampered with.
Version Control
Semantic versioning (SemVer) for datasets. Track changes, manage compatibility, and roll back when needed.
Discovery
Find datasets through registries, search by tags, and browse curated collections of reference data.
Standardization
Consistent format across all datasets makes integration predictable and tooling reusable.
Offline-First
Self-contained packages with embedded validation keys work without external dependencies.
Developer Tools
Rich CLI toolchain for creating, validating, and managing RefPack datasets in your workflows.
Installation
The RefPack CLI is available through multiple distribution channels. Choose the method that works best for your environment:
npm (Node.js)
# Install globally
npm install -g @refpack/cli
# Verify installation
refpack --version
.NET Global Tool
# Install as .NET global tool
dotnet tool install -g RefPack.Cli
# Verify installation
refpack --version
Quick Start
Ready to dive in? Here's a 5-minute walkthrough to create and publish your first RefPack:
Create a new RefPack
refpack scaffold --output ./my-dataset --id my-org/countries --title "World Countries" --author "Your Name"
Add your data
Edit ./my-dataset/data.json
with your dataset (JSON array of objects)
Pack and sign
refpack pack \
--input ./my-dataset/ \
--output my-org-countries-1.0.0.refpack.zip \
--sign-key ./my-dataset/sign-key.pem \
--key-id $(cat ./my-dataset/key-id.txt)
Validate and publish
# Validate locally
refpack validate --package my-org-countries-1.0.0.refpack.zip
# Publish to registry
refpack push \
--package my-org-countries-1.0.0.refpack.zip \
--api-url https://stor.refwire.online/api/packages \
--api-key YOUR_API_KEY
Creating Your First RefPack (Detailed)
Let's walk through creating a RefPack step by step with detailed explanations.
Step 1: Scaffolding
The scaffold command creates a complete RefPack template with all necessary files:
refpack scaffold \
--output ./country-data \
--id acme-corp/iso-countries \
--title "ISO Country Codes" \
--author "Acme Data Team" \
--description "Complete ISO 3166-1 country codes with additional metadata"
This creates the following structure:
country-data/
├── data.meta.json ← Pre-filled manifest
├── data.json ← Template data (replace with yours)
├── data.schema.json ← Template schema (optional)
├── data.changelog.json ← Version history template
├── data.readme.md ← Documentation template
├── sign-key.pem ← Private signing key (keep secure!)
├── key-id.txt ← Key identifier
└── assets/ ← For supplementary files
Step 2: Adding Your Data
Replace the template content with your actual dataset:
Edit data.json
Your data must be a JSON array of objects:
[
{
"code": "US",
"name": "United States",
"alpha3": "USA",
"numeric": "840",
"region": "North America"
},
{
"code": "CA",
"name": "Canada",
"alpha3": "CAN",
"numeric": "124",
"region": "North America"
},
{
"code": "GB",
"name": "United Kingdom",
"alpha3": "GBR",
"numeric": "826",
"region": "Europe"
}
]
Update data.meta.json
Ensure the manifest reflects your dataset:
{
"id": "acme-corp/iso-countries",
"version": "1.0.0",
"title": "ISO Country Codes",
"description": "Complete ISO 3166-1 country codes with regional metadata",
"authors": ["Acme Data Team"],
"createdUtc": "2025-05-29T10:30:00Z",
"tags": ["countries", "iso", "reference", "geography"],
"license": "MIT"
}
Define Schema (Optional)
Create data.schema.json
to validate your data structure:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "ISO Countries Schema",
"type": "array",
"items": {
"type": "object",
"properties": {
"code": { "type": "string", "pattern": "^[A-Z]{2}$" },
"name": { "type": "string", "minLength": 1 },
"alpha3": { "type": "string", "pattern": "^[A-Z]{3}$" },
"numeric": { "type": "string", "pattern": "^[0-9]{3}$" },
"region": { "type": "string", "minLength": 1 }
},
"required": ["code", "name", "alpha3", "numeric", "region"],
"additionalProperties": false
}
}
Step 3: Local Validation
Before packing, validate your data structure:
# Validate JSON structure and schema
refpack validate-data --input ./country-data/
# Check for common issues
refpack lint --input ./country-data/
Step 4: Packing & Signing
Create the final RefPack with cryptographic signature:
refpack pack \
--input ./country-data/ \
--output acme-corp-iso-countries-1.0.0.refpack.zip \
--sign-key ./country-data/sign-key.pem \
--key-id $(cat ./country-data/key-id.txt) \
--verbose
sign-key.pem
is your private key. Keep it secure and never commit it to version control. The public key is automatically embedded in the package signature.
Finding Packages
Discover RefPack datasets through multiple channels:
Web Interface
Browse and search packages on this registry:
- Browse All Packages - View all available datasets
- Use the search bar to find specific datasets by name or tags
- Filter by categories, publishers, or version ranges
CLI Search
# Search for packages
refpack search "country codes" --api-url https://stor.refwire.online/api/packages
# List packages by publisher
refpack list --publisher acme-corp --api-url https://stor.refwire.online/api/packages
# Show package details
refpack info --id acme-corp/iso-countries --api-url https://stor.refwire.online/api/packages
API Access
Programmatically discover packages using the REST API:
# Get package list
curl "https://stor.refwire.online/api/packages"
# Search packages
curl "https://stor.refwire.online/api/packages/search?q=countries"
# Get package metadata
curl "https://stor.refwire.online/api/packages/acme-corp/iso-countries/meta?version=1.0.0"
Downloading & Using Packages
Once you've found a package, there are several ways to download and use it:
Direct Download
Download packages directly from the web interface or CLI:
# Download specific version
refpack pull \
--id acme-corp/iso-countries \
--version 1.0.0 \
--dest ./downloads/ \
--api-url https://stor.refwire.online/api/packages
# Download and extract
refpack pull \
--id acme-corp/iso-countries \
--version 1.0.0 \
--extract ./extracted-data/ \
--api-url https://stor.refwire.online/api/packages
Programmatic Usage
Integrate RefPack directly into your applications:
Python Example
import refpack
import pandas as pd
# Load RefPack into pandas DataFrame
with refpack.open('countries-1.0.0.refpack.zip') as package:
df = pd.DataFrame(package.data)
print(f"Loaded {package.meta.title} v{package.meta.version}")
print(f"Contains {len(df)} records")
JavaScript Example
const refpack = require('@refpack/client');
// Load and validate package
const package = await refpack.load('countries-1.0.0.refpack.zip');
console.log(`Package: ${package.meta.title}`);
console.log(`Records: ${package.data.length}`);
Validation
Always validate packages after downloading:
# Validate package integrity and signature
refpack validate --package countries-1.0.0.refpack.zip --verbose
# Validate against specific schema
refpack validate --package countries-1.0.0.refpack.zip --schema ./my-schema.json
Publishing Packages
Share your RefPack datasets with the world:
Prerequisites
- Account Setup: Create an account on this registry
- API Key: Generate an API key from your Account Settings
- Package Validation: Ensure your package passes validation
Publishing Process
Final Validation
refpack validate --package my-dataset-1.0.0.refpack.zip --strict
Publish to Registry
refpack push \
--package my-dataset-1.0.0.refpack.zip \
--api-url https://stor.refwire.online/api/packages \
--api-key YOUR_API_KEY_HERE
Verify Publication
# Check package is available
refpack info --id my-org/my-dataset --api-url https://stor.refwire.online/api/packages
Publishing Best Practices
- Semantic Versioning: Follow SemVer strictly for version numbers
- Comprehensive Documentation: Include detailed README and changelog
- Schema Definition: Always provide JSON schema for data validation
- Testing: Validate packages in test environments before publishing
- Security: Keep private keys secure and rotate them regularly
Best Practices
Data Structure
- Consistent Fields: Use consistent field names and types across all records
- Minimal Schema: Include only necessary fields to keep datasets focused
- Null Values: Handle null/missing values consistently
- Data Types: Use appropriate JSON data types (string, number, boolean, array)
Versioning Strategy
- Major Version: Breaking changes to data structure or meaning
- Minor Version: New fields or non-breaking enhancements
- Patch Version: Bug fixes, data corrections, or updates
- Pre-release: Alpha/beta versions for testing
Security
- Key Management: Store private keys securely, never in version control
- Key Rotation: Regularly rotate signing keys and update key IDs
- Validation: Always validate packages before use
- Trust Model: Establish trust through out-of-band key verification
Documentation
- Clear Descriptions: Write comprehensive package descriptions
- Usage Examples: Include examples in README files
- Change Logs: Document all changes between versions
- Data Dictionary: Explain all fields and their meanings
Next Steps
Now that you understand RefPack basics, explore these resources to deepen your knowledge:
RefPack Specification
Dive deeper into the RefPack Specification
CLI Guide
Explore all commands in the CLI Guide
Registry API
Leaarn about the Registry API for programmatic access
Packages
Browse existing packages for useful datasets.