Getting Started with RefPack

Welcome to RefPack! This comprehensive guide will walk you through understanding, creating, and using RefPack datasets—a standardized, secure format for distributing structured data.

New to RefPack? Think of it as a package manager for datasets, similar to npm for JavaScript or NuGet for .NET, but specifically designed for structured reference data like country codes, currency tables, and product catalogs.

Introduction

RefPack is a standardized ecosystem for distributing, validating, and consuming reference datasets. It solves common problems in data distribution such as version management, data integrity verification, and secure sharing across teams and organizations.

Whether you're sharing country codes between microservices, distributing ML training datasets, or publishing government open data, RefPack provides the tools and standards to make data distribution seamless and trustworthy.

What is a RefPack?

A RefPack is a standardized .zip archive (typically named dataset-name-1.0.0.refpack.zip) containing a well-defined set of files that ensure data integrity, authenticity, and usability.

File Structure

/
├── data.meta.json            ← Package manifest (required)
├── data.meta.json.jws        ← Cryptographic signature (required)
├── data.json                 ← Your dataset as JSON array (required)
├── data.schema.json          ← JSON Schema for validation (optional)
├── data.changelog.json       ← Version history (optional)
├── data.readme.md           ← Documentation (optional)
└── assets/                  ← Supplementary files (optional)
    ├── diagram.png
    └── raw-data.csv
Core Components:
  • Manifest: Package metadata including ID, version, author, and description
  • Signature: Cryptographic proof of authenticity using JSON Web Signatures (JWS)
  • Data: Your actual dataset as a JSON array of objects

Key Benefits

Security & Trust

Cryptographic signatures ensure data integrity and authenticity. Know exactly who published your data and verify it hasn't been tampered with.

Version Control

Semantic versioning (SemVer) for datasets. Track changes, manage compatibility, and roll back when needed.

Discovery

Find datasets through registries, search by tags, and browse curated collections of reference data.

Standardization

Consistent format across all datasets makes integration predictable and tooling reusable.

Offline-First

Self-contained packages with embedded validation keys work without external dependencies.

Developer Tools

Rich CLI toolchain for creating, validating, and managing RefPack datasets in your workflows.

Installation

The RefPack CLI is available through multiple distribution channels. Choose the method that works best for your environment:

npm (Node.js)

# Install globally
npm install -g @refpack/cli

# Verify installation
refpack --version

.NET Global Tool

# Install as .NET global tool
dotnet tool install -g RefPack.Cli

# Verify installation
refpack --version
Note: The exact installation method may vary based on the current distribution strategy. Check the CLI Guide for the most up-to-date installation instructions.

Quick Start

Ready to dive in? Here's a 5-minute walkthrough to create and publish your first RefPack:

1

Create a new RefPack

refpack scaffold --output ./my-dataset --id my-org/countries --title "World Countries" --author "Your Name"
2

Add your data

Edit ./my-dataset/data.json with your dataset (JSON array of objects)

3

Pack and sign

refpack pack \
  --input ./my-dataset/ \
  --output my-org-countries-1.0.0.refpack.zip \
  --sign-key ./my-dataset/sign-key.pem \
  --key-id $(cat ./my-dataset/key-id.txt)
4

Validate and publish

# Validate locally
refpack validate --package my-org-countries-1.0.0.refpack.zip

# Publish to registry
refpack push \
  --package my-org-countries-1.0.0.refpack.zip \
  --api-url https://stor.refwire.online/api/packages \
  --api-key YOUR_API_KEY
Congratulations! You've created and published your first RefPack. It's now available for others to discover and use.

Creating Your First RefPack (Detailed)

Let's walk through creating a RefPack step by step with detailed explanations.

Step 1: Scaffolding

The scaffold command creates a complete RefPack template with all necessary files:

refpack scaffold \
  --output ./country-data \
  --id acme-corp/iso-countries \
  --title "ISO Country Codes" \
  --author "Acme Data Team" \
  --description "Complete ISO 3166-1 country codes with additional metadata"

This creates the following structure:

country-data/
├── data.meta.json          ← Pre-filled manifest
├── data.json              ← Template data (replace with yours)
├── data.schema.json       ← Template schema (optional)
├── data.changelog.json    ← Version history template
├── data.readme.md         ← Documentation template
├── sign-key.pem          ← Private signing key (keep secure!)
├── key-id.txt            ← Key identifier
└── assets/               ← For supplementary files

Step 2: Adding Your Data

Replace the template content with your actual dataset:

Edit data.json

Your data must be a JSON array of objects:

[
  {
    "code": "US",
    "name": "United States",
    "alpha3": "USA",
    "numeric": "840",
    "region": "North America"
  },
  {
    "code": "CA",
    "name": "Canada",
    "alpha3": "CAN",
    "numeric": "124",
    "region": "North America"
  },
  {
    "code": "GB",
    "name": "United Kingdom",
    "alpha3": "GBR",
    "numeric": "826",
    "region": "Europe"
  }
]

Update data.meta.json

Ensure the manifest reflects your dataset:

{
  "id": "acme-corp/iso-countries",
  "version": "1.0.0",
  "title": "ISO Country Codes",
  "description": "Complete ISO 3166-1 country codes with regional metadata",
  "authors": ["Acme Data Team"],
  "createdUtc": "2025-05-29T10:30:00Z",
  "tags": ["countries", "iso", "reference", "geography"],
  "license": "MIT"
}

Define Schema (Optional)

Create data.schema.json to validate your data structure:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "ISO Countries Schema",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "code": { "type": "string", "pattern": "^[A-Z]{2}$" },
      "name": { "type": "string", "minLength": 1 },
      "alpha3": { "type": "string", "pattern": "^[A-Z]{3}$" },
      "numeric": { "type": "string", "pattern": "^[0-9]{3}$" },
      "region": { "type": "string", "minLength": 1 }
    },
    "required": ["code", "name", "alpha3", "numeric", "region"],
    "additionalProperties": false
  }
}

Step 3: Local Validation

Before packing, validate your data structure:

# Validate JSON structure and schema
refpack validate-data --input ./country-data/

# Check for common issues
refpack lint --input ./country-data/

Step 4: Packing & Signing

Create the final RefPack with cryptographic signature:

refpack pack \
  --input ./country-data/ \
  --output acme-corp-iso-countries-1.0.0.refpack.zip \
  --sign-key ./country-data/sign-key.pem \
  --key-id $(cat ./country-data/key-id.txt) \
  --verbose
Security Note: The sign-key.pem is your private key. Keep it secure and never commit it to version control. The public key is automatically embedded in the package signature.

Finding Packages

Discover RefPack datasets through multiple channels:

Web Interface

Browse and search packages on this registry:

  • Browse All Packages - View all available datasets
  • Use the search bar to find specific datasets by name or tags
  • Filter by categories, publishers, or version ranges

CLI Search

# Search for packages
refpack search "country codes" --api-url https://stor.refwire.online/api/packages

# List packages by publisher
refpack list --publisher acme-corp --api-url https://stor.refwire.online/api/packages

# Show package details
refpack info --id acme-corp/iso-countries --api-url https://stor.refwire.online/api/packages

API Access

Programmatically discover packages using the REST API:

# Get package list
curl "https://stor.refwire.online/api/packages"

# Search packages
curl "https://stor.refwire.online/api/packages/search?q=countries"

# Get package metadata
curl "https://stor.refwire.online/api/packages/acme-corp/iso-countries/meta?version=1.0.0"

Downloading & Using Packages

Once you've found a package, there are several ways to download and use it:

Direct Download

Download packages directly from the web interface or CLI:

# Download specific version
refpack pull \
  --id acme-corp/iso-countries \
  --version 1.0.0 \
  --dest ./downloads/ \
  --api-url https://stor.refwire.online/api/packages

# Download and extract
refpack pull \
  --id acme-corp/iso-countries \
  --version 1.0.0 \
  --extract ./extracted-data/ \
  --api-url https://stor.refwire.online/api/packages

Programmatic Usage

Integrate RefPack directly into your applications:

Python Example

import refpack
import pandas as pd

# Load RefPack into pandas DataFrame
with refpack.open('countries-1.0.0.refpack.zip') as package:
    df = pd.DataFrame(package.data)
    print(f"Loaded {package.meta.title} v{package.meta.version}")
    print(f"Contains {len(df)} records")

JavaScript Example

const refpack = require('@refpack/client');

// Load and validate package
const package = await refpack.load('countries-1.0.0.refpack.zip');
console.log(`Package: ${package.meta.title}`);
console.log(`Records: ${package.data.length}`);

Validation

Always validate packages after downloading:

# Validate package integrity and signature
refpack validate --package countries-1.0.0.refpack.zip --verbose

# Validate against specific schema
refpack validate --package countries-1.0.0.refpack.zip --schema ./my-schema.json

Publishing Packages

Share your RefPack datasets with the world:

Prerequisites

  1. Account Setup: Create an account on this registry
  2. API Key: Generate an API key from your Account Settings
  3. Package Validation: Ensure your package passes validation

Publishing Process

1

Final Validation

refpack validate --package my-dataset-1.0.0.refpack.zip --strict
2

Publish to Registry

refpack push \
  --package my-dataset-1.0.0.refpack.zip \
  --api-url https://stor.refwire.online/api/packages \
  --api-key YOUR_API_KEY_HERE
3

Verify Publication

# Check package is available
refpack info --id my-org/my-dataset --api-url https://stor.refwire.online/api/packages

Publishing Best Practices

  • Semantic Versioning: Follow SemVer strictly for version numbers
  • Comprehensive Documentation: Include detailed README and changelog
  • Schema Definition: Always provide JSON schema for data validation
  • Testing: Validate packages in test environments before publishing
  • Security: Keep private keys secure and rotate them regularly

Best Practices

Data Structure

  • Consistent Fields: Use consistent field names and types across all records
  • Minimal Schema: Include only necessary fields to keep datasets focused
  • Null Values: Handle null/missing values consistently
  • Data Types: Use appropriate JSON data types (string, number, boolean, array)

Versioning Strategy

  • Major Version: Breaking changes to data structure or meaning
  • Minor Version: New fields or non-breaking enhancements
  • Patch Version: Bug fixes, data corrections, or updates
  • Pre-release: Alpha/beta versions for testing

Security

  • Key Management: Store private keys securely, never in version control
  • Key Rotation: Regularly rotate signing keys and update key IDs
  • Validation: Always validate packages before use
  • Trust Model: Establish trust through out-of-band key verification

Documentation

  • Clear Descriptions: Write comprehensive package descriptions
  • Usage Examples: Include examples in README files
  • Change Logs: Document all changes between versions
  • Data Dictionary: Explain all fields and their meanings

Next Steps

Now that you understand RefPack basics, explore these resources to deepen your knowledge:

RefPack Specification

Dive deeper into the RefPack Specification

CLI Guide

Explore all commands in the CLI Guide

Registry API

Leaarn about the Registry API for programmatic access

Packages

Browse existing packages for useful datasets.