
Sarah Johnson
Principal Engineer
Sarah architects Kavod's shared payments infrastructure, handling millions of daily transactions with 99.99% uptime.
The Scale of Kavod Payments
Every ride on Buslyft, every stream on BantuStream, every property token purchase on GrandEstate, every royalty payout on Sonora Beats — they all flow through a single payment backbone: Karat Dollar.
As of February 2026, we process an average of 2 million transactions per day with a combined daily volume of $4.7 million. These transactions span:
- 12 currencies (USD, NGN, KES, GHS, ZAR, TZS, UGX, and more)
- 28 payment methods (cards, mobile money, bank transfer, USDC)
- 10 Kavod platforms (each with different transaction patterns and requirements)
This post explains the architecture that makes it all work — reliably, at scale, with 99.99% uptime over the past 12 months.
Event-Driven Architecture
Why Events?
Traditional payment systems use synchronous request-response patterns: the caller sends a payment request, waits for the payment processor to respond, and then continues. This works fine at low volume, but it creates several problems at scale:
- Tight coupling: Every upstream service needs to know the exact API of the payment service and handle its error codes
- Cascading failures: If the payment service is slow, every calling service gets slow
- Lost transactions: If the caller crashes between sending the request and receiving the response, the transaction status is unknown
Karat Dollar is built on an event-driven architecture where every state change is published as an immutable event to Apache Kafka. Services communicate by producing and consuming events, not by calling each other directly.
The Payment Event Stream
Every transaction progresses through a well-defined state machine, with each transition producing an event:
INITIATED → VALIDATED → ROUTED → SUBMITTED → PROCESSING → COMPLETED
│
(or) → FAILED → RETRY → SUBMITTED → ...Each event contains the full context needed to process it:
interface PaymentEvent {
eventId: string; // Globally unique, used for idempotency
transactionId: string; // Groups all events for a single transaction
eventType: PaymentEventType; // INITIATED, VALIDATED, ROUTED, etc.
timestamp: string; // ISO 8601
payload: {
amount: number;
currency: string;
sourceMethod: PaymentMethod;
destinationMethod: PaymentMethod;
platformId: string; // Which Kavod platform initiated this
userId: string;
metadata: Record<string, string>;
};
previousEventId: string; // Linked list of events for this transaction
}Service Topology
The payment pipeline consists of six core services, each consuming and producing events:
- Gateway Service — Accepts payment requests from platforms, validates input, assigns a transaction ID, and produces an
INITIATEDevent
- Validation Service — Consumes
INITIATEDevents. Checks fraud rules, verifies account balance, validates payment method. ProducesVALIDATEDorREJECTEDevent
- Routing Service — Consumes
VALIDATEDevents. Determines the optimal payment processor based on method, currency, cost, and availability. ProducesROUTEDevent
- Processor Adapters — One adapter per external payment processor (Paystack, Flutterwave, M-Pesa API, Stripe, Circle). Each consumes
ROUTEDevents for its processor, submits to the external API, and producesSUBMITTED/PROCESSING/COMPLETED/FAILEDevents
- Reconciliation Service — Continuously compares our internal ledger against processor settlement reports to detect discrepancies
- Notification Service — Consumes terminal events (
COMPLETED,FAILED) and notifies the originating platform and user
Platform ──> Gateway ──> Kafka ──> Validation ──> Kafka ──> Routing ──> Kafka ──> Processor
│
▼
User <── Notification <── Kafka <── Reconciliation <────────────────────────── Kafka (terminal)Payment Routing
The routing service is where much of the business intelligence lives. It selects the optimal processor for each transaction based on a scoring function:
function scoreProcessor(tx: Transaction, processor: Processor): number {
if (!processor.supports(tx.currency, tx.sourceMethod)) return -Infinity;
if (!processor.isHealthy()) return -Infinity;
return (
WEIGHT_COST * (1 - normalize(processor.feeForTx(tx))) +
WEIGHT_SUCCESS_RATE * processor.recentSuccessRate(tx.currency) +
WEIGHT_SPEED * (1 - normalize(processor.avgSettlementTime(tx.currency))) +
WEIGHT_AVAILABILITY * processor.uptimeScore()
);
}When a processor is experiencing issues (elevated error rates or latency), the routing service automatically shifts traffic to alternatives. This active-active routing is a key factor in our 99.99% uptime — no single processor failure can bring down payments.
Fallback and Retry Logic
When a transaction fails at the processor level, the system distinguishes between:
- Hard failures (invalid card number, insufficient funds, account closed): Not retried. The user is notified immediately
- Soft failures (timeout, rate limit, temporary processor error): Retried up to 3 times with exponential backoff. If all retries fail, the routing service re-routes to an alternative processor
const RETRY_DELAYS = [2000, 8000, 30000]; // ms - exponential backoff
async function handleFailure(event: PaymentEvent, error: ProcessorError) {
if (error.isHardFailure) {
produce({ ...event, eventType: "FAILED", reason: error.code });
return;
}
const retryCount = event.metadata.retryCount || 0;
if (retryCount < RETRY_DELAYS.length) {
await delay(RETRY_DELAYS[retryCount]);
produce({
...event,
eventType: "RETRY",
metadata: { ...event.metadata, retryCount: retryCount + 1 },
});
} else {
// Reroute to alternative processor
produce({ ...event, eventType: "REROUTE" });
}
}Reconciliation
Financial systems must be exactly correct. Our reconciliation service runs continuously, comparing three sources of truth:
- Internal ledger — Our event-sourced record of every transaction
- Processor reports — Settlement files from each payment processor (received daily or hourly depending on the processor)
- Bank statements — Actual funds movement in our settlement accounts
The reconciler flags three types of discrepancies:
- Missing transactions — Present in processor report but not in our ledger (or vice versa)
- Amount mismatches — Transaction exists in both systems but amounts differ
- Status mismatches — We show COMPLETED but the processor shows FAILED (or vice versa)
Each discrepancy generates an alert. Critical discrepancies (amount mismatches above $100, status mismatches) are escalated to the finance team immediately. In practice, our discrepancy rate is less than 0.002% of transactions, and most are resolved automatically within 24 hours.
Achieving 99.99% Uptime
99.99% uptime means less than 52 minutes of downtime per year. Here's how we achieve it:
Multi-Region Deployment
Karat Dollar runs in two active regions (AWS eu-west-1 and af-south-1) with automatic failover. Kafka is replicated across regions with MirrorMaker 2. If an entire region goes down, traffic is redirected within 30 seconds.
Circuit Breakers
Every external call (to payment processors, to databases) is wrapped in a circuit breaker. When the error rate for a dependency exceeds 50% over a 10-second window, the circuit opens and all subsequent calls fail fast. This prevents a slow dependency from consuming all our threads and bringing down the entire system.
Chaos Engineering
We run weekly chaos experiments using Litmus Chaos:
- Pod kill: Randomly terminate payment service pods during peak hours
- Network partition: Simulate network splits between regions
- Processor outage: Simulate a payment processor going completely offline
- Kafka broker failure: Kill individual Kafka brokers
Every chaos experiment must result in zero user-visible errors and zero lost transactions. When an experiment fails, we fix the root cause before the next week's run.
Monitoring and Alerting
We monitor approximately 2,400 metrics across the payment stack. Key SLOs:
| SLO | Target | Current | |---|---|---| | End-to-end success rate | ≥ 99.5% | 99.72% | | P99 latency (card payments) | ≤ 5s | 3.1s | | P99 latency (mobile money) | ≤ 15s | 11.2s | | Reconciliation discrepancy rate | ≤ 0.01% | 0.002% | | System availability | ≥ 99.99% | 99.993% |
Explore Karat Dollar at karatdollar.com.
Try Karat Dollar today
Discover how Karat Dollar can help you build better, faster. Get started for free and see the difference.
Get Started


