Payment System Optimization: A 30% Performance Case Study
Payment systems are the heartbeat of any e-commerce platform. At Agoda, processing millions of transactions daily, every millisecond of latency directly impacts revenue and customer experience. Here’s how we achieved a 30% performance improvement in our payment processing pipeline.
The Challenge
Initial State
- Average processing time: 850ms per transaction
- Peak load: 15,000 transactions/minute
- Database connections: Constantly maxed out
- Customer complaints: Payment timeouts during peak hours
Business Impact
- Revenue loss: $2M annually due to payment timeouts
- Customer experience: 15% abandonment rate during checkout
- Operational cost: High infrastructure scaling costs
Performance Analysis
Identifying Bottlenecks
We used a combination of tools to identify performance bottlenecks:
// Application Performance Monitoring
@Timed(name = "payment.processing.time", description = "Payment processing duration")
@Counted(name = "payment.processing.count", description = "Payment processing count")
public PaymentResponse processPayment(PaymentRequest request) {
// ... processing logic
}
Database Query Analysis
-- Slow query identified
SELECT p.*, pp.provider_name, pm.method_name, c.currency_code
FROM payments p
JOIN payment_providers pp ON p.provider_id = pp.id
JOIN payment_methods pm ON p.method_id = pm.id
JOIN currencies c ON p.currency_id = c.id
WHERE p.customer_id = ? AND p.status = 'PENDING'
ORDER BY p.created_at DESC;
-- Execution time: 450ms average
-- Called: 15,000 times/minute
Optimization Strategies
1. Database Optimization
Index Optimization
-- Added composite index
CREATE INDEX idx_payments_customer_status_created
ON payments(customer_id, status, created_at DESC);
-- Result: Query time reduced from 450ms to 45ms
Query Optimization
-- Optimized query with selective joins
SELECT p.id, p.amount, p.status, p.created_at,
pp.provider_name, pm.method_name, c.currency_code
FROM payments p
JOIN payment_providers pp ON p.provider_id = pp.id
JOIN payment_methods pm ON p.method_id = pm.id
JOIN currencies c ON p.currency_id = c.id
WHERE p.customer_id = ?
AND p.status = 'PENDING'
AND p.created_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
ORDER BY p.created_at DESC
LIMIT 50;
2. Caching Strategy
Redis Implementation
@Configuration
public class PaymentCacheConfig {
@Bean
public RedisTemplate<String, Object> redisTemplate() {
RedisTemplate<String, Object> template = new RedisTemplate<>();
template.setConnectionFactory(jedisConnectionFactory());
template.setDefaultSerializer(new GenericJackson2JsonRedisSerializer());
return template;
}
}
@Service
public class PaymentCacheService {
@Autowired
private RedisTemplate<String, Object> redisTemplate;
@Cacheable(value = "payment-methods", key = "#customerId")
public List<PaymentMethod> getCustomerPaymentMethods(String customerId) {
return paymentMethodRepository.findByCustomerId(customerId);
}
@Cacheable(value = "exchange-rates", key = "#fromCurrency + ':' + #toCurrency")
public ExchangeRate getExchangeRate(String fromCurrency, String toCurrency) {
return exchangeRateService.getLatestRate(fromCurrency, toCurrency);
}
}
Cache Warming Strategy
@Scheduled(fixedRate = 300000) // 5 minutes
public void warmCache() {
// Pre-load frequently accessed data
List<String> activeCustomers = getActiveCustomers();
activeCustomers.parallelStream().forEach(customerId -> {
paymentCacheService.getCustomerPaymentMethods(customerId);
});
// Pre-load exchange rates
currencyPairs.forEach(pair -> {
paymentCacheService.getExchangeRate(pair.getFrom(), pair.getTo());
});
}
3. Connection Pool Optimization
# HikariCP configuration
spring:
datasource:
hikari:
maximum-pool-size: 50
minimum-idle: 10
connection-timeout: 20000
idle-timeout: 300000
max-lifetime: 1200000
leak-detection-threshold: 60000
4. Asynchronous Processing
@Service
public class AsyncPaymentProcessor {
@Async("paymentTaskExecutor")
@Retryable(value = {Exception.class}, maxAttempts = 3)
public CompletableFuture<Void> processPaymentNotification(Payment payment) {
// Send notifications asynchronously
notificationService.sendPaymentConfirmation(payment);
auditService.logPaymentEvent(payment);
analyticsService.trackPaymentMetrics(payment);
return CompletableFuture.completedFuture(null);
}
}
@Configuration
@EnableAsync
public class AsyncConfig {
@Bean(name = "paymentTaskExecutor")
public TaskExecutor paymentTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(20);
executor.setMaxPoolSize(50);
executor.setQueueCapacity(500);
executor.setThreadNamePrefix("Payment-");
executor.initialize();
return executor;
}
}
5. Circuit Breaker Implementation
@Component
public class PaymentGatewayClient {
@CircuitBreaker(name = "payment-gateway", fallbackMethod = "fallbackPayment")
@TimeLimiter(name = "payment-gateway")
@Retry(name = "payment-gateway")
public CompletableFuture<PaymentResponse> processPayment(PaymentRequest request) {
return CompletableFuture.supplyAsync(() -> {
return paymentGatewayService.process(request);
});
}
public CompletableFuture<PaymentResponse> fallbackPayment(Exception ex) {
return CompletableFuture.completedFuture(
PaymentResponse.builder()
.status(PaymentStatus.PENDING)
.message("Payment queued for processing")
.build()
);
}
}
Results
Performance Improvements
| Metric | Before | After | Improvement |
|---|---|---|---|
| Average Response Time | 850ms | 595ms | 30% faster |
| 95th Percentile | 1,200ms | 800ms | 33% faster |
| Database Connections | 90% usage | 45% usage | 50% reduction |
| Cache Hit Rate | 0% | 85% | New capability |
| Error Rate | 2.5% | 0.8% | 68% reduction |
Business Impact
- Revenue increase: $3.2M annually due to reduced payment failures
- Customer satisfaction: 92% payment success rate (up from 85%)
- Infrastructure cost: 40% reduction in database resources needed
- Checkout abandonment: Reduced from 15% to 8%
Monitoring and Alerting
Key Metrics Dashboard
@Component
public class PaymentMetrics {
private final MeterRegistry meterRegistry;
public void recordPaymentProcessingTime(Duration duration) {
Timer.Sample.start(meterRegistry)
.stop(Timer.builder("payment.processing.time")
.register(meterRegistry));
}
public void incrementPaymentCounter(String status, String method) {
Counter.builder("payment.processed")
.tag("status", status)
.tag("method", method)
.register(meterRegistry)
.increment();
}
}
Alert Configuration
# High payment processing time
- alert: HighPaymentProcessingTime
expr: payment_processing_time_p95 > 1000
for: 5m
labels:
severity: warning
# High payment error rate
- alert: HighPaymentErrorRate
expr: rate(payment_processed{status="error"}[5m]) > 0.02
for: 2m
labels:
severity: critical
Lessons Learned
1. Measure Everything
- Implement comprehensive monitoring before optimization
- Use APM tools to identify real bottlenecks
- Set up alerting for key business metrics
2. Cache Strategically
- Cache frequently accessed, rarely changing data
- Implement cache warming for predictable load
- Monitor cache hit rates and adjust TTL accordingly
3. Optimize the Database Last
- Application-level optimizations often yield better results
- Index optimization can provide immediate wins
- Connection pooling is critical for high-concurrency systems
4. Async Where Possible
- Move non-critical operations to background processing
- Use message queues for decoupling
- Implement proper error handling for async operations
What’s Next?
Our next optimization phase focuses on:
- Machine learning for fraud detection optimization
- GraphQL implementation for flexible API responses
- Kubernetes deployment for better resource utilization
Working on payment system optimization? I’d love to hear about your challenges and share experiences. Connect with me on LinkedIn.