Billion User Recommendation System Solution: Vector Database + Real-time Computing Architecture for Millisecond Recommendations

Introduction (pain point analysis)

Dear developers and architects, are you struggling with the following issues?

As your business grows at a rapid pace and your user base passes the billion-dollar mark, traditional recommender systems begin to fall short. Offline batch-based recommendations are slow to update and fail to capture users'Real-time interest; in the face of sudden traffic peaks, the system response latency soars and the user experience plummets; at the same time, the full-volume commodity library in theMassive Candidate Set RecallandAccurate SortingThe computation is time-consuming and lengthy, and it becomes a bottleneck for business growth.

If you're struggling withRecommended timeliness, system scalability and high concurrency performanceand anxiety, then, Tencent Cloud has prepared for you this set of based on thevector databaseandon-line calculationThe millisecond recommendation architecture will be your best choice.

Solution Architecture Diagram and Overview

The core design idea of this program is.Real-time stream processing is utilized to capture the user's instantaneous interest, and with the help of a high-performance vector database to achieve millisecond similarity retrieval, and ultimately make accurate recommendations by integrating long and short-term interests.

A picture is worth a thousand words, and here is an architectural diagram of the solution that clearly shows the data flow in concert with the core components:

Billion User Recommendation System Solution: Vector Database + Real-time Computing Architecture for Millisecond Recommendations - LikaCloud

The workflow is as follows:

Real-time acquisition.Behavioral data generated by users on the front-end (e.g., clicks, views, favorites) is captured in real-time and sent to theTDMQ RocketMQmessage queues for peak shaving and decoupling.
Real-time processing. Streaming Computing Oceanusconsume data in the message queue, perform real-time feature extraction, aggregation, and invoke the model for fast inference to generate the user'sReal-time interest vectors。
Vector Search.Both the user's real-time vectors and the preprocessed item vectors are stored in theTencent Cloud Vector Databasein. When a recommendation is needed, the business application launches a query directly to the vector database, which returns a collection of the most similar items within milliseconds.
Data and modeling pedestals. TBDS/WeDataThe big data platform is responsible for offline data cleansing, integration and the construction of long-term user profiles.TI-ONEThe machine learning platform is then used to train and output high-quality deep learning recommendation models, providing model support for real-time computation and vectorization.
Business Integration.Recommendation business logic (e.g., filtering, sorting rules) is deployed in theCVMOn the top, all kinds of services are invoked securely and at high speed through the intranet, and the final recommendations are returned to the user.

The architecture perfectly solves the problem presented in the introduction ofReal-time, Scalability and PerformanceThree major pain points.

Core Products and Components

assemblies	play a role	Key configuration/selection recommendations	Why choose it
Tencent Cloud Vector Database (Tencent Cloud VectorDB)	The core of the system, which is responsible for storing all items and user vectors and providing millisecond approximate nearest neighbor search (ANN)。	optionHigh Performance Instance TypesSelect the appropriate specification based on the amount of data (billion/ten billion level). Index type selectionHNSWin pursuit of extreme performance.	Optimized for vector search, performance far exceeds that of traditional database solutions. Supporting hundreds of billions of vectors with a single index.99.99% High AvailabilityThe system is maintenance-free, which greatly reduces the cost of development and operation.
Streaming Computing Oceanus	real-time computational brain, which is responsible for consuming user behavioral streams, performing real-time feature computation and user vector generation.	optionFlink version, select the number of CUs (computing units) according to the data throughput. EnableCheckpointFunctional guarantee state consistency.	Fully Managed Apache Flink ServiceIt provides sub-second processing latency and high throughput capability. No need to care about cluster operation and maintenance, focus on business logic development, and easily realize complex event processing.
Message Queue TDMQ (RocketMQ Edition)	The nerve center of the systemIt is responsible for taking on all real-time user behavior data and buffering and decoupling upstream and downstream systems.	optionRocketMQ 5.xThe number of Topic partitions is set to match the number of concurrent consumers to ensure throughput.	HaveExtreme throughput and low latency, perfectly support high concurrent writes for hundreds of millions of users. Fully compatible with the Apache RocketMQ ecosystem , seamless connection to existing systems .
Big Data Platform WeData/TBDS	data cornerstone, responsible for offline data ETL, data quality management and long-term user profile construction.	utilizationWeDataPerform data development and task scheduling usingTBDSHandles storage and computation of very large-scale data.	furnishOne-stop data governance capability, ensuring that the data fed into the model and the real-time system is accurate and reliable, guaranteeing recommendation effectiveness at the source.
Machine Learning Platform TI-ONE	The engine of recommendation algorithms, which is used to train and deploy deep learning models such as twin-tower models, DNNs, etc., to generate high-quality vectors.	utilizationNotebookPerform characterization and modeling experiments usingTraining platformsPerform massively distributed training usingmodeling servicePerform one-click deployment.	provide information fromFull process support from feature engineering to modeling servicesIt has a variety of built-in algorithmic frameworks and optimization components, which significantly improves the R&D efficiency of algorithm engineers.
Cloud Server CVM & Private Network VPC	Business Logic Bearer, which is used to deploy business applications such as recommendation API services, policy services, and so on.	CVMSelect Compute Optimized. All components are deployed in theSame VPC in the same geographic regionwithin the network to ensure the lowest network latency and most secure communications.	VPCProvide all cloud products with aIsolated, secure, high-speed intranet environmentIt is the foundation that guarantees the high performance and security of the entire system.

Summary of program benefits

⚡ Millisecond response.Relying on the extreme retrieval performance of Tencent Cloud Vector Database, the recommended recall delay is reduced to milliseconds, and the user experience is silky smooth.
? Unlimited Expansion.The components of the architecture are distributed and can be easily scaled horizontally to handle growth from millions to hundreds of billions of users and objects.
? Accurate recommendations.The combination of real-time computation and vectorized retrieval can reflect users' long-term preferences and also keenly capture real-time interests, significantly improving recommendation accuracy.
? ️ Stable and reliable.Fully managed services provide automatic failover and high availability protection, with system availability as high as 99.99% for business continuity.
? Reduce costs and increase efficiency.Eliminating cumbersome infrastructure operations and maintenance allows development and algorithm teams to focus more on business innovation, and total cost of ownership (TCO) is dramatically reduced.

Application Scenarios and Applicable Customers

Typical application scenarios.
- E-commerce platform.Real-time personalized recommendations such as “Guess Your Favorite” and “Watch Again” to increase click-through rate and GMV.
- Content information/short video platform.Quickly update the stream content based on users' real-time browsing behavior to increase user stickiness and length of stay.
- Music/Radio App.Generate next play recommendations in real time for an immersive experience.
Applicable customer characteristics.
- Huge volume of users (daily activity of more than a million), facing serious performance and scalability challenges.
- Business to Recommended ResultstopicalityExtremely demanding and requires quick feedback on the latest user behavior.
- The technical team wanted an industry-leading architecture, but wanted toReduced O&M investment, focusing on core business logic.

Billion user recommendation system solution: Vector Database + Real-Time Computing Architecture for Millisecond Recommendations

Introduction (pain point analysis)

Solution Architecture Diagram and Overview

Core Products and Components

Summary of program benefits

Application Scenarios and Applicable Customers

Related links

Fully Managed WordPress Hosting Across Multi-Cloud Platforms

Billion user recommendation system solution: Vector Database + Real-Time Computing Architecture for Millisecond Recommendations

Introduction (pain point analysis)​

​Solution Architecture Diagram and Overview​

​Core Products and Components​

​Summary of program benefits​

​Application Scenarios and Applicable Customers​

​Related links​

Recommended

Elastic Search Solution: Elasticsearch Service, a Cloud-Native Search Engine, Supports Peak Business Access at a Low Cost

Financial-grade data governance solutions: building enterprise-level data blood and quality control systems

Low-cost Big Data Storage and Compute Solution: Object Storage OSS + Compute Separation Architecture Cost Reduction 50%

PB-scale data real-time analysis solution: architecture practice based on Tencent Cloud's native data lake warehouse

Introduction (pain point analysis)

Solution Architecture Diagram and Overview

Core Products and Components

Summary of program benefits

Application Scenarios and Applicable Customers

Related links