Are you an LLM? You can read better optimized documentation at /guide/faq/perf-vs.md for this page in Markdown format
Performance Comparison
TODO
Core Metrics of Distributed ID Schemes
- Global (Same Business) Uniqueness: Uniqueness guarantee is a necessary condition for ID. Assuming ID is not unique will cause primary key conflicts, which is easy to understand.
- The so-called global uniqueness usually does not mean uniqueness across all business services, but uniqueness across different deployment replicas of the same business service. For example, multiple deployment replicas of the Order service generating
Idfor thet_ordertable require global uniqueness. As for whether theIDgenerated fort_order_itemis unique witht_order, it does not affect the uniqueness constraint and will not cause any side effects. The same applies to different business modules. That is, uniqueness mainly solves the ID conflict problem.
- The so-called global uniqueness usually does not mean uniqueness across all business services, but uniqueness across different deployment replicas of the same business service. For example, multiple deployment replicas of the Order service generating
- Orderliness: The orderliness guarantee is necessary for query data structure algorithms (except Hash algorithms), and is the prerequisite for binary search (divide and conquer).
- MySQL-InnoDB B+ tree is the most widely used. Assuming Id is unordered, the B+ tree, in order to maintain the orderliness of ID, will frequently insert in the middle of the index and move the positions of subsequent nodes, even leading to frequent page splits, which has a huge impact on performance. If we can guarantee the orderliness of ID, this situation is completely different, only needing append write operations. Therefore, the orderliness of ID is very important and is an inevitable characteristic of ID design.
- Throughput/Performance (ops/time): The number of IDs that can be generated per unit time (per second). Generating ID is a very high-frequency operation and is also the most basic. Assuming ID generation is slow, no matter how the system is optimized, better performance cannot be obtained.
- Generally, we first generate the ID, then perform the write operation. Assuming ID generation is slow, the overall performance upper limit will be limited, which should be easy to understand.
- Stability (time/op): The stability metric can generally be analyzed using percentile sampling of each operation time, for example, CosId percentile sampling P9999=0.208 us/op, meaning 0% ~ 99.99% of unit operation times are less than or equal to 0.208 us/op.
- Percentile WIKI: A statistical term. If a set of data is sorted from small to large and the corresponding cumulative percentiles are calculated, then the value corresponding to a certain percentile is called the percentile number, denoted as Pk for the k-th percentile. Percentiles are used to compare the relative position of individuals in a group.
- Why not use the average time per operation: Can the average of Jack Ma's net worth and yours be meaningful?
- Can the minimum time per operation and maximum time per operation be used as reference? Because minimum and maximum values only describe the boundary conditions, although they can be used as a reference for stability, they are still not comprehensive. Moreover, percentiles already cover these two indicators.
- Autonomy (Dependency): Mainly refers to whether there is dependency on external environment, for example, segment mode strongly depends on third-party storage middleware to obtain
NextMaxId. Autonomy will also affect availability. - Availability: The availability of distributed ID is mainly affected by autonomy, for example, SnowflakeId is affected by clock rollback, leading to a short period of unavailability. While segment mode is affected by the availability of third-party distributors (
NextMaxId).- Availability WIKI: The proportion of total available time for a functional individual within a given time interval.
- MTBF: Mean Time Between Failures
- MDT: Mean Time To Repair/Recovery
- Availability=MTBF/(MTBF+MDT)
- Assuming MTBF is 1 year, MDT is 1 hour, then
Availability=(365*24)/(365*24+1)=0.999885857778792≈99.99%, which is what we usually call four 9s availability.
- Adaptability: Refers to the adaptive ability when facing changes in the external environment. Here we mainly talk about the dynamic scaling of distributed ID performance when facing traffic bursts.
- SegmentChainId can dynamically scale based on starvation state for safe distance.
- SnowflakeId conventional bit allocation scheme has constant performance of 409.6W, although different TPS performance can be obtained by adjusting the bit allocation scheme, but changing the bit allocation method is destructive, generally determined according to the business scenario and no longer changed.
- Storage Space: Still using MySQL-InnoDB B+ tree as an example, secondary indexes store primary key values, the larger the primary key, the more memory cache and disk space it occupies. The less data stored in Page pages, the more disk IO accesses will increase. In short, under the premise of meeting business needs, occupying as little storage space as possible is a good design principle in most scenarios.
Core Algorithms of Distributed ID
Bit Partition Algorithm (SnowflakeId)
| Performance (Throughput) | Stability (Percentile) | Autonomy (Dependency) | Machine Number Allocator | Machine Number Recycling | Usage Method | |
|---|---|---|---|---|---|---|
| CosId | 4,096,000(ops/s) | P9999=0.244(us/op) | First startup, depends on machine number allocator | Manual allocator, K8S, relational database, Redis, ZooKeeper | Supported | SDK(recommended)/RPC/RESTful |
| Leaf | ZooKeeper | |||||
| uid-generator | Relational database | |||||
| TinyID | Does not support bit partition algorithm |
Segment Algorithm (SegmentId)
| Performance (Throughput) | Stability (Percentile) | Autonomy (Dependency) | Segment Distributor | Adaptability | Storage Space | Usage Method | |
|---|---|---|---|---|---|---|---|
| CosId | 127,439,148(ops/s) | P9999=0.208(us/op) | Depends on segment distributor | Relational database, Redis, ZooKeeper | Supports Step auto-scaling | 64-bit | SDK(recommended)/RPC/RESTful |
| Leaf | MySql | ||||||
| uid-generator | Does not support segment algorithm | ||||||
| TinyID | Database |
