Han Xin code (汉信码 in Chinese, Chinese-sensible code) is two-dimensional (2D) matrix barcode symbology invented in 2007[1] by Chinese company The Article Numbering Center of China[2] (中国物品编码中心 in Chinese) to break the monopoly of QR code. As a QR code, Han Xin code consists of black squares and white square spaces arranged in a square grid on a white background. It has four finder patterns and other markers which allow to recognize it with camera-based readers. Han Xin code contains Reed–Solomon error correction with ability to read corrupted images. At this time, it is issued as ISO/IEC 20830:2021.[3]
The main advantage (and invention requirement), comparable to QR code, is an embedded ability to natively encode Chinese characters instead of Japanese in QR code. Han Xin code in maximal 84 version (189×189 size)[4] allows to encode 7827 numeric characters, 4350 English text characters, 3261 bytes and 1044–2174 Chinese characters (it depends on Unicode region). Han Xin code encodes full ISO/IEC 646 Latin characters instead of restricted amount Latin characters which is supported by QR code. It makes Han Xin code more suitable for English text encoding or GS1 Application Identifiers[5] data encoding.
Additionally, Han Xin code can encode Unicode characters from other languages with special Unicode mode,[3]: 5.4.12 which has embedded lossless compression for UTF-8 characters set and Extended Channel Interpretation support. Han Xin code has special compactification mode for URI encoding and can reduce barcode size which encodes links to web pages.
History and standards
Chinese company The Article Numbering Center of China (中国物品编码中心 in Chinese) during 10-th Five-year plans of China started research[6] of own QR code replacement to remove Japanese monopoly in 2D barcodes. In 2007, the new barcodes standard, at this time known as Han Xin code, published as GB/T 21049-2007[1] with the name Chinese-sensible code.
In 2015, group of ISO/IEC JTC 1/SC 31 started implementation[9] of Han Xin code as international standard and published it as ISO/IEC 20830:2021[3] in 2021.
In 2022 Chinese-sensible code standard was reviewed as GB/T 21049-2022[10] and renamed as Han Xin code to be compliant with ISO standard.
Set of patents is registered in United States Patent and Trademark Office related with Han Xin code encoding and decoding:
European Patent Office EP3330887B1 by Fujian Landi Commercial Equipment Co Ltd "Chinese-sensitive code feature pattern detection method and system"[11]
United States Patent US10095903B2 by Ingenico Fujian Technology Co Ltd "Block decoding method and system for two-dimensional code"[12]
United States Patent US10528781B2 by Ingenico Fujian Technology Co Ltd "Detection method and system for characteristic patterns of Han Xin codes"[13]
Application
Han Xin code can be used in the same way as QR code. At this time Han Xin code is used mostly in China,[14] because it has embedded encoding ability to encode Chinese characters. However, most of barcode printers[15] and barcode scanners[16] support Han Xin code. Han Xin code can be scanned on iOS[17] and Android[18] mobile devices and many barcode libraries[19][20] support reading and writing Han Xin code.
Main advantages of Han Xin code are:
ability to encode Chinese characters with embedded methods;[21]
compact GS1 Application Identifiers data encoding comparable to QR code;
full ISO/IEC 646 support for compact numeric and text encoding.
Barcode design
Barcode structure of Han Xin Code
Han Xin code represents data in black and white square modules, where dark module is a binary one and a light module is a zero. Additionally, Han Xin code can be encoded in inverse colors,[3]: 4.1.2 but this option in many barcode readers is disabled by default. Black and white modules are arranged into square region with sizes from 23 × 23 modules (Version 1) to 189 × 189 modules (Version 84). As QR code, Han Xin code does not have rectangular versions like DataMatrix has and this restricts usage of Han Xin code in some cases. Han Xin code version size can be calculated with the following formula:
Han Xin code symbol is constructed from the following elements:[3]: 4.2
Quiet Zone – is surrounding the symbol on all four sides with at least 3X size;
Finder Pattern – consists from 4 Position Detection Patterns which is placed on all four corners of symbol and used to detect symbol position and area;
Alignment Patterns and Assistant Alignment Patterns – is started from Version 4 and helps with the decoding of distorted code;
Structural Information Regions – is surrounding all four Finder Patterns and used to encode symbol parameters like version, mask and error correction mode;
Data Regions – masked binary data encoded in black and white modules.
Finder pattern
Han Xin Code finder pattern
Finder Pattern[3]: 4.2.3 consists from four Position Detection Patterns located at the four corners of the barcode. The size of Position Detection Pattern is 7×7 modules and it is constructed from 5 elements: dark 7 × 7 modules, light 6 × 6 modules, dark 5 × 5 modules, light 4 × 4 modules, dark 3 × 3 modules respectively.
The scanning ratio of each Position Detection Pattern is 1:1:1:1:3 or 3:1:1:1:1 (depends on scanning direction). The four patterns orientation allows to detect unambiguously the barcode location and orientation.
Every pattern has Position Detection Pattern separator[3]: 4.2.4 with Structural Information Region aligned to it.
Alignment pattern
The Alignment Patterns[3]: 4.2.5 are added to the Han Xin code from Version 4 (Versions 1–3 do not have alignment patterns) and used to precise cell position in the distorted barcodes. Alignment Patterns in Han Xin code are split into:
Alignment Pattern – set of step-wise alignment lines;
Assistant Alignment Pattern - 6 modules, including 5 light modules and 1 dark module.
The Alignment Pattern is made up of a dark line and a downside adjacent light line which are one module wide. Assistant Alignment Pattern consisting from 5 light modules and 1 dark module indicates edge of region block with its dark module.
Below you can see examples of Han Xin code with different Alignment pattern placement.
Han Xin Code version 1
Han Xin Code version 4
Han Xin Code version 22
Han Xin Code version 84
Structural information
Structural information placement of Han Xin Code
Han Xin code Structural Information Region[3]: 4.2.7 is a one module wide region surrounding the four Position Detection Patterns. Han Xin code has two Structural Information identical arrays, which are made from 34 data modules. Every Structural Information array is split on 17 modules which are placed around each Position Detection Pattern.
Structural Information Region encodes the following data:[3]: Annex E
Version + 20 (bits 0–7);
Error correction level (bits 8–9);
Mask index (bits 10–11);
Error correction Reed–Solomon error correction data (bits 12–27);
Bits 28–33 are ignored and can be any (sometimes they can be filled with white, black sequence).
Metadata bits from 0–11 are split into 4 bits tetrads(m2, m1, m0) and supplemented with four error correction tetrads (r3, r2, r1, r0).
Han Xin Code Structural information bits
Version + 20
Error correction level
Mask index
Error correction codewords
m2
m1
m0
r3
r2
r1
r0
X0
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X16
X17
X18
X19
X20
X21
X22
X23
X24
X25
X26
X27
Data masking
To make Han Xin code dark and light modules amount to be closely to 1:1 in the symbol, masking algorithm[3]: 5.8.4 is used. Masking sequence is applied to Data Region through the XOR operation. Finder Pattern, Alignment Patterns and Structural Information Regions are excluded from masking operation. The following table shows mask pattern algorithms (which is placed to Structural Information Region).
Han Xin Code masking pattern algorithm
Condition of masking solution
Data mask pattern reference
Non-masking
00
(i+j) mod 2=0
01
((i+j)mod 3+( j mod 3)) mod 2=0
10
(i mod j +j mod i + i mod 3+ j mod 3) mod 2=0
11
i - Row index of the symbol.
j - Column index of the symbol.
Both i and j start from (1,1), the top left corner module of the symbol. When the masking solution condition is true, the resulting mask bit is 1.
Error correction
Han Xin code uses Reed–Solomon error correction. Encoded data is represented as byte (8-bit) array. Data array divided into blocks[3]: Annex B and error correction codewords sequence is generated for each block which is added to the end of the error correction block. After this, all blocks are merged sequentially into byte stream.
The polynomial arithmetic for Han Xin Code uses finite field generation polynomial: x^8 + x^6 + x^5 + x (355 or 101100011b)[3]: 5.5 with initial root = 1.
The amount of error correction codewords depends on symbol version and error correction level and can be from 16% to 60%, which allows to correct from 8% to 30% damage.[3]: 5.6.2
Han Xin Code error correction levels features
Error correction level
Recovery capacity % (approximation)
Encoding of error correction level
L1
8%
00
L2
15%
01
L3
23%
10
L4
30%
11
Data region
Han Xin code data is encoded as byte array. Data byte array is split into error correction blocks, where error correction codewords (bytes) are added. Error correction blocks are united into one codewords array:[3]: 5.8.3
As an example, this can be demonstrated on Han Xin code version 5 with error correction level L4. It has 27 encoded codewords and 2 error correction blocks with each block size of data codewords and error correction codewords: (14, 20), (13, 22):
(D1...D14, D15...D27) => (D1...D14, E1.1...1.20) + (D15...D27, E2.1...2.22) => (D1...D14, E1.1...1.20, D15...D27, E2.1...2.22) => (C1...C69)
D(x) - Data codewords.
E(b.x) - error codeword, where b is block number and x position in block.
C(x) - resulted codewords.
As the next operation, resulted codewords array C(x) is split into blocks with size of 13 bytes which connects codewords in the same position of each block and form new codewords array. The result is byte array of the same size but mixed by position of 13.
(С1...С13, С14...С26, Сn...Cn+12) => (С1, C14, Cn...С13, С26, Cn+12) => (CM1...CMn+12)
CM(x) – mixed by position of 13 array of codewords (bytes).
After the upper operations the resulted codewords are placed into data region row by row from left to right and from up to down. Horizontal line damage would affect fewer codewords, vertical line damage would affect more codewords.
Encoding
Han Xin code can encode 7827 numeric characters, 4350 English text characters, 3261 bytes and 1044–2174 Chinese characters in the maximal version 84 version.[3]: Annex C Additionally, it supports special Unicode and industrial modes. All modes can be mixed to obtain best compactification level for the data. The following table demonstrates abilities to encode data with different barcode version and error correction level.
Han Xin Code versions and information capacity
Version
Size
Error correction level
Data codewords
Error correction codewords
Numeric
Text
Bytes
Chinese characters
1
23×23
L1
21
4
45
26
18
6–12
L4
9
16
15
10
6
2–4
...
22
65×65
L1
354
68
843
470
351
113–234
L4
168
254
399
222
165
53–110
...
84
189×189
L1
3264
622
7827
4350
3261
1044–2174
L4
1554
2332
3723
2070
1551
497–1034
Encoding modes
All encoding modes can be split into the following groups:[3]: 5.3.1
Numeric mode which includes digits encoding: 0–9;
Text mode which supports full ISO/IEC 646 characters set;
Binary (Byte) mode which encodes bytes values 0–255;
Chinese Characters modes which encodes 1587600 different Chinese characters from GB 18030 codepage in 4 modes;
GS1 mode which encodes GS1 Application Identifiers[5] data;
URI mode which encodes URI links in compact encoding.
Han Xin Code mode characteristics
Mode
Mode indicators
Bits per character
Numeric
0001b
3.3 (10 bits for three digits)
Text
0010b
6
Binary Byte
0011b
8
Common Chinese Characters in Region One
0100b
12
Common Chinese Characters in Region Two
0101b
12
GB18030 2-byte Region
0110b
15
GB18030 4-byte Region
0111b
21
ECI
1000b
Variable (multi-bytes mode)
Unicode
1001b
Adaptive (lossless compression)
GS1
11100001b
Variable (Numeric + Text modes)
URI
11100010b
Variable (2–7 bits per character)
Numeric mode
The input data string in Numeric mode[3]: 5.4.4 is divided into blocks of three digits (the last block can be less than three) and encoded in 10 bits (0000000000b - 1111100111b). The mode data is prefixed with mode indicator 0001b and terminates with mode terminator which also indicates number of digits in last group.
Han Xin Code numeric mode terminators
Numeric characters in last group
Mode terminator
1
1111111101b
2
1111111110b
3
1111111111b
As an example, we need to encode digits sequence 12700402:
Prefix => 0001b
127 => 0001111111
004 => 0000000100
02 => 0000000010
Terminator => 1111111110b
Text mode
Text mode encodes data characters set from ISO/IEC 646. Each character is represented by 6 bits.[3]: 5.4.5 All characters are divided into two subsets: Text1 sub-mode and Text2 sub-mode. 11110b value is used to switch between text sub-modes, 111111b is a mode terminator. Text mode starts from Text1 sub-mode.
Han Xin Code Text1 sub-mode
Character
ASCII value
Encoding value
Character
ASCII value
Encoding value
Character
ASCII value
Encoding value
0
48
000000b
L
76
010101b
g
103
101010b
1
49
000001b
M
77
010110b
h
104
101011b
2
50
000010b
N
78
010111b
i
105
101100b
3
51
000011b
O
79
011000b
j
106
101101b
4
52
000100b
P
80
011001b
k
107
101110b
5
53
000101b
Q
81
011010b
l
108
101111b
6
54
000110b
R
82
011011b
m
109
110000b
7
55
000111b
S
83
011100b
n
110
110001b
8
56
001000b
T
84
011101b
o
111
110010b
9
57
001001b
U
85
011110b
p
112
110011b
A
65
001010b
V
86
011111b
q
113
110100b
B
66
001011b
W
87
100000b
r
114
110101b
C
67
001100b
X
88
100001b
s
115
110110b
D
68
001101b
Y
89
100010b
t
116
110111b
E
69
001110b
Z
90
100011b
u
117
111000b
F
70
001111b
a
97
100100b
v
118
111001b
G
71
010000b
b
98
100101b
w
119
111010b
H
72
010001b
c
99
100110b
x
120
111011b
I
73
010010b
d
100
100111b
y
121
111100b
J
74
010011b
e
101
101000b
z
122
111101b
K
75
010100b
f
102
101001b
Han Xin Code Text2 sub-mode
Character
ASCII value
Encoding value
Character
ASCII value
Encoding value
Character
ASCII value
Encoding value
NUL
0
000000b
NAK
21
010101b
.
46
101010b
SOH
1
000001b
SYN
22
010110b
/
47
101011b
STX
2
000010b
ETB
23
010111b
:
58
101100b
ETX
3
000011b
CAN
24
011000b
;
59
101101b
EOT
4
000100b
EM
25
011001b
<
60
101110b
ENQ
5
000101b
SUB
26
011010b
=
61
101111b
ACK
6
000110b
ESC
27
011011b
>
62
110000b
BEL
7
000111b
SP
32
011100b
?
63
110001b
BS
8
001000b
!
33
011101b
@
64
110010b
HT
9
001001b
”
34
011110b
[
91
110011b
LF
10
001010b
#
35
011111b
\
92
110100b
VT
11
001011b
$
36
100000b
]
93
110101b
FF
12
001100b
%
37
100001b
^
94
110110b
CR
13
001101b
&
38
100010b
_
95
110111b
SO
14
001110b
‘
39
100011b
`
96
111000b
SI
15
001111b
(
40
100100b
{
123
111001b
DLE
16
010000b
)
41
100101b
|
124
111010b
DC1
17
010001b
*
42
100110b
}
125
111011b
DC2
18
010010b
+
43
100111b
~
126
111100b
DC3
19
010011b
,
44
101000b
DEL
27
111101b
DC4
20
010100b
-
45
101001b
Binary byte mode
Binary mode encodes bytes array [0 – 255] in any form. Binary mode[3]: 5.4.6 consists from binary mode indicator 0011b, 13-bit binary counter and bytes data which are converted to 8-bit sequence. None mode terminator is required.
Chinese Characters modes
Chinese Characters modes is a set of 4 modes which encodes Chinese characters from GB 18030 codepage.
Han Xin Code Chinese Characters modes
Mode
Mode indicator
Bits
Encoding characters count
Description
Common Chinese Characters in Region One mode[3]: 5.4.7
0100b
12
4074
Encodes characters from GB 18030 regions, which: first byte value is in the range of B0 to D7 and second byte value is in the range of A1 to FE (3760 characters), first byte value is in the range of A1 to A3 and second byte value is in the range of A1 to FE (282 characters), in the range of A8A1 to A8C0 (32 characters).
Common Chinese Characters in Region Two mode[3]: 5.4.8
0101b
12
3008
Encodes characters from GB 18030 region, which first byte value is in the range of D8 to F7 and second byte value is in the range of A1 to FE (3008 characters).
Encodes characters from GB 18030 region, which first byte value is in the range of 81 to FE and second byte value is in the range of 40 to 7E or 80 to FE (23940 characters).
Encodes characters from GB 18030 region, which first byte value is in the range of 81 to FE, and second byte value is in the range of 30 to 39, and third byte value is in the range of 81 to FE, and fourth byte value is in the range of 30 to 39 (1587600 characters).
Unicode mode
Unicode mode[3]: 5.4.12 encodes UTF-8 charset with embedded lossless compression. In the Unicode mode, the input data is analysed by using self-adaptive algorithm. Firstly, input data is divided and combined into the 1, 2, 3, or 4 byte pattern preencoding sub-sequences, and secondly a run-length data compression algorithm is applied to encode each sub-sequences of the input data.
Shortly, the Unicode mode searches characters sub-pages which can have the same prefix sequence for all of characters of the same language (Cyrillic, Greek, French, German... languages) and encodes only differences from prefix bytes sequence.
GS1 mode
Han Xin code GS1 mode[3]: 5.4.13 is an indicator that the represented data is defined by GS1 General Specification. GS1 mode encodes data in Numeric and Text modes. Other modes may be used but GS1 mode must be first
mode in the symbol and encoded data must be returned with GS1 flag. <FNC1> (if required) must be encoded as 1111101000b in Numeric mode (Numeric mode encodes only three digits, so 1111101000b => 1000 value is counted as special character). In case <FNC1> identifier must be inserted and encoder is in any mode different from Numeric, the mode must be terminated and Numeric mode must be started. GS1 mode indicator is 11100001b and GS1 mode terminator is 11111111b.
The data in GS1 mode is split into GS1 Application Identifiers chinks and then compacted with the best modes. As an example, the following data can be encoded:
(10)123456ABC<FNC1>(240)DATA
The data is encoded in the following way:
<11100001b> <Numeric 10123456> <Text ABC> <Numeric mode selector> <1111101000b> <Numeric 240> <Text DATA> <11111111b>
URI mode
Han Xin code URI mode[3]: 5.4.14 encodes URI links in compact encoding. URI mode indicator is 11100010b and URI mode terminator is 111b. URI mode can encode data in three charsets: URI-A, URI-B, URI-C[3]: Annex M with own sub-mode terminators. URI mode can encode %XX data in special Percent-Encoding sub-mode, where three symbols is encoded in 8 bits.
Han Xin Code URI submodes
Charset
Charset indicator
URI-A
001b
URI-B
010b
URI-C
011b
Percent-Encoding
100b
URI Mode Teminator
111b
Percent-Encoding sub-mode encodes %XX data in 8 bits sequence. The mode does not require any terminator. To encode URI %XX data in this mode, sub-mode indicator (100b) must be added, then 8-bit indicator of sub-mode 8 bits sequence must be added (counter = Length of %XX / 3) and after this sequence, where %FF, or %ff, or %00, must be added as xFF or x00 bytes.