zmc
2023-12-22 9fdbf60165db0400c2e8e6be2dc6e88138ac719a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
U
L±d9!ã@s<ddlmZmZddlmZddlmZGdd„deƒZdS)é)ÚListÚUnioné)Ú CharSetProber)Ú ProbingStatecseZdZdZdZdZddœ‡fdd„ Zddœ‡fdd    „ Zee    dœd
d „ƒZ
ee    dœd d „ƒZ e dœdd„Z e dœdd„Zedœdd„Zedœdd„Zedœdd„Zedœdd„Zeeddœdd„Zeeddœdd„Zeeefed œd!d"„Zeedœd#d$„ƒZe dœd%d&„Z‡ZS)'Ú UTF1632Proberad
    This class simply looks for occurrences of zero bytes, and infers
    whether the file is UTF16 or UTF32 (low-endian or big-endian)
    For instance, files looking like (    [nonzero] )+
    have a good probability to be UTF32BE.  Files looking like (  [nonzero] )+
    may be guessed to be UTF16BE, and inversely for little-endian varieties.
    ég®Gázî?N)Úreturncsntƒ ¡d|_dgd|_dgd|_tj|_ddddg|_d|_    d|_
d|_ d|_ d|_ d|_| ¡dS©NréF)ÚsuperÚ__init__ÚpositionÚ zeros_at_modÚnonzeros_at_modrÚ    DETECTINGÚ_stateÚquadÚinvalid_utf16beÚinvalid_utf16leÚinvalid_utf32beÚinvalid_utf32leÚ'first_half_surrogate_pair_detected_16beÚ'first_half_surrogate_pair_detected_16leÚreset©Úself©Ú    __class__©úLd:\z\workplace\vscode\pyvenv\venv\Lib\site-packages\chardet/utf1632prober.pyr )s
  zUTF1632Prober.__init__csftƒ ¡d|_dgd|_dgd|_tj|_d|_d|_    d|_
d|_ d|_ d|_ ddddg|_dSr
)r rrrrrrrrrrrrrrrrrr r8s
  zUTF1632Prober.resetcCs4| ¡r dS| ¡rdS| ¡r$dS| ¡r0dSdS)Nzutf-32bezutf-32lezutf-16bezutf-16lezutf-16)Úis_likely_utf32beÚis_likely_utf32leÚis_likely_utf16beÚis_likely_utf16lerrrr Ú charset_nameFszUTF1632Prober.charset_namecCsdS)NÚrrrrr ÚlanguageSszUTF1632Prober.languagecCstd|jdƒS)Nçð?g@©Úmaxrrrrr Úapprox_32bit_charsWsz UTF1632Prober.approx_32bit_charscCstd|jdƒS)Nr(g@r)rrrr Úapprox_16bit_charsZsz UTF1632Prober.approx_16bit_charscCsj| ¡}||jkoh|jd||jkoh|jd||jkoh|jd||jkoh|jd||jkoh|j S©Nrréé)r+ÚMIN_CHARS_FOR_DETECTIONrÚEXPECTED_RATIOrr©rZ approx_charsrrr r!]s
ÿþýûzUTF1632Prober.is_likely_utf32becCsj| ¡}||jkoh|jd||jkoh|jd||jkoh|jd||jkoh|jd||jkoh|j Sr-)r+r0rr1rrr2rrr r"gs
ÿþýûzUTF1632Prober.is_likely_utf32lecCsV| ¡}||jkoT|jd|jd||jkoT|jd|jd||jkoT|j S)Nrr/rr.)r,r0rr1rrr2rrr r#qs
ÿÿþûzUTF1632Prober.is_likely_utf16becCsV| ¡}||jkoT|jd|jd||jkoT|jd|jd||jkoT|j S)Nrr.rr/)r,r0rr1rrr2rrr r${s
ÿÿþûzUTF1632Prober.is_likely_utf16le)rr    cCs¨|ddksL|ddksL|ddkrR|ddkrRd|dkrHdkrRnnd|_|ddksž|ddksž|ddkr¤|ddkr¤d|dkršdkr¤nnd|_d    S)
        Validate if the quad of bytes is valid UTF-32.
 
        UTF-32 is valid in the range 0x00000000 - 0x0010FFFF
        excluding 0x0000D800 - 0x0000DFFF
 
        https://en.wikipedia.org/wiki/UTF-32
        rrééØr.éßTr/N)rr)rrrrr Úvalidate_utf32_characters…s8
 
ÿ
þ
ý
ýýý
 
ÿ
þ
ý
ýýý
z'UTF1632Prober.validate_utf32_characters)Úpairr    cCsô|jsNd|dkrdkr*nnd|_qxd|dkrBdkrxnqxd|_n*d|dkrfdkrrnnd|_nd|_|jsÆd|dkr–dkr¢nnd|_qðd|dkrºdkrðnqðd|_n*d|dkrÞdkrênnd|_nd|_d    S)
a9
        Validate if the pair of bytes is  valid UTF-16.
 
        UTF-16 is valid in the range 0x0000 - 0xFFFF excluding 0xD800 - 0xFFFF
        with an exception for surrogate pairs, which must be in the range
        0xD800-0xDBFF followed by 0xDC00-0xDFFF
 
        https://en.wikipedia.org/wiki/UTF-16
        r4réÛTéÜr5FrN)rrrr)rr7rrr Úvalidate_utf16_characters›s 
z'UTF1632Prober.validate_utf16_characters)Úbyte_strr    cCsœ|D]}|jd}||j|<|dkrX| |j¡| |jdd…¡| |jdd…¡|dkrt|j|d7<n|j|d7<|jd7_q|jS)Nr r/rr.r)rrr6r:rrÚstate)rr;ÚcZmod4rrr Úfeed»s
 
 zUTF1632Prober.feedcCsF|jtjtjhkr|jS| ¡dkr.tj|_n|jdkr@tj|_|jS)Ngš™™™™™é?i)rrÚNOT_MEÚFOUND_ITÚget_confidencerrrrr r<Ês 
 
zUTF1632Prober.statecCs(| ¡s | ¡s | ¡s | ¡r$dSdS)Ng333333ë?g)r$r#r"r!rrrr rA×sþýüûøzUTF1632Prober.get_confidence) Ú__name__Ú
__module__Ú __qualname__Ú__doc__r0r1r rÚpropertyÚstrr%r'Úfloatr+r,Úboolr!r"r#r$rÚintr6r:rÚbytesÚ    bytearrayrr>r<rAÚ __classcell__rrrr rs*     
 
 
 
  rN)ÚtypingrrÚ charsetproberrÚenumsrrrrrr Ú<module>s