zmc
2023-12-22 9fdbf60165db0400c2e8e6be2dc6e88138ac719a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
U
¡ý°d‚ ã@shdZddlZddlZddlmZddlmZddlmZm    Z    m
Z
ej j e ddGd    d
„d
e
e    ƒƒZ dS) zDTests to ensure that the html5lib tree builder generates good trees.éN)Ú BeautifulSoup)Ú SoupStraineré)ÚHTML5LIB_PRESENTÚHTML5TreeBuilderSmokeTestÚSoupTestz?html5lib seems not to be present, not testing its tree builder.)Úreasonc@s”eZdZdZedd„ƒZdd„Zdd„Zdd    „Zd
d „Z    d d „Z
dd„Z dd„Z dd„Z dd„Zdd„Zdd„Zdd„Zdd„Zdd„Zd d!„Zd"S)#ÚTestHTML5LibBuilderz"See ``HTML5TreeBuilderSmokeTest``.cCsddlm}|S)Nr)ÚHTML5TreeBuilder)Z bs4.builderr
)Úselfr
©r úNd:\z\workplace\vscode\pyvenv\venv\Lib\site-packages\bs4/tests/test_html5lib.pyÚdefault_builders z#TestHTML5LibBuilder.default_builderc    Csrtdƒ}d}tjdd}t|d|d}W5QRX| ¡| |¡ksHt‚|\}|jtks\t‚dt    |j
ƒksnt‚dS)NÚbz<p>A <b>bold</b> statement.</p>T)ÚrecordÚhtml5lib)Z
parse_onlyz4the html5lib tree builder doesn't support parse_only) rÚwarningsÚcatch_warningsrÚdecodeZ document_forÚAssertionErrorÚfilenameÚ__file__ÚstrÚmessage)r ZstrainerÚmarkupÚwÚsoupÚwarningr r r Útest_soupstrainersz%TestHTML5LibBuilder.test_soupstrainercCsd}| |d¡| d¡dS)z8html5lib inserts <tbody> tags where other parsers don't.z[<table id="1"><tr><td>Here's another table:<table id="2"><tr><td>foo</td></tr></table></td>z†<table id="1"><tbody><tr><td>Here's another table:<table id="2"><tbody><tr><td>foo</td></tr></tbody></table></td></tr></tbody></table>z{<table><thead><tr><td>Foo</td></tr></thead><tbody><tr><td>Bar</td></tr></tbody><tfoot><tr><td>Baz</td></tr></tfoot></table>N)Z assert_soup)r rr r r Útest_correctly_nested_tables&sþÿz0TestHTML5LibBuilder.test_correctly_nested_tablescCs$d}| |¡}d|j ¡ks t‚dS)Nzy<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
   <p>foo</p>
  </body>
</html>s
<p>foo</p>)rÚpÚencoder©r rrr r r Ú(test_xml_declaration_followed_by_doctype:s    
z<TestHTML5LibBuilder.test_xml_declaration_followed_by_doctypecCs:d}| |¡}d|j ¡ks t‚dt| d¡ƒks6t‚dS)Nz%<p><em>foo</p>
<p>bar<a></a></em></p>zD<body><p><em>foo</em></p><em>
</em><p><em>bar<a></a></em></p></body>ér ©rÚbodyrrÚlenÚfind_allr"r r r Útest_reparented_markupHs
z*TestHTML5LibBuilder.test_reparented_markupcCs:d}| |¡}d|j ¡ks t‚dt| d¡ƒks6t‚dS)Nz&<p><em>foo</p>
<p>bar<a></a></em></p>
zE<body><p><em>foo</em></p><em>
</em><p><em>bar<a></a></em></p>
</body>r$r r%r"r r r Ú+test_reparented_markup_ends_with_whitespaceOs
z?TestHTML5LibBuilder.test_reparented_markup_ends_with_whitespacecCsLd}| |¡}|jdd\}}| d¡\}}|j|ks:t‚|j|ksHt‚dS)zƒVerify that we keep the two whitespace nodes in this
        document distinct when reparenting the adjacent <tbody> tags.
        z,<table> <tbody><tbody><ims></tbody> </table>ú ©ÚstringÚtbodyN)rr(Ú next_elementr)r rrZspace1Zspace2Ztbody1Ztbody2r r r Ú<test_reparented_markup_containing_identical_whitespace_nodesUs 
zPTestHTML5LibBuilder.test_reparented_markup_containing_identical_whitespace_nodescCs^d}| |¡}|j}d|jks"t‚|jdd}|jddd}||jksLt‚||jksZt‚dS)NzF<div><a>aftermath<p><noscript>target</noscript>aftermath</a></p></div>Útargetr,Z    aftermathéÿÿÿÿ)rÚnoscriptr/rÚfindr(Zprevious_element)r rrr3r1Zfinal_aftermathr r r Ú*test_reparented_markup_containing_children`s
 z>TestHTML5LibBuilder.test_reparented_markup_containing_childrencCs$d}| |¡}t|ƒ d¡s t‚dS)z(Processing instructions become comments.s<?PITarget PIContent?>z<!--?PITarget PIContent?-->N)rrÚ
startswithrr"r r r Útest_processing_instructionps
z/TestHTML5LibBuilder.test_processing_instructioncCs8d}| |¡}| d¡\}}||ks(t‚||k    s4t‚dS)Ns<a class="my_class"><p></a>Úa)rr(r)r rrZa1Za2r r r Útest_cloned_multivalue_nodevs
 
 z/TestHTML5LibBuilder.test_cloned_multivalue_nodecCs$d}| |¡}d|j ¡ks t‚dS)Ns<table><td></tbody>Az><body>A<table><tbody><tr><td></td></tr></tbody></table></body>)rr&rrr"r r r Útest_foster_parenting}s
z)TestHTML5LibBuilder.test_foster_parentingcCsLd}| |¡}dd„|dƒDƒdd„|dƒDƒt| d¡ƒdksHt‚d    S)
        Test that extraction does not destroy the tree.
 
        https://bugs.launchpad.net/beautifulsoup/+bug/1782928
        zW
<html><head></head>
<style>
</style><script></script><body><p>hello</p></body></html>
cSsg|] }| ¡‘qSr ©Úextract©Ú.0Úsr r r Ú
<listcomp>sz7TestHTML5LibBuilder.test_extraction.<locals>.<listcomp>ÚscriptcSsg|] }| ¡‘qSr r;r=r r r r@sÚstyler rN)rr'r(rr"r r r Útest_extraction‚s
 
z#TestHTML5LibBuilder.test_extractioncCsFd}| |¡}g}| d¡D]}| | d¡¡qt|ƒdksBt‚dS)z‚
        Test that empty comment does not break structure.
 
        https://bugs.launchpad.net/beautifulsoup/+bug/1806598
        zI
<html>
<body>
<form>
<!----><input type="text">
</form>
</body>
</html>
ÚformÚinputrN)rr(Úextendr'r)r rrÚinputsrDr r r Útest_empty_comment”s     
z&TestHTML5LibBuilder.test_empty_commentcCszd}| |¡}d|jjkst‚d|jjks.t‚d|j d¡jksDt‚|j|dd}d|jjjksdt‚d|jjjksvt‚dS)Nz=
   <p>
 
<sourceline>
<b>text</b></sourceline><sourcepos></p>r$éÚ
sourcelineF)Zstore_line_numbersÚ    sourcepos)rr rJrrKr4Únamer"r r r Útest_tracking_line_numbersªs
z.TestHTML5LibBuilder.test_tracking_line_numberscCsdS)Nr )r r r r Útest_special_string_containers¸sz2TestHTML5LibBuilder.test_special_string_containersc    CsjdD]`\}}}d|}| |¡j}| ¡}d| d¡}||ksDt‚|jdd}d|}||kst‚qdS)N))z&RightArrowLeftArrow;u⇄s&rlarr;)z&models;u⊧s&models;)z&Nfr;u𝔑s&Nfr;)z&ngeqq;u≧̸s&ngeqq;)z&not;õ¬s&not;)z&Not;u⫬s&Not;)z&quot;ú"ó")z&there4;õ∴ó&there4;)z &Therefore;rRrS)z &therefore;rRrS)z&fjlig;Úfjsfj)z&sqcup;u⊔s&sqcup;)z&sqcups;u⊔︀s&sqcups;)z&apos;ú'ó')z&verbar;ú|ó|z <div>%s</div>s <div>%s</div>Úutf8Úhtml)Ú    formatter)rÚdivr!r)    r Z input_elementZoutput_unicodeZoutput_elementrr\Zwithout_elementÚexpectZ with_elementr r r Útest_html5_attributes¾s       z)TestHTML5LibBuilder.test_html5_attributesN)Ú__name__Ú
__module__Ú __qualname__Ú__doc__Úpropertyrrrr#r)r*r0r5r7r9r:rCrHrMrNr^r r r r r    s$
  r    )rbZpytestrZbs4rZ bs4.elementrÚrrrÚmarkZskipifr    r r r r Ú<module>s  þ