fume-manage-python.git

U  
¡ý°dG:ã@sdZdZdgZddlmZddlZddlZddlmZm    Z    m
Z
mZmZddl mZmZddlmZmZmZmZmZd    ZGd
ddeeZGdddeZdS) zCUse the HTMLParser library to parse HTML files that aren't too bad.ÚMITÚHTMLParserTreeBuilderé)Ú
HTMLParserN)ÚCDataÚCommentÚDeclarationÚDoctypeÚProcessingInstruction)ÚEntitySubstitutionÚ UnicodeDammit)ÚDetectsXMLParsedAsHTMLÚParserRejectedMarkupÚHTMLÚHTMLTreeBuilderÚSTRICTzhtml.parserc@s|eZdZdZdZdZddZddZdd    ZdddZ    dd dZ
ddZddZddZ ddZddZddZddZdS) ÚBeautifulSoupHTMLParserz¸A subclass of the Python standard library's HTMLParser class, which
    listens for HTMLParser events and translates them into calls
    to Beautiful Soup's tree construction API.
    ÚignoreÚreplacecOs4| d|j¡|_tj|f||g|_| ¡dS)aConstructor.
 
        :param on_duplicate_attribute: A strategy for what to do if a
            tag includes the same attribute more than once. Accepted
            values are: REPLACE (replace earlier values with later
            ones, the default), IGNORE (keep the earliest value
            encountered), or a callable. A callable must take three
            arguments: the dictionary of attributes already processed,
            the name of the duplicate attribute, and the most recent value
            encountered.           
        Úon_duplicate_attributeN)ÚpopÚREPLACErrÚ__init__Úalready_closed_empty_elementZ_initialize_xml_detector)ÚselfÚargsÚkwargs©rúNd:\z\workplace\vscode\pyvenv\venv\Lib\site-packages\bs4/builder/_htmlparser.pyr.sÿ    z BeautifulSoupHTMLParser.__init__cCst|dS)N)r )rÚmessagerrrÚerrorJszBeautifulSoupHTMLParser.errorcCs|j||dd}| |¡dS)zÏHandle an incoming empty-element tag.
 
        This is only called when the markup looks like <tag/>.
 
        :param name: Name of the tag.
        :param attrs: Dictionary of the tag's attributes.
        F)Úhandle_empty_elementN)Úhandle_starttagÚ handle_endtag)rÚnameÚattrsÚtagrrrÚhandle_startendtagZsz*BeautifulSoupHTMLParser.handle_startendtagTcCsÎi}|D]`\}}|dkrd}||kr\|j}||jkr6qd|d|jfkrN|||<qd||||n|||<d}q| ¡\}    }
|jj|dd||    |
d}|r¶|jr¶|r¶|j|dd|j     |¡|j
dkrÊ| |¡dS)a3Handle an opening tag, e.g. '<tag>'
 
        :param name: Name of the tag.
        :param attrs: Dictionary of the tag's attributes.
        :param handle_empty_element: True if this tag is known to be
            an empty-element tag (i.e. there is not expected to be any
            closing tag).
        NÚz"")Ú
sourcelineÚ    sourceposF)Úcheck_already_closed)rÚIGNORErÚgetposÚsoupr!Zis_empty_elementr"rÚappendZ    _root_tagZ_root_tag_encountered)rr#r$r Z    attr_dictÚkeyÚvalueZon_dupeÚ    attrvaluer(r)r%rrrr!is6
 
 
þ
 
z'BeautifulSoupHTMLParser.handle_starttagcCs,|r||jkr|j |¡n|j |¡dS)zõHandle a closing tag, e.g. '</tag>'
        
        :param name: A tag name.
        :param check_already_closed: True if this tag is expected to
           be the closing portion of an empty-element tag,
           e.g. '<tag></tag>'.
        N)rÚremover-r")rr#r*rrrr" s    z%BeautifulSoupHTMLParser.handle_endtagcCs|j |¡dS)z4Handle some textual data that shows up between tags.N)r-Úhandle_data©rÚdatarrrr3²sz#BeautifulSoupHTMLParser.handle_datacCsê| d¡rt| d¡d}n$| d¡r8t| d¡d}nt|}d}|dkr|jjdfD]B}|sbqXzt|g |¡}WqXtk
r}zW5d}~XYqXXqX|sÔzt|}Wn&t    t
fk
rÒ}zW5d}~XYnX|pÚd}| |¡dS)z×Handle a numeric character reference by converting it to the
        corresponding Unicode character and treating it as textual
        data.
 
        :param name: Character number, possibly in hexadecimal.
        ÚxéÚXNézwindows-1252uï¿½)Ú
startswithÚintÚlstripr-Úoriginal_encodingÚ    bytearrayÚdecodeÚUnicodeDecodeErrorÚchrÚ
ValueErrorÚ OverflowErrorr3)rr#Z    real_namer5ÚencodingÚerrrÚhandle_charref¶s*
 
z&BeautifulSoupHTMLParser.handle_charrefcCs0tj |¡}|dk    r|}nd|}| |¡dS)zÈHandle a named entity reference by converting it to the
        corresponding Unicode character(s) and treating it as textual
        data.
 
        :param name: Name of the entity reference.
        Nz&%s)r
ZHTML_ENTITY_TO_CHARACTERÚgetr3)rr#Ú    characterr5rrrÚhandle_entityrefÞs
z(BeautifulSoupHTMLParser.handle_entityrefcCs&|j ¡|j |¡|j t¡dS)zOHandle an HTML comment.
 
        :param data: The text of the comment.
        N)r-ÚendDatar3rr4rrrÚhandle_commentñs
z&BeautifulSoupHTMLParser.handle_commentcCs6|j ¡|tdd}|j |¡|j t¡dS)zYHandle a DOCTYPE declaration.
 
        :param data: The text of the declaration.
        zDOCTYPE N)r-rJÚlenr3rr4rrrÚhandle_declús
z#BeautifulSoupHTMLParser.handle_declcCsN| ¡ d¡r$t}|tdd}nt}|j ¡|j |¡|j |¡dS)z{Handle a declaration of unknown type -- probably a CDATA block.
 
        :param data: The text of the declaration.
        zCDATA[N)Úupperr:rrLrr-rJr3)rr5ÚclsrrrÚunknown_decls
z$BeautifulSoupHTMLParser.unknown_declcCs0|j ¡|j |¡| |¡|j t¡dS)z\Handle a processing instruction.
 
        :param data: The text of the instruction.
        N)r-rJr3Z_document_might_be_xmlr    r4rrrÚ    handle_pis
 
z!BeautifulSoupHTMLParser.handle_piN)T)T)Ú__name__Ú
__module__Ú__qualname__Ú__doc__r+rrrr&r!r"r3rFrIrKrMrPrQrrrrr$s
7
(    
rcsNeZdZdZdZdZeZeee    gZ
dZdfdd    ZdddZ d    d
ZZS) rzpA Beautiful soup `TreeBuilder` that uses the `HTMLParser` parser,
    found in the Python standard library.
    FTNcslt}dD]}||kr
| |¡}|||<q
tt|jf||pBg}|pJi}| |¡d|d<||f|_dS)aConstructor.
 
        :param parser_args: Positional arguments to pass into 
            the BeautifulSoupHTMLParser constructor, once it's
            invoked.
        :param parser_kwargs: Keyword arguments to pass into 
            the BeautifulSoupHTMLParser constructor, once it's
            invoked.
        :param kwargs: Keyword arguments for the superclass constructor.
        )rFÚconvert_charrefsN)ÚdictrÚsuperrrÚupdateÚparser_args)rrZZ parser_kwargsrZextra_parser_kwargsÚargr0©Ú    __class__rrr*s 
 
 
zHTMLParserTreeBuilder.__init__c    cs\t|tr|dddfVdS|g}|g}||g}t|||d|d}|j|j|j|jfVdS)aÜRun any preliminary steps necessary to make incoming markup
        acceptable to the parser.
 
        :param markup: Some markup -- probably a bytestring.
        :param user_specified_encoding: The user asked to try this encoding.
        :param document_declared_encoding: The markup itself claims to be
            in this encoding.
        :param exclude_encodings: The user asked _not_ to try any of
            these encodings.
 
        :yield: A series of 4-tuples:
         (markup, encoding, declared encoding,
          has undergone character replacement)
 
         Each 4-tuple represents a strategy for converting the
         document to Unicode and parsing it. Each strategy will be tried 
         in turn.
        NFT)Úknown_definite_encodingsÚuser_encodingsZis_htmlÚexclude_encodings)Ú
isinstanceÚstrrÚmarkupr=Zdeclared_html_encodingZcontains_replacement_characters)    rrcZuser_specified_encodingZdocument_declared_encodingr`r^r_Z try_encodingsZdammitrrrÚprepare_markupCs"
ûþz$HTMLParserTreeBuilder.prepare_markupc
Csh|j\}}t||}|j|_z| |¡Wn*tk
rT}zt|W5d}~XYnX| ¡g|_dS)z{Run some incoming markup through some parsing process,
        populating the `BeautifulSoup` object in self.soup.
        N)rZrr-ÚfeedÚAssertionErrorr Úcloser)rrcrrÚparserrErrrrets
 
zHTMLParserTreeBuilder.feed)NN)NNN)rRrSrTrUZis_xmlZ    picklableÚ
HTMLPARSERÚNAMErrÚfeaturesZTRACKS_LINE_NUMBERSrrdreÚ __classcell__rrr\rrs
ÿ
1)rUÚ__license__Ú__all__Úhtml.parserrÚsysÚwarningsZbs4.elementrrrrr    Z
bs4.dammitr
rZbs4.builderrr rrrrirrrrrrÚ<module>sÿ    z